skip to main content
research-article

Adaptive Process Migrations in Coupled Applications for Exchanging Data in Local File Cache

Published:31 July 2018Publication History
Skip Abstract Section

Abstract

Many problems in science and engineering are usually emulated as a set of mutually interacting models, resulting in a coupled or multiphysics application. These component models show challenges originating from their interdisciplinary nature and from their computational and algorithmic complexities. In general, these models are independently developed and maintained, so that they commonly employ the global file system for exchanging their data in the coupled application.

To effectively use the local file cache on the compute node for exchanging the data among the processes of such applications, and consequently boosting I/O performance, this article presents a novel mechanism to migrate a process from one compute node to another node on the basis of block I/O dependency. In this newly proposed mechanism, the block I/O dependency between two involved processes running on the different nodes is profiled as block access similarity by taking advantage of the Cohen’s kappa statistic. Then, the process is supposed to be dynamically migrated from its source node to the destination node, on which there is another process having heavy block I/O dependency. As a result, both processes can exchange their data by utilizing the local file cache instead of the global file system to reduce I/O time. The experimental results demonstrate that the I/O performance can be significantly improved, and the time required for executing the application can be resultantly decreased, as expected.

References

  1. H. Abdi. 2007. The Kendall rank correlation coefficient. In Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks, CA. 508--510.Google ScholarGoogle Scholar
  2. R. Ahmad, A. Gani, and S. Hamid. 2015. Virtual machine migration in cloud data centers: a review, taxonomy, and open research issues. J Supercomput. 71, 7 (2015), 2473--2515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Amir, B. Awerbuch, and A. Barak et al. 2000. An opportunity cost approach for job assignment in a scalable computing cluster. IEEE Trans. Parallel Distrib. Syst. 11, 7 (2000), 760--768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Anthony and J. Garrett. 2005. Understanding interobserver agreement: the kappa statistic. Fam. Med. 37, 5 (2005), 360--363.Google ScholarGoogle Scholar
  5. K. Barker, A. Chernikov, N. Chrisochoides et al. 2004. A load balancing framework for adaptive and asynchronous applications. IEEE Trans. Parallel Distrib. Syst. 15, 2 (2004), 183--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Benesty, J. Chen, and Y. Huang et al. 2009. Pearson correlation coefficient. In Noise Reduction in Speech Processing. Springer, Berlin, 1--4.Google ScholarGoogle Scholar
  7. BTIO Benchmark. 2011. Retrieved from http://www.nas.nasa.gov/.Google ScholarGoogle Scholar
  8. A. Choudhary. 2015. Active Storage with Analytics Capabilities and I/O Runtime System for Petascale Systems. Report No. DOE-NWU-25848. Northwestern University, Evanston, IL.Google ScholarGoogle Scholar
  9. I. Cores, G. Rodriguez, and P. Gonzalez et al. 2014. Failure avoidance in MPI applications using an application-level approach. Comput. J. 57, 1 (2014), 100--114.Google ScholarGoogle ScholarCross RefCross Ref
  10. X. Cui, P. Zhu, and X. Yang et al. 2014. Optimized big data K-means clustering using MapReduce. J. Supercomput. 70, 3 (2014), 1249--1259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. DeGroot and M. Schervish. 2011. Probability and Statistics, 4th ed. Pearson Education Limited, London.Google ScholarGoogle Scholar
  12. X. Ding, S. Jiang, F. Chen, and X. Zhang et al. 2007. DiskSeen: Exploiting Disk Layout and Access History to Enhance I/O Prefetch. In Proceedings of the USENIX Annual Technical Conference (ATC’07). USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Dongarra and P. Beckman et al. 2011. The International Exascale Software Roadmap. Int. J. High Perf. Comput. Appl. 25, 1 (2011), 3--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Duell. 2000. The design and implementation of berkeley labs linux checkpoint/restart. Technique Report, Lawrence Berkeley National Laboratory.Google ScholarGoogle Scholar
  15. FUSE: Filesystem in Userspace. Retrieved from http://fuse.sourceforge.net/.Google ScholarGoogle Scholar
  16. B. Hunt, E. Kostelich, and I. Szunyogh. 2007. Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D 230, 1 (2007), 112--126.Google ScholarGoogle Scholar
  17. K. Ibrahim, S. Hofmeyr, C. Iancu, and E. Roman. 2011. Optimized pre-copy live migration for memory intensive applications. In Proceedings of the International Conference on High Performance Computing, Network, and Storage Analysis (SC’2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Iozone Filesystem Benchmark. Retrieved from http://www.iozone.org/.Google ScholarGoogle Scholar
  19. S. Jiang, X. Ding, Y. Xu, and K. Davis. 2013. A prefetching scheme exploiting both data layout and access history on disk. ACM Trans. Stor. 9, 3 (2013), Article 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Jeannot, G. Mercier, and F. Tessier. 2014. Process placement in multicore clusters: Algorithmic issues and practical techniques. IEEE Trans. Parallel Distrib. Syst. 25, 4 (2014), 993--1002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. F. Joseph, J. Cohen, and B. Everitt. 1969. Large sample standard errors of kappa and weighted kappa. Psychol. Bull. 72, 5 (1969), 323--327.Google ScholarGoogle ScholarCross RefCross Ref
  22. KDD Cup 1999 Data. Retrieved from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.Google ScholarGoogle Scholar
  23. J. Larson, R. Jacob, and E. Ong. 2005. The model coupling toolkit: A new Fortran90 toolkit for building multiphysics parallel coupled models. Int. J. High Perform. C 19, 3 (2005), 277--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Z. Li, Z. Chen, S. Srinivasan, and Y. Zhou. 2004. C-Miner: Mining Block Correlations in Storage Systems. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (ATC’04). USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Liao, Y. Ishikawa. 2012a. Partial replication of metadata to achieve high metadata availability in parallel file systems. In Proceedings of 41st International Conference on Parallel Processing (ICPP’12). 168--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Liao. 2012b. A new concurrent checkpoint mechanism for embeded multi-core systems. Comput. Inform. 31, 3 (2012), 693--709.Google ScholarGoogle Scholar
  27. J. Liao, F. Trahay, B. Gerofi, Y. Ishikawa. 2016. Prefetching on storage servers through mining access patterns on blocks. IEEE Trans. Parallel Distrib. Syst. 27, 9 (Sep. 2016), 2698--2710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Liao, B. Gerofi, G. Lien , S. Nishizawa, T. Miyoshi, H. Tomita, W. Liao, A. Choudhary, and Y. Ishikawa. 2017. A flexible I/O arbitration framework for netCDF based big data processing workflows on high-end supercomputers. Concurr. Comput. Pract. Exper. 29, 15 (Aug. 2017), 12 pages.Google ScholarGoogle ScholarCross RefCross Ref
  29. A. Mashtizadeh, M. Cai, and G. Tarasuk-Levin et al. 2014. XvMotion: Unified virtual machine migration over long distance. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. V. Medina and J. Garcia. 2014. A survey of migration mechanisms of virtual machines. ACM Comput. Surv. 46, 3 (2014), Article 30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. Melnykov, W. Chen, and R. Maitra. 2012. Mixsim: An R package for simulating data to study performance of clustering algorithms. J. Stat. Softw. 51, 12 (2012), 1--25.Google ScholarGoogle ScholarCross RefCross Ref
  32. F. Milojicic, F. Douglis, Y. Paindaveine, and S. Zhou et al. 2000. Process migration. ACM Comput. Surv. 32, 3 (2000), 241--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Miyoshi, G. Lien, S. Satoh, and Y. Ishikawa et al. 2016. “Big data assimilation” toward post-peta-scale severe weather prediction: An overview and progress. Proc. IEEE 104, 11 (Nov. 2016), 2155--2179.Google ScholarGoogle ScholarCross RefCross Ref
  34. F. Molteni. 2003. Atmospheric simulations using a GCM with simplified physical parametrizations. I: Model climatology and variability in multi-decadal experiments. In Climate Dynamics, Vol. 20, 175--191.Google ScholarGoogle ScholarCross RefCross Ref
  35. L. Myers and J. Sirois. 2006. Spearman correlation coefficients, differences between. Wiley StatsRef: Statistics Reference Online.Google ScholarGoogle Scholar
  36. X. Ouyang, S. Marcarelli, R. Rajachandrasekar, and D. Panda. 2010. RDMA-based job migration framework for MPI over infiniband. In Proceedings of the IEEE International Conference on Cluster Computing (Cluster’10). 116--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. X. Ouyang, R. Rajachandrasekar, X. Besseron, and D. Panda. 2011a. High performance pipelined process migration with RDMA. In Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’11). IEEE Computer Society, 314--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. X. Ouyang, R. Rajachandrasekar, X. Besseron, and D. Panda et al. 2011b. CRFS: A lightweight user-level filesystem for generic checkpoint/restart. In Proceedings of 2011 International Conference on Parallel Processing (ICPP’11). 375--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Petri and H. Langendorfer. 1995. Load balancing and fault tolerance in workstation clusters migrating groups of communicating processes. ACM SIGOPS Operat. Syst. Rev. 29, 4 (1995), 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E. Riedel, G. Gibson, and C. Faloutsos. 1998. Active storage for large-scale data mining and multimedia applications. In Proceedings of 24th Conference on Very Large Databases (VLDB’98). 62--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Valcke, R. Budich, and M. Carter et al. 2006. The PRISM software framework and the OASIS coupler. In Proceedings of the 18th Annual BMRC Modelling Workshop.Google ScholarGoogle Scholar
  42. C. Vecchiola, S. Pandey, and R. Buyya. 2009. High-performance cloud computing: A view of scientific applications. In Proceedings of the 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN’09). 4--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. R. Vyas, H. Maheta, and V. Dabhi et al. 2014. Load balancing using process migration for linux based distributed system. In Proceedings of International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT’14). 248--252.Google ScholarGoogle Scholar
  44. C. Wang, F. Mueller, C. Engelmann, and S. Scott. 2008. Proactive process-level live migration in HPC environments. In Proceedings of the International Conference on High Performance Computing, Networks, and Storage Analysis (SC’08). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Wang and X. Liang. 2005. Qualitative Data Analysis. East China Normal University Press, 92--93. {in Chinese}Google ScholarGoogle Scholar
  46. D. Williams, H. Jamjoom, and H. Weatherspoon. 2012. The Xen-Blanket: Virtualize once, run everywhere. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys’12). ACM, New York, NY, 113--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Wong. 2018. C-MapReduce. Retrieved January 2018 from https://github.com/jeffrey-garcia/C-MapReduce.Google ScholarGoogle Scholar
  48. Y. Xie, D. Feng, Y. Li, and D. Long. 2016. Oasis: An active storage framework for object storage platform. Future Generation Computer Systems 56 (2016), 746--758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. F. Xu, F. Liu, L. Liu, and H. Jin et al. 2014. iaware: Making live migration of virtual machines interference-aware in the cloud. IEEE Trans. Comput. 63, 12 (2014), 3012--3025. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. F. Xu, F. Liu, L. Liu, and H. Jin et al. 2014b. Managing performance overhead of virtual machines in cloud computing: a survey, state of the art, and future directions. Proc. IEEE 102, 1 (2014), 11--31.Google ScholarGoogle ScholarCross RefCross Ref
  51. X. Zhang, K. Davis, and S. Jiang. 2011. Qos support for end users of i/o-intensive applications using shared storage systems. In Proceedings of the 2011 ACM/IEEE Conference on Supercomputing (SC’11). ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. F. Zhang, C. Docan, M. Parashar et al. 2012. Enabling in-situ execution of coupled scientific workflow on multi-core platform. In Proceedings of IEEE 26th International Parallel 8 Distributed Processing Symposium (IPDPS’12). 1352--1363. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. F. Zheng, H. Zou, and G. Eisenhauer et al. 2013. Flexio: I/O middleware for location-flexible scientific data analytics. In Proceedings of IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS’13). 320--331. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adaptive Process Migrations in Coupled Applications for Exchanging Data in Local File Cache

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!