skip to main content
research-article

Performance Analysis and Optimization of Full Garbage Collection in Memory-hungry Environments

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

Garbage collection (GC), especially full GC, would non- trivially impact overall application performance, especially for those memory-hungry ones handling large data sets. This paper presents an in-depth performance analysis on the full GC performance of Parallel Scavenge (PS), a state-of-the-art and the default garbage collector in the HotSpot JVM, using traditional and big-data applications running atop JVM on CPU (e.g., Intel Xeon) and many-integrated cores (e.g., Intel Xeon i). The analysis uncovers that unnecessary memory accesses and calculations during reference updating in the compaction ase are the main causes of lengthy full GC. To this end, this paper describes an incremental query model for reference calculation, which is further embodied with three schemes (namely optimistic, sort-based and region-based) for different query patterns. Performance evaluation shows that the incremental query model leads to averagely 1.9X (up to 2.9X) in full GC and 19.3% (up to 57.2%) improvement in application throughput, as well as 31.2% reduction in pause time over the vanilla PS collector on CPU, and the numbers are 2.1X (up to 3.4X), 11.1% (up to 41.2%) and 34.9% for Xeon i accordingly.

References

  1. SPECjvm2008. https://www.spec.org/jvm2008/, 2015.Google ScholarGoogle Scholar
  2. D. Abuaiadh, Y. Ossia, E. Petrank, and U. Silbershtein. An efficient parallel heap compaction algorithm. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '04, pages 224--236, New York, NY, USA, 2004. ACM. ISBN 1--58113--831--8. 10.1145/1028976.1028995. URL http://doi.acm.org/10.1145/1028976.1028995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Apache. Apache gira: an iterative gra processing system built for high scalability. http://gira.apache.org/.Google ScholarGoogle Scholar
  4. S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pages 22--32, New York, NY, USA, 2008. ACM. ISBN 978--1--59593--860--2. 10.1145/1375581.1375586. URL http://doi.acm.org/10.1145/1375581.1375586.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. VanDrunen, von Dincklage, and Wiedermann]blackburn2006dacapoS. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. ansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The dacapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '06, pages 169--190, New York, NY, USA, 2006. ACM. ISBN 1--59593--348--4. 10.1145/1167473.1167488. URL http://doi.acm.org/10.1145/1167473.1167488.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Bu, V. Borkar, G. Xu, and M. J. Carey. A bloat-aware design for big data applications. In Proceedings of the 2013 International Symposium on Memory Management, ISMM '13, pages 119--130, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--2100--6. 10.1145/2464157.2466485. URL http://doi.acm.org/10.1145/2464157.2466485.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Cahoon and K. S. McKinley. Data flow analysis for software prefetching linked data structures in java. In Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, PACT '01, pages 280--291, Washington, DC, USA, 2001. IEEE Computer Society. ISBN 0--7695--1363--8. URL http://dl.acm.org/citation.cfm?id=645988.674177.Google ScholarGoogle ScholarCross RefCross Ref
  8. , and Sahlin]Chung:2000:RST:325694.325744Y. C. Chung, S.-M. Moon, K. Ebcio\uglu, and D. Sahlin. Reducing sweep time for a nearly empty heap. In Proceedings of the 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '00, pages 378--389, New York, NY, USA, 2000. ACM. ISBN 1--58113--125--9. 10.1145/325694.325744. URL http://doi.acm.org/10.1145/325694.325744.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dean and L. A. Barroso. The tail at scale. Commun. ACM, 56 (2): 74--80, Feb. 2013. ISSN 0001-0782. 10.1145/2408776.2408794. URL http://doi.acm.org/10.1145/2408776.2408794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. Assessing the scalability of garbage collectors on many cores. SIGOPS Oper. Syst. Rev., 45 (3): 15--19, Jan. 2012. ISSN 0163--5980. 10.1145/2094091.2094096. URL http://doi.acm.org/10.1145/2094091.2094096.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A study of the scalability of stop-the-world garbage collectors on multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 229--240, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--1870--9. 10.1145/2451116.2451142. URL http://doi.acm.org/10.1145/2451116.2451142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Gidra, G. Thomas, J. Sopena, M. Shapiro, and N. Nguyen. Numagic: A garbage collector for big data on big numa machines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 661--673, New York, NY, USA, 2015. ACM. ISBN 978--1--4503--2835--7. 10.1145/2694344.2694361. URL http://doi.acm.org/10.1145/2694344.2694361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. I. Gog, J. Giceva, M. Schwarzkopf, K. Vaswani, D. Vytiniotis, G. Ramalingan, D. Murray, S. Hand, and M. Isard. Broom: Sweeping out garbage collection from big data systems. In Proceedings of the 15th USENIX Conference on Hot Topics in Operating Systems, HOTOS'15, pages 2--2, Berkeley, CA, USA, 2015. USENIX Association. URL http://dl.acm.org/citation.cfm?id=2831090.2831092.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Jones, A. Hosking, and E. Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. Chapman & Hall/CRC, 1st edition, 2011. ISBN 1420082795, 9781420082791.Google ScholarGoogle Scholar
  15. M. Maas, T. Harris, K. Asanovic, and J. Kubiatowicz. Trash day: Coordinating garbage collection in distributed systems. In Proceedings of the 15th USENIX Conference on Hot Topics in Operating Systems, HOTOS'15, pages 1--1, Berkeley, CA, USA, 2015. USENIX Association. URL http://dl.acm.org/citation.cfm?id=2831090.2831091.Google ScholarGoogle Scholar
  16. S. Microystems. Memory management in the java hotspot? virtual machine, 2006.Google ScholarGoogle Scholar
  17. J. E. Moreira, S. P. Midkiff, M. Gupta, P. Wu, G. Almasi, and P. Artigas. Ninja: Java for high performance numerical computing. Sci. Program., 10 (1): 19--33, Jan. 2002. ISSN 1058--9244. 10.1155/2002/314103. URL http://dx.doi.org/10.1155/2002/314103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Morikawa, T. Ugawa, and H. Iwasaki. Adaptive scanning reduces sweep time for the lisp2 mark-compact garbage collector. In Proceedings of the 2013 International Symposium on Memory Management, ISMM '13, pages 15--26, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--2100--6. 10.1145/2464157.2466480. URL http://doi.acm.org/10.1145/2464157.2466480.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 439--455, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--2388--8. 10.1145/2517349.2522738. URL http://doi.acm.org/10.1145/2517349.2522738.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. M. Muthukumar and D. Janakiram. Yama: A scalable generational garbage collector for java in multiprocessor systems. IEEE Trans. Parallel Distrib. Syst., 17 (2): 148--159, Feb. 2006. ISSN 1045--9219. 10.1109/TPDS.2006.28. URL http://dx.doi.org/10.1109/TPDS.2006.28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Nguyen and G. Xu. Cachetor: Detecting cacheable data to remove bloat. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pages 268--278, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--2237--9. 10.1145/2491411.2491416. URL http://doi.acm.org/10.1145/2491411.2491416.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Nguyen, K. Wang, Y. Bu, L. Fang, J. Hu, and G. Xu. Facade: A compiler and runtime for (almost) object-bounded big data applications. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 675--690, New York, NY, USA, 2015. ACM. ISBN 978--1--4503--2835--7. 10.1145/2694344.2694345. URL http://doi.acm.org/10.1145/2694344.2694345.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Sachindran, J. E. B. Moss, and E. D. Berger. Mc2: High-performance garbage collection for memory-constrained environments. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '04, pages 81--98, New York, NY, USA, 2004. ACM. ISBN 1--58113--831--8. 10.1145/1028976.1028984. URL http://doi.acm.org/10.1145/1028976.1028984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Sarkar and J. Dolby. High-performance scalable java virtual machines. In Proceedings of the 8th International Conference on High Performance Computing, HiPC '01, pages 151--166, London, UK, UK, 2001. Springer-Verlag. ISBN 3--540--43009--1. URL http://dl.acm.org/citation.cfm?id=645447.652938.Google ScholarGoogle ScholarCross RefCross Ref
  25. soman2008mtm2S. Soman, C. Krintz, and L. Daynès. Mtm2: Scalable memory management for multi-tasking managed runtime environments. In Proceedings of the 22Nd European Conference on Object-Oriented Programming, ECOOP '08, pages 335--361, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 978--3--540--70591--8. 10.1007/978--3--540--70592--5\_15. URL http://dx.doi.org/10.1007/978--3--540--70592--5\_15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Spark. Apache spark is a fast and general engine for large-scale data processing. http://spark.apache.org/, 2015.Google ScholarGoogle Scholar
  27. to, Touri\ no, and Doallo]taboada2013javaG. L. Taboada, S. Ramos, R. R. Expósito, J. Touri\ no, and R. Doallo. Java in the high performance computing arena: Research, practice and experience. Sci. Comput. Program., 78 (5): 425--444, May 2013. ISSN 0167--6423. 10.1016/j.scico.2011.06.002. URL http://dx.doi.org/10.1016/j.scico.2011.06.002.Google ScholarGoogle Scholar
  28. D. Vengerov. Modeling, analysis and throughput optimization of a generational garbage collector. In Proceedings of the 2009 International Symposium on Memory Management, ISMM '09, pages 1--9, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--347--1. 10.1145/1542431.1542433. URL http://doi.acm.org/10.1145/1542431.1542433.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Yu, T. Lei, H. Chen, and B. Zang. Openjdk meets xeon i: A comprehensive study of java hpc on intel many-core architecture. In Parallel Processing Workshops (ICPPW), 2015 44th International Conference on, pages 156--165. IEEE, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, pages 2--2, Berkeley, CA, USA, 2012. USENIX Association. URL http://dl.acm.org/citation.cfm?id=2228298.2228301.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Zhang, E. Tune, R. Hagmann, R. Jnagal, V. Gokhale, and J. Wilkes. Cpi2: Cpu performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 379--391, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--1994--2. 10.1145/2465351.2465388. URL http://doi.acm.org/10.1145/2465351.2465388.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Performance Analysis and Optimization of Full Garbage Collection in Memory-hungry Environments

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!