Abstract
Work-stealing is promising for scheduling and balancing parallel workloads. It has a wide range of applicability on middleware, libraries, and runtime systems of programming languages. OpenJDK uses work-stealing for copying garbage collection (GC) to balance copying tasks among GC threads. Each thread has its own queue to store tasks. When a thread has no task in its queue, it acts as a thief and attempts to steal a task from another thread's queue. However, this work-stealing algorithm requires expensive memory fences for pushing, popping, and stealing tasks, especially on weak memory models such as POWER and ARM. To address this problem, we propose a work-stealing algorithm that uses double queues. Each GC thread has a public queue that is accessible from other GC threads and a private queue that is only accessible by itself. Pushing and popping tasks in the private queue are free from expensive memory fences. The most significant point in our algorithm is providing a mechanism to maintain the load balance on the basis of the use of double queues. We developed a prototype implementation for parallel GC in OpenJDK8 for ppc64le. We evaluated our algorithm by using SPECjbb2015, SPECjvm2008, TPC-DS, and Apache DayTrader.
- Power ISA Version 2.07. 2013. https://www.power.org/documentation/ power-isa-version-2-07/. (2013).Google Scholar
- Umut A. Acar, Arthur Chargueraud, and Mike Rainey. 2013. Scheduling Parallel Programs by Work Stealing with Private Deques. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). ACM, 219–228. Google Scholar
Digital Library
- Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 1998. Thread Scheduling for Multiprogrammed Multiprocessors. In Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’98). Google Scholar
Digital Library
- Intel Threading Building Blocks. 2018. https://software.intel.com/enus/intel-tbb. (2018).Google Scholar
- Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’95). Google Scholar
Digital Library
- Hans-Juergen Boehm. 1993. Space Efficient Conservative Garbage Collection. In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (PLDI ’93). ACM, New York, NY, USA, 197–206. Google Scholar
Digital Library
- David Chase and Yossi Lev. 2005. Dynamic Circular Work-stealing Deque. In Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’05). Google Scholar
Digital Library
- Guojing Cong, Sreedhar Kodali, Sriram Krishnamoorthy, Doug Lea, Vijay Saraswat, and Tong Wen. 2008. Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing. In International Conference on Parallel Processing (ICPP ’08). Google Scholar
Digital Library
- David Cunningham, David Grove, Benjamin Herta, Arun Iyengar, Kiyokuni Kawachiya, Hiroki Murata, Vijay Saraswat, Mikio Takeuchi, and Olivier Tardieu. 2014. Resilient X10: Efficient Failure-aware Programming. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’14). Google Scholar
Digital Library
- James Dinan, D. Brian Larkins, P. Sadayappan, Sriram Krishnamoorthy, and Jarek Nieplocha. 2009. Scalable Work Stealing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC ’09). Google Scholar
Digital Library
- ARM documentation. 2018. http://infocenter.arm.com/help/index.jsp? topic=/com.arm.doc.home/index.html. (2018).Google Scholar
- Toshio Endo, Kenjiro Taura, and Akinori Yonezawa. 1997. A Scalable Mark-sweep Garbage Collector on Large-scale Shared-memory Machines. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing (SC ’97). Google Scholar
Digital Library
- Helin Eric. 2012. Improving load balancing during the marking phase of garbage collection.Google Scholar
- Christine H. Flood, David Detlefs, Nir Shavit, and Xiaolan Zhang. 2001. Parallel Garbage Collection for Shared Memory Multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1 (JVM’01). Google Scholar
Digital Library
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI ’98). Google Scholar
Digital Library
- Wessam Hassanein. 2016. Understanding and Improving JVM GC Work Stealing at the Data Center Scale. In Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management (ISMM 2016). Google Scholar
Digital Library
- Danny Hendler and Nir Shavit. 2002. Non-blocking Steal-half Work Queues. In Proceedings of the Twenty-first Annual Symposium on Principles of Distributed Computing (PODC ’02). Google Scholar
Digital Library
- Open J9. 2018. http://openj9.mybluemix.net/. (2018).Google Scholar
- Nhat Minh Lê, Antoniu Pop, Albert Cohen, and Francesco Zappa Nardelli. 2013. Correct and Efficient Work-stealing for Weak Memory Models. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). 69–80. Google Scholar
Digital Library
- Maged M. Michael, Martin T. Vechev, and Vijay A. Saraswat. 2009. Idempotent Work Stealing. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’09). Google Scholar
Digital Library
- OpenJDK. 2018. http://openjdk.java.net/. (2018).Google Scholar
- Hyunkyu Park, Changmin Lee, and Seung Hun Kim. 2013. MarkSharing: A Parallel Garbage Collection Algorithm for Low Synchronization Overhead. In International Conference on Parallel and Distributed Systems 2013 (ICPADS 2013). Google Scholar
Digital Library
- WebSphere Application Server. 2018. http://www03.ibm.com/software/products/en/appserv-was. (2018).Google Scholar
- Olivier Tardieu, Benjamin Herta, David Cunningham, David Grove, Prabhanjan Kambadur, Vijay Saraswat, Avraham Shinnar, Mikio Takeuchi, and Mandana Vaziri. 2014. X10 and APGAS at Petascale. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’14). Google Scholar
Digital Library
- Tom van Dijk and Jaco van de Pol. 2014. Lace: Non-blocking Split Deque for Work-Stealing. In Euro-Par 2014 International Workshops.Google Scholar
Index Terms
Balanced double queues for GC work-stealing on weak memory models
Recommendations
Understanding and improving JVM GC work stealing at the data center scale
ISMM 2016: Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory ManagementGarbage collection (GC) is a critical part of performance in managed run-time systems such as the OpenJDK Java Virtual Machine (JVM). With a large number of latency sensitive applications written in Java the performance of the JVM is essential. Java ...
Balanced double queues for GC work-stealing on weak memory models
ISMM 2018: Proceedings of the 2018 ACM SIGPLAN International Symposium on Memory ManagementWork-stealing is promising for scheduling and balancing parallel workloads. It has a wide range of applicability on middleware, libraries, and runtime systems of programming languages. OpenJDK uses work-stealing for copying garbage collection (GC) to ...
Scaling up parallel GC work-stealing in many-core environments
ISMM 2019: Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory ManagementParallel copying garbage collection (GC) is widely used in the de facto Java virtual machines such as OpenJDK and OpenJ9. OpenJDK uses work-stealing for copying objects in the Parallel GC and Garbage-First (G1) GC policies to balance the copying task ...







Comments