Abstract
Applications composed of multiple parallel libraries perform poorly when those libraries interfere with one another by obliviously using the same physical cores, leading to destructive resource oversubscription. This paper presents the design and implementation of Lithe, a low-level substrate that provides the basic primitives and a standard interface for composing parallel codes efficiently. Lithe can be inserted underneath the runtimes of legacy parallel libraries to provide bolt-on composability without needing to change existing application code. Lithe can also serve as the foundation for building new parallel abstractions and libraries that automatically interoperate with one another.
In this paper, we show versions of Threading Building Blocks (TBB) and OpenMP perform competitively with their original implementations when ported to Lithe. Furthermore, for two applications composed of multiple parallel libraries, we show that leveraging our substrate outperforms their original, even expertly tuned, implementations.
- Atul Adya et al. Cooperative task management without manual stack management. In USENIX, 2002. Google Scholar
Digital Library
- Thomas Anderson et al. Scheduler activations: Effective kernel support for the user-level management of parallelism. In SOSP, 1991. Google Scholar
Digital Library
- Animoto. http://www.animoto.com.Google Scholar
- Robert Blumofe et al. Cilk: An efficient multithreaded runtime system. In PPOPP, 1995.. Google Scholar
Digital Library
- Rohit Chandra et al. Parallel Programming in OpenMP. Morgan Kaufmann, 2001. Google Scholar
Digital Library
- Jike Chong et al. Scalable hmm based inference engine in large vocabulary continuous speech recognition. In ICME, 2009. Google Scholar
Digital Library
- Timothy Davis. Multifrontal multithreaded rank-revealing sparse QR factorization. Transactions on Mathematical Software, Submitted.Google Scholar
- K. Dussa et al. Dynamic partitioning in a Transputer environment. In SIGMETRICS, 1990. Google Scholar
Digital Library
- EVE Online. http://www.eveonline.com.Google Scholar
- Kathleen Fisher and John Reppy. Compiler support for lightweight concurrency. Technical report, Bell Labs, 2002.Google Scholar
- Flickr. http://www.flickr.com.Google Scholar
- Matthew Fluet et al. A scheduling framework for general-purpose parallel languages. In ICFP, 2008. Google Scholar
Digital Library
- Bryan Ford and Sai Susarla. CPU inheritance scheduling. In OSDI, 1996. Google Scholar
Digital Library
- Seth Copen Goldstein et al. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 1996. Google Scholar
Digital Library
- Google Voice. http://voice.google.com.Google Scholar
- GraphicsMagick. http://www.graphicsmagick.org.Google Scholar
- Benjamin Hindman. Libprocess. http://www.eecs.berkeley.edu/ benh/libprocess.Google Scholar
- Parry Husbands and Katherine Yelick. Multithreading and one-sided communication in parallel lu factorization. In Supercomputing, 2007. Google Scholar
Digital Library
- Intel. Math Kernel Library for the Linux Operating System: User's Guide. 2007.Google Scholar
- Ravi Iyer. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In ICS, 2004. Google Scholar
Digital Library
- Haoqiang Ji et al. The OpenMP implementation of NAS parallel benchmarks and its performance. Technical report, NASA Research Center, 1999.Google Scholar
- Laxmikant V. Kale, Joshua Yelon, and Timothy Knauff. Threads for interoperable parallel programming. Languages and Compilers for Parallel Computing, 1996. Google Scholar
Digital Library
- Jakub Kurzak et al. Scheduling linear algebra operations on multicore processors. Technical report, LAPACK, 2009.Google Scholar
- C. L. Lawson et al. Basic linear algebra subprograms for FORTRAN usage. Transactions on Mathematical Software, 1979. Google Scholar
Digital Library
- Jae Lee et al. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In ISCA, 2008. Google Scholar
Digital Library
- Peng Li et al. Lightweight concurrency primitives. In Haskell, 2007. Google Scholar
Digital Library
- Rose Liu et al. Tessellation: Space-time partitioning in a manycore client OS. In HotPar, 2009. Google Scholar
Digital Library
- Brian Marsh et al. First-class user-level threads. OS Review, 1991. Google Scholar
Digital Library
- Cathy McCann et al.A dynamic processor allocation policy for multiprogrammed shared--memory multiprocessors. Transactions on Computer Systems, 1993. Google Scholar
Digital Library
- Ana Lucia De Moura and Robert Ierusalimschy. Revisiting coroutines. Transactions on Programming Languages and Systems, 2009. Google Scholar
Digital Library
- Rajesh Nishtala and Kathy Yelick. Optimizing collective communication on multicores. In HotPar, 2009. Google Scholar
Digital Library
- Simon Peter et al. 30 seconds is not enough! a study of operating system timer usage. In Eurosys, 2008. Google Scholar
Digital Library
- John Regehr. Using Hierarchical Scheduling to Support Soft Real-Time Applications in General-Purpose Operating Systems. PhD thesis, University of Virginia, 2001. Google Scholar
Digital Library
- James Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly, 2007. Google Scholar
Digital Library
Index Terms
Composing parallel software efficiently with lithe
Recommendations
Composing parallel software efficiently with lithe
PLDI '10: Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and ImplementationApplications composed of multiple parallel libraries perform poorly when those libraries interfere with one another by obliviously using the same physical cores, leading to destructive resource oversubscription. This paper presents the design and ...
Efficient multiprogramming for multicores with SCAF
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on MicroarchitectureAs hardware becomes increasingly parallel and the availability of scalable parallel software improves, the problem of managing multiple multithreaded applications (processes) becomes important. Malleable processes, which can vary the number of threads ...
Transparently Space Sharing a Multicore Among Multiple Processes
As hardware becomes increasingly parallel and the availability of scalable parallel software improves, the problem of managing multiple multithreaded applications (processes) becomes important. Malleable processes, which can vary the number of threads ...







Comments