Abstract
With core counts on the rise, the sequential components of applications are becoming the major bottleneck in performance scaling as predicted by Amdahl's law. We are therefore faced with the simultaneous problems of occupying an increasing number of cores and speeding up sequential sections. In this work, we reconcile these two seemingly incompatible problems with a novel programming model called N-way. The core idea behind N-way is to benefit from the algorithmic diversity available to express certain key computational steps. By simultaneously launching in parallel multiple ways to solve a given computation, a runtime can just-in-time pick the best (for example the fastest) way and therefore achieve speedup.
Previous work has demonstrated the benefits of such an approach but has not addressed its inherent waste. In this work, we focus on providing a mathematically sound learning-based statistical model that can be used by a runtime to determine the optimal balance between resources used and benefits obtainable through N-way. We further describe a dynamic culling mechanism to further reduce resource waste.
We present abstractions and a runtime support to cleanly encapsulate the computational-options and monitor their progress. We demonstrate a low-overhead runtime that achieves significant speedup over a range of widely used kernels. Our results demonstrate super-linear speedups in certain cases.
- J. Ansel, Y. L. Wong, C. Chan, M. Olszewski, A. Edelman, and S. Amarasinghe. Language and compiler support for auto-tuning variable-accuracy algorithms. In CGO '11. IEEE Computer Society, 2011. Google Scholar
Digital Library
- K. Asanovic et al. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006--183, EECS Department, University of California, Berkeley, Dec 2006.Google Scholar
- E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for C/C+. In OOPSLA '09, pages 81--96, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- J. Cachopo and A. Rito-Silva. Versioned boxes as the basis for memory transactions. Sci. Comput. Program., 63(2):172--185, 2006. Google Scholar
Digital Library
- CLANG: A C family frontend for LLVM. http://clang.llvm.org/, 2010.Google Scholar
- R. Cledat, T. Kumar, J. Sreeram, and S. Pande. Opportunistic computing: A new paradigm for scalable realism on many cores. In HotPar 2009: 1st USENIX Workshop on Hot Topics in Parallelism. USENIX, 2009. Google Scholar
Digital Library
- R. Cledat and S. Pande. Energy efficiency via the n-way model. In PESPMA 2010, in conjunction with ISCA. ACM, 2010.Google Scholar
- B. Cox, D. Evans, A. Filipi, J. Rowanhill, W. Hu, J. Davidson, J. Knight, A. Nguyen-tuong, and J. Hiser. N-variant systems: A secretless framework for security through diversity. In In Proceedings of the 15th USENIX Security Symposium, pages 105--120, 2006. Google Scholar
Digital Library
- Dimacs benchmarks. http://tinyurl.com/myj2m7, 2009.Google Scholar
- Y. Hamadi, S. Jabbour, and L. Sais. Manysat: Solver description. Technical Report MSR-TR-2008--83, Microsoft Research, May 2008.Google Scholar
- T. Harris and K. Fraser. Language support for lightweight transactions. In OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388--402, New York, NY, USA, 2003. ACM Press. Google Scholar
Digital Library
- M. D. Hill and M. R. Marty. Amdahl's law in the multicore era. IEEE COMPUTER, 2008. Google Scholar
Digital Library
- Intel haswell. http://tinyurl.com/28dxp67, 2010.Google Scholar
- Intel shows 48-core 'datacentre on a chip'. http://tinyurl.com/2fyhejo, 2010.Google Scholar
- S. K. Iyer, J. Jain, M. R. Prasad, D. Sahoo, and T. Sidle. Error detection using BMC in a parallel environment. In CHARME, pages 354--358, 2005. Google Scholar
Digital Library
- C. F. Joerg. The Cilk System for Parallel Multithreaded Computing. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, Jan. 1996. Available as MIT Laboratory for Computer Science Technical Report MIT/LCS/TR-701. Google Scholar
Digital Library
- K. Knobe. Ease of use with concurrent collections (CnC). In HotPar 2009: 1st USENIX Workshop on Hot Topics in Parallelism. USENIX, 2009. Google Scholar
Digital Library
- J. J. Kuffner Jr. and S. M. Lavalle. RRT-connect: An efficient approach to single-query path planning. In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pages 995--1001, 2000.Google Scholar
Cross Ref
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07, pages 211--222, 2007. Google Scholar
Digital Library
- M. Luby and W. Ertel. Optimal parallelization of las vegas algorithms. In STACS '94, pages 463--474. Springer, 1994. Google Scholar
Digital Library
- S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard. Quality of service profiling. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE '10, pages 25--34, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- M. Mitzenmacher and E. Upfal. Probability and Computing. Cambridge University Press, 2005.Google Scholar
Digital Library
- R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. Google Scholar
Digital Library
- D. Patterson. The trouble with multicore. http://spectrum.ieee.org/computing/software/the-trouble-with-multicore/%, July 2010. Google Scholar
Digital Library
- G. Reinelt. TSPLIB - a traveling salesman problem library. In ORSA Journal on Computing, volume 3, pages 376--384, 1991.Google Scholar
- M. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proceedings of the 20th annual international conference on Supercomputing, ICS '06, pages 324--334, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- B. Salamat, T. Jackson, A. Gal, and M. Franz. Orchestra: intrusion detection using parallel execution and monitoring of program variants in user-space. In EuroSys '09: Proceedings of the 4th ACM European conference on Computer systems, pages 33--46, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- B. Selman, H. Kautz, and B. Cohen. Local search strategies for satisfiability testing. In DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 521--532, 1995.Google Scholar
- TomLab. CPLEX parameters interface. http://tomopt.com/docs/cplexug/tomlab_cplex014.php, March 2010.Google Scholar
- O. Trachsel and T. Gross. A platform for competitive execution. In PESPMA 2008, in conjunction with ISCA. ACM, 2008.Google Scholar
- O. Trachsel and T. R. Gross. Variant-based competitive parallel execution of sequential programs. In CF '10: Proceedings of the 7th ACM international conference on Computing frontiers, pages 197--206, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- M. Tygert. A fast algorithm for computing minimal-norm solutions to underdetermined systems of linear equations. May 2009.Google Scholar
- V. Vazirani. Approximation Algorithms. Springer, 2001. Google Scholar
Digital Library
- M. Wall. GAlib. http://lancet.mit.edu/ga/, 2009.Google Scholar
- C. M. Wintersteiger, Y. Hamadi, and L. Moura. A concurrent portfolio approach to smt solving. In CAV '09, pages 715--720, Berlin, Heidelberg, 2009. Springer-Verlag. Google Scholar
Digital Library
Index Terms
Efficiently speeding up sequential computation through the n-way programming model
Recommendations
Efficiently speeding up sequential computation through the n-way programming model
OOPSLA '11: Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applicationsWith core counts on the rise, the sequential components of applications are becoming the major bottleneck in performance scaling as predicted by Amdahl's law. We are therefore faced with the simultaneous problems of occupying an increasing number of ...
Study of parallel programming models on computer clusters with Intel MIC coprocessors
Coprocessors based on the Intel Many Integrated Core MIC Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. ...
Speeding Up Galois Field Arithmetic on Intel MIC Architecture
NPC 2013: Proceedings of the 10th IFIP International Conference on Network and Parallel Computing - Volume 8147Galois Field arithmetic is the basis of LRC, RS and many other erasure coding approaches. Traditional implementations of Galois Field arithmetic use multiplication tables or discrete logarithms, which limit the speed of its computation. The Intel Many ...







Comments