Abstract
A plethora of program analysis and optimization techniques rely on linear programming at their heart. However, such techniques are often considered too slow for production use. While today’s best solvers are optimized for complex problems with thousands of dimensions, linear programming, as used in compilers, is typically applied to small and seemingly trivial problems, but to many instances in a single compilation run. As a result, compilers do not benefit from decades of research on optimizing large-scale linear programming. We design a simplex solver targeted at compilers. A novel theory of transprecision computation applied from individual elements to full data-structures provides the computational foundation. By carefully combining it with optimized representations for small and sparse matrices and specialized small-coefficient algorithms, we (1) reduce memory traffic, (2) exploit wide vectors, and (3) use low-precision arithmetic units effectively. We evaluate our work by embedding our solver into a state-of-the-art integer set library and implement one essential operation, coalescing, on top of our transprecision solver. Our evaluation shows more than an order-of-magnitude speedup on the core simplex pivot operation and a mean speedup of 3.2x (vs. GMP) and 4.6x (vs. IMath) for the optimized coalescing operation. Our results demonstrate that our optimizations exploit the wide SIMD instructions of modern microarchitectures effectively. We expect our work to provide foundations for a future integer set library that uses transprecision arithmetic to accelerate compiler analyses.
Supplemental Material
- Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Adam Betts, Alastair F Donaldson, Jeroen Ketema, et al. 2015. Pencil: A platform-neutral compute intermediate language for accelerator programming. In 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 138-149. https://doi.org/10.1109/pact. 2015.17 Google Scholar
Digital Library
- Roberto Bagnara, Patricia M Hill, and Enea Zafanella. 2008. The Parma Polyhedra Library: Toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems. Science of Computer Programming 72, 1-2 ( 2008 ), 3-21. https://doi.org/10.1016/j.scico. 2007. 08.001 Google Scholar
Digital Library
- Wenlei Bao, Sriram Krishnamoorthy, Louis-Noel Pouchet, and P Sadayappan. 2017. Analytical modeling of cache behavior for afine programs. Proceedings of the ACM on Programming Languages 2, POPL ( 2017 ), 32. https://doi.org/10.1145/3158120 Google Scholar
Digital Library
- Dimitris Bertsimas and John N Tsitsiklis. 1997. Introduction to linear optimization. Vol. 6. Athena Scientific Belmont, MA.Google Scholar
Digital Library
- Erin Carson and Nicholas J Higham. 2018. Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM Journal on Scientific Computing 40, 2 ( 2018 ), A817-A847. https://doi.org/10.1137/17m1140819 Google Scholar
Cross Ref
- C. Chambers and D. Ungar. 1989. Customization: Optimizing Compiler Technology for SELF, a Dynamically-typed Objectoriented Programming Language. In Proceedings of the ACM SIGPLAN 1989 Conference on Programming Language Design and Implementation (PLDI '89). 146-160. https://doi.org/10.1145/73141.74831 Google Scholar
Digital Library
- Sharan Chetlur, Clif Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Eficient primitives for deep learning. arXiv preprint arXiv:1410.0759 ( 2014 ).Google Scholar
- Richard Crandall and Carl B Pomerance. 2006. Prime numbers: a computational perspective. Vol. 182. Springer Science & Business Media. https://doi.org/10.2307/3621190 Google Scholar
Cross Ref
- David Detlefs, Greg Nelson, and James B Saxe. 2005. Simplify: a theorem prover for program checking. Journal of the ACM (JACM) 52, 3 ( 2005 ), 365-473. https://doi.org/10.1145/1066100.1066102 Google Scholar
Digital Library
- Will Dietz, Peng Li, John Regehr, and Vikram Adve. 2015. Understanding integer overflow in C/C++. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 1 ( 2015 ), 2. https://doi.org/10.1109/icse. 2012.6227142 Google Scholar
Cross Ref
- Paul Feautrier. 1988. Parametric integer programming. RAIRO-Operations Research 22, 3 ( 1988 ), 243-268. https://doi.org/10. 1051/ro/1988220302431 Google Scholar
Cross Ref
- M. J. Fromberger. 2019. imath. https://github.com/creachadair/imath. Accessed: 2019-04-25.Google Scholar
- Philip E Gill, Walter Murray, Michael A Saunders, and Margaret H Wright. 1984. Sparse matrix methods in optimization. SIAM J. Sci. Statist. Comput. 5, 3 ( 1984 ), 562-589. https://doi.org/10.21236/ada124397 Google Scholar
Cross Ref
- Torbjrn Granlund et al. 2015. GNU MP 6.0 Multiple precision arithmetic library. Samurai Media Limited.Google Scholar
- Tobias Grosser, Albert Cohen, Justin Holewinski, Ponuswamy Sadayappan, and Sven Verdoolaege. 2014. Hybrid hexagonal/classical tiling for GPUs. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. 66-75. https://doi.org/10.1145/2544137.2544160 Google Scholar
Digital Library
- Tobias Grosser, Armin Groesslinger, and Christian Lengauer. 2012. Polly-performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters 22, 04 ( 2012 ), 1250010. https://doi.org/10.1145/2925426.2926286 Google Scholar
Digital Library
- Tobias Grosser, Sven Verdoolaege, and Albert Cohen. 2015. Polyhedral AST generation is more than scanning polyhedra. ACM Transactions on Programming Languages and Systems (TOPLAS) 37, 4 ( 2015 ), 1-50. https://doi.org/10.1145/2743016 Google Scholar
Digital Library
- Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning. 1737-1746.Google Scholar
Digital Library
- Tobias Gysi, Tobias Grosser, Laurin Brandner, and Torsten Hoefler. 2019. A fast analytical model of fully associative caches. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 816-829. https://doi.org/10.1145/3314221.3314606 Google Scholar
Digital Library
- Christoph Haase. 2018. A survival guide to presburger arithmetic. ACM SIGLOG News 5, 3 ( 2018 ), 67-82. https://doi.org/10. 1145/3242953.3242964 Google Scholar
Digital Library
- Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Panruo Wu, Srikara Pranesh, Stanimire Tomov, and Jack Dongarra. 2018a. The design of fast and energy-eficient linear solvers: On the potential of half-precision arithmetic and iterative refinement techniques. In International Conference on Computational Science. Springer, 586-600. https://doi.org/10.1007/ 978-3-319-93698-7_45 Google Scholar
Cross Ref
- Azzam Haidar, Stanimire Tomov, Jack Dongarra, and Nicholas J Higham. 2018b. Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 47. https://doi.org/10.1109/sc. 2018.00050 Google Scholar
Digital Library
- Jared Hoberock. 2019. C+ + Extensions for Parallelism Version 2 ( Working Draft, N4808 ). Accessed: 2019-07-22.Google Scholar
- Urs Hölzle, Craig Chambers, and David Ungar. 1992. Debugging Optimized Code with Dynamic Deoptimization. In Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation (PLDI '92). 32-43. https://doi.org/10.1145/143095.143114 Google Scholar
Digital Library
- Urs Hölzle and David Ungar. 1994. Optimizing Dynamically-dispatched Calls with Run-time Type Feedback. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI '94). 326-336. https: //doi.org/10.1145/178243.178478 Google Scholar
Digital Library
- Elias N Houstis, John R Rice, NP Chrisochoides, HC Karathanasis, PN Papochiou, EA Vavalis, and Ko Yang Wang. 1990. //ELLPACK: A Numerical Simulation Programming Environment for Parallel MIMD Machines. In Proceedings of the 4th International Conference on Supercomputing (ICS '90). Association for Computing Machinery, New York, NY, USA, 96-107. https://doi.org/10.1145/77726.255144 Google Scholar
Digital Library
- Thomas Kotzmann, Christian Wimmer, Hanspeter Mössenböck, Thomas Rodriguez, Kenneth Russell, and David Cox. 2008. Design of the Java HotSpot Client Compiler for Java 6. ACM Transactions on Architecture and Code Optimization (TACO) 5, 1, Article 7 (May 2008 ), 32 pages. https://doi.org/10.1145/1369396.1370017 Google Scholar
Digital Library
- Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, and Alan R Bishop. 2014. A unified sparse matrix data format for eficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM Journal on Scientific Computing 36, 5 ( 2014 ), C401-C423. https://doi.org/10.1137/130930352 Google Scholar
Cross Ref
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: feedback-directed and runtime optimization. IEEE Computer Society, 75. https://doi.org/10.1109/cgo. 2004.1281665 Google Scholar
Cross Ref
- Vincent Loechner. 1999. PolyLib: A library for manipulating parameterized polyhedra.Google Scholar
- László Lovász and Herbert E Scarf. 1992. The generalized basis reduction algorithm. Mathematics of Operations Research 17, 3 ( 1992 ), 751-764. https://doi.org/10.1287/moor.17.3. 751 Google Scholar
Cross Ref
- Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jefrey S Vetter. 2018. Nvidia tensor core programmability, performance & precision. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 522-531. https://doi.org/10.1109/ipdpsw. 2018.00091 Google Scholar
Cross Ref
- Alexander Monakov, Anton Lokhmotov, and Arutyun Avetisyan. 2010. Automatically tuning sparse matrix-vector multiplication for GPU architectures. In International Conference on High-Performance Embedded Architectures and Compilers. Springer, 111-125. https://doi.org/10.1007/978-3-642-11515-8_10 Google Scholar
Digital Library
- Charles Gregory Nelson. 1981. Techniques for program verification. Xerox. Palo Alto Research Center.Google Scholar
- Philip Pfafe, Tobias Grosser, and Martin Tillmann. 2019. Eficient hierarchical online-autotuning: a case study on polyhedral accelerator mapping. In Proceedings of the ACM International Conference on Supercomputing. 354-366. https://doi.org/10. 1145/3330345.3330377 Google Scholar
Digital Library
- Louis-Noël Pouchet. 2012. Polybench: The polyhedral benchmark suite. ( 2012 ).Google Scholar
- Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen, and John Cavazos. 2008. Iterative Optimization in the Polyhedral Model: Part II, Multidimensional Time. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08). Association for Computing Machinery, New York, NY, USA, 90-100. https: //doi.org/10.1145/1375581.1375594 Google Scholar
Digital Library
- Louis-Noel Pouchet, Cedric Bastoul, Albert Cohen, and Nicolas Vasilache. 2007. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In International Symposium on Code Generation and Optimization (CGO'07). IEEE, 144-156. https://doi.org/10.1109/cgo. 2007.21 Google Scholar
Digital Library
- Mojzesz Presburger. 1929. Über die Vollständigkeit eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt in Comptes Rendus du I congres de Mathématiciens des Pays Slaves. Slaves, Warsaw ( 1929 ), 92-101.Google Scholar
- Manuel Rigger, Stefan Marr, Bram Adams, and Hanspeter Mössenböck. 2019. Understanding GCC Builtins to Develop Better Tools. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019 ). 74-85. https://doi.org/10.1145/3338906.3338907 Google Scholar
Digital Library
- A Schriver. 1986. Theory of integer and linear programming.Google Scholar
- Ramakrishna Upadrasta and Albert Cohen. 2013. Sub-Polyhedral Scheduling Using (Unit-)Two-Variable-per-Inequality Polyhedra. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '13). Association for Computing Machinery, New York, NY, USA, 483-496. https://doi.org/10.1145/2429069.2429127 Google Scholar
Digital Library
- Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv: 1802. 04730 ( 2018 ).Google Scholar
- Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In International Congress on Mathematical Software. Springer, 299-302. https://doi.org/10.1007/978-3-642-15582-6_49 Google Scholar
Cross Ref
- Sven Verdoolaege. 2015. Integer set coalescing. In International Workshop on Polyhedral Compilation Techniques, Date: 2015 /01/19-2015/01/19, Location: Amsterdam, The Netherlands.Google Scholar
- Sven Verdoolaege. 2020. Integer Set Library: Manual, Version 0.22.1. Retrieved from http://isl.gforge. inria.fr/manual.pdf on 31.08. 2020.Google Scholar
- Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, Jose Ignacio Gomez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Transactions on Architecture and Code Optimization (TACO) 9, 4 ( 2013 ), 54. https://doi.org/10.1145/2400682.2400713 Google Scholar
Digital Library
- Josef Weidendorfer. 2008. Sequential performance analysis with callgrind and kcachegrind. In Tools for High Performance Computing. Springer, 93-113. https://doi.org/10.1007/978-3-540-68564-7_7 Google Scholar
Cross Ref
- Thomas Würthinger, Christian Wimmer, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Christian Humer, Gregor Richards, Doug Simon, and Mario Wolczko. 2013. One VM to Rule Them All. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software (Onward! 2013). 187-204. https: //doi.org/10.1145/2509578.2509581 Google Scholar
Digital Library
- Zahari Zlatev. 1991. Sparse Matrix Technique for Ordinary Diferential Equations. Springer Netherlands, 131-154. https: //doi.org/10.1007/ 978-94-017-1116-6_8 Google Scholar
Cross Ref
Index Terms
Fast linear programming through transprecision computing on small and sparse data
Recommendations
Safe bounds in linear and mixed-integer linear programming
Current mixed-integer linear programming solvers are based on linear programming routines that use floating-point arithmetic. Occasionally, this leads to wrong solutions, even for problems where all coefficients and all solution components are small ...
A simplex algorithm for piecewise-linear programming I: Derivation and proof
The simplex method for linear programming can be extended to permit the minimization of any convex separable piecewise-linear objective, subject to linear constraints. This three-part paper develops and analyzes a general, computationally practical ...
Transformation of a multi-choice linear programming problem
The aim of this paper is to transform a multi-choice linear programming problem to a standard mathematical programming problem where the right hand side goals of some constraints are 'multi-choice' in nature. For each of the constraint there may exist ...






Comments