skip to main content
research-article
Open Access

Functional collection programming with semi-ring dictionaries

Published:29 April 2022Publication History
Skip Abstract Section

Abstract

This paper introduces semi-ring dictionaries, a powerful class of compositional and purely functional collections that subsume other collection types such as sets, multisets, arrays, vectors, and matrices. We developed SDQL, a statically typed language that can express relational algebra with aggregations, linear algebra, and functional collections over data such as relations and matrices using semi-ring dictionaries. Furthermore, thanks to the algebraic structure behind these dictionaries, SDQL unifies a wide range of optimizations commonly used in databases (DB) and linear algebra (LA). As a result, SDQL enables efficient processing of hybrid DB and LA workloads, by putting together optimizations that are otherwise confined to either DB systems or LA frameworks. We show experimentally that a handful of DB and LA workloads can take advantage of the SDQL language and optimizations. SDQL can be competitive with or outperforms a host of systems that are state of the art in their own domain: in-memory DB systems Typer and Tectorwise for (flat, not nested) relational data; SciPy for LA workloads; sparse tensor compiler taco; the Trance nested relational engine; and the in-database machine learning engines LMFAO and Morpheus for hybrid DB/LA workloads over relational data.

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, USA. 265–283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mahmoud Abo Khamis, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. 2018. In-Database Learning with Sparse Tensors. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (SIGMOD/PODS ’18). Association for Computing Machinery, New York, NY, USA. 325–340. isbn:9781450347068Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. 2016. FAQ: Questions Asked Frequently. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS ’16). Association for Computing Machinery, New York, NY, USA. 13–28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Srinivas M Aji and Robert J McEliece. 2000. The generalized distributive law. IEEE transactions on Information Theory, 46, 2 (2000), 325–343.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yael Amsterdamer, Daniel Deutch, and Val Tannen. 2011. Provenance for aggregate queries. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 153–164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Johan Anker and Josef Svenningsson. 2013. An EDSL approach to high performance Haskell programming. In ACM Haskell Symposium. ACM, New York, NY, USA. 1–12.Google ScholarGoogle Scholar
  7. Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD ’15). ACM, New York, NY, USA. 1383–1394. isbn:978-1-4503-2758-9Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Emil Axelsson, Koen Claessen, Mary Sheeran, Josef Svenningsson, David Engdal, and Anders Persson. 2011. The Design and Implementation of Feldspar an Embedded Language for Digital Signal Processing. In Proceedings of the 22Nd International Conference on Implementation and Application of Functional Languages (IFL’10). Springer-Verlag, Berlin, Heidelberg. 121–136. isbn:978-3-642-24275-5Google ScholarGoogle ScholarCross RefCross Ref
  9. R. C. Backhouse and B. A. Carré. 1975. Regular Algebra Applied to Path-finding Problems. IMA Journal of Applied Mathematics, 15, 2 (1975), 04, 161–186. issn:0272-4960Google ScholarGoogle ScholarCross RefCross Ref
  10. Brett W Bader and Tamara G Kolda. 2008. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing, 30, 1 (2008), 205–231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Peter Boncz, Thomas Neumann, and Orri Erling. 2014. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. Springer International Publishing, Cham. 61–76. isbn:978-3-319-04936-6Google ScholarGoogle Scholar
  12. Val Breazu-Tannen, Peter Buneman, and Limsoon Wong. 1992. Naturally embedded query languages. Springer.Google ScholarGoogle Scholar
  13. Val Breazu-Tannen and Ramesh Subrahmanyam. 1991. Logical and computational aspects of programming with sets/bags/lists. Springer.Google ScholarGoogle Scholar
  14. Robert Brijder, Floris Geerts, Jan Van Den Bussche, and Timmy Weerwag. 2019. On the Expressive Power of Query Languages for Matrices. ACM Trans. Database Syst., 44, 4 (2019), Article 15, oct, 31 pages. issn:0362-5915Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Robert Brijder, Floris Geerts, Jan Van Den Bussche, and Timmy Weerwag. 2019. On the Expressive Power of Query Languages for Matrices. ACM Trans. Database Syst., 44, 4 (2019), Article 15, oct, 31 pages. issn:0362-5915Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Peter Buneman, Shamim Naqvi, Val Tannen, and Limsoon Wong. 1995. Principles of Programming with Complex Objects and Collection Types. Theor. Comput. Sci., 149, 1 (1995), Sept., 3–48. issn:0304-3975Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam Hruschka, and Tom Mitchell. 2010. Toward an architecture for never-ending language learning. In Proceedings of the AAAI Conference on Artificial Intelligence. 24.Google ScholarGoogle ScholarCross RefCross Ref
  18. Zachary R. Chalmers, Caitlin F. Connelly, David Fabrizio, Laurie Gay, Siraj M. Ali, Riley Ennis, Alexa Schrock, Brittany Campbell, Adam Shlien, Juliann Chmielecki, Franklin Huang, Yuting He, James Sun, Uri Tabori, Mark Kennedy, Daniel S. Lieber, Steven Roels, Jared White, Geoffrey A. Otto, Jeffrey S. Ross, Levi Garraway, Vincent A. Miller, Phillip J. Stephens, and Garrett M. Frampton. 2017. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Medicine, 9, 1 (2017), 34.Google ScholarGoogle ScholarCross RefCross Ref
  19. Lingjiao Chen, Arun Kumar, Jeffrey Naughton, and Jignesh M Patel. 2017. Towards linear algebra over normalized data. Proceedings of the VLDB Endowment, 10, 11 (2017), 1214–1225.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. James Cheney, Sam Lindley, and Philip Wadler. 2014. Query shredding: efficient relational evaluation of queries over nested multisets. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 1027–1038.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: A Parallel DSL for Image Analysis and Visualization. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, 111–120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 123, Oct., 30 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu. 2017. HoTTSQL: Proving query rewrites with univalent SQL semantics. ACM SIGPLAN Notices, 52, 6 (2017), 510–524.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Koen Claessen, Mary Sheeran, and Bo Joel Svensson. 2012. Expressive Array Constructs in an Embedded GPU Kernel Programming Language. In Proceedings of the 7th Workshop on Declarative Aspects and Applications of Multicore Programming (DAMP ’12). ACM, NY, USA. 21–30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM, 13, 6 (1970), June, 377–387. issn:0001-0782Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. National Research Council (US) Committee. 2005. On the Nature of Biological Data. In Catalyzing Inquiry at the Interface of Computing and Biology, John C. Wooley and Herbert S. Lin (Eds.). National Academies Press (US).Google ScholarGoogle Scholar
  27. Keith Conrad. 2018. Tensor products. Notes of course, available on-line.Google ScholarGoogle Scholar
  28. Ezra Cooper, Sam Lindley, Philip Wadler, and Jeremy Yallop. 2007. Links: Web Programming Without Tiers. In Proceedings of the 5th International Conference on Formal Methods for Components and Objects (FMCO’06). Springer-Verlag, Berlin, Heidelberg. 266–296. isbn:3-540-74791-5, 978-3-540-74791-8Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to algorithms. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream Fusion. From Lists to Streams to Nothing at All. In ICFP ’07.Google ScholarGoogle Scholar
  31. Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Ugur Çetintemel, and Stanley B Zdonik. 2015. Tupleware:" Big" Data, Big Analytics, Small Clusters.. In CIDR.Google ScholarGoogle Scholar
  32. Stephen Dolan. 2013. Fun with Semirings: A Functional Pearl on the Abuse of Linear Algebra. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming (ICFP ’13). Association for Computing Machinery, New York, NY, USA. 101–110. isbn:9781450323260Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kento Emoto, Sebastian Fischer, and Zhenjiang Hu. 2012. Filter-embedding semiring fusion for programming with MapReduce. Formal Aspects of Computing, 24, 4 (2012), 623–645.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Laura Fancello, Sara Gandini, Pier Giuseppe Pelicci, and Luca Mazzarella. 2019. Tumor mutational burden quantification from targeted gene panels: major advancements and challenges. Journal for ImmunoTherapy of Cancer, 7, 1 (2019), 183. isbn:2051-1426 https://doi.org/10.1186/s40425-019-0647-4 Google ScholarGoogle ScholarCross RefCross Ref
  35. Corporacion Favorita. 2017. Corp. Favorita Grocery Sales Forecasting: Can you accurately predict sales for a large grocery chain?Google ScholarGoogle Scholar
  36. Leonidas Fegaras and David Maier. 2000. Optimizing Object Queries Using an Effective Calculus. ACM Trans. Database Syst., 25, 4 (2000), Dec., 457–516. issn:0362-5915Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xixuan Feng, Arun Kumar, Benjamin Recht, and Christopher Ré. 2012. Towards a Unified Architecture for in-RDBMS Analytics. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12). ACM, New York, NY, USA. 325–336. isbn:978-1-4503-1247-9Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Robert Fink, Larisa Han, and Dan Olteanu. 2012. Aggregation in Probabilistic Databases via Knowledge Compilation. 5, 5 (2012), jan, 490–501. issn:2150-8097Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Floris Geerts, Thomas Muñoz, Cristian Riveros, Jan Van den Bussche, and Domagoj Vrgoč. 2021. Matrix Query Languages. ACM SIGMOD Record, 50, 3 (2021), 6–19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jeremy Gibbons, Fritz Henglein, Ralf Hinze, and Nicolas Wu. 2018. Relational Algebra by Way of Adjunctions. Proc. ACM Program. Lang., 2, ICFP (2018), Article 86, July, 28 pages. https://doi.org/10.1145/3236781 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Andrew Gill, John Launchbury, and Simon L Peyton Jones. 1993. A short cut to deforestation. In Proceedings of the conference on Functional programming languages and computer architecture (FPCA). 223–232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Michel Gondran and Michel Minoux. 2008. Graphs, dioids and semirings: new models and algorithms. 41, Springer Science & Business Media.Google ScholarGoogle Scholar
  43. G. Graefe. 1994. Volcano-an extensible and parallel query evaluation system. IEEE Transactions on Knowledge and Data Engineering, 6, 1 (1994), 120–135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Todd J Green, Grigoris Karvounarakis, and Val Tannen. 2007. Provenance semirings. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 31–40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Clemens Grelck and Sven-Bodo Scholz. 2006. SAC—A Functional Array Language for Efficient Multi-threaded Execution. Int. Journal of Parallel Programming, 34, 4 (2006), 383–427. issn:1573-7640Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Torsten Grust, Jan Rittinger, and Tom Schreiber. 2010. Avalanche-safe LINQ Compilation. PVLDB, 3, 1-2 (2010), Sept., 162–172. issn:2150-8097Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Torsten Grust and MarcH. Scholl. 1999. How to Comprehend Queries Functionally. Journal of Intelligent Information Systems, 12, 2-3 (1999), 191–218. issn:0925-9902Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Joseph M Hellerstein, Christoper Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, and Kun Li. 2012. The MADlib analytics library: or MAD skills, the SQL. Proceedings of the VLDB Endowment, 5, 12 (2012), 1700–1711.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Troels Henriksen, Niels GW Serup, Martin Elsman, Fritz Henglein, and Cosmin E Oancea. 2017. Futhark: purely functional GPU-programming with nested parallelism and in-place array updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 556–571.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Dylan Hutchison, Bill Howe, and Dan Suciu. 2017. LaraDB: A minimalist kernel for linear and relational algebra computation. In Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond. 1–10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. S Idreos, F Groffen, N Nes, S Manegold, S Mullender, and M Kersten. 2012. Monetdb: Two decades of research in column-oriented database. IEEE Data Engineering Bulletin.Google ScholarGoogle Scholar
  52. Kenneth E Iverson. 1962. A Programming Language. In Proceedings of the May 1-3, 1962, spring joint computer conference. 345–351.Google ScholarGoogle Scholar
  53. Hayden Jananthan, Ziqi Zhou, Vijay Gadepally, Dylan Hutchison, Suna Kim, and Jeremy Kepner. 2017. Polystore mathematics of relational algebra. In 2017 IEEE International Conference on Big Data (Big Data). 3180–3189.Google ScholarGoogle ScholarCross RefCross Ref
  54. Simon Peyton Jones, Andrew Tolmach, and Tony Hoare. 2001. Playing by the rules: rewriting as a practical optimisation technique in GHC. In Haskell workshop. 1, 203–233.Google ScholarGoogle Scholar
  55. Simon Peyton Jones and Philip Wadler. 2007. Comprehensive comprehensions. In Proceedings of the ACM SIGPLAN workshop on Haskell workshop. 61–72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Manohar Jonnalagedda and Sandro Stucki. 2015. Fold-based Fusion As a Library: A Generative Programming Pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala. ACM, 41–50. isbn:978-1-4503-3626-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Konstantinos Karanasos, Matteo Interlandi, Doris Xin, Fotis Psallidas, Rathijit Sen, Kwanghyun Park, Ivan Popivanov, Supun Nakandal, Subru Krishnan, and Markus Weimer. 2020. Extending relational query processing with ML inference. In CIDR.Google ScholarGoogle Scholar
  58. Manos Karpathiotakis, Ioannis Alagiannis, Thomas Heinis, Miguel Branco, and Anastasia Ailamaki. 2015. Just-in-time data virtualization: Lightweight data management with ViDa. In CIDR.Google ScholarGoogle Scholar
  59. Grigoris Karvounarakis and Todd J Green. 2012. Semiring-annotated data: queries and provenance? ACM SIGMOD Record, 41, 3 (2012), 5–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Gabriele Keller, Manuel MT Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, and Ben Lippmeier. 2010. Regular, shape-polymorphic, parallel arrays in Haskell. ACM Sigplan Notices, 45, 9 (2010), 261–272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Jeremy Kepner, Peter Aaltonen, David Bader, Aydin Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, and Henning Meyerhenke. 2016. Mathematical foundations of the GraphBLAS. In 2016 IEEE High Performance Extreme Computing Conference (HPEC). 1–9.Google ScholarGoogle ScholarCross RefCross Ref
  62. Jeremy Kepner and John Gilbert. 2011. Graph algorithms in the language of linear algebra. 22, SIAM.Google ScholarGoogle Scholar
  63. Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, and Peter Boncz. 2018. Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. Proceedings of the VLDB Endowment, 11, 13 (2018), 2209–2222.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Mahmoud Abo Khamis, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. 2018. AC/DC: In-Database Learning Thunderstruck. In Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning (DEEM’18). ACM, New York, NY, USA. Article 8, 10 pages. isbn:978-1-4503-5828-6Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Oleg Kiselyov. 2018. Reconciling Abstraction with High Performance: A MetaOCaml approach. Foundations and Trends in Programming Languages, 5, 1 (2018), 1–101.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2017. Stream Fusion, to Completeness. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA. 285–299. isbn:978-1-4503-4660-3Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, Oct., 29 pages. issn:2475-1421Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nötzli, Daniel Lupei, and Amir Shaikhha. 2014. DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDBJ, 23, 2 (2014), 253–278. issn:1066-8888Google ScholarGoogle ScholarCross RefCross Ref
  69. Christoph Koch, Daniel Lupei, and Val Tannen. 2016. Incremental View Maintenance For Collection Programming. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS ’16). Association for Computing Machinery, New York, NY, USA. 75–90. isbn:9781450341912Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Konstantinos Krikellas, Stratis Viglas, and Marcelo Cintra. 2010. Generating code for holistic query evaluation. In ICDE. 613–624.Google ScholarGoogle Scholar
  71. Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2019. An intermediate representation for optimizing machine learning pipelines. Proceedings of the VLDB Endowment, 12, 11 (2019), 1553–1567.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Daniel J Lehmann. 1977. Algebraic structures for transitive closure. Theoretical Computer Science, 4, 1 (1977), 59–76.Google ScholarGoogle ScholarCross RefCross Ref
  73. Side Li, Lingjiao Chen, and Arun Kumar. 2019. Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra. In Proceedings of the 2019 International Conference on Management of Data. 1571–1588.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Leonid Libkin and Limsoon Wong. 1997. Query languages for bags and aggregate functions. Journal of Computer and System sciences, 55, 2 (1997), 241–272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Geoffrey Mainland, Roman Leshchinskiy, and Simon Peyton Jones. 2013. Exploiting Vector Instructions with Generalized Stream Fusion. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming (ICFP’13). ACM, New York, NY, USA. 37–48. isbn:978-1-4503-2326-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Marco Masseroli, Pietro Pinoli, Francesco Venco, Abdulrahman Kaitoua, Vahid Jalili, Fernando Palluzzi, Heiko Muller, and Stefano Ceri. 2015. GenoMetric Query Language: A Novel Approach to Large-scale Genomic Data Management. Bioinformatics, 31, 12 (2015), 1881–1888.Google ScholarGoogle ScholarCross RefCross Ref
  77. Erik Meijer, Brian Beckman, and Gavin Bierman. 2006. LINQ: Reconciling Object, Relations and XML in the .NET Framework. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD ’06). ACM, 706–706. isbn:1-59593-434-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning in Apache Spark. The Journal of Machine Learning Research, 17, 1 (2016), 1235–1241.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Guido Moerkotte and Thomas Neumann. 2011. Accelerating queries with group-by and join by groupjoin. Proceedings of the VLDB Endowment, 4, 11 (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Mehryar Mohri. 2002. Semiring frameworks and algorithms for shortest-distance problems. Journal of Automata, Languages and Combinatorics, 7, 3 (2002), 321–350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Fabian Nagel, Gavin Bierman, and Stratis D. Viglas. 2014. Code Generation for Efficient Query Processing in Managed Runtimes. PVLDB, 7, 12 (2014), Aug., 1095–1106. issn:2150-8097Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, 4, 9 (2011), 539–550.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Milos Nikolic and Dan Olteanu. 2018. Incremental View Maintenance with Triple Lock Factorization Benefits. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18). ACM, New York, NY, USA. 365–380. isbn:978-1-4503-4703-7Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Dan Olteanu and Maximilian Schleich. 2016. Factorized Databases. SIGMOD Rec., 45, 2 (2016), Sept., 5–16. issn:0163-5808Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia, and Stanford InfoLab. 2017. Weld: A Common Runtime for High Performance Data Analytics. In Conference on Innovative Data Systems Research (CIDR).Google ScholarGoogle Scholar
  86. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic Differentiation in PyTorch. In NIPS 2017 Autodiff Workshop: The Future of Gradient-based Machine Learning Software and Techniques.Google ScholarGoogle Scholar
  87. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, and Vincent Dubourg. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research, 12, Oct (2011), 2825–2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Markus Puschel, José MF Moura, Jeremy R Johnson, David Padua, Manuela M Veloso, Bryan W Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, and Yevgen Voronenko. 2005. SPIRAL: Code generation for DSP transforms. Proc. IEEE, 93, 2 (2005), 232–275.Google ScholarGoogle ScholarCross RefCross Ref
  89. Chengjie Qin and Florin Rusu. 2015. Speculative approximations for terascale distributed gradient descent optimization. In Proceedings of the Fourth Workshop on Data analytics in the Cloud. 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). ACM, New York, NY, USA. 519–530. isbn:978-1-4503-2014-6Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Raghu Ramakrishnan and Johannes Gehrke. 2000. Database Management Systems (2nd ed.). Osborne/McGraw-Hill. isbn:0072440422Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Mark A Roth, Herry F Korth, and Abraham Silberschatz. 1988. Extended algebra and calculus for nested relational databases. ACM Transactions on Database Systems (TODS), 13, 4 (1988), 389–417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Maximilian Schleich, Dan Olteanu, Mahmoud Abo Khamis, Hung Q. Ngo, and XuanLong Nguyen. 2019. A Layered Aggregate Engine for Analytics Workloads. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD ’19). ACM, New York, NY, USA. 1642–1659. isbn:978-1-4503-5643-5Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Maximilian Schleich, Dan Olteanu, and Radu Ciucanu. 2016. Learning Linear Regression Models over Factorized Joins. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD ’16). ACM, New York, NY, USA. 3–18. isbn:978-1-4503-3531-7Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Amir Shaikhha, Mohammad Dashti, and Christoph Koch. 2018. Push versus Pull-Based Loop Fusion in Query Engines. Journal of Functional Programming, 28 (2018), e10.Google ScholarGoogle ScholarCross RefCross Ref
  96. Amir Shaikhha, Andrew Fitzgibbon, Simon Peyton Jones, and Dimitrios Vytiniotis. 2017. Destination-passing Style for Efficient Memory Management. In Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing (FHPC 2017). ACM, New York, NY, USA. 12–23. isbn:978-1-4503-5181-2Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, and Simon Peyton Jones. 2019. Efficient differentiable programming in a functional array-processing language. Proceedings of the ACM on Programming Languages, 3, ICFP (2019), 97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Amir Shaikhha, Yannis Klonatos, and Christoph Koch. 2018. Building Efficient Query Engines in a High-Level Language. ACM Transactions on Database Systems, 43, 1 (2018), Article 4, April, 45 pages. issn:0362-5915Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, and Christoph Koch. 2016. How to Architect a Query Compiler. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16). ACM, New York, NY, USA. 1907–1922. isbn:978-1-4503-3531-7Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Amir Shaikhha and Lionel Parreaux. 2019. Finally, a Polymorphic Linear Algebra Language. In Proceedings of the 33rd European Conference on Object-Oriented Programming (ECOOP’19).Google ScholarGoogle Scholar
  101. Amir Shaikhha, Maximilian Schleich, Alexandru Ghita, and Dan Olteanu. 2020. Multi-Layer Optimizations for End-to-End Data Analytics. In CGO. 145–157.Google ScholarGoogle Scholar
  102. Amir Shaikhha, Maximilian Schleich, and Dan Olteanu. 2021. An Intermediate Representation for Hybrid Database and Machine Learning Workloads. Proc. VLDB Endow., 14, 12 (2021), 2831–2834.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Arun Kumar Side Li. 2019. MorpheusPy. https://github.com/ADALabUCSD/MorpheusPyGoogle ScholarGoogle Scholar
  104. Arun Kumar Side Li. 2019. MorpheusPy – Issue #3. https://github.com/ADALabUCSD/MorpheusPy/issues/3Google ScholarGoogle Scholar
  105. Jaclyn Smith, Michael Benedikt, Milos Nikolic, and Amir Shaikhha. 2020. Scalable querying of nested data. Proceedings of the VLDB Endowment, 14, 3 (2020), 445–457.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Shaden Smith, Niranjay Ravindran, Nicholas D Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In 2015 IEEE International Parallel and Distributed Processing Symposium. 61–70.Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Daniele G. Spampinato, Diego Fabregat-Traver, Paolo Bientinesi, and Markus Püschel. 2018. Program Generation for Small-scale Linear Algebra Applications. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO 2018). ACM, New York, NY, USA. 327–339. isbn:978-1-4503-5617-6Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Daniele G. Spampinato and Markus Püschel. 2016. A basic linear algebra compiler for structured matrices. In Proceedings of the 2016 International Symposium on Code Generation and Optimization. 117–127.Google ScholarGoogle Scholar
  109. Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating Performance Portable Code Using Rewrite Rules: From High-level Functional Expressions to High-performance OpenCL Code. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA. 205–217. isbn:978-1-4503-3669-7Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Peter H. Sudmant, Tobias Rausch, Eugene J. Gardner, Robert E. Handsaker, Alexej Abyzov, John Huddleston, Yan Zhang, Kai Ye, Goo Jun, and Markus Hsi-Yang Fritz. 2015. An integrated map of structural variation in 2,504 human genomes. Nature, 526, 7571 (2015), 75–81. issn:1476-4687 https://doi.org/10.1038/nature15394 Google ScholarGoogle ScholarCross RefCross Ref
  111. Arvind Sujeeth, HyoukJoong Lee, Kevin Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (ICML ’11). 609–616.Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Josef Svenningsson. 2002. Shortcut Fusion for Accumulating Parameters & Zip-like Functions. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming (ICFP ’02). ACM, 124–132. isbn:1-58113-487-8Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Bo Joel Svensson and Josef Svenningsson. 2014. Defunctionalizing Push Arrays. In Proceedings of the 3rd ACM SIGPLAN Workshop on Functional High-performance Computing (FHPC ’14). ACM, NY, USA. 43–52. isbn:978-1-4503-3040-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Ruby Y Tahboub, Grégory M Essertel, and Tiark Rompf. 2018. How to architect a query compiler, revisited. In Proceedings of the 2018 International Conference on Management of Data. 307–322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Akihiko Takano and Erik Meijer. 1995. Shortcut Deforestation in Calculational Form. In Proceedings of the Seventh International Conference on Functional Programming Languages and Computer Architecture (FPCA ’95). Association for Computing Machinery, New York, NY, USA. 306–313.Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Robert Endre Tarjan. 1981. A Unified Approach to Path Problems. J. ACM, 28, 3 (1981), jul, 577–593. issn:0004-5411Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Hail Team. 2020. Hail 0.2. https://github.com/hail-is/hailGoogle ScholarGoogle Scholar
  118. Phil Trinder. 1992. Comprehensions, a Query Notation for DBPLs. In Proc. of the 3rd DBPL workshop (DBPL3). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 55–68. isbn:1-55860-242-9Google ScholarGoogle Scholar
  119. Leslie G Valiant. 1975. General context-free recognition in less than cubic time. Journal of computer and system sciences, 10, 2 (1975), 308–315.Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730.Google ScholarGoogle Scholar
  121. Todd L. Veldhuizen. 2014. Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm. In Proc. 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24-28, 2014.. 96–106.Google ScholarGoogle Scholar
  122. Stratis Viglas, Gavin M. Bierman, and Fabian Nagel. 2014. Processing Declarative Queries Through Generating Imperative Code in Managed Runtimes. IEEE Data Eng. Bull., 37, 1 (2014), 12–21.Google ScholarGoogle Scholar
  123. Kate Voss, Jeff Gentry, and Geraldine Van Der Auwera. 2017. Full-stack genomics pipelining with GATK4+ WDL+ Cromwell [version 1; not peer reviewed]. F1000Research, 4. https://doi.org/10.7490/f1000research.1114631.1 Google ScholarGoogle ScholarCross RefCross Ref
  124. Philip Wadler. 1988. Deforestation: Transforming programs to eliminate trees. In ESOP’88. 344–358.Google ScholarGoogle Scholar
  125. Philip Wadler. 1990. Comprehending Monads. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming (LFP ’90). ACM, New York, NY, USA. 61–78. isbn:0-89791-368-XGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  126. Limsoon Wong. 2000. Kleisli, a functional query system. Journal of Functional Programming, 10, 1 (2000), 19–56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. Jianxin Xiong, Jeremy Johnson, Robert Johnson, and David Padua. 2001. SPL: A Language and Compiler for DSP Algorithms. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation (PLDI’01). ACM, New York, NY, USA. 298–308. isbn:1-58113-414-2Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Weipeng P. Yan and Per-Åke Larson. 1994. Performing Group-By before Join. In Proceedings of the Tenth International Conference on Data Engineering. IEEE Computer Society, USA. 89–100. isbn:0818654007Google ScholarGoogle Scholar
  129. Yongyang Yu, Mingjie Tang, and Walid G Aref. 2021. Scalable relational query processing on big matrix data. arXiv preprint arXiv:2110.01767.Google ScholarGoogle Scholar
  130. Marcin Zukowski, Peter A Boncz, Niels Nes, and Sándor Héman. 2005. MonetDB/X100 - A DBMS In The CPU Cache. IEEE Data Eng. Bull., 28, 2 (2005), 17–22.Google ScholarGoogle Scholar

Index Terms

  1. Functional collection programming with semi-ring dictionaries

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!