Abstract
This paper introduces semi-ring dictionaries, a powerful class of compositional and purely functional collections that subsume other collection types such as sets, multisets, arrays, vectors, and matrices. We developed SDQL, a statically typed language that can express relational algebra with aggregations, linear algebra, and functional collections over data such as relations and matrices using semi-ring dictionaries. Furthermore, thanks to the algebraic structure behind these dictionaries, SDQL unifies a wide range of optimizations commonly used in databases (DB) and linear algebra (LA). As a result, SDQL enables efficient processing of hybrid DB and LA workloads, by putting together optimizations that are otherwise confined to either DB systems or LA frameworks. We show experimentally that a handful of DB and LA workloads can take advantage of the SDQL language and optimizations. SDQL can be competitive with or outperforms a host of systems that are state of the art in their own domain: in-memory DB systems Typer and Tectorwise for (flat, not nested) relational data; SciPy for LA workloads; sparse tensor compiler taco; the Trance nested relational engine; and the in-database machine learning engines LMFAO and Morpheus for hybrid DB/LA workloads over relational data.
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, USA. 265–283.Google Scholar
Digital Library
- Mahmoud Abo Khamis, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. 2018. In-Database Learning with Sparse Tensors. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (SIGMOD/PODS ’18). Association for Computing Machinery, New York, NY, USA. 325–340. isbn:9781450347068Google Scholar
Digital Library
- Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. 2016. FAQ: Questions Asked Frequently. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS ’16). Association for Computing Machinery, New York, NY, USA. 13–28.Google Scholar
Digital Library
- Srinivas M Aji and Robert J McEliece. 2000. The generalized distributive law. IEEE transactions on Information Theory, 46, 2 (2000), 325–343.Google Scholar
Digital Library
- Yael Amsterdamer, Daniel Deutch, and Val Tannen. 2011. Provenance for aggregate queries. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 153–164.Google Scholar
Digital Library
- Johan Anker and Josef Svenningsson. 2013. An EDSL approach to high performance Haskell programming. In ACM Haskell Symposium. ACM, New York, NY, USA. 1–12.Google Scholar
- Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD ’15). ACM, New York, NY, USA. 1383–1394. isbn:978-1-4503-2758-9Google Scholar
Digital Library
- Emil Axelsson, Koen Claessen, Mary Sheeran, Josef Svenningsson, David Engdal, and Anders Persson. 2011. The Design and Implementation of Feldspar an Embedded Language for Digital Signal Processing. In Proceedings of the 22Nd International Conference on Implementation and Application of Functional Languages (IFL’10). Springer-Verlag, Berlin, Heidelberg. 121–136. isbn:978-3-642-24275-5Google Scholar
Cross Ref
- R. C. Backhouse and B. A. Carré. 1975. Regular Algebra Applied to Path-finding Problems. IMA Journal of Applied Mathematics, 15, 2 (1975), 04, 161–186. issn:0272-4960Google Scholar
Cross Ref
- Brett W Bader and Tamara G Kolda. 2008. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing, 30, 1 (2008), 205–231.Google Scholar
Digital Library
- Peter Boncz, Thomas Neumann, and Orri Erling. 2014. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. Springer International Publishing, Cham. 61–76. isbn:978-3-319-04936-6Google Scholar
- Val Breazu-Tannen, Peter Buneman, and Limsoon Wong. 1992. Naturally embedded query languages. Springer.Google Scholar
- Val Breazu-Tannen and Ramesh Subrahmanyam. 1991. Logical and computational aspects of programming with sets/bags/lists. Springer.Google Scholar
- Robert Brijder, Floris Geerts, Jan Van Den Bussche, and Timmy Weerwag. 2019. On the Expressive Power of Query Languages for Matrices. ACM Trans. Database Syst., 44, 4 (2019), Article 15, oct, 31 pages. issn:0362-5915Google Scholar
Digital Library
- Robert Brijder, Floris Geerts, Jan Van Den Bussche, and Timmy Weerwag. 2019. On the Expressive Power of Query Languages for Matrices. ACM Trans. Database Syst., 44, 4 (2019), Article 15, oct, 31 pages. issn:0362-5915Google Scholar
Digital Library
- Peter Buneman, Shamim Naqvi, Val Tannen, and Limsoon Wong. 1995. Principles of Programming with Complex Objects and Collection Types. Theor. Comput. Sci., 149, 1 (1995), Sept., 3–48. issn:0304-3975Google Scholar
Digital Library
- Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam Hruschka, and Tom Mitchell. 2010. Toward an architecture for never-ending language learning. In Proceedings of the AAAI Conference on Artificial Intelligence. 24.Google Scholar
Cross Ref
- Zachary R. Chalmers, Caitlin F. Connelly, David Fabrizio, Laurie Gay, Siraj M. Ali, Riley Ennis, Alexa Schrock, Brittany Campbell, Adam Shlien, Juliann Chmielecki, Franklin Huang, Yuting He, James Sun, Uri Tabori, Mark Kennedy, Daniel S. Lieber, Steven Roels, Jared White, Geoffrey A. Otto, Jeffrey S. Ross, Levi Garraway, Vincent A. Miller, Phillip J. Stephens, and Garrett M. Frampton. 2017. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Medicine, 9, 1 (2017), 34.Google Scholar
Cross Ref
- Lingjiao Chen, Arun Kumar, Jeffrey Naughton, and Jignesh M Patel. 2017. Towards linear algebra over normalized data. Proceedings of the VLDB Endowment, 10, 11 (2017), 1214–1225.Google Scholar
Digital Library
- James Cheney, Sam Lindley, and Philip Wadler. 2014. Query shredding: efficient relational evaluation of queries over nested multisets. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 1027–1038.Google Scholar
Digital Library
- Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: A Parallel DSL for Image Analysis and Visualization. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, 111–120.Google Scholar
Digital Library
- Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 123, Oct., 30 pages.Google Scholar
Digital Library
- Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu. 2017. HoTTSQL: Proving query rewrites with univalent SQL semantics. ACM SIGPLAN Notices, 52, 6 (2017), 510–524.Google Scholar
Digital Library
- Koen Claessen, Mary Sheeran, and Bo Joel Svensson. 2012. Expressive Array Constructs in an Embedded GPU Kernel Programming Language. In Proceedings of the 7th Workshop on Declarative Aspects and Applications of Multicore Programming (DAMP ’12). ACM, NY, USA. 21–30.Google Scholar
Digital Library
- E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM, 13, 6 (1970), June, 377–387. issn:0001-0782Google Scholar
Digital Library
- National Research Council (US) Committee. 2005. On the Nature of Biological Data. In Catalyzing Inquiry at the Interface of Computing and Biology, John C. Wooley and Herbert S. Lin (Eds.). National Academies Press (US).Google Scholar
- Keith Conrad. 2018. Tensor products. Notes of course, available on-line.Google Scholar
- Ezra Cooper, Sam Lindley, Philip Wadler, and Jeremy Yallop. 2007. Links: Web Programming Without Tiers. In Proceedings of the 5th International Conference on Formal Methods for Components and Objects (FMCO’06). Springer-Verlag, Berlin, Heidelberg. 266–296. isbn:3-540-74791-5, 978-3-540-74791-8Google Scholar
Digital Library
- Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to algorithms. MIT press.Google Scholar
Digital Library
- Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream Fusion. From Lists to Streams to Nothing at All. In ICFP ’07.Google Scholar
- Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Ugur Çetintemel, and Stanley B Zdonik. 2015. Tupleware:" Big" Data, Big Analytics, Small Clusters.. In CIDR.Google Scholar
- Stephen Dolan. 2013. Fun with Semirings: A Functional Pearl on the Abuse of Linear Algebra. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming (ICFP ’13). Association for Computing Machinery, New York, NY, USA. 101–110. isbn:9781450323260Google Scholar
Digital Library
- Kento Emoto, Sebastian Fischer, and Zhenjiang Hu. 2012. Filter-embedding semiring fusion for programming with MapReduce. Formal Aspects of Computing, 24, 4 (2012), 623–645.Google Scholar
Digital Library
- Laura Fancello, Sara Gandini, Pier Giuseppe Pelicci, and Luca Mazzarella. 2019. Tumor mutational burden quantification from targeted gene panels: major advancements and challenges. Journal for ImmunoTherapy of Cancer, 7, 1 (2019), 183. isbn:2051-1426 https://doi.org/10.1186/s40425-019-0647-4 Google Scholar
Cross Ref
- Corporacion Favorita. 2017. Corp. Favorita Grocery Sales Forecasting: Can you accurately predict sales for a large grocery chain?Google Scholar
- Leonidas Fegaras and David Maier. 2000. Optimizing Object Queries Using an Effective Calculus. ACM Trans. Database Syst., 25, 4 (2000), Dec., 457–516. issn:0362-5915Google Scholar
Digital Library
- Xixuan Feng, Arun Kumar, Benjamin Recht, and Christopher Ré. 2012. Towards a Unified Architecture for in-RDBMS Analytics. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12). ACM, New York, NY, USA. 325–336. isbn:978-1-4503-1247-9Google Scholar
Digital Library
- Robert Fink, Larisa Han, and Dan Olteanu. 2012. Aggregation in Probabilistic Databases via Knowledge Compilation. 5, 5 (2012), jan, 490–501. issn:2150-8097Google Scholar
Digital Library
- Floris Geerts, Thomas Muñoz, Cristian Riveros, Jan Van den Bussche, and Domagoj Vrgoč. 2021. Matrix Query Languages. ACM SIGMOD Record, 50, 3 (2021), 6–19.Google Scholar
Digital Library
- Jeremy Gibbons, Fritz Henglein, Ralf Hinze, and Nicolas Wu. 2018. Relational Algebra by Way of Adjunctions. Proc. ACM Program. Lang., 2, ICFP (2018), Article 86, July, 28 pages. https://doi.org/10.1145/3236781 Google Scholar
Digital Library
- Andrew Gill, John Launchbury, and Simon L Peyton Jones. 1993. A short cut to deforestation. In Proceedings of the conference on Functional programming languages and computer architecture (FPCA). 223–232.Google Scholar
Digital Library
- Michel Gondran and Michel Minoux. 2008. Graphs, dioids and semirings: new models and algorithms. 41, Springer Science & Business Media.Google Scholar
- G. Graefe. 1994. Volcano-an extensible and parallel query evaluation system. IEEE Transactions on Knowledge and Data Engineering, 6, 1 (1994), 120–135.Google Scholar
Digital Library
- Todd J Green, Grigoris Karvounarakis, and Val Tannen. 2007. Provenance semirings. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 31–40.Google Scholar
Digital Library
- Clemens Grelck and Sven-Bodo Scholz. 2006. SAC—A Functional Array Language for Efficient Multi-threaded Execution. Int. Journal of Parallel Programming, 34, 4 (2006), 383–427. issn:1573-7640Google Scholar
Digital Library
- Torsten Grust, Jan Rittinger, and Tom Schreiber. 2010. Avalanche-safe LINQ Compilation. PVLDB, 3, 1-2 (2010), Sept., 162–172. issn:2150-8097Google Scholar
Digital Library
- Torsten Grust and MarcH. Scholl. 1999. How to Comprehend Queries Functionally. Journal of Intelligent Information Systems, 12, 2-3 (1999), 191–218. issn:0925-9902Google Scholar
Digital Library
- Joseph M Hellerstein, Christoper Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, and Kun Li. 2012. The MADlib analytics library: or MAD skills, the SQL. Proceedings of the VLDB Endowment, 5, 12 (2012), 1700–1711.Google Scholar
Digital Library
- Troels Henriksen, Niels GW Serup, Martin Elsman, Fritz Henglein, and Cosmin E Oancea. 2017. Futhark: purely functional GPU-programming with nested parallelism and in-place array updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 556–571.Google Scholar
Digital Library
- Dylan Hutchison, Bill Howe, and Dan Suciu. 2017. LaraDB: A minimalist kernel for linear and relational algebra computation. In Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond. 1–10.Google Scholar
Digital Library
- S Idreos, F Groffen, N Nes, S Manegold, S Mullender, and M Kersten. 2012. Monetdb: Two decades of research in column-oriented database. IEEE Data Engineering Bulletin.Google Scholar
- Kenneth E Iverson. 1962. A Programming Language. In Proceedings of the May 1-3, 1962, spring joint computer conference. 345–351.Google Scholar
- Hayden Jananthan, Ziqi Zhou, Vijay Gadepally, Dylan Hutchison, Suna Kim, and Jeremy Kepner. 2017. Polystore mathematics of relational algebra. In 2017 IEEE International Conference on Big Data (Big Data). 3180–3189.Google Scholar
Cross Ref
- Simon Peyton Jones, Andrew Tolmach, and Tony Hoare. 2001. Playing by the rules: rewriting as a practical optimisation technique in GHC. In Haskell workshop. 1, 203–233.Google Scholar
- Simon Peyton Jones and Philip Wadler. 2007. Comprehensive comprehensions. In Proceedings of the ACM SIGPLAN workshop on Haskell workshop. 61–72.Google Scholar
Digital Library
- Manohar Jonnalagedda and Sandro Stucki. 2015. Fold-based Fusion As a Library: A Generative Programming Pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala. ACM, 41–50. isbn:978-1-4503-3626-0Google Scholar
Digital Library
- Konstantinos Karanasos, Matteo Interlandi, Doris Xin, Fotis Psallidas, Rathijit Sen, Kwanghyun Park, Ivan Popivanov, Supun Nakandal, Subru Krishnan, and Markus Weimer. 2020. Extending relational query processing with ML inference. In CIDR.Google Scholar
- Manos Karpathiotakis, Ioannis Alagiannis, Thomas Heinis, Miguel Branco, and Anastasia Ailamaki. 2015. Just-in-time data virtualization: Lightweight data management with ViDa. In CIDR.Google Scholar
- Grigoris Karvounarakis and Todd J Green. 2012. Semiring-annotated data: queries and provenance? ACM SIGMOD Record, 41, 3 (2012), 5–14.Google Scholar
Digital Library
- Gabriele Keller, Manuel MT Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, and Ben Lippmeier. 2010. Regular, shape-polymorphic, parallel arrays in Haskell. ACM Sigplan Notices, 45, 9 (2010), 261–272.Google Scholar
Digital Library
- Jeremy Kepner, Peter Aaltonen, David Bader, Aydin Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, and Henning Meyerhenke. 2016. Mathematical foundations of the GraphBLAS. In 2016 IEEE High Performance Extreme Computing Conference (HPEC). 1–9.Google Scholar
Cross Ref
- Jeremy Kepner and John Gilbert. 2011. Graph algorithms in the language of linear algebra. 22, SIAM.Google Scholar
- Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, and Peter Boncz. 2018. Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. Proceedings of the VLDB Endowment, 11, 13 (2018), 2209–2222.Google Scholar
Digital Library
- Mahmoud Abo Khamis, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. 2018. AC/DC: In-Database Learning Thunderstruck. In Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning (DEEM’18). ACM, New York, NY, USA. Article 8, 10 pages. isbn:978-1-4503-5828-6Google Scholar
Digital Library
- Oleg Kiselyov. 2018. Reconciling Abstraction with High Performance: A MetaOCaml approach. Foundations and Trends in Programming Languages, 5, 1 (2018), 1–101.Google Scholar
Digital Library
- Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2017. Stream Fusion, to Completeness. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA. 285–299. isbn:978-1-4503-4660-3Google Scholar
Digital Library
- Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, Oct., 29 pages. issn:2475-1421Google Scholar
Digital Library
- Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nötzli, Daniel Lupei, and Amir Shaikhha. 2014. DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDBJ, 23, 2 (2014), 253–278. issn:1066-8888Google Scholar
Cross Ref
- Christoph Koch, Daniel Lupei, and Val Tannen. 2016. Incremental View Maintenance For Collection Programming. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS ’16). Association for Computing Machinery, New York, NY, USA. 75–90. isbn:9781450341912Google Scholar
Digital Library
- Konstantinos Krikellas, Stratis Viglas, and Marcelo Cintra. 2010. Generating code for holistic query evaluation. In ICDE. 613–624.Google Scholar
- Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2019. An intermediate representation for optimizing machine learning pipelines. Proceedings of the VLDB Endowment, 12, 11 (2019), 1553–1567.Google Scholar
Digital Library
- Daniel J Lehmann. 1977. Algebraic structures for transitive closure. Theoretical Computer Science, 4, 1 (1977), 59–76.Google Scholar
Cross Ref
- Side Li, Lingjiao Chen, and Arun Kumar. 2019. Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra. In Proceedings of the 2019 International Conference on Management of Data. 1571–1588.Google Scholar
Digital Library
- Leonid Libkin and Limsoon Wong. 1997. Query languages for bags and aggregate functions. Journal of Computer and System sciences, 55, 2 (1997), 241–272.Google Scholar
Digital Library
- Geoffrey Mainland, Roman Leshchinskiy, and Simon Peyton Jones. 2013. Exploiting Vector Instructions with Generalized Stream Fusion. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming (ICFP’13). ACM, New York, NY, USA. 37–48. isbn:978-1-4503-2326-0Google Scholar
Digital Library
- Marco Masseroli, Pietro Pinoli, Francesco Venco, Abdulrahman Kaitoua, Vahid Jalili, Fernando Palluzzi, Heiko Muller, and Stefano Ceri. 2015. GenoMetric Query Language: A Novel Approach to Large-scale Genomic Data Management. Bioinformatics, 31, 12 (2015), 1881–1888.Google Scholar
Cross Ref
- Erik Meijer, Brian Beckman, and Gavin Bierman. 2006. LINQ: Reconciling Object, Relations and XML in the .NET Framework. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD ’06). ACM, 706–706. isbn:1-59593-434-0Google Scholar
Digital Library
- Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning in Apache Spark. The Journal of Machine Learning Research, 17, 1 (2016), 1235–1241.Google Scholar
Digital Library
- Guido Moerkotte and Thomas Neumann. 2011. Accelerating queries with group-by and join by groupjoin. Proceedings of the VLDB Endowment, 4, 11 (2011).Google Scholar
Digital Library
- Mehryar Mohri. 2002. Semiring frameworks and algorithms for shortest-distance problems. Journal of Automata, Languages and Combinatorics, 7, 3 (2002), 321–350.Google Scholar
Digital Library
- Fabian Nagel, Gavin Bierman, and Stratis D. Viglas. 2014. Code Generation for Efficient Query Processing in Managed Runtimes. PVLDB, 7, 12 (2014), Aug., 1095–1106. issn:2150-8097Google Scholar
Digital Library
- Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, 4, 9 (2011), 539–550.Google Scholar
Digital Library
- Milos Nikolic and Dan Olteanu. 2018. Incremental View Maintenance with Triple Lock Factorization Benefits. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18). ACM, New York, NY, USA. 365–380. isbn:978-1-4503-4703-7Google Scholar
Digital Library
- Dan Olteanu and Maximilian Schleich. 2016. Factorized Databases. SIGMOD Rec., 45, 2 (2016), Sept., 5–16. issn:0163-5808Google Scholar
Digital Library
- Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia, and Stanford InfoLab. 2017. Weld: A Common Runtime for High Performance Data Analytics. In Conference on Innovative Data Systems Research (CIDR).Google Scholar
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic Differentiation in PyTorch. In NIPS 2017 Autodiff Workshop: The Future of Gradient-based Machine Learning Software and Techniques.Google Scholar
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, and Vincent Dubourg. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research, 12, Oct (2011), 2825–2830.Google Scholar
Digital Library
- Markus Puschel, José MF Moura, Jeremy R Johnson, David Padua, Manuela M Veloso, Bryan W Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, and Yevgen Voronenko. 2005. SPIRAL: Code generation for DSP transforms. Proc. IEEE, 93, 2 (2005), 232–275.Google Scholar
Cross Ref
- Chengjie Qin and Florin Rusu. 2015. Speculative approximations for terascale distributed gradient descent optimization. In Proceedings of the Fourth Workshop on Data analytics in the Cloud. 1.Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). ACM, New York, NY, USA. 519–530. isbn:978-1-4503-2014-6Google Scholar
Digital Library
- Raghu Ramakrishnan and Johannes Gehrke. 2000. Database Management Systems (2nd ed.). Osborne/McGraw-Hill. isbn:0072440422Google Scholar
Digital Library
- Mark A Roth, Herry F Korth, and Abraham Silberschatz. 1988. Extended algebra and calculus for nested relational databases. ACM Transactions on Database Systems (TODS), 13, 4 (1988), 389–417.Google Scholar
Digital Library
- Maximilian Schleich, Dan Olteanu, Mahmoud Abo Khamis, Hung Q. Ngo, and XuanLong Nguyen. 2019. A Layered Aggregate Engine for Analytics Workloads. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD ’19). ACM, New York, NY, USA. 1642–1659. isbn:978-1-4503-5643-5Google Scholar
Digital Library
- Maximilian Schleich, Dan Olteanu, and Radu Ciucanu. 2016. Learning Linear Regression Models over Factorized Joins. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD ’16). ACM, New York, NY, USA. 3–18. isbn:978-1-4503-3531-7Google Scholar
Digital Library
- Amir Shaikhha, Mohammad Dashti, and Christoph Koch. 2018. Push versus Pull-Based Loop Fusion in Query Engines. Journal of Functional Programming, 28 (2018), e10.Google Scholar
Cross Ref
- Amir Shaikhha, Andrew Fitzgibbon, Simon Peyton Jones, and Dimitrios Vytiniotis. 2017. Destination-passing Style for Efficient Memory Management. In Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing (FHPC 2017). ACM, New York, NY, USA. 12–23. isbn:978-1-4503-5181-2Google Scholar
Digital Library
- Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, and Simon Peyton Jones. 2019. Efficient differentiable programming in a functional array-processing language. Proceedings of the ACM on Programming Languages, 3, ICFP (2019), 97.Google Scholar
Digital Library
- Amir Shaikhha, Yannis Klonatos, and Christoph Koch. 2018. Building Efficient Query Engines in a High-Level Language. ACM Transactions on Database Systems, 43, 1 (2018), Article 4, April, 45 pages. issn:0362-5915Google Scholar
Digital Library
- Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, and Christoph Koch. 2016. How to Architect a Query Compiler. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16). ACM, New York, NY, USA. 1907–1922. isbn:978-1-4503-3531-7Google Scholar
Digital Library
- Amir Shaikhha and Lionel Parreaux. 2019. Finally, a Polymorphic Linear Algebra Language. In Proceedings of the 33rd European Conference on Object-Oriented Programming (ECOOP’19).Google Scholar
- Amir Shaikhha, Maximilian Schleich, Alexandru Ghita, and Dan Olteanu. 2020. Multi-Layer Optimizations for End-to-End Data Analytics. In CGO. 145–157.Google Scholar
- Amir Shaikhha, Maximilian Schleich, and Dan Olteanu. 2021. An Intermediate Representation for Hybrid Database and Machine Learning Workloads. Proc. VLDB Endow., 14, 12 (2021), 2831–2834.Google Scholar
Digital Library
- Arun Kumar Side Li. 2019. MorpheusPy. https://github.com/ADALabUCSD/MorpheusPyGoogle Scholar
- Arun Kumar Side Li. 2019. MorpheusPy – Issue #3. https://github.com/ADALabUCSD/MorpheusPy/issues/3Google Scholar
- Jaclyn Smith, Michael Benedikt, Milos Nikolic, and Amir Shaikhha. 2020. Scalable querying of nested data. Proceedings of the VLDB Endowment, 14, 3 (2020), 445–457.Google Scholar
Digital Library
- Shaden Smith, Niranjay Ravindran, Nicholas D Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In 2015 IEEE International Parallel and Distributed Processing Symposium. 61–70.Google Scholar
Digital Library
- Daniele G. Spampinato, Diego Fabregat-Traver, Paolo Bientinesi, and Markus Püschel. 2018. Program Generation for Small-scale Linear Algebra Applications. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO 2018). ACM, New York, NY, USA. 327–339. isbn:978-1-4503-5617-6Google Scholar
Digital Library
- Daniele G. Spampinato and Markus Püschel. 2016. A basic linear algebra compiler for structured matrices. In Proceedings of the 2016 International Symposium on Code Generation and Optimization. 117–127.Google Scholar
- Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating Performance Portable Code Using Rewrite Rules: From High-level Functional Expressions to High-performance OpenCL Code. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA. 205–217. isbn:978-1-4503-3669-7Google Scholar
Digital Library
- Peter H. Sudmant, Tobias Rausch, Eugene J. Gardner, Robert E. Handsaker, Alexej Abyzov, John Huddleston, Yan Zhang, Kai Ye, Goo Jun, and Markus Hsi-Yang Fritz. 2015. An integrated map of structural variation in 2,504 human genomes. Nature, 526, 7571 (2015), 75–81. issn:1476-4687 https://doi.org/10.1038/nature15394 Google Scholar
Cross Ref
- Arvind Sujeeth, HyoukJoong Lee, Kevin Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (ICML ’11). 609–616.Google Scholar
Digital Library
- Josef Svenningsson. 2002. Shortcut Fusion for Accumulating Parameters & Zip-like Functions. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming (ICFP ’02). ACM, 124–132. isbn:1-58113-487-8Google Scholar
Digital Library
- Bo Joel Svensson and Josef Svenningsson. 2014. Defunctionalizing Push Arrays. In Proceedings of the 3rd ACM SIGPLAN Workshop on Functional High-performance Computing (FHPC ’14). ACM, NY, USA. 43–52. isbn:978-1-4503-3040-4Google Scholar
Digital Library
- Ruby Y Tahboub, Grégory M Essertel, and Tiark Rompf. 2018. How to architect a query compiler, revisited. In Proceedings of the 2018 International Conference on Management of Data. 307–322.Google Scholar
Digital Library
- Akihiko Takano and Erik Meijer. 1995. Shortcut Deforestation in Calculational Form. In Proceedings of the Seventh International Conference on Functional Programming Languages and Computer Architecture (FPCA ’95). Association for Computing Machinery, New York, NY, USA. 306–313.Google Scholar
Digital Library
- Robert Endre Tarjan. 1981. A Unified Approach to Path Problems. J. ACM, 28, 3 (1981), jul, 577–593. issn:0004-5411Google Scholar
Digital Library
- Hail Team. 2020. Hail 0.2. https://github.com/hail-is/hailGoogle Scholar
- Phil Trinder. 1992. Comprehensions, a Query Notation for DBPLs. In Proc. of the 3rd DBPL workshop (DBPL3). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 55–68. isbn:1-55860-242-9Google Scholar
- Leslie G Valiant. 1975. General context-free recognition in less than cubic time. Journal of computer and system sciences, 10, 2 (1975), 308–315.Google Scholar
Digital Library
- Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730.Google Scholar
- Todd L. Veldhuizen. 2014. Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm. In Proc. 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24-28, 2014.. 96–106.Google Scholar
- Stratis Viglas, Gavin M. Bierman, and Fabian Nagel. 2014. Processing Declarative Queries Through Generating Imperative Code in Managed Runtimes. IEEE Data Eng. Bull., 37, 1 (2014), 12–21.Google Scholar
- Kate Voss, Jeff Gentry, and Geraldine Van Der Auwera. 2017. Full-stack genomics pipelining with GATK4+ WDL+ Cromwell [version 1; not peer reviewed]. F1000Research, 4. https://doi.org/10.7490/f1000research.1114631.1 Google Scholar
Cross Ref
- Philip Wadler. 1988. Deforestation: Transforming programs to eliminate trees. In ESOP’88. 344–358.Google Scholar
- Philip Wadler. 1990. Comprehending Monads. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming (LFP ’90). ACM, New York, NY, USA. 61–78. isbn:0-89791-368-XGoogle Scholar
Digital Library
- Limsoon Wong. 2000. Kleisli, a functional query system. Journal of Functional Programming, 10, 1 (2000), 19–56.Google Scholar
Digital Library
- Jianxin Xiong, Jeremy Johnson, Robert Johnson, and David Padua. 2001. SPL: A Language and Compiler for DSP Algorithms. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation (PLDI’01). ACM, New York, NY, USA. 298–308. isbn:1-58113-414-2Google Scholar
Digital Library
- Weipeng P. Yan and Per-Åke Larson. 1994. Performing Group-By before Join. In Proceedings of the Tenth International Conference on Data Engineering. IEEE Computer Society, USA. 89–100. isbn:0818654007Google Scholar
- Yongyang Yu, Mingjie Tang, and Walid G Aref. 2021. Scalable relational query processing on big matrix data. arXiv preprint arXiv:2110.01767.Google Scholar
- Marcin Zukowski, Peter A Boncz, Niels Nes, and Sándor Héman. 2005. MonetDB/X100 - A DBMS In The CPU Cache. IEEE Data Eng. Bull., 28, 2 (2005), 17–22.Google Scholar
Index Terms
Functional collection programming with semi-ring dictionaries
Recommendations
An Extended Algebra for Constraint Databases
Constraint relational databases use constraints to both model and query data. A constraint relation contains a finite set of generalized tuples. Each generalized tuple is represented by a conjunction of constraints on a given logical theory and, ...
SQL query optimization through nested relational algebra
Most research work on optimization of nested queries focuses on aggregate subqueries. In this article, we show that existing approaches are not adequate for nonaggregate subqueries, especially for those having multiple subqueries and certain comparison ...
Implementing collection classes with monads
In object-oriented programming, there are many notions of ‘collection with members in X’. This paper offers an axiomatic theory of collections based on monads in the category of sets and total functions. Heuristically, the axioms defining a collection ...






Comments