ABSTRACT
There is growing urgency in computer science circles regarding an impending crisis in parallel programming. Emerging computing platforms, from multicore processors to cloud computing, predicate their performance growth on the development of software to harness parallelism. For the first time in the history of computing, the progress of Moore's Law depends on the productivity of software engineers. Unfortunately, parallel and distributed programming today is challenging even for the best programmers, and simply unworkable for the majority. There has never been a more urgent need for breakthroughs in programming models and languages.
While parallel programming in general is considered very difficult, data parallelism has been very successful. The relational algebra parallelizes easily over large datasets, and SQL programmers have long reaped the benefits of parallelism without modifications to their code. This point has been rediscovered and amplified via recent enthusiasm for MapReduce programming and "Big Data", which have turned data parallelism into common culture across computing.
As a result, it is increasingly attractive to tackle the challenge of parallel programming on the firm common ground of data parallelism: start with an easy-to-parallelize kernel-relational algebra-and extend it to general-purpose computation. This approach has clear precedents in database theory, where it has long been known that classical relational languages have natural Turing-complete extensions.
At the same time that this crisis has been evolving, variants of Datalog have been seen cropping up in a wide range of practical settings, from security to robotics to compiler analysis. Over the past seven years, we have been exploring the use of Datalog-inspired languages in a variety of systems projects, with a focus on inherently parallel tasks in networking and distributed systems. The experience has been largely positive: we have demonstrated full-featured Datalog-based system implementations that are orders of magnitude more compact than equivalent imperatively-implemented systems, with competitive performance and significantly accelerated software evolution. Evidence is mounting that Datalog can serve as the basis of a much simpler family of languages for programming serious parallel and distributed software.
This raises many questions that should warm the heart of a database theoretician. How does the complexity hierarchy of logic languages relate to parallel models of computation? Is there a suitable Coordination Complexity model that captures the realities of modern parallel hardware, where computation is cheap and coordination is expensive? Can the lens of logic provide better focus on what is "hard" to parallelize, what is "embarrassingly parallel", and points in between? Does our understanding of non-monotonic reasoning shed light on the ability of loosely-coupled distributed systems to guarantee eventual consistency? And finally, a question close to the heart of the PODS conference: if Datalog has been The Answer all these years, is parallel and distributed programming The Question it has been waiting for?
In this talk and the paper that accompanies it, I present design patterns that arose in our experience building distributed and parallel software in the style of Datalog, and use them to motivate some initial conjectures relating to the questions above.
The full paper was not available at the time these proceedings were printed, but can be found online by searching for the phrase "Springtime for Datalog".
Supplemental Material
Index Terms
Datalog redux: experience and conjecture
Recommendations
Compiling data-parallel Datalog
CC 2021: Proceedings of the 30th ACM SIGPLAN International Conference on Compiler ConstructionDatalog allows intuitive declarative specification of logical inference tasks while enjoying efficient implementation via state-of-the-art engines such as LogicBlox and Soufflé. These engines enable high-performance implementation of complex logical ...
Abstract Hilbertian deductive systems, infon logic, and Datalog
In the first part of the paper, we discuss abstract Hilbertian deductive systems; these are systems defined by abstract notions of formula, axiom, and inference rule. We use these systems to develop a general method for converting derivability problems, ...
On temporal logic versus datalog
Logic and complexity in computer scienceWe provide a direct and modular translation from the temporal logics CTL, ETL, FCTL (CTL extended with the ability to express fairness) and the Modal µ-calculus to Monadic inf-Datalog with built-in predicates. We call it inf-Datalog because the ...






Comments