skip to main content
10.1145/1755913.1755937acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Boom analytics: exploring data-centric, declarative programming for the cloud

Published:13 April 2010Publication History

ABSTRACT

Building and debugging distributed software remains extremely difficult. We conjecture that by adopting a data-centric approach to system design and by employing declarative programming languages, a broad range of distributed software can be recast naturally in a data-parallel programming model. Our hope is that this model can significantly raise the level of abstraction for programmers, improving code simplicity, speed of development, ease of software evolution, and program correctness.

This paper presents our experience with an initial large-scale experiment in this direction. First, we used the Overlog language to implement a "Big Data" analytics stack that is API-compatible with Hadoop and HDFS and provides comparable performance. Second, we extended the system with complex distributed features not yet available in Hadoop, including high availability, scalability, and unique monitoring and debugging facilities. We present both quantitative and anecdotal results from our experience, providing some concrete evidence that both data-centric design and declarative languages can substantially simplify distributed systems programming.

References

  1. A. Abouzeid et al. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Alvaro et al. BOOM: Data-centric programming in the datacenter. Technical Report UCB/EECS-2009-113, EECS Department, University of California, Berkeley, Jul 2009.Google ScholarGoogle Scholar
  3. P. Alvaro et al. Dedalus: Datalog in time and space. Technical Report UCB/EECS-2009-173, EECS Department, University of California, Berkeley, Dec 2009.Google ScholarGoogle ScholarCross RefCross Ref
  4. P. Alvaro et al. I Do Declare: Consensus in a logic language. In NetDB, 2009.Google ScholarGoogle Scholar
  5. M. P. Ashley-Rollman et al.Declarative Programming for Modular Robots. In Workshop on Self-Reconfigurable Robots/Systems and Applications, 2007.Google ScholarGoogle Scholar
  6. N. Belaramani et al. PADS: A policy architecture for data replication systems. In NSDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Burrows. The Chubby lock service for loosely-coupled distributed systems. In OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Cabrero et al. ARMISTICE: an experience developing management software with Erlang. In ACM SIGPLAN Workshop on Erlang, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. D. Chandra et al. Paxos made live: an engineering perspective. In PODC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Condie et al. Evita Raced: metacompilation for declarative networks. In VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In SOSP, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Eisner et al. Dyna: a declarative language for implementing dynamic programs. In ACL, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Ghemawat et al. The Google file system. In SOSP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. S. Gunawi et al. SQCK: A Declarative File System Checker. In OSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Gupta et al. Constraint checking with partial information. In PODS, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Isard et al. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. B. Jones. Interposition agents: transparently interposing user code at the system interface. In SOSP, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Kohler et al. The Click modular router. ACM Transactions on Computer Systems, 18(3):263--297, August 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. S. Lam et al. Context-sensitive program analysis as database queries. In PODS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133--169, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. LATE Hadoop Jira. Hadoop jira issue tracker, July 2009. http://issues.apache.org/jira/browse/HADOOP.Google ScholarGoogle Scholar
  23. B. T. Loo et al. Declarative networking: language, execution and optimization. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. T. Loo et al. Implementing declarative overlays. In SOSP, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. R. Marczak et al. Declarative reconfigurable trust management. In CIDR, 2009.Google ScholarGoogle Scholar
  27. F. Marguerie et al. LINQ In Action. Manning Publications Co., 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Nokia Corporation. disco: massive data -- minimal code, 2009. http://discoproject.org/.Google ScholarGoogle Scholar
  29. T. Schutt et al. Scalaris: Reliable transactional P2P key/value store. In ACM SIGPLAN Workshop on Erlang, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Sears and E. Brewer. Stasis: flexible transactional storage. In OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Singh et al. Using queries for distributed monitoring and forensics. In EuroSys, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Singh et al. BFT protocols under fire. In NSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Stonebraker. Inclusion of new types in relational data base systems. In ICDE, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Szekely and E. Torres, Dec. 2005.http://www.klinewoods.com/papers/p2paxos.pdf.Google ScholarGoogle Scholar
  35. A. Thusoo et al. Hive -- a warehousing solution over a Map-Reduce framework. In VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. D. Ullman. Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies. W. H. Freeman & Company, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. W. White et al. Scaling games to epic proportions. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. F. Yang et al. Hilda: A high-level language for data-driven web applications. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Yu et al.DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Zaharia et al. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Zaharia et al. Improving MapReduce performance in heterogeneous environments. In OSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Boom analytics: exploring data-centric, declarative programming for the cloud

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          EuroSys '10: Proceedings of the 5th European conference on Computer systems
          April 2010
          388 pages
          ISBN:9781605585772
          DOI:10.1145/1755913

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 April 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate241of1,308submissions,18%

          Upcoming Conference

          EuroSys '24
          Nineteenth European Conference on Computer Systems
          April 22 - 25, 2024
          Athens , Greece

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader