Abstract
Many service applications use actors as a programming model for the middle tier, to simplify synchronization, fault-tolerance, and scalability. However, efficient operation of such actors in multiple, geographically distant datacenters is challenging, due to the very high communication latency. Caching and replication are essential to hide latency and exploit locality; but it is not a priori clear how to combine these techniques with the actor programming model.
We present Geo, an open-source geo-distributed actor system that improves performance by caching actor states in one or more datacenters, yet guarantees the existence of a single latest version by virtue of a distributed cache coherence protocol. Geo's programming model supports both volatile and persistent actors, and supports updates with a choice of linearizable and eventual consistency. Our evaluation on several workloads shows substantial performance benefits, and confirms the advantage of supporting both replicated and single-instance coherence protocols as configuration choices. For example, replication can provide fast, always-available reads and updates globally, while batching of linearizable storage accesses at a single location can boost the throughput of an order processing workload by 7x.
- Akka 2016. Akka - Actors for the JVM. Apache 2 License, https://github.com/akka/akka . (2016).Google Scholar
- Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing Scalable, Highly Available Storage for Interactive Services. In Conference on Innovative Data system Research (CIDR). 223–234. http://www.cidrdb.org/cidr2011/Papers/CIDR11_ Paper32.pdfGoogle Scholar
- H. E. Bal, M. F. Kaashoek, and A. S. Tanenbaum. 1992. Orca: a language for parallel programming of distributed systems. IEEE Transactions on Software Engineering 18, 3 (Mar 1992), 190–205. DOI: Google Scholar
Digital Library
- Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, and Jorgen Thelin. 2014. Orleans: Distributed Virtual Actors for Programmability and Scalability. Technical Report MSR-TR-2014-41. Microsoft Research.Google Scholar
- Kenneth P. Birman and Robbert Van Renesse. 1993. Reliable Distributed Computing with the ISIS Toolkit. IEEE Computer Society Press, Los Alamitos, CA, USA.Google Scholar
- Sebastian Burckhardt. 2014. Principles of Eventual Consistency. Found. Trends Program. Lang. 1, 1-2 (Oct. 2014), 1–150.Google Scholar
Digital Library
- Sebastian Burckhardt, Alexey Gotsman, Hongseok Yang, and Marek Zawirski. 2014. Replicated Data Types: Specification, Verification, Optimality. In Principles of Programming Languages (POPL). 271–284.Google Scholar
- Sebastian Burckhardt, Daan Leijen, Jonathan Protzenko, and Manuel Fähndrich. 2015. Global Sequence Protocol: A Robust Abstraction for Replicated Shared State. In European Conference on Object-Oriented Programming (ECOOP). 568–590.Google Scholar
- Sergey Bykov, Alan Geller, Gabriel Kliot, James Larus, Ravi Pandya, and Jorgen Thelin. 2011. Orleans: Cloud Computing for Everyone. In ACM Symposium on Cloud Computing (SOCC ’11). Article 16, 14 pages.Google Scholar
Digital Library
- Cassandra 2016. The Apache Cassandra Project. http://cassandra.apache.org . (2016).Google Scholar
- K. M. Chandy and J. Misra. 1984. The Drinking Philosophers Problem. ACM Trans. Program. Lang. Syst. 6, 4 (Oct. 1984), 632–646. DOI: Google Scholar
Digital Library
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26, 2, Article 4 (June 2008), 26 pages. DOI: Google Scholar
Digital Library
- J. Chase, F. Amador, E. Lazowska, H. Levy, and R. Littlefield. 1989. The Amber System: Parallel Programming on a Network of Multiprocessors. In ACM Symposium on Operating Systems Principles (SOSP). ACM, New York, NY, USA, 147–158. DOI: Google Scholar
Digital Library
- Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. 2008. PNUTS: Yahoo!’s Hosted Data Serving Platform. Proc. VLDB Endow. 1, 2 (Aug. 2008), 1277–1288. DOI: Google Scholar
Digital Library
- James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2013. Spanner: Google’s Globally Distributed Database. ACM Trans. Comput. Syst. 31, 3, Article 8 (Aug. 2013), 22 pages. DOI: Google Scholar
Digital Library
- G. Decandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. 2007. Dynamo: amazon’s highly available key-value store. In Symposium on Operating Systems Principles. 205–220. DOI: Google Scholar
Digital Library
- E-Tags 2016. Editing the Web - Detecting the Lost Update Problem Using Unreserved Checkout. http://www.w3.org/1999/ 04/Editing/ . (2016).Google Scholar
- Alan Fekete, David Gupta, Victor Luchangco, Nancy Lynch, and Alex Shvartsman. 1996. Eventually-serializable Data Services. In Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing (PODC ’96). ACM, New York, NY, USA, 300–309. DOI: Google Scholar
Digital Library
- Alexey Gotsman and Sebastian Burckhardt. 2017. Consistency Models with Global Operation Sequencing and their Composition. to appear. In Conference on Distributed Computing (DISC).Google Scholar
- HBase 2016. Apache HBase. http://hbase.apache.org/index.html . (2016).Google Scholar
- M. Herlihy and J. Wing. 1990. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3 (1990), 463–492. DOI: Google Scholar
Digital Library
- Bettina Kemme, Ricardo Jiménez-Peris, and Marta Patiño Martínez. 2010. Database Replication. Morgan & Claypool Publishers.Google Scholar
- Tim Kraska, Martin Hentschel, Gustavo Alonso, and Donald Kossmann. 2009. Consistency Rationing in the Cloud: Pay Only when It Matters. Proc. VLDB Endow. 2, 1 (Aug. 2009), 253–264. DOI: Google Scholar
Digital Library
- Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, and Alan Fekete. 2013. MDCC: Multi-data Center Consistency. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys ’13). ACM, New York, NY, USA, 113–126. DOI: Google Scholar
Digital Library
- Leslie Lamport. 1998. The Part-time Parliament. ACM Trans. Comput. Syst. 16, 2 (May 1998), 133–169. DOI: Google Scholar
Digital Library
- Cheng Li, Daniel Porto, Allen Clement, Johannes Gehrke, Nuno Preguiça, and Rodrigo Rodrigues. 2012. Making Georeplicated Systems Fast As Possible, Consistent when Necessary. In Operating Systems Design and Implementation (OSDI). USENIX Association, Berkeley, CA, USA, 265–278. http://dl.acm.org/citation.cfm?id=2387880.2387906Google Scholar
- Barbara Liskov, Miguel Castro, Liuba Shrira, and Atul Adya. 1999. Providing Persistent Objects in Distributed Systems. In European Conference on Object-Oriented Programming (ECOOP). Springer-Verlag, London, UK, UK, 230–257. http: //dl.acm.org/citation.cfm?id=646156.679840 Google Scholar
Cross Ref
- Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2011. Don’t Settle for Eventual: Scalable Causal Consistency for Wide-area Storage with COPS. In ACM Symposium on Operating Systems Principles (SOSP). ACM, New York, NY, USA, 401–416. DOI: Google Scholar
Digital Library
- Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2013. Stronger Semantics for Low-latency Geo-replicated Storage. In Networked Systems Design and Implementation (NSDI). USENIX Association, Berkeley, CA, USA, 313–328. http://dl.acm.org/citation.cfm?id=2482626.2482657Google Scholar
Digital Library
- Hernán Melgratti and Christian Roldán. 2016. A Formal Analysis of the Global Sequence Protocol. Springer, 175–191. Google Scholar
Cross Ref
- Memcached 2016. Available under BSD 3-clause license. https://github.com/memcached/memcached . (2016).Google Scholar
- Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi. 2013. Message Futures: Fast Commitment of Transactions in Multi-datacenter Environments. In Conference on Innovative Data system Research (CIDR).Google Scholar
- Faisal Nawab, Vaibhav Arora, Divyakant Agrawal, and Amr El Abbadi. 2015. Minimizing Commit Latency of Transactions in Geo-Replicated Data Stores. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD ’15). ACM, New York, NY, USA, 1279–1294. DOI: Google Scholar
Digital Library
- Netflix 2011. The netflix simian army. http://techblog.netflix.com/ . (2011).Google Scholar
- Brian M. Oki and Barbara H. Liskov. 1988. Viewstamped Replication: A New Primary Copy Method to Support HighlyAvailable Distributed Systems. In Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing (PODC ’88). ACM, New York, NY, USA, 8–17. DOI: Google Scholar
Digital Library
- Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC’14). USENIX Association, Berkeley, CA, USA, 305–320. http://dl.acm.org/citation.cfm?id=2643634.2643666Google Scholar
Digital Library
- Orbit 2016. Orbit - Virtual Actors for the JVM. BSD 3-clause license. https://github.com/orbit/orbit . (2016).Google Scholar
- Orleans 2016. Orleans - Distributed Virtual Actor Model for .NET. MIT license. https://github.com/dotnet/orleans . (2016).Google Scholar
- Outages 2013. The Year in Downtime: The Top 10 Outages of 2013. http://www.datacenterknowledge.com/archives/2013/ 12/16/year-downtime-top-10-outages-2013/ . (2013).Google Scholar
- Ponemon 2013. Ponemon Institute: 2013 Study on Data Center Outages. http://www.emersonnetworkpower.com/ documentation/en-us/brands/liebert/documents/%20white%20papers/2013_emerson_data_center_outages_sl-24679. pdf . (2013).Google Scholar
- Russell Power and Jinyang Li. 2010. Piccolo: Building Fast, Distributed Programs with Partitioned Tables. In Operating Systems Design and Implementation (OSDI). 293–306.Google Scholar
- Raft 2016. The raft consensus algorithm. https://raft.github.io/ . (2016).Google Scholar
- Redis 2016. http://redis.io/documentation/ . (2016).Google Scholar
- Riak 2016. Riak KV. http://docs.basho.com/riak/kv/ . (2016).Google Scholar
- James B. Rothnie and Nathan Goodman. 1977. A Survey of Research and Development in Distributed Database Management. In International Conference on Very Large Databases (VLDB). 48–62.Google Scholar
- Yasushi Saito and Marc Shapiro. 2005. Optimistic Replication. ACM Comput. Surv. 37, 1 (March 2005), 42–81. DOI: Google Scholar
Digital Library
- Marco Serafini, Dan Dobre, Matthias Majuntke, Péter Bokor, and Neeraj Suri. 2010. Eventually Linearizable Shared Objects. In Proceedings of the 29th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC ’10). ACM, New York, NY, USA, 95–104. DOI: Google Scholar
Digital Library
- SF Reliable Actors 2016. Service Fabric Reliable Actors. Available for the Windows Azure platform, see https://azure. microsoft.com/en-us/documentation/articles/service-fabric-reliable-actors-get-started/ . (2016).Google Scholar
- M. Shapiro, N. Preguica, C. Baquero, and M. Zawirski. 2011a. A comprehensive study of convergent and commutative replicated data types. Technical Report Rapport de recherche 7506. INRIA.Google Scholar
- Mark Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011b. Conflict-free Replicated Data Types. In 13th Int. Symp. on Stabilization, Safety, and Security of Distributed Systems (SSS). Grenoble, France. Google Scholar
Cross Ref
- Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li. 2011. Transactional Storage for Geo-replicated Systems. In ACM Symposium on Operating Systems Principles (SOSP). ACM, New York, NY, USA, 385–400. DOI: Google Scholar
Digital Library
- Chunqiang Tang, DeQing Chen, Sandhya Dwarkadas, and Michael L. Scott. 2003. Efficient Distributed Shared State for Heterogeneous Machine Architectures. In Proceedings of the 23rd International Conference on Distributed Computing Systems (ICDCS ’03). IEEE Computer Society, Washington, DC, USA, 560–. http://dl.acm.org/citation.cfm?id=850929. 851916Google Scholar
- D. Terry, M. Theimer, K. Petersen, A. Demers, M. Spreitzer, and C. Hauser. 1995. Managing update conflicts in Bayou, a weakly connected replicated storage system. SIGOPS Oper. Syst. Rev. 29 (December 1995), 172–182. Issue 5. DOI: Google Scholar
Digital Library
- Douglas B. Terry. 2008. Replicated Data Management for Mobile Computing. Morgan & Claypool Publisher.Google Scholar
- Douglas B. Terry. 2013. Replicated Data Consistency Explained Through Baseball. Commun. ACM 56, 12 (Dec. 2013), 82–89. DOI: Google Scholar
Digital Library
- TPC-W 2005. TPC-W: Benchmarking An Ecommerce Solution. Revision 1.2, available at http://www.tpc.org/tpcw/tpc-w_ wh.pdf . (2005).Google Scholar
- Robbert van Renesse, Kenneth P. Birman, and Silvano Maffeis. 1996. Horus: A Flexible Group Communication System. Commun. ACM 39, 4 (April 1996), 76–83. DOI: Google Scholar
Digital Library
- Windows Azure Cache 2016. http://www.windowsazure.com/en-us/documentation/services/cache . (2016).Google Scholar
Index Terms
Geo-distribution of actor-based services
Recommendations
Tutorial on geo-replication in data center applications
SIGMETRICS '13: Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systemsData center applications increasingly require a *geo-replicated* storage system, that is, a storage system replicated across many geographic locations. Geo-replication can reduce access latency, improve availability, and provide disaster tolerance. It ...
R-Memcached: a consistent cache replication scheme with Memcached
Middleware Posters and Demos '14: Proceedings of the Posters and Demos Session of the 15th International Middleware ConferenceCaching is one of the key approaches to improve software system efficiency. And Memcached is a widely-used open source implementation of distributed object caching mechanism. However, Memcached is subject to hotspot problem and single point of failure. ...
FlashTier: a lightweight, consistent and durable storage cache
EuroSys '12: Proceedings of the 7th ACM european conference on Computer SystemsThe availability of high-speed solid-state storage has introduced a new tier into the storage hierarchy. Low-latency and high-IOPS solid-state drives (SSDs) cache data in front of high-capacity disks. However, most existing SSDs are designed to be a ...






Comments