skip to main content
research-article
Public Access

Enabling Lightweight Transactions with Precision Time

Published:04 April 2017Publication History
Skip Abstract Section

Abstract

Distributed transactional storage is an important service in today's data centers. Achieving high performance without high complexity is often a challenge for these systems due to sophisticated consistency protocols and multiple layers of abstraction. In this paper we show how to combine two emerging technologies---Software-Defined Flash (SDF) and precise synchronized clocks---to improve performance and reduce complexity for transactional storage within the data center.

We present a distributed transactional system (called MILANA) as a layer above a durable multi-version key-value store (called SEMEL) for read-heavy workloads within a data center. SEMEL exploits write behavior of SSDs to maintain a time-ordered sequence of versions for each key efficiently and durably. MILANA adds a variant of optimistic concurrency control above SEMEL's API to service read requests from a consistent snapshot and to enable clients to make fast local commit or abort decisions for read-only transactions.

Experiments with the prototype reveal up to 43% lower transaction abort rates using IEEE Precision Time Protocol (PTP) vs. the standard Network Time Protocol (NTP). Under the Retwis benchmark, client-local validation of read-only transactions yields a 35% reduction in latency and 55% increase in transaction throughput.

References

  1. Atul Adya, Robert Gruber, Barbara Liskov, and Umesh Maheshwari. Efficient optimistic concurrency control using loosely synchronized clocks. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, SIGMOD '95, pages 23--34. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. Design tradeoffs for ssd performance. In USENIX 2008 Annual Technical Conference, ATC'08, pages 57--70. USENIX Association, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. A scalable, commodity data center network architecture. In Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication, SIGCOMM '08, pages 63--74, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. Fawn: A fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP '09, pages 1--14. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '12, pages 53--64, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jason Baker, Chris Bond, James C Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In CIDR, volume 11, pages 223--234, 2011.Google ScholarGoogle Scholar
  7. Ilya Baldin, Jeff Chase, Yufeng Xin, Anirban Mandal, Paul Ruth, Claris Castillo, Victor Orlikowski, Chris Heermann, and Jonathan Mills. ExoGENI: A Multi-Domain Infrastructure-as-a-Service Testbed, pages 279--315. Springer International Publishing, Cham, 2016.Google ScholarGoogle Scholar
  8. Philip A. Bernstein and Nathan Goodman. Multiversion concurrency control - theory and algorithms. ACM Transactions on Database Systems, 8(4):465--483, December 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Philip A. Bernstein, Vassco Hadzilacos, and Nathan Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1987.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Philip A. Bernstein, Colin W. Reid, and Sudipto Das. Hyder - A transactional record manager for shared flash. In CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 9--12, 2011, Online Proceedings, pages 9--20, 2011.Google ScholarGoogle Scholar
  11. Matias Bjørling, Javier González, and Philippe Bonnet. Lightnvm: The linux open-channel ssd subsystem. In 15th USENIX Conference on File and Storage Technologies (FAST). USENIX, 2017.Google ScholarGoogle Scholar
  12. Matias Bjørling. Operating System Support for High-Performance Solid State Drives. PhD thesis, Denmark, 2016.Google ScholarGoogle Scholar
  13. Adrian M. Caulfield, Todor I. Mollov, Louis Alex Eisner, Arup De, Joel Coburn, and Steven Swanson. Providing safe, user space access to fast, solid state disks. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 387--400, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Joel Coburn, Trevor Bunker, Meir Schwarz, Rajesh Gupta, and Steven Swanson. From aries to mars: Transaction support for next-generation, solid-state drives. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 197--212, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. Pnuts: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2):1277--1288, August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. Spanner: Google's globally-distributed database. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 251--264. USENIX Association, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Biplob Debnath, Sudipta Sengupta, and Jin Li. Flashstore: High throughput persistent key-value store. Proc. VLDB Endow., 3(1--2):1414--1425, September 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Biplob Debnath, Sudipta Sengupta, and Jin Li. Skimpystash: Ram space skimpy key-value store on flash-based storage. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD '11, pages 25--36. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP '07, pages 205--220. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bailu Ding, Lucja Kot, Alan Demers, and Johannes Gehrke. Centiman: Elastic, high performance optimistic concurrency control by watermarking. In Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC '15, pages 262--275. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. Farm: Fast remote memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 401--414, Berkeley, CA, USA, 2014. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Aleksandar Dragojević, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. No compromises: Distributed transactions with consistency, availability, and performance. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP '15, pages 54--70, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. J. Fidge. Timestamps in message-passing systems that preserve the partial ordering. Proceedings of the 11th Australian Computer Science Conference, 10(1):56--66, 1988.Google ScholarGoogle Scholar
  24. C. Gray and D. Cheriton. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, SOSP '89, pages 202--210, New York, NY, USA, 1989. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Aayush Gupta, Youngjae Kim, and Bhuvan Urgaonkar. Dftl: A flash translation layer employing demand-based selective caching of page-level address mappings. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pages 229--240, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jian Huang, Anirudh Badam, Moinuddin K. Qureshi, and Karsten Schwan. Unified address translation for memory-mapped ssds with flashmap. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture, ISCA '15, pages 580--591. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. Zookeeper: Wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC'10, pages 11--11. USENIX Association, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. IEEE. Ieee standard for a precision clock synchronization protocol for networked measurement and control systems. IEEE Std 1588--2008 (Revision of IEEE Std 1588--2002), pages 1--269, 2008.Google ScholarGoogle Scholar
  29. William K. Josephson, Lars A. Bongo, Kai Li, and David Flynn. Dfs: A file system for virtualized flash storage. Trans. Storage, 6(3):14:1--14:25, September 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Anuj Kalia, Michael Kaminsky, and David G. Andersen. Using rdma efficiently for key-value services. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM '14, pages 295--306, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC '97, pages 654--663, New York, NY, USA, 1997. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. T. Kung and John T. Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems, 6(2):213--226, June 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Avinash Lakshman and Prashant Malik. Cassandra: A decentralized structured storage system. SIGOPS Operating Systems Review, 44(2):35--40, April 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Leslie Lamport. Paxos made simple. ACM Sigact News, 32(4):18--25, 2001.Google ScholarGoogle Scholar
  35. Costin Leau. Spring data redis -- retwis-j, 2013. http://docs.spring.io/spring-data/data-keyvalue/examples/retwisj/current/.Google ScholarGoogle Scholar
  36. Collin Lee, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout. Implementing linearizability at large scale and low latency. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP '15, pages 71--86, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ki Suh Lee, Han Wang, Vishal Shrivastav, and Hakim Weatherspoon. Globally synchronized time via datacenter networks. In To appear in Proceedings of ACM SIGCOMM, August 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Charles E. Leiserson. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Trans. Comput., 34(10):892--901, October 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. Silt: A memory-efficient, high-performance key-value store. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 1--13. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Barbara Liskov and James Cowling. Viewstamped replication revisited. Technical Report MIT-CSAIL-TR-2012-021, MIT, July 2012.Google ScholarGoogle Scholar
  41. Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, Raju Rangaswami, Sushma Devendrappa, Bharath Ramsundar, and Sriram Ganesan. Nvmkv: A scalable and lightweight flash aware key-value store. In 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 14), Philadelphia, PA, June 2014. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Friedemann Mattern. Virtual time and global states of distributed systems. In Parallel and Distributed Algorithms, pages 215--226. North-Holland, 1989.Google ScholarGoogle Scholar
  43. Christopher Mitchell, Yifeng Geng, and Jinyang Li. Using one-sided rdma reads to build a fast, cpu-efficient key-value store. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference, USENIX ATC'13, pages 103--114, Berkeley, CA, USA, 2013. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. Scaling memcache at facebook. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 385--398, Lombard, IL, 2013. USENIX.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Brian M. Oki and Barbara H. Liskov. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing, PODC '88, pages 8--17. ACM, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. SDF: Software-defined Flash for Web-scale Internet Storage Systems. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 471--484, New York, NY, USA, 2014. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Xiangyong Ouyang, David Nellans, Robert Wipfel, David Flynn, and Dhabaleswar K. Panda. Beyond block i/o: Rethinking traditional storage primitives. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture, HPCA '11, pages 301--311, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarGoogle ScholarCross RefCross Ref
  48. Vijayan Prabhakaran, Thomas L. Rodeheffer, and Lidong Zhou. Transactional flash. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pages 147--160, Berkeley, CA, USA, 2008. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout. Log-structured memory for dram-based storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies, FAST'14, pages 1--16, Berkeley, CA, USA, 2014. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Mohit Saxena, Michael M. Swift, and Yiying Zhang. Flashtier: A lightweight, consistent and durable storage cache. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys '12, pages 267--280. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. Willow: A user-programmable ssd. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 67--80, Berkeley, CA, USA, 2014. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Sriram Subramanian, Swaminathan Sundararaman, Nisha Talagala, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Snapshots in a flash with iosnap. In Proceedings of the Ninth European Conference on Computer Systems, EuroSys '14, pages 23:1--23:14. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 1--12. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. Fast in-memory transaction processing using rdma and htm. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP '15, pages 87--104, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Zev Weiss, Sriram Subramanian, Swaminathan Sundararaman, Vinay Sridhar, Nisha Talagala, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. MjÖlnir: Collecting trash in a demanding new world. In Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads, INFLOW '15, pages 4:1--4:10. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zev Weiss, Sriram Subramanian, Swaminathan Sundararaman, Nisha Talagala, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Anvil: Advanced virtualization for modern non-volatile memory devices. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST'15, pages 111--118. USENIX Association, 2015.Google ScholarGoogle Scholar
  57. Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan R. K. Ports. Building consistent transactions with inconsistent replication. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP '15, pages 263--278. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. De-indirection for flash-based ssds with nameless writes. In Proceedings of the 10th USENIX Conference on File and Storage Technologies, FAST'12, pages 1--1. USENIX Association, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Yiying Zhang, Jian Yang, Amirsaman Memaripour, and Steven Swanson. Mojim: A reliable and highly-available non-volatile memory system. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 3--18, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Enabling Lightweight Transactions with Precision Time

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Article Metrics

      • Downloads (Last 12 months)42
      • Downloads (Last 6 weeks)9

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!