skip to main content
research-article
Open Access

Isotope: ACID Transactions for Block Storage

Published:16 February 2017Publication History
Skip Abstract Section

Abstract

Existing storage stacks are top heavy and expect little from block storage. As a result, new high-level storage abstractions—and new designs for existing abstractions—are difficult to realize, requiring developers to implement from scratch complex functionality such as failure atomicity and fine-grained concurrency control. In this article, we argue that pushing transactional isolation into the block store (in addition to atomicity and durability) is both viable and broadly useful, resulting in simpler high-level storage systems that provide strong semantics without sacrificing performance. We present Isotope, a new block store that supports ACID transactions over block reads and writes. Internally, Isotope uses a new multiversion concurrency control protocol that exploits fine-grained, subblock parallelism in workloads and offers both strict serializability and snapshot isolation guarantees. We implemented several high-level storage systems over Isotope, including two key-value stores that implement the LevelDB API over a hash table and B-tree, respectively, and a POSIX file system. We show that Isotope’s block-level transactions enable systems that are simple (100s of lines of code), robust (i.e., providing ACID guarantees), and fast (e.g., 415MB/s for random file writes). We also show that these systems can be composed using Isotope, providing applications with transactions across different high-level constructs such as files, directories, and key-value pairs.

References

  1. Abutalib Aghayev and Peter Desnoyers. 2015. Skylight—a window on shingled disk operation. In USENIX Conference on File and Storage Technologies (FAST’15). USENIX Association, 135--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. 2007. Sinfonia: A new paradigm for building scalable distributed systems. ACM SIGOPS Operating Systems Review 41, 6 (2007), 159--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Khalil Amiri, Garth A. Gibson, and Richard Golding. 2000. Highly concurrent shared storage. In International Conference on Distributed Computing Systems. IEEE, 298--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Anirudh Badam and Vivek S. Pai. 2011. SSDAlloc: Hybrid SSD/RAM memory management made easy. In USENIX Conference on Networked Systems Design and Implementation (NSDI’11). USENIX Association, 211--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, and John D. Davis. 2012. CORFU: A shared log design for flash clusters. In USENIX Conference on Networked Systems Design and Implementation (NSDI’12). USENIX Association, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O’Neil, and Patrick O’Neil. 1995. A critique of ANSI SQL isolation levels. ACM SIGMOD Record 24, 2 (1995), 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1987. Concurrency Control and Recovery in Database Systems. Vol. 370. Addison-Wesley, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chia Chao, Robert English, David Jacobson, Alexander Stepanov, and John Wilkes. 1992. Mime: A High Performance Parallel Storage Device with Strong Recovery Guarantees. Technical Report. HPL-CSP-92-9, Hewlett-Packard Laboratories.Google ScholarGoogle Scholar
  9. Joel Coburn, Trevor Bunker, Meir Schwarz, Rajesh Gupta, and Steven Swanson. 2013. From ARIES to MARS: Transaction support for next-generation, solid-state drives. In ACM Symposium on Operating Systems Principles (SOSP’13). ACM, 197--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. ACM SIGARCH Computer Architecture News 39, 1 (2011), 105--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In ACM Symposium on Cloud Computing (SoCC’10). ACM, 143--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Brian Cornell, Peter A. Dinda, and Fabián E. Bustamante. 2004. Wayback: A user-level versioning file system for Linux. In USENIX Annual Technical Conference (ATC’04). USENIX Association, 19--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Brendan Cully, Jake Wires, Dutch Meyer, Kevin Jamieson, Keir Fraser, Tim Deegan, Daniel Stodden, Geoffrey Lefebvre, Daniel Ferstay, and Andrew Warfield. 2014. Strata: Scalable high-performance storage on virtualized non-volatile memory. In USENIX Conference on File and Storage Technologies (FAST’14). USENIX Association, 17--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Wiebren De Jonge, M. Frans Kaashoek, and Wilson C. Hsieh. 1993. The logical disk: A new approach to improving file systems. ACM SIGOPS Operating Systems Review 27, 5 (1993), 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael R. Stonebraker, and David A. Wood. 1984. Implementation techniques for main memory database systems. In ACM SIGMOD International Conference on Management of Data. ACM, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, and Robert Endre Tarjan. 1986. Making data structures persistent. In ACM Symposium on Theory of Computing (STOC’86). ACM, 109--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Robert M. English and Alexander A. Stepanov. 1992. Loge: A self-organizing disk controller. In USENIX Winter Technical Conference. USENIX Association, 237--251.Google ScholarGoogle Scholar
  18. Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and concurrent MemCache with dumber caching and smarter hashing. In USENIX Symposium on Networked Systems Design and Implementation (NSDI’13). USENIX Association, 371--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. fcntl(2) Linux manual page. 2016. fcntl(2) Linux manual page. Retrieved from http://man7.org/linux/man-pages/man2/fcntl.2.html.Google ScholarGoogle Scholar
  20. Filesystem in Userspace. 2016. Retrieved from https://github.com/libfuse/libfuse.Google ScholarGoogle Scholar
  21. Michail Flouris and Angelos Bilas. 2004. Clotho: Transparent data versioning at the block I/O level. In IEEE Conference on Mass Storage Systems and Technologies (MSST’04). IEEE, 315--328.Google ScholarGoogle Scholar
  22. Fusion-io. 2015. Fusion-io. Retrieved from http://www.fusionio.com.Google ScholarGoogle Scholar
  23. Gregory R. Ganger. 2001. Blurring the Line Between OSes and Storage Devices. School of Computer Science, Carnegie Mellon University.Google ScholarGoogle Scholar
  24. Google. 2016. LevelDB benchmarks. Retrieved from https://github.com/google/leveldb/blob/master/doc/benchmark.html.Google ScholarGoogle Scholar
  25. Rachid Guerraoui and Michal Kapalka. 2008. On the correctness of transactional memory. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’08). ACM, 175--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tim Harris, James Larus, and Ravi Rajwar. 2010. Transactional Memory. Morgan and Claypool Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Dave Hitz, James Lau, and Michael Malcolm. 1994. File system design for an NFS file server appliance. In USENIX Winter Technical Conference. USENIX Association, 235--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. IOzone. 2016. IOzone filesystem benchmark. Retrieved from http://www.iozone.org.Google ScholarGoogle Scholar
  29. Jithin Jose, Mohammad Banikazemi, Wendy Belluomini, Chet Murthy, and Dhabaleswar K Panda. 2013. MetaData persistence using storage class memory: Experiences with flash-backed DRAM. In Proceedings of Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads (INFLOW’13). ACM, 3:1--3:7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hsiang-Tsung Kung and John T. Robinson. 1981. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS) 6, 2 (1981), 213--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. David E. Lowell and Peter M. Chen. 1997. Free transactions with rio vista. ACM SIGOPS Operating Systems Review 31, 5 (1997), 92--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. John MacCormick, Nick Murphy, Marc Najork, Chandramohan A. Thekkath, and Lidong Zhou. 2004. Boxwood: Abstractions as the foundation for storage infrastructure. In USENIX Symposium on Opearting Systems Design and Implementation (OSDI’04). USENIX Association, 105--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mike Mesnier, Gregory R. Ganger, and Erik Riedel. 2003. Object-based storage. IEEE Communications Magazine 41, 8 (2003), 84--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dutch T. Meyer, Gitika Aggarwal, Brendan Cully, Geoffrey Lefebvre, Michael J. Feeley, Norman C. Hutchinson, and Andrew Warfield. 2008. Parallax: Virtual disks for virtual machines. ACM SIGOPS Operating Systems Review 42, 4 (2008), 41--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Microsoft. 2016a. Storage Spaces. Retrieved from http://technet.microsoft.com/en-us/library/hh831739.aspx.Google ScholarGoogle Scholar
  36. Microsoft. 2016b. WinFS. Retrieved from http://blogs.msdn.com/b/winfs/.Google ScholarGoogle Scholar
  37. C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS) 17, 1 (1992), 94--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kiran-Kumar Muniswamy-Reddy, Charles P. Wright, Andrew Himmer, and Erez Zadok. 2004. A versatile and user-oriented versioning file system. In USENIX Conference on File and Storage Technologies (FAST’04). USENIX Association, 115--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell, and Yutaka Suzue. 2012. Flat datacenter storage. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). USENIX Association, 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Michael A. Olson. 1993. The design and implementation of the inversion file system. In USENIX Winter Technical Conference. USENIX Association, 205--218.Google ScholarGoogle Scholar
  41. Avery Pennarun. 2016. Everything you never wanted to know about file locking. Retrieved from http://apenwarr.ca/log/?m=201012#13.Google ScholarGoogle Scholar
  42. Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel. 2009. Operating system transactions. In ACM Symposium on Operating Systems Principles (SOSP’09). ACM, 161--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Vijayan Prabhakaran, Thomas L. Rodeheffer, and Lidong Zhou. 2008. Transactional flash. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). USENIX Association, 147--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sean Quinlan and Sean Dorward. 2002. Venti: A new approach to archival storage. In USENIX Conference on File and Storage Technologies (FAST’02). USENIX Association, 89--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Colin Reid, Philip A. Bernstein, Ming Wu, and Xinhao Yuan. 2011. Optimistic concurrency control by melding trees. Proceedings of the VLDB Endowment 4, 11 (2011).Google ScholarGoogle Scholar
  46. Jerome H. Saltzer, David P. Reed, and David D. Clark. 1984. End-to-end arguments in system design. ACM Transactions on Computer Systems (TOCS) 2, 4 (1984), 277--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. SanDisk. 2015a. SanDisk Fusion-io Atomic Multi-Block Writes. Retrieved from http://www.sandisk.com/assets/docs/accelerate-myql-open-source-databases-with-sandisk-nvmfs-and-fusion-iomemory-sx300-application-accelerators.pdf.Google ScholarGoogle Scholar
  48. SanDisk. 2015b. SanDisk Fusion-io Auto-Commit Memory. Retrieved from http://web.sandisk.com/assets/white-papers/MySQL_High-Speed_Transaction_Logg ing.pdf.Google ScholarGoogle Scholar
  49. Douglas S. Santry, Michael J. Feeley, Norman C. Hutchinson, Alistair C. Veitch, Ross W. Carton, and Jacob Ofir. 1999. Deciding when to forget in the elephant file system. ACM SIGOPS Operating Systems Review 33, 5 (1999), 110--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Mahadev Satyanarayanan, Henry H. Mashburn, Puneet Kumar, David C. Steere, and James J. Kistler. 1994. Lightweight recoverable virtual memory. ACM Transactions on Computer Systems (TOCS) 12, 1 (1994), 33--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Mohit Saxena, Mehul A. Shah, Stavros Harizopoulos, Michael M. Swift, and Arif Merchant. 2012a. Hathi: Durable transactions for memory using flash. In International Workshop on Data Management on New Hardware. ACM, 33--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Mohit Saxena, Michael M. Swift, and Yiying Zhang. 2012b. FlashTier: A lightweight, consistent and durable storage cache. In ACM European Conference on Computer Systems (EuroSys’12). ACM, 267--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Seagate. 2016. Seagate Kinetic Open Storage Platform. Retrieved from http://www.seagate.com/solutions/cloud/data-center-cloud/platforms/.Google ScholarGoogle Scholar
  54. Russell Sears and Eric Brewer. 2006. Stasis: Flexible transactional storage. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). USENIX Association, 29--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Nir Shavit and Dan Touitou. 1997. Software transactional memory. Distributed Computing 10, 2 (1997), 99--116.Google ScholarGoogle ScholarCross RefCross Ref
  56. Ji-Yong Shin, Mahesh Balakrishnan, Tudor Marian, and Hakim Weatherspoon. 2013. Gecko: Contention-oblivious disk arrays for cloud storage. In USENIX Conference on File and Storage Technologies (FAST’13). USENIX Association, 213--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Muthian Sivathanu, Vijayan Prabhakaran, Florentina I. Popovici, Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2003. Semantically-smart disk systems. In USENIX Conference on File and Storage Technologies (FAST’03). USENIX Association, 73--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Dimitris Skourtis, Dimitris Achlioptas, Noah Watkins, Carlos Maltzahn, and Scott Brandt. 2014. Flash on rails: Consistent flash performance through redundancy. In USENIX Annual Technical Conference (ATC’14). USENIX Association, 463--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Gokul Soundararajan, Vijayan Prabhakaran, Mahesh Balakrishnan, and Ted Wobber. 2010. Extending SSD lifetimes with disk-based write caches. In USENIX Conference on File and Storage Technologies (FAST’10). USENIX Association, 101--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li. 2011. Transactional storage for geo-replicated systems. In ACM Symposium on Operating Systems Principles (SOSP’11). ACM, 385--400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Lex Stein. 2005. Stupid file systems are better. In Workshop on Hot Topics in Operating Systems (HotOS’05). USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Alexander Thomson and Daniel J. Abadi. 2015. CalvinFS: Consistent WAN replication and scalable metadata management for distributed file systems. In USENIX Conference on File and Storage Technologies (FAST’15). USENIX Association, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. ACM SIGARCH Computer Architecture News 39, 1 (2011), 91--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Randolph Y. Wang, Thomas E. Anderson, and David A. Patterson. 1998. Virtual log based file systems for a programmable disk. Operating Systems Review 33 (1998), 29--44.Google ScholarGoogle Scholar
  65. John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan. 1996. The HP AutoRAID hierarchical storage system. ACM Transactions on Computer Systems (TOCS) 14, 1 (1996), 108--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Charles P. Wright, Richard Spillane, Gopalan Sivathanu, and Erez Zadok. 2007. Extending ACID semantics to the file system. ACM Transactions on Storage (TOS) 3, 2 (2007), 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. De-indirection for flash-based SSDs with nameless writes. In USENIX Conference on File and Storage Technologies (FAST’12). USENIX Association, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Isotope: ACID Transactions for Block Storage

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Storage
            ACM Transactions on Storage  Volume 13, Issue 1
            Special Issue on USENIX FAST 2016 and Regular Papers
            February 2017
            201 pages
            ISSN:1553-3077
            EISSN:1553-3093
            DOI:10.1145/3054178
            • Editor:
            • Sam H. Noh
            Issue’s Table of Contents

            Copyright © 2017 Owner/Author

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 February 2017
            • Accepted: 1 December 2016
            • Received: 1 September 2016
            Published in tos Volume 13, Issue 1

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!