skip to main content
research-article
Public Access

ReFlex: Remote Flash ≈ Local Flash

Published:04 April 2017Publication History
Skip Abstract Section

Abstract

Remote access to NVMe Flash enables flexible scaling and high utilization of Flash capacity and IOPS within a datacenter. However, existing systems for remote Flash access either introduce significant performance overheads or fail to isolate the multiple remote clients sharing each Flash device. We present ReFlex, a software-based system for remote Flash access, that provides nearly identical performance to accessing local Flash. ReFlex uses a dataplane kernel to closely integrate networking and storage processing to achieve low latency and high throughput at low resource requirements. Specifically, ReFlex can serve up to 850K IOPS per core over TCP/IP networking, while adding 21us over direct access to local Flash. ReFlex uses a QoS scheduler that can enforce tail latency and throughput service-level objectives (SLOs) for thousands of remote clients. We show that ReFlex allows applications to use remote Flash while maintaining their original performance with local Flash.

References

  1. IX-project: protected dataplane for low latency and high performance. https://github.com/ix-project, 2016.Google ScholarGoogle Scholar
  2. Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark S. Manasse, and Rina Panigrahy. Design tradeoffs for ssd performance. In USENIX Annual Technical Conference, pages 57--70, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. Disk-locality in datacenter computing considered irrelevant. In Proc. of USENIX Hot Topics in Operating Systems, HotOS'13, pages 12--12, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eric Anderson, Michael Hobbs, Kimberly Keeton, Susan Spence, Mustafa Uysal, and Alistair Veitch. Hippodrome: Running circles around storage administration. In Proc. of the 1st USENIX Conference on File and Storage Technologies, FAST '02. USENIX Association, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sebastian Angel, Hitesh Ballani, Thomas Karagiannis, Greg O\textquoterightShea, and Eno Thereska. End-to-end performance isolation through virtual datacenters. In Proc. of USENIX Operating Systems Design and Implementation, OSDI'14, pages 233--248, October 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Avago Technologies. Storage and PCI Express -- A Natural Combination. http://www.avagotech.com/applications/datacenters/enterprise-storage , 2016.Google ScholarGoogle Scholar
  7. Jens Axboe. Linux block IO-present and future. In Ottawa Linux Symp, pages 51--61, 2004.Google ScholarGoogle Scholar
  8. Microsoft Azure. Storage. https://azure.microsoft.com/en-us/services/storage/, 2016.Google ScholarGoogle Scholar
  9. Luiz André Barroso and Urs Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. 2009.Google ScholarGoogle Scholar
  10. Adam Belay, Andrea Bittau, Ali Mashtizadeh, David Terei, David Mazières, and Christos Kozyrakis. Dune: Safe user-level access to privileged cpu features. In Proc. of USENIX Operating Systems Design and Implementation, OSDI'12, pages 335--348, 2012.Google ScholarGoogle Scholar
  11. Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. IX: A protected dataplane operating system for high throughput and low latency. In Proc. of USENIX Operating Systems Design and Implementation, OSDI'14, pages 49--65, October 2014.Google ScholarGoogle Scholar
  12. Matias Bjørling, Jens Axboe, David Nellans, and Philippe Bonnet. Linux block io: introducing multi-queue ssd access on multi-core systems. In Proc. of International Systems and Storage Conference, page 22. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Simona Boboila and Peter Desnoyers. Write endurance in flash drives: Measurements and analysis. In Proc. of USENIX Conference on File and Storage Technologies, FAST'10, pages 9--9. USENIX Association, 2010.Google ScholarGoogle Scholar
  14. John Bruno, Jose Brustoloni, Eran Gabber, Banu Ozden, and Abraham Silberschatz. Disk scheduling with quality of service guarantees. In Proc. of the IEEE International Conference on Multimedia Computing and Systems - Volume 2, ICMCS '99, pages 400--405. IEEE Computer Society, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Adrian M. Caulfield and Steven Swanson. QuickSAN: A storage area network for fast, distributed, solid state disks. In Proc. of International Symposium on Computer Architecture, ISCA '13, pages 464--474. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mallikarjun Chadalapaka, Hemal Shah, Uri Elzur, Patricia Thaler, and Michael Ko. A study of iSCSI extensions for RDMA (iSER). In Proc. of ACM SIGCOMM Workshop on Network-I/O Convergence: Experience, Lessons, Implications, NICELI '03, pages 209--219. ACM, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A distributed storage system for structured data. In Proc. of USENIX Symposium on Operating Systems Design and Implementation - Volume 7, OSDI '06, pages 205--218. USENIX Association, 2006.Google ScholarGoogle Scholar
  18. Chelsio Communications. NVM Express over Fabrics. http://www.chelsio.com/wp-content/uploads/resources/NVM_Express_Over_Fabrics.pdf, 2014.Google ScholarGoogle Scholar
  19. François Alexandre Colombani. HDD, SSHD, SSD or PCIe SSD. Storage Newsletter, http://www.storagenewsletter.com/rubriques/market-reportsresearch/hdd-sshd-ssd-or-pcie-ssd/, 2015.Google ScholarGoogle Scholar
  20. A. Demers, S. Keshav, and S. Shenker. Analysis and simulation of a fair queueing algorithm. In Symposium Proceedings on Communications Architectures &Amp; Protocols, SIGCOMM '89, pages 1--12. ACM, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Adam Dunkels. Design and implementation of the lwip, 2001.Google ScholarGoogle Scholar
  22. Facebook Inc. RocksDB: A persistent key-value store for fast storage environments. http://rocksdb.org, 2015.Google ScholarGoogle Scholar
  23. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. In Proc. of ACM Symposium on Operating Systems Principles, SOSP '03, pages 29--43. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ajay Gulati, Irfan Ahmad, and Carl A. Waldspurger. PARDA: proportional allocation of resources for distributed storage access. In Proc. of USENIX File and Storage Technologies, FAST '09, pages 85--98, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ajay Gulati, Arif Merchant, Mustafa Uysal, Pradeep Padala, and Peter Varman. Efficient and adaptive proportional share I/O scheduling. SIGMETRICS Perform. Eval. Rev., 37(2):79--80, October 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ajay Gulati, Arif Merchant, and Peter J. Varman. pclock: An arrival curve based approach for qos guarantees in shared storage systems. In Proc. of ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '07, pages 13--24. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ajay Gulati, Arif Merchant, and Peter J. Varman. mClock: handling throughput variability for hypervisor io scheduling. In Proc. of USENIX Operating Systems Design and Implementation, OSDI'10, pages 437--450, 2010.Google ScholarGoogle Scholar
  28. Ajay Gulati, Ganesha Shanmuganathan, Irfan Ahmad, Carl Waldspurger, and Mustafa Uysal. Pesto: Online storage performance management in virtualized datacenters. In Proc. of the 2Nd ACM Symposium on Cloud Computing, SOCC '11, pages 19:1--19:14. ACM, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Intel Corp. Intel Rack Scale Architecture Platform. http://www.intel.com/content/dam/www/public/us/en/documents/guides/rack-scale-hardware-guide.pdf, 2015.Google ScholarGoogle Scholar
  30. Intel Corp. Dataplane Performance Development Kit. https://dpdk.org, 2016.Google ScholarGoogle Scholar
  31. Intel Corp. Storage Performance Development Kit. https://01.org/spdk, 2016.Google ScholarGoogle Scholar
  32. Jens Axboe. FIO: Flexible I/O Tester. https://github.com/axboe/fio, 2015.Google ScholarGoogle Scholar
  33. Eun Young Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. mTCP: A highly scalable user-level TCP stack for multicore systems. In Proc. of USENIX Networked Systems Design and Implementation, NSDI'14, pages 489--502, 2014.Google ScholarGoogle Scholar
  34. Wei Jin, Jeffrey S. Chase, and Jasleen Kaur. Interposed proportional sharing for a storage service utility. In Proc. of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '04/Performance '04, pages 37--48. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Abhijeet Joglekar, Michael E. Kounavis, and Frank L. Berry. A scalable and high performance software iSCSI implementation. In Proc. of USENIX Conference on File and Storage Technologies - Volume 4, FAST'05, pages 20--20. USENIX Association, 2005.Google ScholarGoogle Scholar
  36. Rishi Kapoor, George Porter, Malveeka Tewari, Geoffrey M. Voelker, and Amin Vahdat. Chronos: Predictable low latency for data center applications. In Proc. of the Third ACM Symposium on Cloud Computing, SoCC '12, pages 9:1--9:14, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, and Sanjeev Kumar. Flash storage disaggregation. In Proc. of European Conference on Computer Systems, EuroSys '16, pages 29:1--29:15, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yossi Kuperman, Eyal Moscovici, Joel Nider, Razya Ladelsky, Abel Gordon, and Dan Tsafrir. Paravirtual remote I/O. In Proc. of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '16, pages 49--65. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jure Leskovec and Andrej Krevl. SNAP datasets: Stanford large network dataset collection. 2015.Google ScholarGoogle Scholar
  40. Jacob Leverich. Mutilate: High-Performance Memcached Load Generator. https://github.com/leverich/mutilate, 2014.Google ScholarGoogle Scholar
  41. Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. Tales of the tail: Hardware, OS, and application-level sources of tail latency. In Proc. of the ACM Symposium on Cloud Computing, SOCC '14, pages 9:1--9:14. ACM, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ilias Marinos, Robert N.M. Watson, and Mark Handley. Network stack specialization for performance. In Proc. of ACM SIGCOMM, SIGCOMM'14, pages 175--186, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Menage, Paul. cgroups. https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt, 2004.Google ScholarGoogle Scholar
  44. Arif Merchant, Mustafa Uysal, Pradeep Padala, Xiaoyun Zhu, Sharad Singhal, and Kang G. Shin. Maestro: quality-of-service in large disk arrays. In Proc. of International Conference on Autonomic Computing, ICAC'11, pages 245--254, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Metz, Amber Huffman, Steve Sardella, and Dave Mintrun. The performance impact of NVM Express and NVM Express over Fabrics. http://www.nvmexpress.org/wp-content/uploads/NVMe-Webcast-Slides-20141111-Final.pdf, 2015.Google ScholarGoogle Scholar
  46. Trond Norbye. Memcached Binary Protocol. https://https://github.com/memcached/memcached/blob/master/protocol_binary.h, 2008.Google ScholarGoogle Scholar
  47. NVM Express Inc. NVM Express: the optimized PCI Express SSD interface. http://www.nvmexpress.org, 2015.Google ScholarGoogle Scholar
  48. NVM Express Inc. NVM Express over Fabrics Revision 1.0 . http://www.nvmexpress.org/wp-content/uploads/NVMe_over_Fabrics_1_0_Gold_20160605.pdf , 2016.Google ScholarGoogle Scholar
  49. Open-iSCSi project. iSCSI tools for Linux. https://github.com/open-iscsi/open-iscsi, 2016.Google ScholarGoogle Scholar
  50. Jian Ouyang, Shiding Lin, Jiang Song, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. SDF: software-defined flash for web-scale internet storage systems. In Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 471--484, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Abhay K. Parekh and Robert G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: The single-node case. IEEE/ACM Trans. Netw., 1(3):344--357, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Stan Park and Kai Shen. FIOS: a fair, efficient flash I/O scheduler. In Proc. of USENIX File and Storage Technologies, FAST'12, page 13, 2012.Google ScholarGoogle Scholar
  53. George Prekas, Mia Primorac, Adam Belay, Christos Kozyrakis, and Edouard Bugnion. Energy proportionality and workload consolidation for latency-critical applications. In Proc. of the Sixth ACM Symposium on Cloud Computing, SoCC '15, pages 342--355. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Niels Provos and Nick Mathewson. libevent-an event notification library. http://libevent.org, 2016.Google ScholarGoogle Scholar
  55. Samsung Electronics Co. Samsung PM1725 NVMe PCIe SSD. http://www.samsung.com/semiconductor/global/file/insight/2015/11/pm1725-ProdOverview-2015-0.pdf, 2015.Google ScholarGoogle Scholar
  56. R. Sandberg. Design and implementation of the Sun network filesystem. In In Proc. of USENIX Summer Conference., pages 119--130. 1985.Google ScholarGoogle Scholar
  57. Satran, et al. Internet Small Computer Systems Interface (iSCSI). https://www.ietf.org/rfc/rfc3720.txt, 2004.Google ScholarGoogle Scholar
  58. Kai Shen and Stan Park. FlashFQ: A fair queueing I/O scheduler for flash-based SSDs. In Proc. of USENIX Annual Technical Conference, ATC'13, pages 67--78. USENIX, 2013.Google ScholarGoogle Scholar
  59. Prashant J. Shenoy and Harrick M. Vin. Cello: A disk scheduling framework for next generation operating systems. Technical report, Austin, TX, USA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. M. Shreedhar and George Varghese. Efficient fair queueing using deficit round robin. In Proc. of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM '95, pages 231--242. ACM, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. David Shue and Michael J. Freedman. From application requests to virtual IOPs: provisioned key-value storage with Libra. In Proc. of European Conference on Computer Systems, EuroSys'14, pages 17:1--17:14, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. David Shue, Michael J. Freedman, and Anees Shaikh. Performance isolation and fairness for multi-tenant cloud storage. In Proc. of USENIX Operating Systems Design and Implementation, OSDI'12, pages 349--362, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The Hadoop distributed file system. In Proc. of IEEE Mass Storage Systems and Technologies, MSST '10, pages 1--10. IEEE Computer Society, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Solarflare Communications Inc. OpenOnload. http://www.openonload.org/ , 2013.Google ScholarGoogle Scholar
  65. Ioan Stefanovici, Bianca Schroeder, Greg O'Shea, and Eno Thereska. sRoute: Treating the storage stack like a network. In Proc. of USENIX Conference on File and Storage Technologies, FAST '16, pages 197--212, Santa Clara, CA, 2016.Google ScholarGoogle Scholar
  66. Eno Thereska, Hitesh Ballani, Greg O\textquoterightShea, Thomas Karagiannis, Antony Rowstron, Tom Talpey, Richard Black, and Timothy Zhu. IOFlow: A software-defined storage architecture. In Proc. of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 182--196. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Cheng-Chun Tu, Chao-tang Lee, and Tzi-cker Chiueh. Secure I/O device sharing among virtual machines on multiple hosts. In Proc. of International Symposium on Computer Architecture, ISCA '13, pages 108--119. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Paolo Valente and Fabio Checconi. High throughput disk scheduling with fair bandwidth distribution. IEEE Trans. Computers, 59:1172--1186, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Matthew Wachs, Michael Abd-El-Malek, Eno Thereska, and Gregory R. Ganger. Argon: Performance insulation for shared storage servers. In Proc. of USENIX File and Storage Technologies, FAST '07, pages 5--5, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, and Ion Stoica. Cake: Enabling high-level SLOs on shared storage systems. In Proc. of ACM Symposium on Cloud Computing, SoCC '12, pages 14:1--14:14. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Theodore M. Wong, Richard A. Golding, Caixue Lin, and Ralph A. Becker-Szendy. Zygaria: Storage performance as a managed resource. In Proc. of IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS '06, pages 125--134. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Joel Wu and Scott A. Brandt. The design and implementation of aqua: an adaptive quality of service aware object-based storage device. In Proc. of the 23rd IEEE / 14th NASA Goddard Conference on Mass Storage Systems and Technologies, pages 209--218, May 2006.Google ScholarGoogle Scholar
  73. Jianyong Zhang, Anand Sivasubramaniam, Qian Wang, Alma Riska, and Erik Riedel. Storage performance virtualization via throughput and latency control. Trans. Storage, 2(3):283--308, August 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Yiying Zhang, Leo Prasath Arulraj, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. De-indirection for flash-based ssds with nameless writes. In FAST, page 1, 2012.Google ScholarGoogle Scholar
  75. Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, and Alexander S. Szalay. Flashgraph: Processing billion-node graphs on an array of commodity SSDs. In Proc of USENIX Conference on File and Storage Technologies, FAST '15, pages 45--58, 2015.Google ScholarGoogle Scholar
  76. Timothy Zhu, Alexey Tumanov, Michael A. Kozuch, Mor Harchol-Balter, and Gregory R. Ganger. Prioritymeister: Tail latency QoS for shared networked storage. In Proc. of ACM Symposium on Cloud Computing, SOCC '14, pages 29:1--29:14. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ReFlex: Remote Flash ≈ Local Flash

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 52, Issue 4
        ASPLOS '17
        April 2017
        811 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3093336
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
          April 2017
          856 pages
          ISBN:9781450344654
          DOI:10.1145/3037697

        Copyright © 2017 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 April 2017

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!