skip to main content
10.1145/3575693.3575732acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections

uBFT: Microsecond-Scale BFT using Disaggregated Memory

Published:30 January 2023Publication History

ABSTRACT

We propose uBFT, the first State Machine Replication (SMR) system to achieve microsecond-scale latency in data centers, while using only 2f+1 replicas to tolerate f Byzantine failures. The Byzantine Fault Tolerance (BFT) provided by uBFT is essential as pure crashes appear to be a mere illusion with real-life systems reportedly failing in many unexpected ways. uBFT relies on a small non-tailored trusted computing base—disaggregated memory—and consumes a practically bounded amount of memory. uBFT is based on a novel abstraction called Consistent Tail Broadcast, which we use to prevent equivocation while bounding memory. We implement uBFT using RDMA-based disaggregated memory and obtain an end-to-end latency of as little as 10 us. This is at least 50× faster than MinBFT, a state-of-the-art 2f+1 BFT SMR based on Intel’s SGX. We use uBFT to replicate two KV-stores (Memcached and Redis), as well as a financial order matching engine (Liquibook). These applications have low latency (up to 20 us) and become Byzantine tolerant with as little as 10 us more. The price for uBFT is a small amount of reliable disaggregated memory (less than 1 MiB), which in our prototype consists of a small number of memory servers connected through RDMA and replicated for fault tolerance.

References

  1. Michael Abd-El-Malek, Gregory R. Ganger, Garth R. Goodson, Michael K. Reiter, and Jay J. Wylie. 2005. Fault-Scalable Byzantine Fault-Tolerant Services. SIGOPS Oper. Syst. Rev., 39, 5 (2005), Oct, 59–74. issn:0163-5980 https://doi.org/10.1145/1095809.1095817 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Virendra Marathe, and Igor Zablotchi. 2019. The Impact of RDMA on Agreement. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing (PODC ’19). Association for Computing Machinery, New York, NY, USA. 409–418. isbn:9781450362177 https://doi.org/10.1145/3293611.3331601 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Virendra J. Marathe, Athanasios Xygkis, and Igor Zablotchi. 2020. Microsecond Consensus for Microsecond Applications. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation (OSDI ’20). USENIX Association, Berkeley, CA, USA. Article 34, 18 pages. isbn:978-1-939133-19-9 https://www.usenix.org/conference/osdi20/presentation/aguilera Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Antoine Murat, Athanasios Xygkis, and Igor Zablotchi. 2022. Source code for uBFT: Microsecond-scale BFT using Disaggregated Memory. https://doi.org/10.5281/zenodo.7330354 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Antoine Murat, Athanasios Xygkis, and Igor Zablotchi. 2022. uBFT: Microsecond-scale BFT using Disaggregated Memory (Extended Version). https://doi.org/10.48550/ARXIV.2210.17174 Google ScholarGoogle Scholar
  6. Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Dalia Papuc, Athanasios Xygkis, and Igor Zablotchi. 2021. Frugal Byzantine Computing. In 35th International Symposium on Distributed Computing (DISC ’21), Seth Gilbert (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 209). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany. 3:1–3:19. isbn:978-3-95977-210-5 issn:1868-8969 https://doi.org/10.4230/LIPIcs.DISC.2021.3 Google ScholarGoogle ScholarCross RefCross Ref
  7. Marcos K. Aguilera and Michael Walfish. 2009. No Time for Asynchrony. In Proceedings of the 12th Conference on Hot Topics in Operating Systems (HotOS ’09). USENIX Association, Berkeley, CA, USA. 3. https://www.usenix.org/conference/hotos-xii/no-time-asynchrony Google ScholarGoogle Scholar
  8. Amitanand S. Aiyer, Lorenzo Alvisi, Rida A. Bazzi, and Allen Clement. 2008. Matrix Signatures: From MACs to Digital Signatures in Distributed Systems. In Proceedings of the 22nd International Symposium on Distributed Computing (DISC ’08). Springer-Verlag, Berlin, Heidelberg. 16–31. isbn:9783540877783 https://doi.org/10.1007/978-3-540-87779-0_2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, Srinivasan Muralidharan, Chet Murthy, Binh Nguyen, Manish Sethi, Gari Singh, Keith Smith, Alessandro Sorniotti, Chrysoula Stathakopoulou, Marko Vukolić, Sharon Weed Cocco, and Jason Yellick. 2018. Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains. In Proceedings of the Thirteenth EuroSys Conference (EuroSys ’18). Association for Computing Machinery, New York, NY, USA. Article 30, 15 pages. isbn:9781450355841 https://doi.org/10.1145/3190508.3190538 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Behnaz Arzani, Selim Ciraci, Stefan Saroiu, Alec Wolman, Jack W. Stokes, Geoff Outhred, and Lechao Diwu. 2020. PrivateEye: Scalable and Privacy-Preserving Compromise Detection in the Cloud. In Proceedings of the 17th Usenix Conference on Networked Systems Design and Implementation (NSDI ’20). USENIX Association, Berkeley, CA, USA. 797–816. isbn:9781939133137 https://www.usenix.org/conference/nsdi20/presentation/arzani Google ScholarGoogle Scholar
  11. InfiniBand Trade Association. 2020. InfiniBand Architecture, General Specifications, Memory Placement Extensions. https://cw.infinibandta.org/document/dl/8594 Google ScholarGoogle Scholar
  12. Pierre-Louis Aublin, Sonia Ben Mokhtar, and Vivien Quéma. 2013. RBFT: Redundant Byzantine Fault Tolerance. In Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems (ICDCS ’13). IEEE Computer Society, USA. 297–306. isbn:9780769550008 https://doi.org/10.1109/ICDCS.2013.53 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Maurice Bailleu, Dimitra Giantsidi, Vasilis Gavrielatos, Do Le Quoc, Vijay Nagarajan, and Pramod Bhatotia. 2021. Avocado: A Secure In-Memory Distributed Storage System. In 2021 USENIX Annual Technical Conference (USENIX ATC ’21). USENIX Association, Berkeley, CA, USA. 65–79. isbn:978-1-939133-23-6 https://www.usenix.org/conference/atc21/presentation/bailleu Google ScholarGoogle Scholar
  14. Luiz Barroso, Mike Marty, David Patterson, and Parthasarathy Ranganathan. 2017. Attack of the Killer Microseconds. Commun. ACM, 60, 4 (2017), Mar, 48–54. issn:0001-0782 https://doi.org/10.1145/3015146 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Motti Beck and Michael Kagan. 2011. Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. In Proceedings of the 3rd Workshop on Data Center - Converged and Virtual Ethernet Switching (DC-CaVES ’11). International Teletraffic Congress, San Francisco, CA, USA. 9–15. isbn:9780983628323 https://dl.acm.org/doi/10.5555/2043535.2043537 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Johannes Behl, Tobias Distler, and Rüdiger Kapitza. 2017. Hybrids on Steroids: SGX-Based High Performance BFT. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys ’17). Association for Computing Machinery, New York, NY, USA. 222–237. isbn:9781450349383 https://doi.org/10.1145/3064176.3064213 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Naama Ben-David, Benjamin Y. Chan, and Elaine Shi. 2022. Revisiting the Power of Non-Equivocation in Distributed Protocols. In Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing (PODC ’22). Association for Computing Machinery, New York, NY, USA. 450–459. isbn:9781450392624 https://doi.org/10.1145/3519270.3538427 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Naama Ben-David and Kartik Nayak. 2021. Brief Announcement: Classifying Trusted Hardware via Unidirectional Communication. In Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing (PODC ’21). Association for Computing Machinery, New York, NY, USA. 191–194. isbn:9781450385480 https://doi.org/10.1145/3465084.3467948 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alysson Bessani, João Sousa, and Eduardo E. P. Alchieri. 2014. State Machine Replication for the Masses with BFT-SMART. In Proceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN ’14). IEEE Computer Society, USA. 355–362. isbn:9781479922338 https://doi.org/10.1109/DSN.2014.43 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sol Boucher, Anuj Kalia, David G. Andersen, and Michael Kaminsky. 2018. Putting the "Micro" Back in Microservice. In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC ’18). USENIX Association, Berkeley, CA, USA. 645–650. isbn:9781931971447 https://www.usenix.org/conference/atc18/presentation/boucher Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ethan Buchman, Jae Kwon, and Zarko Milosevic. 2018. The latest gossip on BFT consensus. https://doi.org/10.48550/ARXIV.1807.04938 Google ScholarGoogle Scholar
  22. Christian Cachin, Rachid Guerraoui, and Lus Rodrigues. 2011. Introduction to Reliable and Secure Distributed Programming (2nd ed.). Springer Publishing Company, New York, NY, USA. isbn:3642152597 https://doi.org/10.1007/978-3-642-15260-3 Google ScholarGoogle ScholarCross RefCross Ref
  23. Qingchao Cai, Wentian Guo, Hao Zhang, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Yong Meng Teo, and Sheng Wang. 2018. Efficient Distributed Memory Management with RDMA and Caching. Proc. VLDB Endow., 11, 11 (2018), Jul, 1604–1617. issn:2150-8097 https://doi.org/10.14778/3236187.3236209 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Miguel Castro and Barbara Liskov. 1999. Practical Byzantine Fault Tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI ’99). USENIX Association, Berkeley, CA, USA. 173–186. isbn:1880446391 https://www.usenix.org/legacy/publications/library/proceedings/osdi99/castro.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Byung-Gon Chun, Petros Maniatis, Scott Shenker, and John Kubiatowicz. 2007. Attested Append-Only Memory: Making Adversaries Stick to Their Word. In Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles (SOSP ’07). Association for Computing Machinery, New York, NY, USA. 189–204. isbn:9781595935915 https://doi.org/10.1145/1294261.1294280 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Allen Clement, Flavio Junqueira, Aniket Kate, and Rodrigo Rodrigues. 2012. On the (Limited) Power of Non-Equivocation. In Proceedings of the 2012 ACM Symposium on Principles of Distributed Computing (PODC ’12). Association for Computing Machinery, New York, NY, USA. 301–308. isbn:9781450314503 https://doi.org/10.1145/2332432.2332490 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yann Collet. 2022. xxHash: Extremely fast non-cryptographic hash algorithm. https://github.com/Cyan4973/xxHash Google ScholarGoogle Scholar
  28. Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained. https://eprint.iacr.org/2016/086 Google ScholarGoogle Scholar
  29. Jinhua Cui, Jason Zhijingcheng Yu, Shweta Shinde, Prateek Saxena, and Zhiping Cai. 2021. SmashEx: Smashing SGX Enclaves Using Exceptions. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21). Association for Computing Machinery, New York, NY, USA. 779–793. isbn:9781450384544 https://doi.org/10.1145/3460120.3484821 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ian Cutress. 2019. CXL Specification 1.0 Released: New Industry High-Speed Interconnect From Intel. https://www.anandtech.com/show/14068/cxl-specification-1-released-new-industry-high-speed-interconnect-from-intel Google ScholarGoogle Scholar
  31. George Danezis, Lefteris Kokoris-Kogias, Alberto Sonnino, and Alexander Spiegelman. 2022. Narwhal and Tusk: A DAG-Based Mempool and Efficient BFT Consensus. In Proceedings of the Seventeenth European Conference on Computer Systems (EuroSys ’22). Association for Computing Machinery, New York, NY, USA. 34–50. isbn:9781450391627 https://doi.org/10.1145/3492321.3519594 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Carole Delporte-Gallet, Hugues Fauconnier, Rachid Guerraoui, Vassos Hadzilacos, Petr Kouznetsov, and Sam Toueg. 2004. The Weakest Failure Detectors to Solve Certain Fundamental Problems in Distributed Computing. In Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing (PODC ’04). Association for Computing Machinery, New York, NY, USA. 338–346. isbn:1581138024 https://doi.org/10.1145/1011767.1011818 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Harish Dattatraya Dixit, Laura Boyle, Gautham Vunnam, Sneha Pendharkar, Matt Beadon, and Sriram Sankar. 2022. Detecting silent data corruptions in the wild. https://doi.org/10.48550/ARXIV.2203.08989 Google ScholarGoogle Scholar
  34. Aleksandar Dragojević, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No Compromises: Distributed Transactions with Consistency, Availability, and Performance. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ’15). Association for Computing Machinery, New York, NY, USA. 54–70. isbn:9781450338349 https://doi.org/10.1145/2815400.2815425 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ittay Eyal, Adem Efe Gencer, Emin Gün Sirer, and Robbert Van Renesse. 2016. Bitcoin-NG: A Scalable Blockchain Protocol. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI ’16). USENIX Association, Berkeley, CA, USA. 45–59. isbn:9781931971294 https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/eyal Google ScholarGoogle Scholar
  36. Shufan Fei, Zheng Yan, Wenxiu Ding, and Haomeng Xie. 2021. Security Vulnerabilities of SGX and Countermeasures: A Survey. ACM Comput. Surv., 54, 6 (2021), Article 126, Jul, 36 pages. issn:0360-0300 https://doi.org/10.1145/3456631 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nickolai Zeldovich. 2017. Algorand: Scaling Byzantine Agreements for Cryptocurrencies. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP ’17). Association for Computing Machinery, New York, NY, USA. 51–68. isbn:9781450350853 https://doi.org/10.1145/3132747.3132757 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Guy Golan-Gueta, Ittai Abraham, Shelly Grossman, Dahlia Malkhi, Benny Pinkas, Michael K. Reiter, Dragos-Adrian Seredinschi, Orr Tamir, and Alin Tomescu. 2019. SBFT: A Scalable and Decentralized Trust Infrastructure. In 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN ’19). IEEE Computer Society, USA. 568–580. https://doi.org/10.1109/DSN.2019.00063 Google ScholarGoogle ScholarCross RefCross Ref
  39. Donghyun Gouk, Sangwon Lee, Miryeong Kwon, and Myoungsoo Jung. 2022. Direct Access, High-Performance Memory Disaggregation with DirectCXL. In 2022 USENIX Annual Technical Conference (USENIX ATC ’22). USENIX Association, Berkeley, CA, USA. 287–294. isbn:978-1-939133-29-65 https://www.usenix.org/conference/atc22/presentation/gouk Google ScholarGoogle Scholar
  40. Tomasz Gromadzki and Jan Marian Michalski. 2019. Persistent Memory Replication Over Traditional RDMA Part 1: Understanding Remote Persistent Memory. https://www.intel.com/content/www/us/en/developer/articles/technical/persistent-memory-replication-over-traditional-rdma-part-1-understanding-remote-persistent.html Google ScholarGoogle Scholar
  41. Rachid Guerraoui, Antoine Murat, Javier Picorel, Athanasios Xygkis, Huabing Yan, and Pengfei Zuo. 2022. uKharon: A Membership Service for Microsecond Applications. In 2022 USENIX Annual Technical Conference (USENIX ATC ’22). USENIX Association, Berkeley, CA, USA. 101–120. isbn:978-1-939133-29-24 https://www.usenix.org/conference/atc22/presentation/guerraoui Google ScholarGoogle Scholar
  42. Haryadi S. Gunawi, Riza O. Suminto, Russell Sears, Casey Golliher, Swaminathan Sundararaman, Xing Lin, Tim Emami, Weiguang Sheng, Nematollah Bidokhti, Caitie McCaffrey, Deepthi Srinivasan, Biswaranjan Panda, Andrew Baptist, Gary Grider, Parks M. Fields, Kevin Harms, Robert B. Ross, Andree Jacobson, Robert Ricci, Kirk Webb, Peter Alvaro, H. Birali Runesha, Mingzhe Hao, and Huaicheng Li. 2018. Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems. ACM Trans. Storage, 14, 3 (2018), Article 23, Oct, 26 pages. issn:1553-3077 https://doi.org/10.1145/3242086 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Divya Gupta, Lucas Perronne, and Sara Bouchenak. 2016. BFT-Bench: A Framework to Evaluate BFT Protocols. In Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering (ICPE ’16). Association for Computing Machinery, New York, NY, USA. 109–112. isbn:9781450340809 https://doi.org/10.1145/2851553.2858667 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Peter H. Hochschild, Paul Turner, Jeffrey C. Mogul, Rama Govindaraju, Parthasarathy Ranganathan, David E. Culler, and Amin Vahdat. 2021. Cores That Don’t Count. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS ’21). Association for Computing Machinery, New York, NY, USA. 9–16. isbn:9781450384384 https://doi.org/10.1145/3458336.3465297 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Peng Huang, Chuanxiong Guo, Lidong Zhou, Jacob R. Lorch, Yingnong Dang, Murali Chintalapati, and Randolph Yao. 2017. Gray Failure: The Achilles’ Heel of Cloud-Scale Systems. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS ’17). Association for Computing Machinery, New York, NY, USA. 150–155. isbn:9781450350686 https://doi.org/10.1145/3102980.3103005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-Free Coordination for Internet-Scale Systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC ’10). USENIX Association, Berkeley, CA, USA. 11 pages. https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems Google ScholarGoogle Scholar
  47. Stephen Ibanez, Alex Mallery, Serhat Arslan, Theo Jepsen, Muhammad Shahbaz, Changhoon Kim, and Nick McKeown. 2021. The nanoPU: A Nanosecond Network Stack for Datacenters. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’21). USENIX Association, Virtual Event. 239–256. isbn:978-1-939133-22-9 https://www.usenix.org/conference/osdi21/presentation/ibanez Google ScholarGoogle Scholar
  48. Danga Interactive. 2022. Memcached. https://memcached.org/ Google ScholarGoogle Scholar
  49. Hai Jin, Rajkumar Buyya, and Toni Cortes. 2002. An Introduction to the InfiniBand Architecture. In High Performance Mass Storage and Parallel I/O: Technologies and Applications (1st ed.). John Wiley and Sons, Inc., Hoboken, NJ, USA. 616–632. https://doi.org/10.1109/9780470544839 Google ScholarGoogle ScholarCross RefCross Ref
  50. Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA Efficiently for Key-Value Services. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM ’14). Association for Computing Machinery, New York, NY, USA. 295–306. isbn:9781450328364 https://doi.org/10.1145/2619239.2626299 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Rüdiger Kapitza, Johannes Behl, Christian Cachin, Tobias Distler, Simon Kuhnle, Seyed Vahid Mohammadi, Wolfgang Schröder-Preikschat, and Klaus Stengel. 2012. CheapBFT: Resource-Efficient Byzantine Fault Tolerance. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys ’12). Association for Computing Machinery, New York, NY, USA. 295–308. isbn:9781450312233 https://doi.org/10.1145/2168836.2168866 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Eleftherios Kokoris-Kogias, Philipp Jovanovic, Nicolas Gailly, Ismail Khoffi, Linus Gasser, and Bryan Ford. 2016. Enhancing Bitcoin Security and Performance with Strong Consistency via Collective Signing. In Proceedings of the 25th USENIX Conference on Security Symposium (SEC ’16). USENIX Association, Berkeley, CA, USA. 279–296. isbn:9781931971324 https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/kogias Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. 2010. Zyzzyva: Speculative Byzantine Fault Tolerance. ACM Trans. Comput. Syst., 27, 4 (2010), Article 7, Jan, 39 pages. issn:0734-2071 https://doi.org/10.1145/1658357.1658358 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Leslie Lamport, Robert Shostak, and Marshall Pease. 1982. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst., 4, 3 (1982), Jul, 382–401. issn:0164-0925 https://doi.org/10.1145/357172.357176 Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Seung-seob Lee, Yanpeng Yu, Yupeng Tang, Anurag Khandelwal, Lin Zhong, and Abhishek Bhattacharjee. 2021. MIND: In-Network Memory Management for Disaggregated Data Centers. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP ’21). Association for Computing Machinery, New York, NY, USA. 488–504. isbn:9781450387095 https://doi.org/10.1145/3477132.3483561 Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Youngmoon Lee, Hasan Al Maruf, Mosharaf Chowdhury, Asaf Cidon, and Kang G. Shin. 2019. Mitigating the Performance-Efficiency Tradeoff in Resilient Memory Disaggregation. https://doi.org/10.48550/ARXIV.1910.09727 Google ScholarGoogle Scholar
  57. Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, and Haryadi S. Gunawi. 2016. TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’16). Association for Computing Machinery, New York, NY, USA. 517–530. isbn:9781450340915 https://doi.org/10.1145/2872362.2872374 Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and Michael Walfish. 2011. Detecting Failures in Distributed Systems with the FALCON Spy Network. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP ’11). Association for Computing Machinery, New York, NY, USA. 279–294. isbn:9781450309776 https://doi.org/10.1145/2043556.2043583 Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Dave Levin, John R. Douceur, Jacob R. Lorch, and Thomas Moscibroda. 2009. TrInc: Small Trusted Hardware for Large Distributed Systems. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’09). USENIX Association, Berkeley, CA, USA. 14 pages. https://www.usenix.org/conference/nsdi-09/trinc-small-trusted-hardware-large-distributed-systems Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Jian Liu, Wenting Li, Ghassan O. Karame, and N. Asokan. 2019. Scalable Byzantine Consensus via Hardware-Assisted Secret Sharing. IEEE Trans. Comput., 68, 1 (2019), Jan, 139–151. issn:0018-9340 https://doi.org/10.1109/TC.2018.2860009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Isis Agora Lovecruft, Henry De Valence, and Aumasson. 2022. ed25519-dalek: Fast and efficient Rust implementation of ed25519 key generation, signing, and verification in Rust. https://github.com/dalek-cryptography/ed25519-dalek Google ScholarGoogle Scholar
  62. Mads Frederik Madsen and Søren Debois. 2020. On the Subject of Non-Equivocation: Defining Non-Equivocation in Synchronous Agreement Systems. In Proceedings of the 39th Symposium on Principles of Distributed Computing (PODC ’20). Association for Computing Machinery, New York, NY, USA. 159–168. isbn:9781450375825 https://doi.org/10.1145/3382734.3405731 Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yanhua Mao, Flavio P. Junqueira, and Keith Marzullo. 2008. Mencius: Building Efficient Replicated State Machines for WANs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI ’08). USENIX Association, Berkeley, CA, USA. 369–384. https://www.usenix.org/conference/osdi-08/mencius-building-efficient-replicated-state-machines-wans Google ScholarGoogle Scholar
  64. Jean-Philippe Martin and Lorenzo Alvisi. 2006. Fast Byzantine Consensus. IEEE Trans. Dependable Secur. Comput., 3, 3 (2006), Jul, 202–215. issn:1545-5971 https://doi.org/10.1109/TDSC.2006.35 Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Ines Messadi, Markus Horst Becker, Kai Bleeke, Leander Jehl, Sonia Ben Mokhtar, and Rüdiger Kapitza. 2022. SplitBFT: Improving Byzantine Fault Tolerance Safety Using Trusted Compartments. In Proceedings of the 23rd ACM/IFIP International Middleware Conference (Middleware ’22). Association for Computing Machinery, New York, NY, USA. 56–68. isbn:9781450393409 https://doi.org/10.1145/3528535.3531516 Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Justin Meza, Qiang Wu, Sanjev Kumar, and Onur Mutlu. 2015. A Large-Scale Study of Flash Memory Failures in the Field. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’15). Association for Computing Machinery, New York, NY, USA. 177–190. isbn:9781450334860 https://doi.org/10.1145/2745844.2745848 Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Justin Meza, Tianyin Xu, Kaushik Veeraraghavan, and Onur Mutlu. 2018. A Large Scale Study of Data Center Network Reliability. In Proceedings of the Internet Measurement Conference 2018 (IMC ’18). Association for Computing Machinery, New York, NY, USA. 393–407. isbn:9781450356190 https://doi.org/10.1145/3278532.3278566 Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Andrew Miller, Yu Xia, Kyle Croman, Elaine Shi, and Dawn Song. 2016. The Honey Badger of BFT Protocols. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). Association for Computing Machinery, New York, NY, USA. 31–42. isbn:9781450341394 https://doi.org/10.1145/2976749.2978399 Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (USENIX ATC ’13). USENIX Association, Berkeley, CA, USA. 103–114. https://www.usenix.org/conference/atc13/technical-sessions/presentation/mitchell Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is More Consensus in Egalitarian Parliaments. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ’13). Association for Computing Machinery, New York, NY, USA. 358–372. isbn:9781450323888 https://doi.org/10.1145/2517349.2517350 Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Satoshi Nakamoto. 2008. Bitcoin: A Peer-to-Peer Electronic Cash System. https://bitcoin.org/bitcoin.pdf Google ScholarGoogle Scholar
  72. Iyswarya Narayanan, Di Wang, Myeongjae Jeon, Bikash Sharma, Laura Caulfield, Anand Sivasubramaniam, Ben Cutler, Jie Liu, Badriddine Khessib, and Kushagra Vaid. 2016. SSD Failures in Datacenters: What, When and Why? In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science (SIGMETRICS ’16). Association for Computing Machinery, New York, NY, USA. 407–408. isbn:9781450342667 https://doi.org/10.1145/2896377.2901489 Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Eric Newhuis. 2022. Liquibook: Open source order matching engine. https://github.com/enewhuis/liquibook Google ScholarGoogle Scholar
  74. Jack O’Connor, Jean-Philippe Aumasson, Samuel Neves, and Zooko Wilcox-O’Hearn. 2022. BLAKE3. https://github.com/BLAKE3-team/BLAKE3 Google ScholarGoogle Scholar
  75. David Oppenheimer, Archana Ganapathi, and David A. Patterson. 2003. Why Do Internet Services Fail, and What Can Be Done about It? In Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems - Volume 4 (USITS ’03). USENIX Association, Berkeley, CA, USA. 15 pages. https://www.usenix.org/conference/usits-03/why-do-internet-services-fail-and-what-can-be-done-about-it Google ScholarGoogle Scholar
  76. Red Hat. 2020. RHEL for Real Time Timestamping. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/chap-timestamping Google ScholarGoogle Scholar
  77. Benjamin Rothenberger, Konstantin Taranov, Adrian Perrig, and Torsten Hoefler. 2021. ReDMArk: Bypassing RDMA Security Mechanisms. In 30th USENIX Security Symposium (USENIX Security ’21). USENIX Association, Berkeley, CA, USA. 4277–4292. isbn:978-1-939133-24-3 https://www.usenix.org/conference/usenixsecurity21/presentation/rothenberger Google ScholarGoogle Scholar
  78. Signe Rüsch, Ines Messadi, and Rüdiger Kapitza. 2018. Towards Low-Latency Byzantine Agreement Protocols Using RDMA. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE Computer Society, Los Alamitos, CA, USA. 146–151. issn:2325-6664 https://doi.org/10.1109/DSN-W.2018.00054 Google ScholarGoogle ScholarCross RefCross Ref
  79. J. H. Saltzer, D. P. Reed, and D. D. Clark. 1984. End-to-End Arguments in System Design. ACM Trans. Comput. Syst., 2, 4 (1984), Nov, 277–288. issn:0734-2071 https://doi.org/10.1145/357401.357402 Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Salvatore Sanfilippo. 2022. Redis. https://github.com/redis/redis Google ScholarGoogle Scholar
  81. David Schneider. 2012. The microsecond market. IEEE Spectrum, 49, 6 (2012), 66–81. https://doi.org/10.1109/MSPEC.2012.6203974 Google ScholarGoogle ScholarCross RefCross Ref
  82. Omid Shahmirzadi, Sergio Mena, and Andre Schiper. 2009. Relaxed Atomic Broadcast: State-Machine Replication Using Bounded Memory. In Proceedings of the 2009 28th IEEE International Symposium on Reliable Distributed Systems (SRDS ’09). IEEE Computer Society, Niagara Falls, NY, USA. 3–11. isbn:9780769538266 https://doi.org/10.1109/SRDS.2009.25 Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Alan Shieh, Srikanth Kandula, Albert Greenberg, Changhoon Kim, and Bikas Saha. 2011. Sharing the Data Center Network. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI ’11). USENIX Association, Berkeley, CA, USA. 309–322. https://www.usenix.org/conference/nsdi11/sharing-data-center-network Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Konstantin Taranov, Benjamin Rothenberger, Adrian Perrig, and Torsten Hoefler. 2020. sRDMA: Efficient NIC-Based Authentication and Encryption for Remote Direct Memory Access. In Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC ’20). USENIX Association, Berkeley, CA, USA. Article 47, 14 pages. isbn:978-1-939133-14-4 https://www.usenix.org/conference/atc20/presentation/taranov Google ScholarGoogle Scholar
  85. Mellanox Technologies. 2015. RDMA Aware Networks Programming User Manual. Rev 1.7. https://docs.nvidia.com/networking/spaces/viewspace.action?key=RDMAAwareProgrammingv17 Google ScholarGoogle Scholar
  86. Mellanox Technologies. 2022. VMA: Linux user space library for network socket acceleration based on RDMA compatible network adaptors. https://github.com/Mellanox/libvma Google ScholarGoogle Scholar
  87. Shin-Yeh Tsai, Yizhou Shan, and Yiying Zhang. 2020. Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores. In Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC ’20). USENIX Association, Berkeley, CA, USA. Article 3, 16 pages. isbn:978-1-939133-14-4 https://www.usenix.org/conference/atc20/presentation/tsai Google ScholarGoogle Scholar
  88. Giuliana Santos Veronese, Miguel Correia, Alysson Neves Bessani, Lau Cheuk Lung, and Paulo Verissimo. 2013. Efficient Byzantine Fault-Tolerance. IEEE Trans. Comput., 62, 1 (2013), Jan, 16–30. issn:0018-9340 https://doi.org/10.1109/TC.2011.221 Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Ruihong Wang, Jianguo Wang, Stratos Idreos, M. Tamer Özsu, and Walid G. Aref. 2022. The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation. https://doi.org/10.48550/ARXIV.2207.03027 Google ScholarGoogle Scholar
  90. Tian Yang, Robert Gifford, Andreas Haeberlen, and Linh Thi Xuan Phan. 2019. The Synchronous Data Center. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS ’19). Association for Computing Machinery, New York, NY, USA. 142–148. isbn:9781450367271 https://doi.org/10.1145/3317550.3321442 Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Jian Yin, Jean-Philippe Martin, Arun Venkataramani, Lorenzo Alvisi, and Mike Dahlin. 2003. Separating Agreement from Execution for Byzantine Fault Tolerant Services. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP ’03). Association for Computing Machinery, New York, NY, USA. 253–267. isbn:1581137575 https://doi.org/10.1145/945445.945470 Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan Gueta, and Ittai Abraham. 2019. HotStuff: BFT Consensus with Linearity and Responsiveness. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing (PODC ’19). Association for Computing Machinery, New York, NY, USA. 347–356. isbn:9781450362177 https://doi.org/10.1145/3293611.3331591 Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Yang Zhou, Hassan M. G. Wassel, Sihang Liu, Jiaqi Gao, James Mickens, Minlan Yu, Chris Kennelly, Paul Turner, David E. Culler, Henry M. Levy, and Amin Vahdat. 2022. Carbink: Fault-Tolerant Far Memory. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’22). USENIX Association, Berkeley, CA, USA. 55–71. isbn:978-1-939133-28-1 https://www.usenix.org/conference/osdi22/presentation/zhou-yang Google ScholarGoogle Scholar
  94. Danyang Zhuo, Qiao Zhang, Dan R. K. Ports, Arvind Krishnamurthy, and Thomas Anderson. 2014. Machine Fault Tolerance for Reliable Datacenter Systems. In Proceedings of 5th Asia-Pacific Workshop on Systems (APSys ’14). Association for Computing Machinery, New York, NY, USA. Article 3, 7 pages. isbn:9781450330244 https://doi.org/10.1145/2637166.2637235 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. uBFT: Microsecond-Scale BFT using Disaggregated Memory

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader