ABSTRACT
With the advent of trusted execution environments provided by recent general purpose processors, a class of replication protocols has become more attractive than ever: Protocols based on a hybrid fault model are able to tolerate arbitrary faults yet reduce the costs significantly compared to their traditional Byzantine relatives by employing a small subsystem trusted to only fail by crashing. Unfortunately, existing proposals have their own price: We are not aware of any hybrid protocol that is backed by a comprehensive formal specification, complicating the reasoning about correctness and implications. Moreover, current protocols of that class have to be performed largely sequentially. Hence, they are not well-prepared for just the modern multi-core processors that bring their very own fault model to a broad audience. In this paper, we present Hybster, a new hybrid state-machine replication protocol that is highly parallelizable and specified formally. With over 1 million operations per second using only four cores, the evaluation of our Intel SGX-based prototype implementation shows that Hybster makes hybrid state-machine replication a viable option even for today's very demanding critical services.
References
- http://www.businessinsider.com/amazons-cloud-can-handle-1-million-transactions-per-second-2012-4.Google Scholar
- https://gigaom.com/2011/12/06/facebook-shares-some-secrets-on-making-mysql-scale.Google Scholar
- M. Abd-El-Malek, G. R. Ganger, G. R. Goodson, M. K. Reiter, and J. J. Wylie. Fault-scalable Byzantine fault-tolerant services. In Proceedings of the 20th Symposium on Operating Systems Principles (SOSP '05), pages 59--74, 2005. Google Scholar
Digital Library
- Y. Amir, B. Coan, J. Kirsch, and J. Lane. Byzantine replication under attack. In Proceedings of the 38th International Conference on Dependable Systems and Networks (DSN '08), pages 197--206, 2008. Google Scholar
Cross Ref
- ARM. Security technology building a secure system using TrustZone technology (white paper). ARM Limited, 2009.Google Scholar
- P.-L. Aublin, S. B. Mokhtar, and V. Quéma. RBFT: Redundant Byzantine fault tolerance. In Proceedings of the 33rd International Conference on Distributed Computing Systems (ICDCS '13), pages 297--306, 2013. Google Scholar
Digital Library
- J. Aumasson and L. Merino. SGX Secure Enclaves in Practice - Security and Crypto Review. https://www.blackhat.com/docs/us-16/materials/us-16-Aumasson-SGX-Secure-Enclaves-In-Practice-Security-And-Crypto-Review.pdf, 2016.Google Scholar
- J. Behl, T. Distler, and R. Kapitza. Hybster --- A highly parallelizable protocol for hybrid fault-tolerant service replication. http: //publikationsserver.tu-braunschweig.de/get/64440.Google Scholar
- J. Behl, T. Distler, and R. Kapitza. Consensus-oriented parallelization: How to earn your first million. In Proceedings of the 16th Middleware Conference (Middleware '15), pages 173--184, 2015. Google Scholar
Digital Library
- A. Bessani, J. Sousa, and E. Alchieri. State machine replication for the masses with BFT-SMaRt. In Proceedings of the 44th International Conference on Dependable Systems and Networks (DSN '14), pages 355--362, 2014. Google Scholar
Digital Library
- M.Castro. Practical Byzantine Fault-Tolerance. PhD thesis, MIT, 2001.Google Scholar
- M. Castro and B. Liskov. A correctness proof for a practical Byzantine-fault-tolerant replication algorithm. Technical report, Cambridge, MA, USA, 1999.Google Scholar
- M. Castro and B. Liskov. Practical Byzantine fault tolerance. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI '99), pages 173--186, 1999.Google Scholar
Digital Library
- M. Castro, R. Rodrigues, and B. Liskov. BASE: Using abstraction to improve fault tolerance. ACM Transactions on Computer Systems, 21(3):236--269, 2003. Google Scholar
Digital Library
- B.-G. Chun, P. Maniatis, S. Shenker, and J. Kubiatowicz. Attested append-only memory: Making adversaries stick to their word. In Proceedings of 21st Symposium on Operating Systems Principles (SOSP '07), pages 189--204, 2007. Google Scholar
Digital Library
- A. Clement, E. Wong, L. Alvisi, M. Dahlin, and M. Marchetti. Making Byzantine fault tolerant systems tolerate Byzantine faults. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI '09), pages 153--168, 2009.Google Scholar
Digital Library
- M. Correia, N. F. Neves, L. C. Lung, and P. Veríssimo. Worm-IT -- A wormhole-based intrusion-tolerant group communication system. Journal of Systems and Software, 80(2):178--197, 2007. Google Scholar
Digital Library
- M. Correia, N. F. Neves, and P. Veríssimo. How to tolerate half less one Byzantine nodes in practical distributed systems. In Proceedings of the 23rd Symposium on Reliable Distributed Systems (SRDS '04), pages 174--183, 2004. Google Scholar
Cross Ref
- J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06), pages 177--190, 2006.Google Scholar
- T. Distler, C. Cachin, and R. Kapitza. Resource-efficient Byzantine fault tolerance. IEEE Transactions on Computers, 65(9):2807--2819, 2016. Google Scholar
Digital Library
- T. Distler and R. Kapitza. Increasing performance in Byzantine fault-tolerant systems with on-demand replica consistency. In Proceedings of the 6th European Conference on Computer Systems (EuroSys '11), pages 91--105, 2011. Google Scholar
Digital Library
- T. Distler, R. Kapitza, I. Popov, H. P. Reiser, and W. Schröder-Preikschat. SPARE: Replicas on hold. In Proceedings of the 18th Network and Distributed System Security Symposium (NDSS '11), pages 407--420, 2011.Google Scholar
- M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32:374--382, Apr. 1985. Google Scholar
Digital Library
- R. Guerraoui, N. Knežević, V. Quéma, and M. Vukolić. The next 700 BFT protocols. In Proceedings of the 5th European Conference on Computer Systems (EuroSys '10), 2010. Google Scholar
Digital Library
- Z. Guo, C. Hong, M. Yang, D. Zhou, L. Zhou, and L. Zhuang. Rex: Replication at the speed of multi-core. In Proceedings of the 9th European Conference on Computer Systems (EuroSys '14), 2014. Google Scholar
Digital Library
- J. Hendricks, S. Sinnamohideen, G. R. Ganger, and M. K. Reiter. Zzyzx: Scalable fault tolerance through Byzantine locking. In Proceedings of the 40th International Conference on Dependable Systems and Networks (DSN '10), pages 363--372, 2010. Google Scholar
Cross Ref
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proceedings of the 2010 USENIX Annual Technical Conference (ATC '10), pages 145--158, 2010.Google Scholar
Digital Library
- R. Kapitza, J. Behl, C. Cachin, T. Distler, S. Kuhnle, S. V. Mohammadi, W. Schröder-Preikschat, and K. Stengel. Cheap-BFT: Resource-efficient Byzantine fault tolerance. In Proceedings of the 7th European Conference on Computer Systems (EuroSys '12), pages 295--308, 2012.Google Scholar
Digital Library
- M. Kapritsos, Y. Wang, V. Quéma, A. Clement, L. Alvisi, and M. Dahlin. All about Eve: Execute-verify replication for multi-core servers. In Proceedings of the 10th Symposium on Operating Systems Design and Implementation (OSDI '12), pages 237--250, 2012.Google Scholar
- R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: Speculative Byzantine fault tolerance. In Proceedings of the 21st Symposium on Operating Systems Principles (SOSP '07), pages 45--58, 2007. Google Scholar
Digital Library
- R. Kotla and M. Dahlin. High throughput Byzantine fault tolerance. In Proceedings of the 34th International Conference on Dependable Systems and Networks (DSN '04), pages 575--584, 2004. Google Scholar
Cross Ref
- D. Levin, J. R. Douceur, J. R. Lorch, and T. Moscibroda. TrInc: Small trusted hardware for large distributed systems. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI '09), 2009.Google Scholar
- P. J. Marandi, C. E. Bezerra, and F. Pedone. Rethinking state-machine replication for parallelism. In Proceedings of the 34th International Conference on Distributed Computing Systems (ICDCS '14), pages 368--377, 2014. Google Scholar
Digital Library
- F. McKeen, I. Alexandrovich, A. Berenzon, C. V. Rozas, H. Shafi, V. Shanbhogue, and U. R. Savagaonkar. Innovative instructions and software model for isolated execution. In Proceedings of the 2nd Workshop on Hardware and Architectural Support for Security and Privacy (HASP '13), 2013. Google Scholar
Digital Library
- A. Miller, Y. Xia, K. Croman, E. Shi, and D. Song. The honey badger of BFT protocols. In Proceedings of the 2016 Conference on Computer and Communications Security (CCS 16), pages 31--42, 2016. Google Scholar
Digital Library
- M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. Journal of the ACM, 27(2):228--234, 1980. Google Scholar
Digital Library
- H. P. Reiser and R. Kapitza. Hypervisor-based efficient proactive recovery. In Proceedings of the 26th Symposium on Reliable Distributed Systems (SRDS '07), pages 83--92, 2007. Google Scholar
Cross Ref
- J. M. Rushby. Design and verification of secure systems. In Proceedings of the 8th Symposium on Operating Systems Principles (SOSP '81), pages 12--21, 1981. Google Scholar
Digital Library
- N. Santos and A. Schiper. Achieving high-throughput state machine replication in multi-core systems. In Proceedings of the 33rd International Conference on Distributed Computing Systems (ICDCS '13), pages 266--275, 2013. Google Scholar
Digital Library
- F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299--319, 1990. Google Scholar
Digital Library
- J. Sousa and A. Bessani. From Byzantine consensus to BFT state machine replication: A latency-optimal transformation. In Proceedings of the 9th European Dependable Computing Conference (EDCC '12), pages 37--48, 2012. Google Scholar
Digital Library
- R. van Renesse, C. Ho, and N. Schiper. Byzantine chain replication. In Principles of Distributed Systems, pages 345--359. Springer, 2012. Google Scholar
Cross Ref
- G. S. Veronese, M. Correia, A. Bessani, and L. C. Lung. Spin one's wheels? Byzantine fault tolerance with a spinning primary. In Proceedings of the 28th Symposium on Reliable Distributed Systems (SRDS '09), pages 135--144, 2009. Google Scholar
Digital Library
- G. S. Veronese, M. Correia, A. N. Bessani, and L. C. Lung. EBAWA: Efficient Byzantine agreement for wide-area networks. In Proceedings of the 12th Symposium on High-Assurance Systems Engineering (HASE '10), pages 10--19, 2010. Google Scholar
Digital Library
- G. S. Veronese, M. Correia, A. N. Bessani, L. C. Lung, and P. Veríssimo. Efficient Byzantine fault-tolerance. IEEE Transactions on Computers, 62(1):16--30, 2013. Google Scholar
Digital Library
- M. Vukolić. The quest for scalable blockchain fabric: Proof-of-work vs. BFT replication. In IFIP WG 11.4 Workshop on Open Research Problems in Network Security (iNetSec '15), pages 112--125, 2015.Google Scholar
- T. Wood, R. Singh, A. Venkataramani, P. Shenoy, and E. Cecchet. ZZ and the art of practical BFT execution. In Proceedings of the 6th European Conference on Computer Systems (EuroSys '11), pages 123--138, 2011. Google Scholar
Digital Library
- J. Yin, J.-P. Martin, A. Venkataramani, L. Alvisi, and M. Dahlin. Separating agreement from execution for Byzantine fault tolerant services. In Proceedings of the 19th Symposium on Operating Systems Principles (SOSP '03), pages 253--267, 2003. Google Scholar
Digital Library
Index Terms
Hybrids on Steroids: SGX-Based High Performance BFT





Comments