Abstract
Scaling Byzantine Fault Tolerant (BFT) systems in terms of membership is important for secure applications with large participation such as blockchains. While traditional protocols have low latency, they cannot handle many processors. Conversely, blockchains often have hundreds to thousands of processors to increase robustness, but they typically have high latency or energy costs.
We describe various sources of unscalability in BFT consensus protocols. To improve performance, many BFT protocols optimize the “normal case,” where there are no failures. This can be done in a modular fashion by wrapping existing BFT protocols with a building block that we call alliance. In normal case executions, alliance can scalably determine if the initial conditions of a BFT consensus protocol predetermine the outcome, obviating running the consensus protocol.
We give examples of existing protocols that solve alliance. We show that a solution based on hypercubes and
1 INTRODUCTION
The availability of tamperproof logs or “blockchains” is driving interest in secure applications that require a trusted database shared by consortiums of mutually distrusting companies. For example, the international real estate market has been struggling with fraud due to the lack of a trusted shared ledger [27, 36]. Using blockchain technology in the food industry may allow consumers to verify the source of products while health inspectors can quickly track down the source of contaminations [1, 64]. There is also much interest in blockchains from the financial technology industry [2, 69]. These consortiums often consist of hundreds of stakeholders, including a plurality of banks, insurance companies, vendors, and government agencies.
Public (aka permissionless) blockchains, like Nakamoto’s blockchain [54] that is used for Bitcoin, can easily scale to thousands of participants. However, they can suffer from various disadvantages that make them poorly matched with scalable applications such as the ones listed above. These disadvantages include low throughput and lack of confidentiality. Fortunately, various more recent designs address low throughput using out-of-band techniques such as sharding and off-chain processing [33, 41, 42, 49, 58], while cryptographic technology based on zero knowledge proofs can add confidentiality to public blockchains [20, 61], although at the expense of throughput. Another serious problem remains: public blockchains rely on diversity in the membership, but studies suggest that they suffer from centralization [28]. Most consortiums therefore opt for private (permissioned) blockchain technology in which they have tight control over the membership in the blockchain and can support confidentiality at low overhead. Asynchronous Byzantine consensus protocols are a logical choice for such consortiums [6, 60]. Unfortunately, the throughput of Byzantine consensus protocols does not naturally scale well with the number of participants due to the all-to-all nature of their communication requirements.
This article addresses membership scalability in asynchronous Byzantine consensus protocols. To maintain good overall throughput, we focus on so-called normal case scenarios, when communication is timely and participants are behaving correctly. While many previous consensus protocols follow similar approaches to optimizing normal case execution [3, 8, 43, 44], they do so to decrease latency. In this article, we focus on optimizing normal case execution to scale with the size of the membership. Our work does not address worst-case performance, so communication or participant failures can lead to temporary performance hiccups that grow worse with the size of the membership. Nonetheless, we propose strategies to reduce or amortize the latency cost of our protocols with such executions.
We take a modular approach that can work with a black box implementation of a Byzantine consensus protocol. Similar to References [8, 45], we isolate the protocol that participants run during normal case execution, from the complete consensus protocol they fall back to under failures. This protocol determines if all processors have the same proposal. We call this alliance. We give examples of existing allicance protocols [8, 43, 45]. A Byzantine consensus protocol can be combined with an alliance protocol that avoids the overhead of running the Byzantine consensus protocol in normal case scenarios. In executions where failures do occur, the hybrid protocols still operate correctly with modest additional overhead compared to just running the Byzantine consensus protocol.
The focus of this article is to show that alliance can be solved in a scalable manner without significant additional overhead under failures. We first investigate and compare various possible protocols for solving alliance. Inspired by step-count optimality results for broadcast patterns [32], we devise an efficient alliance protocol by organizing processors on a hypercube. The protocol uses only Message Authentication Codes (
The contributions of this article are as follows:
— | We present various alliance protocols based on communication schemes that appear in literature and how alliance protocols can be modularly combined with existing asynchronous Byzantine consensus protocols. | ||||
— | We propose MAC Hypercube Alliance Protocol ( | ||||
— | We describe various performance optimizations for | ||||
— | We evaluate alliance protocols through network simulations and show that | ||||
The remainder of the article is organized as follows. In Section 2, we discuss prior work on scaling permissionless and permissioned consensus protocols. In Section 3, we describe our system model, scalability metrics, and cryptographic primitives. We also describe the alliance problem and show how it can be integrated with Byzantine Fault Tolerant (BFT) consensus protocols. In Section 4, we consider different ways of implementing alliance. We give our
2 BACKGROUND AND RELATED WORK
A plethora of consensus protocols have been designed over the past four decades. Table 1 summarizes prominent Byzantine consensus protocols in chronological order. They make different timing assumptions for providing safety. With respect to liveness, they make different assumptions about time, but may also give different guarantees (such as probabilistic termination). The table also describes the communication patterns that protocols use in normal cases executions.
| Protocol | Safety With | Liveness | Normal case Communication |
|---|---|---|---|
| [l]Dolev-Strong [26] | Synchrony | w/Synchrony | All-to-all |
| [l]Byzantine Generals [47] | Synchrony | w/Synchrony | All-to-all |
| Ben-Or [12] | Asynchrony | Randomized | All-to-all |
| Bracha-Toueg [15] | Asynchrony | Randomized | All-to-all |
| PBFT [19] | Asynchrony | w/Weak Synchrony | All-to-all |
| Q/U [3] | Asynchrony | N/A | One-to-many |
| Zyzzyva [43] | Asynchrony | w/Weak Synchrony | One-to-all |
| Bosco [66] | Asynchrony | Randomized | All-to-all |
| Bitcoin [54] | Synchrony | w/Synchrony | Gossip |
| Aardvark [22] | Asynchrony | w/Weak Synchrony | All-to-all |
| BFT-Smart [14] | Asynchrony | w/Weak Synchrony | All-to-all |
| Ethereum [71] | Synchrony | w/Synchrony | Gossip |
| XPaxos [48] | Synchrony | w/Synchrony | All-to-all |
| HoneyBadgerBFT [53] | Asynchrony | Randomized | All-to-all |
| Byzcoin [41] | Synchrony | w/Synchrony | Tree-based |
| Algorand [33] | Synchrony | w/Synchrony | Gossip |
| Omniledger [42] | Synchrony | w/Synchrony | Gossip |
| Thunderella [58] | Synchrony | w/Synchrony | One-to-many |
| HotStuff [72] | Asynchrony | w/Weak Synchrony | One-to-all |
| Monoxide [70] | Synchrony | w/Synchrony | Gossip |
Table 1. Comparison of Byzantine Consensus Protocols in Chronological Order
In this section, we look at various design decisions that affect their scalability.
2.1 Optimizing for the “Normal Case”
Many permissioned consensus protocols optimize so-called “normal case executions,” that is, executions that exhibit no failures and little or no contention.
In a crash-failure setting, one example is Mencius [50], where multiple consensus instances can be run concurrently by different leaders. This allows for better network-utilization and throughput in the case where requests are well-balanced through servers.
In BFT settings, Zyzzyva [43] uses speculative updates and provides a fast response to clients in a failure-free execution. Q/U [3] allows single round updates and queries in the normal case by optimistically executing commands and enabling object versioning. In case of inconsistency, a client needs to set a barrier and write-back the object in question. Thunderella [58] works with permissionless membership. It uses a committee with a leader for the normal case and falls back to a conventional blockchain when the leader is faulty.
Under failures, many permissioned consensus protocols fall back to so-called view-change or recovery protocols (e.g., Reference [8]). Often, throughput of such protocols can be decreased significantly through certain attacks by a faulty client, primary, or network adversary [22, 53]. Aardvark [22] is a consensus protocol that is designed to be robust to attacks in which the primary is faulty. However, this comes at the expense of peak performance. HoneyBadgerBFT [53] is robust under full asynchrony.
Most consensus protocols require at least two communication steps to reach agreement [66]. However, there exists protocols that can reach agreement in one communication step in normal case executions. Brasileiro et al. gives one such protocol for the crash failure model when all correct participants propose the same value [16]. Song and Van Renesse give theoretical upper bounds on the fraction of Byzantine members a Byzantine one-step consensus protocol can tolerate and present a corresponding protocol [66].
2.2 Scaling Communication
Practical BFT (PBFT) [19] is perhaps the most well-known consensus protocol for asynchronous settings. PBFT uses an all-to-all communication pattern, however, all-to-all communication does not scale well. Fortunately, in failure-free cases, we can achieve all-to-all information exchange without using all-to-all communication by utilizing different communication patterns. We distinguish deterministic and randomized patterns.
In deterministic communication patterns, a processor communicates with only a certain subset of its peers—its neighbors. Patterns that are commonly used include one-to-all [43, 72], tree-based [25, 68], and ring-based [37] communication. We discuss these communication patterns, their trade-offs, and limitations in Section 4.
Another way to scale communication is using randomized communication or gossip. In Algorand [33], each processor gossips its messages to a small random subset of peers and message integrity is checked through digital signatures. Snowflake [59] illustrates a leaderless design in which each processor collects messages from a random sample of its peers and adjusts estimates based on the distribution of values in these samples.
There has been extensive work on the existence of time, step-count, and message-count optimal broadcast algorithms based on hypercubes. These algorithms cover a range of information exchange archetypes including all-to-all, one-to-all, and one-to-one communication [13, 32, 63, 67]. Hypercube-based communication has also been studied in various settings including broadcasting in clusters [30], message passing multiprocessor networks [23], matrix operations [67], and parallel database processing [10].
2.3 “Permissionless” Protocols
A different approach to scaling is using permissionless protocols such as Bitcoin’s “Nakamoto blockchain.” Simplified, such protocols have a set of anonymous miners solve cryptopuzzles—the first to solve the current acryptopuzzle can propose the next block. Permissionless protocols scale well in the number of members and they are robust to a greater fraction exhibiting Byzantine behavior. However, they assume network synchrony [57] and thus have to make worst-case latency assumptions, and many are highly inefficient and suffer from high latency and low throughput.
2.4 Committee-based Protocols
In a committee-based approach, the processors first elect a committee of modest size. The committee then makes decisions on behalf of the entire membership. Periodically, or after failures, a new committee is elected by the membership. Adaptive online adversaries [11] can focus their resources on attacking just the committee. To prevent this, committee election is randomized and the outcome unpredictable.
For example, in Byzcoin [41] and Thunderella [58], a committee is formed from the miners that recently mined blocks in an underlying permissionless blockchain. Such protocols can only defend against oblivious adversaries [11], as the committee at any time is known. In contrast, in Algorand [33] committees are selected and changed using Verifiable Random Functions [52]. Continuously electing a new committee could eventually result in a committee with a relatively high fraction of Byzantine members.
2.5 Sharding-based Protocols
Sharding is a common approach in databases and key-value stores to “scale out” for improved capacity and throughput. It can also be applied to blockchains [4, 24, 42, 49]. A problem with sharding is that operations that involve objects on multiple shards often require transactional semantics. Still, Omniledger [42] has demonstrated that—in workloads with high locality—throughput can scale approximately linearly with the number of processors. Omniledger periodically creates new shards by random selection from all processors, so, like committees, some shards may have a relatively high fraction of Byzantine members. Also, if locality is poor, cross-shard transactions can negatively impact overall performance.
Monoxide [70] takes a different approach to scaling permissionless blockchains. Instead of improving latency by integrating permissioned protocols within shards, Monoxide focuses entirely on scaling out the computational power. Processors, accounts, and transactions are partitioned into shards based on their identifications or addresses, and each shard works with its own isolated sub-blockchain. Within each shard, processors mine blocks similar to Nakamoto’s blockchain. Cross-shard transactions are split into multiple sub-transactions that are handled within their corresponding shard. Monoxide achieves higher throughput as the number of processors and shards increases. However, it suffers from high latency just as Nakamoto’s blockchain.
2.6 Modular Consensus
There are a number of BFT consensus protocols that follow modular designs for simplicity and ease of provability [7, 9]. The Hyperledger Fabric [6] implements the execute-order-validate architecture that enables customizability and flexibility from an individual application’s perspective. Similarly, Thunderella [58] allows the underlying blockchain to be designed separately from the normal case protocol. Bosco [66] and Brasileiro et al. [16] give consensus protocols that utilize black-box consensus protocols.
In Reference [8], Aublin et al. propose “Abstract” and show that it is possible to develop BFT consensus protocols modularly by separating the design of different execution profiles such as normal case execution and handling failures. Their approach yields a protocol with higher throughput, lower latency, and simpler development. In this article, we take a similar approach in combining alliance protocols and consensus protocols to build improved consensus protocols. However, we are focused specifically on improving the membership scalability of consensus protocols.
In References [44, 45], Kursawe proposes “Optimistic Byzantine Agreement” in which participants run two rounds of all-to-all communication to enable early decision making in the normal-case. Kursawe shows how to use this “pre-protocol” before consensus protocols with different validity assumptions, such that under failure Optimistic Byzantine Agreement falls back to the underlying protocol. We call the problem addressed during the optimistic phase of this work alliance. This article investigates how to solve alliance in a way that scales with membership.
3 ASYNCHRONOUS BYZANTINE CONSENSUS
3.1 System Model
Let \(P = \lbrace p_0, \ldots , p_{n-1}\rbrace\) be a set of \(n\) processors. A correct processor is a processor that follows a protocol specification; a Byzantine processor is a processor that is not correct. We also define an honest processor to be a processor that follows the protocol specification until it experiences a crash failure. Note that all correct processors are honest, and that honest processors that crash are considered Byzantine. We assume that at most \(f\) processors, \(f \lt n\), are Byzantine.
We assume that communication between any two honest processors is First In, First Out (FIFO)-ordered, and communication between any two correct processors is reliable. More formally:
Integrity: If an honest processor \(p\) delivers \((q, m)\) and processor \(q\) is honest, then \(q\) sent \(m\) to \(p\).
FIFO: If an honest processor \(p\) delivers \((q, m_1)\) and \((q, m_2)\) in that order and processor \(q\) is honest, then \(q\) sent \(m_1\) before \(m_2\).
Reliability: If a correct processor \(p\) sends \(m\) to a correct processor \(q\), then \(q\) eventually delivers \((p, m)\).
We do not assume that there are bounds on communication or processing step latencies, that is, the system is asynchronous. Processors may have clocks to implement timers, but the clocks of different processors do not need to be synchronized in any way.
3.2 Consensus
Consensus is defined as the problem of a set of processors agreeing on a common value. Each honest processor \(p_i\) has an initial proposal \(v_i\).
The core properties of consensus are as follows:
Agreement: If two honest processors decide, then they decide the same value.
Termination: Every correct processor eventually decides some value.
Note that these properties can easily be satisfied by honest processors always deciding \(v\) for some predetermined \(v\). To make consensus useful, we have to add a validity property. There are two that we consider in this article.
In the first formulation, the consensus protocol satisfies the following property:
Strong Validity [55]: If all correct processors propose the same proposal \(v\), and if an honest processor decides, then it decides \(v\).
We define strong consensus to be a consensus protocol that satisfies Strong Validity.
In the second formulation, there is an external predicate \(\mathcal {P}\) on proposals and the protocol satisfies the following property:
External Validity [18]: If an honest processor decides \(v\), then \(\mathcal {P}(v)\) holds.
We define validated consensus to be a consensus protocol that satisfies External Validity. Under the requirement that honest processors only propose values that satisfy \(\mathcal {P}\), a validated consensus protocol guarantees that honest processors decide a value that satisfies \(\mathcal {P}\), even if the value was not proposed by an honest processor.
The seminal Fischer, Lynch, and Paterson (FLP) impossibility result [31] implies that there cannot exist an asynchronous BFT consensus protocol that satisfies Agreement, Validity, and Termination if all correct processors are deterministic. From its proof, it is easy to see that the best that may be achieved with nondeterministic processors is an almost surely Termination guarantee. In our work, we will assume that the underlying BFT consensus protocol is partially correct, that is, the protocol satisfies Agreement as well as either Strong or External Validity. We will show that our modular consensus protocols are partially correct and inherit the liveness guarantees of the underlying consensus protocol.
3.3 Performance Metrics
Consensus is implemented by a protocol between the processors in \(P\). Processors send and receive messages. The efficiency of running a consensus protocol can be measured by various metrics. We define three metrics of interest.
The number of steps is a measure for in how many steps a run of a consensus protocol finishes—which is when all correct processors have decided. We can define a step as follows: both processors and messages are labeled with a step. Initially, all processors start a protocol in step 0. Each message \(m\) that is sent by a processor in step \(s\) is labeled with \(s+1\). A processor updates its step when it receives a message \(m\) to the maximum of \(m\)’s label and its current step. Thus, the number of steps a run of consensus takes is the maximum of all correct processors’ steps when first they all have decided.1 If we assume that local events take negligible time and message latencies are constant, then steps can be used to estimate the end-to-end latency of a protocol run. Note that many consensus protocols identify rounds, where each round consists of a fixed number of steps.
The message complexity of a run of a consensus protocol is the total number of messages sent during the run.
Finally, we define maximum fan-in as another efficiency metric that we want to minimize. We define fan-in as the number of neighbors a processor receives messages from in one communication step. The maximum fan-in of a protocol is then defined as the maximum fan-in across all processors and communication steps. The optimal maximum fan-in of a protocol is 1 and represents one-to-one communication. High maximum fan-in may cause throughput collapse through Transmission Control Protocol (TCP)incast (see Section 5.2). Poor distribution of work among processors is also associated with high maximum fan-in. This in turn may then lead to poor network utilization, since imbalance will cause over-subscription in a subset of links while most other links stay idle.
While we have described the metrics above for a particular run of a consensus protocol, the same metrics can be used for computing expectations, such as the expected message complexity of a run of a consensus protocol.
In our theoretical analysis, we are not concerned with how large messages are, since this may depend on protocol-specific primitives (see Section 3.4). However, in our implementation, optimizations, and evaluation, we will consider the effect of message sizes on the network.
3.4 Message Integrity
In a Byzantine setting, we need to ensure message integrity to prohibit Byzantine processors from forging messages from honest processors. Both public key cryptography with public signatures and private asymmetric keys with
The majority of consensus protocols use public key cryptography and digital signatures. Digital signatures provide non-repudiation, that is a processor cannot refute the authenticity of a message it previously signed. In particular, unlike
In consensus protocols that use
Compacted Signatures. Group signatures and multisignature schemes (e.g., Reference [62]) allow multiple individual signatures to be merged into a single signature with approximately the same size. We can obtain group signatures, even with thousands of participants, using cosigning protocols such as [68]. These schemes require extra communication steps and would introduce computation overheads on processors. However, with very large groups these overheads tend to be negligible compared to the cost of transmitting the large message payloads. Similarly, \((t,n)\)-threshold signature schemes can also provide more compact digital signatures [65].
3.5 Optimizing the Normal Case
Various consensus protocols optimize the normal case execution of consensus by isolating the case where every processor has the same proposal or history [8, 43, 44, 45]. In Reference [8], Aublin et al. present Aliph, a modular consensus protocol that runs Quorum in the normal case. Quorum is a low-latency protocol where a client considers a request that it sent to the processors ordered if it receives a matching history response from every processsor, similar to Zyzzyva [43]. Aliph falls back from Quorum to more complicated protocols in other cases. In Reference [45], Kursawe shows that for consensus protocols that support Strong Validity, running an expensive Byzantine consensus protocol can be avoided if correct processors could efficiently determine whether or not all honest processors have the same proposal by simply deciding that proposal.
We define alliance to be the problem in which a group of processors—some of which may be Byzantine—determine if they have the same proposal. They conclude either
a-Agreement: If an honest processor concludes
true , then all honest processors have the same proposal.a-Nontriviality: If all processors are honest and have the same proposal, then if a processor concludes, it concludes
true .a-Termination: If all processors are correct, then all processors eventually conclude.
Note that if a processor concludes
3.5.1 A Simple Alliance Protocol.
Algorithm 1 (\(\mathcal {A}_0\)) is a simple alliance protocol. Here, \(\langle m\rangle _i\) denotes a message \(m\) that is signed by processor \(p_i\). In this protocol, processors broadcast their signed proposals to one another. If an honest processor receives identical and correctly signed proposals for all processors, then it knows that all honest processors had the same proposal. This protocol solves alliance but has quadratic message complexity. We will improve on this in Section 4.
3.5.2 Strong Consensus with Alliance.
As mentioned above, alliance may be used to try to reach consensus in the case that all honest processors have the same proposals, avoiding running a consensus protocol. In this section, we will show how one may use alliance in combination with a consensus protocol that satisfies Strong Validity by generalizing and modularizing the optimistic Byzantine agreement protocol in Reference [45] using alliance as a building block. Note that both the alliance protocol and consensus protocol are used as black boxes.
Algorithm 2 presents a consensus protocol \(\mathcal {S}_a\) (for strong consensus with alliance) based on a combination of an abstract alliance protocol \(\mathcal {A}\) and an abstract underlying strong consensus protocol \(\mathcal {B}_s\) (for strong consensus below) that satisfies Strong Validity.
In this protocol the processors start an instance of an alliance protocol \(\mathcal {A}\) (for example, alliance protocol \(\mathcal {A}_0\)). If a processor concludes
The correctness of this approach has been discussed in Reference [45]. Note that the optimistic Byzantine agreement protocol in Reference [45] includes an extra round of communication between processors at the start of the protocol, compared to Algorithm 2. The purpose of the additional round is to attempt to achieve alliance even when honest processors start with different proposals. We have included a proof of correctness of Algorithm 2 in Appendix A.
3.5.3 Validated Consensus with Alliance.
Algorithm 2 cannot be used if the underlying consensus protocol does not satisfy Strong Validity. After all, in a case where there is a Byzantine processor and all honest processors propose the same value \(v\), it is well possible that alliance would conclude
Fortunately, most consensus protocols can be made validated [18] by having honest processors reject any messages that contain proposal values that do not satisfy \(\mathcal {P}\). In Reference [17], Cachin et al. describe how any validated consensus protocol can be transformed into a strong consensus protocol. That transformation can be combined with the timeout approach in Algorithm 2 or [8, 44] to integrate alliance with any validated consensus protocol with no additional overhead other than running the alliance protocol itself. We have included an algorithm that combines an alliance protocol with an abstract validated consensus protocol in Appendix B for reference.
3.5.4 Partial Synchrony and Requirements on the Timeout.
Alliance is safe to run in an asynchronous setting. However, it relies on timely communication between processors to enable early decision-making. Particularly, alliance is most useful in partially synchronous settings. For instance, assume that there is a period of synchrony in the system where messages between processors are delivered and processed within \(\delta\) time. Then, an alliance protocol \(\mathcal {A}\) is guaranteed to succeed if the timeout is chosen to be \(s_\mathcal {A} \delta\) where \(s_\mathcal {A}\) is the number of communication steps \(\mathcal {A}\) is designed to take. As we will see in Section 4, \(s_\mathcal {A}\) can range from 2 to \(2n\) where \(n\) is the number of processors.
3.5.5 If Alliance Fails.
The consensus protocols that we have described leverage alliance to speed up normal case executions. In other executions, we have shown that the consensus protocols still satisfy the correctness properties. However, there is the concern that alliance might add unnecessary overhead in such executions: alliance will fail after some time and the underlying consensus protocol still needs to run.
It is possible to run alliance and the underlying consensus protocol concurrently. Doing so would never increase the amount of time to reach consensus compared to just running the underlying consensus protocol. However, it would still add communication overhead. To remove this overhead, it would be necessary to integrate alliance and the underlying consensus protocol so that their messages can be piggybacked on one another. However, this hurts modularity and the underlying consensus protocol would no longer be a black box.
Algorithms 2 and 4 use a timeout to determine when to give up on alliance and start the underlying consensus protocol. This timeout is tunable or could be adaptive. It represents both how long alliance needs to complete in the normal case and the maximum amount of time consensus could be delayed by running alliance. As we will see in Section 5, this time is modest. Moreover, we could further optimize this overhead by disabling the alliance optimization for a period of time after it fails, amortizing its overhead over multiple runs of the underlying consensus protocol.
4 ALLIANCE PROTOCOLS
So far, we have only presented a simple alliance protocol with quadratic message complexity. We will now further explore alliance protocols, present a scalable protocol based on a hypercube-design and MACs, and consider various optimizations.
4.1 Classes of Alliance Protocols
Consensus protocols require all-to-all information exchange. In the simple “all-to-all” alliance protocol \(\mathcal {A}_0\), every processor broadcasts their proposal. While this finishes in the least number of communication steps possible, it results in a message complexity of \(O(n^2)\) with a maximum fan-in of \(O(n)\). This greatly hinders the scalability of the protocol. Furthermore, besides sending and receiving many messages, processors need to establish and manage network connections to all other processors.
We can achieve all-to-all information exchange without using all-to-all communication by using different communication patterns. Table 2 lists efficiency metrics and trade-offs of different communication patterns that are described below.
The One-to-all Protocol:. Several communication protocols make use of the “one-to-all” communication pattern [43, 72]. Consider an alliance protocol where one of the processors is designated leader. The protocol consists of two communication steps:
Collection step: Processors send their signed proposals to the leader.
Decision step: The leader broadcasts a vector of the signed proposals to each processors. Each processor can then conclude accordingly.
Note that the proposals need to be signed with public key signatures.
We illustrate this protocol in Figure 1(a). The protocol has optimal message complexity (\(2n\)). However, it has very high maximum fan-in because of the overhead on the leader and in particular can suffer from TCP incast.
Fig. 1. Illustration of one-to-all communication and ring communication.
The Ring Protocol:. One way to drastically reduce message complexity while still achieving optimal fan-in is by organizing the processors in a ring and sending a message around the ring twice [5, 37] (Figure 1(b)). This leads to an alliance protocol that also uses a leader as follows:
Collection steps: The leader initiates the protocol by sending its signed proposal to its clockwise neighbor. Each other processor collects a vector of proposals from its counter-clockwise neighbor, adds its own proposal, and sends the resulting vector to its clockwise neighbor. It takes \(n\) communication steps for the leader to receive a complete vector of proposals from its counter-clockwise neighbor.
Decision steps: To notify all information to all processors, the complete vector is forwarded along the entire ring (for efficiency, the vector can be decreased by one entry on each step).
Alternatively, the processors can be organized in a single sequence, the leader forming one end of the sequence and the vector being forwarded back and forth. The resulting protocol has similar characteristics. Either way, the number of steps is \(2n\), again limiting scalability.
The \(k\)-ary Tree Protocol:. Another example of all-to-all information exchange that has better scalability properties involves organizing the processors into a balanced tree [25, 41]. Generalizing the one-to-all protocol, we can implement an alliance protocol that communicates using a balanced tree with a fixed branching factor \(k\) as follows:
Collection steps: As in the one-to-all protocol, processors send their signed values toward the root (leader) starting from the leaves. Each processor collects values from each of its children and sends a vector to its parent.
Decision steps: The root transmits the complete vector down the tree. Each processor receives the vector from its parent and broadcasts the vector to its children.
The total number of steps is \(2\log _k n\), and the message complexity is \(2n\) (each processor sends and receives a message). The internal nodes—which represent a \(1/k\) fraction of the entire membership—receive from a higher number of senders than the leaves do—the maximum fan-in is \(k\). We can represent the one-to-all and ring protocols as \((n-1)\)-ary and 1-ary tree protocols, respectively.
The Two-level Hierarchy Protocol:. We can also arrange processors into a fixed-depth communication tree as follows: We first partition processors into groups. Each group has a group leader and one of the group leaders is the protocol leader:
Collection steps: The processors in a group send their proposals to the group leader. Each group leader collects these proposals and sends a vector to the protocol leader. The protocol leader collects vectors from its peer group leaders.
Decision steps: The protocol leader broadcasts the complete vector of proposals to the group leaders. Each group leader then propagates the vector to each peer processor in its group.
This pattern uses a communication tree of fixed depth 2 and nodes having mixed degrees. Groups are typically based on pragmatic objectives—for example, there may be a group per rack in a machine room, or a group per autonomous system in a wide-area setting. Similar to the one-to-all protocol, the two-level protocol achieves a constant number of steps. Assuming a fixed group size \(g\) and \(g\) being larger than the number of groups, the maximum fan-in of this protocol is \(O(g)\) where \(g\) is the fixed group size.
The Hypercube Protocol:. We know by the optimality result given in Reference [32] that, if a processor can only send (or receive) a message at a time, a binomial tree minimizes the completion time of dissemination of a single packet. This result is specifically relevant to our problem, since the completion time of a protocol depends on the completion time of the last processor.
Binomial trees can be perfectly fitted in hypercubes [38, 39]. Particularly, a binomial tree that contains every node in the hypercube is a spanning tree of the hypercube. As discussed in Reference [32], a hypercube-like overlay network is suitable for implementing broadcast over a binomial tree. We place \(n\) processors on a \(\log n\)-dimensional hypercube (for now, we assume for simplicity that \(n\) is a power of 2). We arrange processors in the hypercube such that processors \(p_i\) and \(p_j\) are adjacent in dimension \(s\), \(s \in [0, \log n)\), iff \(i = (j \oplus 2^s)\), that is, \(i\) and \(j\) only differ in bit \(s\) (\(\oplus\) is the
Fig. 2. A three-dimensional hypercube. At each step s, processors communicate with their neighbor at dimension s.
Fig. 3. The communication tree of processor p_0 in a four-dimensional cube with no compaction and 2-compaction.
All-to-all and one-to-all protocols both minimize the number of steps. However, the all-to-all protocol has poor message complexity and the one-to-all protocol has high maximum fan-in. Likewise, a \(k\)-ary tree protocol offers better message complexity than the hypercube, while the hypercube reduces fan-in. We see that each communication pattern minimizes a different set of performance metrics. As we will see later in our evaluation, hypercubes appear to scale better in practice. The hypercube protocol is also open to a variety of optimizations that allows it to perform competitively with protocols with lower numbers of steps.
4.2 The MAC Hypercube Alliance Protocol
Digital signatures lead to significant computational overheads. To reduce this overhead while taking advantage of the scalability of the hypercubes, we use Message Authentication Codes (MACs). Note that the protocol is correct with any deterministic MAC scheme that satisfies unforgeability.
Let \(n = 2^d\). Every pair of processors \(i\) and \(j\) (not necessarily neighbors) shares two keys. Honest processors keep secret keys to themselves. \(K_i[j]\) denotes the key \(p_i\) shares with \(p_j\) and \(K_j[i]\) denotes the key \(p_j\) shares with \(p_i\).
We denote by
For simplicity, we assume that proposals are unique across different invocations of the protocol so Byzantine processors cannot re-use
Each honest processor \(p_i\) executes
Then, in loop
After all \(d\) steps complete, processor \(p_i\) checks to see if \(H_i[i]\) equals the
Note that the aggregation of
Let \(p_i\) be an honest processor that proposes \(v_i\) and concludes
As shown in Reference [40], this can only hold if \(p_j\) has computed \(\texttt {MAC}(K_j[i], v_j)\) such that \(\texttt {MAC}(K_j[i], v_j) = \texttt {MAC}(K_j[i], v_i)\), because the keys of \(p_i\) and \(p_j\) are both secret. Thus \(v_j = v_i\).□
Assume all processors are correct and have the same initial proposal \(v\). Consider some processor \(p_i\). Each processor \(p_j\) initially sets \(H_j[i]\) to \(\texttt {MAC}(K_j[i], v)\). We first prove the following loop invariant for loop
Base Step (\(s=0\)): We have to prove that, at the start of step 0, \(H_j[i] = \texttt {MAC}(K_{j \oplus 0}[i], v)\). This follows trivially from \((j \oplus 0) = j\) and the fact that \(H_j[i]\) was initialized to \(\texttt {MAC}(K_j[i], v)\) in loop
L1 .
□Induction Step: Assume the induction hypothesis holds for some \(s^{\prime } \in [0, d)\). We have to show that the hypothesis holds for \(s = s^{\prime } + 1\). Let \(p_{j^{\prime }}\) be the neighbor of \(p_j\) in dimension \(s^{\prime }\), that is, \(p_{j^{\prime }} = j \oplus 2^{s^{\prime }}\). By the induction hypothesis, we have that at the start of step \(s^{\prime }\): \[\begin{align*} H_j[i] &= \texttt {MAC}(K_{j \oplus 0}[i], v) \oplus \cdots \oplus \texttt {MAC}(K_{j \oplus (2^{s^{\prime }}-1)}[i], v),\\ H_{j^{\prime }}[i] &= \texttt {MAC}(K_{{j^{\prime }} \oplus 0}[i], v) \oplus \cdots \oplus \texttt {MAC}(K_{{j^{\prime }} \oplus (2^{s^{\prime }}-1)}[i], v). \end{align*}\] Given that both \(p_j\) and \(p_{j^{\prime }}\) are correct, \(p_j\) receives \(H = H_{j^{\prime }}[i]\) and updates \(H_j[i] := H_j[i] \oplus H[i]\) in the last step of loop
L2 . The induction now follows directly from the observation that \[\begin{equation*} {j^{\prime }} \oplus 0 ~ \cdots ~ {j^{\prime }} \oplus (2^{s^{\prime }}-1) \equiv j \oplus 2^{s^{\prime }} ~ \cdots ~ j \oplus (2^s - 1). \end{equation*}\] Thus, directly after loopL2 , we have that \[\begin{equation*} H_j[i] = \texttt {MAC}(K_{j \oplus 0}[i], v) \oplus \cdots \oplus \texttt {MAC}(K_{j \oplus (2^d-1)}[i], v). \end{equation*}\] The result then follows directly for \(j=i\) and \(n = 2^d\).
Assume all processors are correct. Clearly a correct processor finishes loop
Base step (\(s = 0\)): We have shown that loop
Induction step: Assume that every processor reaches loop \(s^{\prime }\), \(s^{\prime } \lt d\). To prove the induction, we have to show that every processor finishes step \(s^{\prime }\) to reach step \(s^{\prime } + 1\). This follows immediately from the facts that in each step \(s^{\prime }\) processors are paired in neighbor pairs that exchange messages and communication between correct processors is reliable.
The result that loop
4.3 Optimizations
In this section, we describe several improvements we can make over the base
4.3.1 Early Termination.
The start of step \(s\) at processor \(p_i\) denotes the point before \(p_i\) sends its vector to its neighbor but after it executes any previous step (or after initialization). Similarly, the end of step \(s\) at processor \(p_i\) denotes the point after \(p_i\) receives its neighbor’s vector and updates its own, but before the start of the next step or completion.
Let \(\mathcal {S}_{i, s}\) be the set of processors that have their
Similarly, let \(\mathcal {E}_{i, s}\) be the set of processors that have their vectors aggregated in \(p_i\)’s vector at the end of step \(s\): \(\mathcal {E}_{i, s} = \mathcal {S}_{i, s} \cup \mathcal {S}_{i \oplus 2^s, s}\). Finally, for all \(s^* \in [1,d)\), we have that \(\mathcal {S}_{i, s^*} = \mathcal {E}_{i, s^* - 1}\), because the
\(\mathcal {E}_{i, s} = \lbrace p_{i \oplus 0},p_{i \oplus 1},\ldots ,p_{i \oplus (2^{s+1}-1)}\rbrace .\)
We will prove this by induction on \(s\).
Base Case (s = 0): We have that \(\mathcal {E}_{i, 0} = S_{i, 0} \cup \mathcal {S}_{i \oplus 1, 0}\). Since \(S_{i, 0} = \lbrace p_i\rbrace\) and \(S_{i \oplus 1, 0} = \lbrace p_{i\oplus 1}\rbrace\), \(\mathcal {E}_{i, 0} = \lbrace p_i, p_{i\oplus 1}\rbrace\).
□Induction Step: Assume the hypothesis holds for some \(s^{\prime } \in [0,d-1)\). We will show that it holds for \(s=s^{\prime }+1\). \[\begin{equation*} \begin{aligned}\mathcal {E}_{i, s} &= \mathcal {S}_{i, s} \cup \mathcal {S}_{i \oplus 2^s, s}\\ & = \mathcal {E}_{i, s^{\prime }} \cup \mathcal {E}_{i \oplus 2^s, s^{\prime }}\\ & = \lbrace p_i,p_{i\oplus 1},\ldots ,p_{i\oplus (2^{s} - 1)}\rbrace \cup \lbrace p_{i\oplus 2^s},p_{i\oplus 2^s\oplus 1},\ldots ,p_{i\oplus 2^{s}\oplus (2^{s} - 1)}\rbrace \\ & = \lbrace p_i,p_{i\oplus 1},\ldots ,p_{i\oplus (2^{s} - 1)}\rbrace \cup \lbrace p_{i\oplus 2^s},p_{i\oplus (2^s\oplus 1)},\ldots ,p_{i\oplus (2^{s}\oplus (2^{s} - 1))}\rbrace \\ & = \lbrace p_i,p_{i\oplus 1},\ldots ,p_{i\oplus (2^{s} - 1)}\rbrace \cup \lbrace p_{i\oplus 2^s},p_{i\oplus (2^s+1)},\ldots ,p_{i\oplus (2^{s}+(2^{s} - 1))}\rbrace \\ & = \lbrace p_i,p_{i\oplus 1},\ldots ,p_{i\oplus (2^{s+1} - 1)}\rbrace . \end{aligned} \end{equation*}\]
4.3.2 Reducing Message Sizes.
So far, we have ignored message sizes. In this section, we will consider them. In
When a processor \(p\) sends its
So the number of processors that receive \(p\)’s
Thus, even though the maximum message size grows linearly with the number of processors, with this optimization, we can reduce the message size by half with each step. As a result, the average message size is \(O(\frac{n}{\log n})\) bytes, and since there are \(n\) processors and \(\log n\) steps, the total load on the network is \(O(n^2)\).
In the one-to-all, two-level, and \(k\)-ary tree protocols, there is a single leader that is responsible for collecting all of the information before the decision steps. So during collection steps, each processor would need to send or receive an entire
4.3.3 Non-uniform Message Latencies.
In a \(d\)-dimensional hypercube, if we assume that the link delay between two processors is symmetric, then the latency of the protocol will be proportional to the sum of the maximum link delay in each dimension. Formally, let \(\ell _{i \rightarrow j}\) be the link delay between processors \(p_i\) and \(p_j\). Then, if we assume there is no congestion or processing delay, then the latency of the protocol is \(\sum _{s=0}^d \max _{i \in [0,n)} \left(\ell _{i \rightarrow i \oplus 2^s}\right)\).
Consider a two-dimensional hypercube, so a square. Assuming that link delays are symmetric, the latency of the protocol will be \(\propto ~\max (\ell _{0\rightarrow 1},\ell _{2\rightarrow 3})~+~\max (\ell _{0\rightarrow 2},\ell _{1\rightarrow 3})\). Assume we have that \(\ell _{0\rightarrow 1}~=~\ell _{0\rightarrow 2}~=~10\) ms but \(\ell _{1\rightarrow 2}~=~\ell _{1\rightarrow 3}~=~\ell _{0\rightarrow 3}~=~\ell _{2\rightarrow 3}~=~1\) ms. We can achieve a significantly lower protocol latency by switching the logical places of \(p_1\) and \(p_3\) on the hypercube structure (20 ms vs. 11 ms).
This is an especially important observation in the context of the internet as message delays between different autonomous systems are typically much higher than message delays within one. Therefore, we would like to minimize the number of messages that traverse through different ASes (Autonomous Systems) or through processors that are located further from each other.
The number of distinct hypercube arrangements (excluding symmetry and rotations) grows exponentially with respect to the number of processors. For instance, for \(n=4\) there are only 3 topologically distinct hypercubes. However, for \(n=8\), there are 840; and for \(n=16\), there are more than 54 billion. Finding the optimal hypercube arrangement is a difficult problem, but we can achieve a decent approximation by grouping processors that have low latency links between each other and arranging the hypercube so that most communication is contained within groups.
One simple method to achieve this is to assign processors of a group or AS to the same subcube of a hypercube [29]. In our case, this would require for processors in a group to have a continuous series of indices. We can group processors using simple clustering algorithms such as the \(k\)-medoids algorithm using a pre-computed latency matrix. If the necessary information is available, then we can also group processors that belong to the same AS together, since intra-AS latencies are generally not very variable. Once we obtain groups, we can assign processors within groups of size \(m\) to the same (\(\log m\))-dimensional subcube of the hypercube.
We can also use grouping to improve the latency of a \(k\)-ary tree protocol. In this protocol, the latency of one communication step is upper-bounded by the maximum latency between a parent processor and its \(k\) children, so we would like to assign processors in a group to the same (or consecutive) levels of the tree.
4.3.4 Compacting Rounds.
In
In compacted round 0, a processor sends messages to its neighbors in the next \(c\) dimensions. It also sends messages to any other processor that would receive this message via the neighbor’s propagation before the \(c{\text{th}}\) regular communication step. In the following compacted rounds, processors cover similarly their neighbors in dimensions \([c, 2c), [2c, 3c),\) and so on. Formally, in a compacted round \(r\), a processor \(p_i\) sends messages to and receives from all processors in \(\mathcal {E}_{i, c(r+1)-1} \setminus \mathcal {S}_{i, cr}\), where \(\mathcal {E}\) and \(\mathcal {S}\) are defined as in Section 4.3.1. The number of compacted rounds required to complete the hypercube protocol is thus \(\lceil d/c\rceil\).
Figure 3 illustrates the communication tree of a single processor in a four-dimensional hypercube with compaction factors 1 (no compaction) and 2.
Increasing the compaction factor reduces the number of steps from \(\log n\) to \(\lceil \frac{\log n}{c} \rceil\). The maximum fan-in is now \(2^c\). In each compacted round, a processor sends \(2^{c-1} + 2^{c-2} + \cdots + 2^0\) messages. So in each round \(n\) processors send \((2^c-1)\) messages each. The message complexity of the compacted
We can maintain the lower-bound on the minimum network load as \(O(n^2)\) as follows. Arguing as in Section 4.3.2, when a processor \(p\) sends its vector to its neighbor \(q\) in compacted round \(r \in [0, \lceil d/n \rceil)\), \(q\) propagates this vector in the remaining rounds. By then, \(q\) finishes communicating with its first \(c(r+1)\) neighbors. Thus, \(q\) aggregating \(p\)’s vector at the end of a compacted round \(r\) is equivalent to \(q\) aggregating it in step \(c(r+1) - 1\) in a regular hypercube. So at compacted round \(r\), the average message size is \(2^{d - c(r+1)}\) and with each round the message size reduces by a factor of \(2^c\). So compaction increases the message complexity, but it also reduces the average message size.
Comparison of a \(c\)-compacted Hypercube with a \(k\)-ary Tree:. We will discuss the number of steps each protocol is expected to take with respect to their branching parameters. A \(c\)-compacted hypercube protocol takes \(\frac{\log n}{c}\) steps, whereas a \(k\)-ary tree protocol takes \(2 \log _k n\) steps. So the hypercube protocol takes fewer steps as long as \(\frac{\log n}{c}~\lt ~2\log _k n~\Rightarrow ~c~\gt ~\frac{\log k}{2} \Rightarrow c \gt \log _4 k\).
4.4 Incomplete Hypercubes
For simplicity, we have assumed that the number of processors is a power of 2. Let \(n = 2^{d-1} + m\) be the number of processors where \(d = \lceil \log n \rceil\) and \(0 \lt m \lt 2^{d-1}\). We make the assumption that processors are indexed and hence placed on the hypercube in a continuous sequence. For instance, for \(n=3\), we have \(P =\lbrace p_0,p_1,p_2\rbrace\), as opposed to \(P =\lbrace p_0,p_1,p_5\rbrace\).
With our assumption, an incomplete hypercube consists of one complete and one incomplete sub-hypercube. The incomplete sub-hypercube, similarly, consists of one complete and one incomplete sub-sub-cube. Such a hypercube is called a composite hypercube [34].
We define a virtual processor as a processor that is not indexed in the composite hypercube but would be indexed in a complete \(d\)-dimensional hypercube so the set of virtual processors is \(\forall p_v\) s.t. \(n \le v \le 2^d\). We denote a processor that is located in the largest incomplete subhypercube as an incomplete processor (i.e., \(\forall p_i\) st \(2^{d-1}\le i \lt m\)). We denote the other processors as complete processors.
In an incomplete hypercube, we can achieve all-to-all communication by embedding a complete \(d\)-dimensional hypercube into the composite. That is, every virtual processor is simulated by a single existing processor. Notice that we can do the assignment of virtual processors to existing processors arbitrarily, since we have an underlying network topology that is a clique.
Now, let \(n_j\) be the number of existing neighbors that a processor \(p_j\) has. Assume an assignment of virtual processors to simulators is load-balanced such that for all \(p_j\), the number of virtual nodes that \(p_j\) simulates is less than or equal to \(d-n_j\). Such an assignment implies that the protocol completes in \(d\) communication steps and has a message complexity of \(n\cdot d\). We can reduce the message complexity further to the optimal \((n-m)(d-1) + md\). This partly follows from the observation that incomplete processors do not need to receive any messages from complete processors before the last communication step, since an incomplete processor is guaranteed to have an existing neighbor then. This complete neighbor will already transmit the aggregate information of all complete processors at step \(d-1\). Furthermore, we can eliminate the extra message overhead on complete processors by carefully assigning virtual processors such that simulation-related messages overlap with existing protocol messages. One scheme that enables both of these optimizations is having incomplete processors simulate their virtual neighbors. So at step \(s\), if an incomplete processor \(p_i\) has a virtual neighbor \(q_i\), it instead sends its vector to the first future existing neighbor of \(q_i\). Complete processors do not need to send any extra messages for simulation.
For example, consider an incomplete hypercube of size \(n=5\). In step 0, \(p_4\) can simulate \(p_5\) by sending its vector to \(p_1\), since \(p_1\) is the first future neighbor of \(p_5\) that exists. Now at step 1, \(p_1\) will simulate \(p_7\) by sending its vector to \(p_3\), since it already received \(p_4\)’s vector. Notice that this comes at no additional message cost, since \(p_1\) and \(p_3\) are already neighbors in the first dimension. Similarly, \(p_4\) can simulate \(p_6\) in step 1 by sending its vector to \(p_2\). Finally, the message complexity is 11 messages, instead of \(n \cdot d = 15\).
5 EVALUATION
In this section, we show that alliance protocols can be implemented efficiently and scalably. In particular, we demonstrate that
With our evaluation, we want to answer the following questions:
Does communication in
MHAP scale well in the number of participants?How does the performance of
MHAP compare to traditionally used patterns such as one-to-all and \(k\)-ary tree?What are the improvements gained by grouping and compaction optimizations? Do such optimizations scale well with
MHAP and \(k\)-ary tree?
5.1 Methodology
5.1.1 Simulation.
Our evaluation requires us to run experiments with thousands of processors. We use network simulation to achieve this. To obtain results as realistic as possible, we use the well-tested ns-3 network simulator [56] along with the BRITE [51] topology generator to generate realistic network topologies.2
Using ns-3, we describe topologies by specifying nodes (one per processor) and routers with IP4 addresses. We specify links between nodes and routers and in-between routers. We set the bandwidth and delay of each link according to the topology we want to construct, and we connect processors using TCP sockets. ns-3 is a discrete event simulator: it assigns a point in simulation time for every event. An event may trigger other events. The scheduler orders events and moves the simulation time from one event to another accordingly [35]. We simulate our protocols by specifying “send” events for sending messages and appropriate callbacks for specifying “receive” events for receiving messages. Note that we only simulate the network latency of the protocols. Since all of the alliance protocols that we evaluate use
5.1.2 Topologies.
We run experiments with 3 different network topologies:
Star Topology: Every processor in the graph is connected to a single core switch. Each link between a processor and the switch has a capacity of 20 Gbps and a latency of 100 ms. Therefore each communication step is expected to take 200 ms. While unrealistic, this topology allows us to control the end-to-end latency between processors and to ensure that communication between two disjoint pairs of processors does not share a link. We experiment with this topology to illustrate the scaling capability of protocols and the improvements obtained with
Star-of-Stars Topology: We first partition processors into groups of 128 members each. Every processor inside a group is connected to a single group switch by a 1ms link. Each group switch is connected to a single core switch by a 100 ms link. Each link has a capacity of 20 Gbps. Since the intra-group and inter-group latencies are significantly different, we can use this simple topology to illustrate improvement obtained by grouping processors.
Top-down Hierarchical Topology (BRITE): The Star and Star-of-Stars topologies allow us to evaluate improvements of optimizations and to estimate performance analytically. However, they are not realistic settings. To evaluate the communication models more realistically, we use the popular BRITE topology generator [51].
We place autonomous systems (AS) geometrically on a 15,000 km\(^2\) plane uniformly at random. Each AS spans (max.) 1,000 km\(^2\) of the plane and 500 routers are located in an AS with a heavy-tailed distribution. Link latencies between routers are calculated with respect to the geometric distance between routers. These links between routers have a bandwidth of 20 Gbps. Linking in inter-AS and intra-AS routers prioritizes smallest-degree and non-leaf nodes. Each AS has roughly 250 border routers. Processors are randomly distributed into different ASes, each with 128 members. They connect to the border routers in ASes with links that have a latency of 1ms and a bandwidth of 10 Gbps.
5.1.3 Experiment Setup.
We set message sizes assuming that all protocols use private keys and
Each experiment is initiated by a starter processor. The starter sends to each processor a small
In experiments with the star-of-stars and BRITE topologies, the geo-distribution of processors may affect the latency of a protocol. So with this topology, if the grouping optimization is not used, we run five different experiments while randomizing the distribution of processes to ASes. If the grouping optimization is used, then processors are distributed to ASes as described in Section 4.3.3.
In each experiment, we run initialization and alliance 120 times. The first 60 runs are used to warm up TCP connections but are otherwise discounted (to remove bias caused by TCP slow start). For the remaining runs, we report the observed protocol latency, which is the average completion time. We define the expected protocol latency to be the estimated completion time of a single protocol run without network congestion. We can estimate the expected latency for protocols in two of our experimental topologies.
In our simulations, we make the assumption that a processor can start sending or receiving multiple messages at the same time. Particularly, we are not concerned with transmission and NIC delays at the processors. For example, in the one-to-all protocol, if the leader broadcasts a single message and there is no network congestion, all processors should receive the message in one link latency as opposed to \(O(n)\) latencies. This corresponds to our earlier definition of a communication step.
5.2 TCP Incast
In tree-based and broadcast-based protocols, while executing collection steps, each processor needs to collect messages from all of its neighbors. The number of neighbors of a processor depends on different parameters: the branching factor \(k\) in a \(k\text{-}\)ary tree protocol, the number of processors in a one-to-all protocol, the number of processors per group in a two-level protocol and the compaction factor in
TCP incast is a common scalability problem that arises when many processors are trying to send to a single processor over TCP connections. It results in a major decline in average throughput, increased protocol latency, and increased variance in both latency and throughput. This is especially relevant in protocols that operate in steps where a processor needs to receive messages from all its peers before moving on to the next step. Finding an accurate model to predict incast is still an open research problem [21].
Recall that, in an experiment, we repeat alliance multiple times. We can compute the latency of each of these runs and obtain a sequence of latencies. If a protocol does not cause significant congestion in the network, then we would expect the latency to gradually decrease until the point where the senders’ congestion windows have converged. In congestion-free experiments, this typically occurs somewhere in the first five protocol runs. However, as the number of processors in many-to-one communication increases, incast leads to increased queuing time and packet drops in the bottleneck switch. This causes timeout events in senders, which in turn result in high and unpredictable protocol latencies. Figure 4 illustrates this problem with the one-to-all protocol in the BRITE topology.
Fig. 4. TCP incast problem in the one-to-all protocol with 4,096 participants. Each line corresponds to a different run of the experiment with a different geo-distribution of processors. In each run of the experiment, the alliance protocol was run 120 times, and data points show the latency measured for each run of alliance.
5.3 Results and Comparison
5.3.1 Star.
In our experiments using the star topology, we evaluate
As seen in Figure 5(a), the one-to-all protocol performs better than both
Fig. 5. Evaluation of Protocols in Star Topologies. HC: MHAP, ccHC: c-compacted MHAP, kT: k\text{-}ary tree protocol, 1A: one-to-all protocol, /g: with grouping, 2L: two-level protocol.
The 2-ary tree protocol, 4-ary tree protocol, and
5.3.2 Star-of-stars.
We evaluate the two-level and the 4-ary tree protocols against
In the star-of-stars topology there is a single core switch that multiple group switches are connected to. Therefore, even communication between two disjoint sets of processors is likely to share a link. With large numbers of processors congestion occurs in the core switch, which results in poor scaling of the
The two-level protocol outperforms the base
In both of these topologies, there are multiple shared links and switches and limited variance in the end-to-end delays between processors, so protocols are more likely to face network congestion than in a realistic network.
5.3.3 BRITE.
To get more realistic results, we performed experiments using the topology we generated with BRITE.
We first test for improvements gained by reducing message sizes using the optimizations described in Section 4.3.2. We run experiments with
Fig. 6. Comparison of protocols with and without the message size optimization in BRITE topology. HC: MHAP, ccHC: c-compacted MHAP, kT: k\text{-}ary tree protocol, /g: with grouping, *: without the optimization.
We then compared
Fig. 7. Evaluation of protocols in the BRITE topology. ccHC: c-compacted MHAP, kT: k-ary tree protocol, 1A: one-to-all protocol, /g: with grouping, 2L: two-level protocol.
The one-to-all protocol performs near optimal up to \(\sim\)1,200 processors, after which we observe TCP incast in the switch connected to the leader. The two-level protocol performs similar to the one-to-all protocol as expected. It also scales better up to \(\sim\)2,000 processors. However, as we further increase the number of processors, we see that incast during collection steps causes packet drops in flows going from processors to group leaders, which deteriorates performance.
The base \(k\)-ary tree protocol scales well with up to \(k=512\). Performance gain is approximately proportional to the depth of the tree. For instance with 4,096 processors, an 8-ary tree has four levels and a latency of approximately 2 s, whereas a 16-ary tree has three levels and a latency of approximately 1.5 s. Thus, increasing \(k\) further than 64 yields marginal improvement toward a latency of 1 s, since the resulting tree has the same depth (2) for large numbers of processors. The depth of a tree with 4,096 processors will at minimum be 2 unless \(k=4,096\), which then reduces to the one-to-all case. We see that the 512-ary and 1,024-ary tree protocols experience TCP incast with 4,096 processors. With the 512-ary tree protocol, two of the five geodistributions of 4,096 processors result in runs of alliance that experience incast. With the 1,024-ary tree protocol, we observe incast consistently in each geodistribution. The grouping optimization with the 8-ary protocol reduces the protocol latency significantly. However, as seen previously, grouping with a tree does not scale well.
The base
To sum up, with smaller numbers of processors and with appropriate optimizations, both
6 CONCLUSION
We have demonstrated through analysis and simulation that we can achieve significant improvements in scaling Byzantine consensus protocols using alliance, at least in the normal case when there are no failures, all communication is timely, and there is little contention. Experimenting with different implementations of alliance, we find that while the one-to-all and two-level protocols have minimal latency with fewer than 2,000 participants, they do not scale well with larger memberships. Both tree-based and hypercube-based solutions scale well in the number of processors, but a hypercube-based solution has superior scalability and performance, because load is balanced across all processors and the network is better utilized.
Appendices
A PROOF OF ALGORITHM 2 - STRONG CONSENSUS WITH ALLIANCE
Algorithm 2 satisfies Agreement.
Let \(p_i\) and \(p_j\) be two honest processors with proposals \(v_i\) and \(v_j\) that decide values \(w_i\) and \(w_j\). We have to show that \(w_i = w_j\). There are three cases:
Case 1: Both \(p_i\) and \(p_j\) decide after \(\mathcal {A}\) concludes
true (atA1 ).Case 2: Without loss of generality, \(p_i\) decides at
A1 and \(p_j\) decides the result of the underlying consensus (atC1 ).
In case 1, both \(p_i\) and \(p_j\) conclude
Assume case 2 holds and \(p_i\) decides at
In case 3 both \(p_i\) and \(p_j\) decide the result of running \(\mathcal {B}_s\), and thus, by the Agreement property of \(\mathcal {B}_s\), \(w_i = w_j\).□
Algorithm 2 satisfies Strong Validity.
Suppose all honest processors propose the same proposal \(v\) and let \(p_i\) be a honest processor that decides \(w_i\). We have to show that \(v = w_i\). There are two cases:
In case 1, \(p_i\) concludes
Algorithm 2 is a partially correct asynchronous Byzantine consensus protocol with Strong Validity.
Algorithm 2 inherits the Termination properties of \(\mathcal {B}_s\).
Let \(p_i\) be a correct processor that starts running the alliance protocol. There are 3 outcomes of running alliance with a timeout:
Case 1: \(p_i\) concludes
true .
In case 1, \(p_i\) decides at
B VALIDATED CONSENSUS WITH ALLIANCE
ACKNOWLEDGMENTS
We thank the anonymous reviewers for their insightful and detailed feedback.
Footnotes
1 Although this definition of a step resembles that of Lamport clocks [46], a processor’s step is not incremented every time it receives a message.
Footnote2 Our code is available at github.com/burcuc/alliance-sim.
Footnote
- [1] . Retrieved from https://www.ibm.com/products/food-trust.Google Scholar
- [2] . 2020. A Digital Platform for Precious Metals. Retrieved from https://tradewindmarkets.com/.Google Scholar
- [3] . 2005. Fault-scalable Byzantine fault-tolerant services. ACM SIGOPS Operat. Syst. Rev. 39, 5 (2005), 59–74.Google Scholar
Digital Library
- [4] . 2017. Chainspace: A sharded smart contracts platform. Retrieved from https://
arxiv:cs.CR/1708.03778. Google Scholar - [5] . 1992. Transis: A communication subsystem for high availability. In Proceedings of the 22nd Annual International Symposium on Fault-Tolerant Computing.
IEEE Computer Society ,Washington, DC , 76–84.DOI: DOI: https://doi.org/10.1109/FTCS.1992.243613Google ScholarCross Ref
- [6] . 2018. Hyperledger fabric: A distributed operating system for permissioned blockchains. In Proceedings of the 13th European Conference on Computer Systems (EuroSys’18).
ACM ,New York, NY , Article30 , 15 pages.DOI: DOI: https://doi.org/10.1145/3190508.3190538Google ScholarCross Ref
- [7] . 2012. A modular approach to shared-memory consensus, with applications to the probabilistic-write model. Distrib. Comput. 25, 2 (2012), 179–188.Google Scholar
Cross Ref
- [8] . 2015. The next 700 BFT protocols. ACM Trans. Comput. Syst. 32, 4, Article
12 (Jan. 2015), 45 pages.DOI: DOI: https://doi.org/10.1145/2658994Google ScholarDigital Library
- [9] . 2003. Consensus in Byzantine asynchronous systems. J. Discrete Algor. 1, 2 (2003), 185–210.Google Scholar
Digital Library
- [10] . 1989. Database operations in a cube-connected multicomputer system. IEEE Trans. Comput. 38, 6 (
June 1989), 920–927.DOI: DOI: https://doi.org/10.1109/12.24307Google ScholarCross Ref
- [11] . 1990. On the power of randomization in online algorithms. In Proceedings of the 22nd Annual ACM Symposium on Theory of Computing (STOC’90).
ACM ,New York, NY , 379–386.DOI: DOI: https://doi.org/10.1145/100216.100268Google ScholarDigital Library
- [12] . 1983. Another advantage of free choice (extended abstract): Completely asynchronous agreement protocols. In Proceedings of the 2nd Annual ACM Symposium on Principles of Distributed Computing (PODC’83).
ACM ,New York, NY , 27–30.DOI: DOI: https://doi.org/10.1145/800221.806707Google ScholarDigital Library
- [13] . 1991. Optimal communication algorithms for hypercubes. J. Parallel Distrib. Comput. 11, 4 (1991), 263–275.Google Scholar
Digital Library
- [14] . 2014. State machine replication for the masses with BFT-SMART. In Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’14).
IEEE Computer Society ,Washington, DC , 355–362.DOI: DOI: https://doi.org/10.1109/DSN.2014.43Google ScholarDigital Library
- [15] . 1983. Resilient consensus protocols. In Proceedings of the 2nd Annual ACM Symposium on Principles of Distributed Computing (PODC’83).
ACM ,New York, NY , 12–26.DOI: DOI: https://doi.org/10.1145/800221.806706Google ScholarDigital Library
- [16] . 2001. Consensus in one communication step. In Proceedings of the 6th International Conference on Parallel Computing Technologies
(PaCT’01) .Springer-Verlag ,London, UK , 42–50. http://dl.acm.org/citation.cfm?id=645768.667612.Google ScholarDigital Library
- [17] . 2014. Introduction to Reliable and Secure Distributed Programming. Springer.Google Scholar
Digital Library
- [18] . 2001. Secure and efficient asynchronous broadcast protocols. In Proceedings of the Conference on Advances in Cryptology
(CRYPTO’01) , (Ed.).Springer ,Berlin , 524–541.Google ScholarCross Ref
- [19] . 1999. Practical Byzantine fault tolerance. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation
(OSDI’99) .USENIX Association ,Berkeley, CA , 173–186.Google ScholarDigital Library
- [20] . 2017. Solidus: Confidential distributed ledger transactions via PVORM. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security
(CCS’17) .Association for Computing Machinery ,New York, NY , 701–717.DOI: DOI: https://doi.org/10.1145/3133956.3134010Google ScholarDigital Library
- [21] . 2009. Understanding TCP incast throughput collapse in datacenter networks. In Proceedings of the 1st ACM Workshop on Research on Enterprise Networking
(WREN’09) .ACM ,New York, NY , 73–82.DOI: DOI: https://doi.org/10.1145/1592681.1592693Google ScholarDigital Library
- [22] . 2009. Making Byzantine fault tolerant systems tolerate byzantine faults. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation
(NSDI’09) .USENIX Association ,Berkeley, CA , 153–168. Retrieved from http://dl.acm.org/citation.cfm?id=1558977.1558988.Google ScholarDigital Library
- [23] . 1989. Dynamic load balancing for distributed memory multiprocessors. J. Parallel Distrib. Comput. 7, 2 (1989), 279–301.Google Scholar
Digital Library
- [24] . 2015. Centrally banked cryptocurrencies. Retrieved from https://arXiv:cs.CR/1505.06895.Google Scholar
- [25] . 2003. Exact communication costs for consensus and leader in a tree. J. Discrete Algor. 1, 2 (
Apr. 2003), 167–183.DOI: DOI: https://doi.org/10.1016/S1570-8667(03)00024-8Google ScholarDigital Library
- [26] . 1982. An efficient algorithm for Byzantine agreement without authentication. Info. Control 52, 3 (1982), 257–274.Google Scholar
Cross Ref
- [27] . 2018. Reducing Title Fraud: Real Estate Looks Toward Blockchain-based Transactions. Retrieved from https://rismedia.com/2018/12/25/reducing-title-fraud-real-estate-blockchain-transactions/.Google Scholar
- [28] . 2018. Decentralization in bitcoin and ethereum networks. Retrieved from https//:arXiv:cs.CR/1801.03998.Google Scholar
- [29] . 2000. Using a hypercube algorithm for broadcasting in internet-based clusters. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications
(PDPTA’00) , (Ed.).CSREA Press ,Athens, GA , 120.Google Scholar - [30] . 2005. Dynamically adaptive binomial trees for broadcasting in heterogeneous networks of workstations. In Proceedings of the International Conference on High Performance Computing for Computational Science
(VECPAR’04) , , , , and (Eds.).Springer-Verlag ,Heidelberg , 480–495.Google ScholarDigital Library
- [31] . 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (
Apr. 1985), 374–382.DOI: DOI: https://doi.org/10.1145/3149.214121Google ScholarDigital Library
- [32] . 2005. On cooperative content distribution and the price of barter. In Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
(ICDCS’05) .IEEE Computer Society ,Washington, DC , 81–90.DOI: DOI: https://doi.org/10.1109/ICDCS.2005.53Google ScholarDigital Library
- [33] . 2017. Algorand: Scaling byzantine agreements for cryptocurrencies. In Proceedings of the 26th Symposium on Operating Systems Principles
(SOSP’17) .ACM ,New York, NY , 51–68.DOI: DOI: https://doi.org/10.1145/3132747.3132757Google ScholarCross Ref
- [34] . 1991. On optimal embeddings into incomplete hypercubes. In Proceedings of the 5th International Parallel Processing Symposium
(IPPS’91) .IEEE Computer Society ,Washington, DC , 416–423.DOI: DOI: https://doi.org/10.1109/IPPS.1991.153813Google ScholarDigital Library
- [35] . 2010. ns-3 Tutorial. Retrieved from https://www.nsnam.org/tutorials/geni-tutorial-part1.pdf.Google Scholar
- [36] . 2020. Blockchain in Real Estate: How This Disrupts the Market: CB Insights. Retrieved from https://www.cbinsights.com/research/blockchain-real-estate-disruption/.Google Scholar
- [37] . 2010. Ring paxos: A high-throughput atomic broadcast protocol. In Proceedings of the 40th IEEE/IFIP International Conference on Dependable Systems Networks
(DSN’10) .IEEE Computer Society ,Washington, DC , 527–536.DOI: DOI: https://doi.org/10.1109/DSN.2010.5544272Google Scholar - [38] . 1997. Embedding of binomial trees in hypercubes with link faults. In Proceedings of the International Conference on Parallel Processing. 96–99.
DOI: DOI: https://doi.org/10.1109/ICPP.1997.622564Google ScholarDigital Library
- [39] . 1989. Optimum broadcasting and personalized communication in hypercubes. IEEE Trans. Comput. 38, 9 (1989), 1249–1268.
DOI: DOI: https://doi.org/10.1109/12.29465Google ScholarCross Ref
- [40] . 2008. Aggregate message authentication codes. In Proceedings of the Cryptographers’ Track at the RSA Conference. Springer, 155–169.Google Scholar
Cross Ref
- [41] . 2016. Enhancing bitcoin security and performance with strong consistency via collective signing. In Proceedings of the 25th USENIX Security Symposium.
USENIX Association ,Austin, TX , 279–296. Retrieved from https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/kogias.Google Scholar - [42] . 2018. OmniLedger: A secure, scale-out, decentralized ledger via sharding. In Proceedings of the 39th IEEE Symposium on Security and Privacy.
IEEE Computer Society ,Washington, DC , 583–598.DOI: DOI: https://doi.org/10.1109/SP.2018.000-5Google ScholarCross Ref
- [43] . 2007. Zyzzyva: Speculative Byzantine fault tolerance. In Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles
(SOSP’07) .ACM ,New York, NY , 45–58.DOI: DOI: https://doi.org/10.1145/1294261.1294267Google ScholarDigital Library
- [44] . 2000. Optimistic Asynchronous Byzantine Agreement.
Technical Report . IBM Research Zurich.Google Scholar - [45] . 2002. Optimistic Byzantine agreement. In Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems.262–267.
DOI: DOI: https://doi.org/10.1109/RELDIS.2002.1180196Google ScholarCross Ref
- [46] . 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (1978), 558–565.Google Scholar
Digital Library
- [47] . 1982. The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4, 3 (
July 1982), 382–401.DOI: DOI: https://doi.org/10.1145/357172.357176Google ScholarDigital Library
- [48] . 2016. XFT: Practical fault tolerance beyond crashes. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation
(OSDI 16) .USENIX Association ,Savannah, GA , 485–500. Retrieved from https://www.usenix.org/conference/osdi16/technical-sessions/presentation/liu.Google Scholar - [49] . 2016. A secure sharding protocol for open blockchains. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security
(CCS’16) .ACM ,New York, NY , 17–30.DOI: DOI: https://doi.org/10.1145/2976749.2978389Google ScholarDigital Library
- [50] . 2008. Mencius: Building efficient replicated state machines for WANs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation
(OSDI’08) .USENIX Association ,Berkeley, CA , 369–384. Retrieved from http://dl.acm.org/citation.cfm?id=1855741.1855767.Google Scholar - [51] . 2001. BRITE: An approach to universal topology generation. In Proceedings of the 9th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
(MASCOTS’01) .IEEE Computer Society ,Washington, DC , 346.DOI: DOI: https://doi.org/10.1109/MASCOT.2001.948886Google ScholarCross Ref
- [52] . 1999. Verifiable random functions. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science
(FOCS’99) .IEEE Computer Society ,Washington, DC , 120. Retrieved from http://dl.acm.org/citation.cfm?id=795665.796482.Google ScholarCross Ref
- [53] . 2016. The honey badger of BFT protocols. In Proceedings of theACM SIGSAC Conference on Computer and Communications Security, , , , , and (Eds.).
ACM ,New York, NY , 31–42.DOI: DOI: https://doi.org/10.1145/2976749.2978399Google ScholarDigital Library
- [54] . 2009. Bitcoin: A Peer-to-peer Electronic Cash System. Retrieved from http://bitcoin.org/bitcoin.pdf.Google Scholar
- [55] . 1994. Distributed consensus revisited. Info. Process. Lett. 49, 4 (1994), 195–201.Google Scholar
Digital Library
- [56] ns-3: A Discrete-event Network Simulator for Internet Systems. Retrieved from https://www.nsnam.org/.Google Scholar
- [57] . 2017. Hybrid consensus: Efficient consensus in the permissionless model. In Proceedings of the 31st International Symposium on Distributed Computing
(DISC’17) , (Ed.), Vol. 91.Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik , 39:1–39:16.DOI: DOI: https://doi.org/10.4230/LIPIcs.DISC.2017.39Google Scholar - [58] . 2018. Thunderella: Blockchains with optimistic instant confirmation. In Proceedings of the 37th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Part II
(EUROCRYPT’18) , and (Eds.), Vol. 10821.Springer-Cham ,Cham, Switzerland , 3–33.DOI: DOI: https://doi.org/10.1007/978-3-319-78375-8_1Google ScholarCross Ref
- [59] . 2018. Snowflake to Avalanche: A Novel Metastable Consensus Protocol Family for Cryptocurrencies. Retrieved from https://ipfs.io/ipfs/QmUy4jh5mGNZvLkjies1RWM4YuvJh5o2FYopNPVYwrRVGV.Google Scholar
- [60] . 2019. CCF: A Framework for Building Confidential Verifiable Replicated Services.
Technical Report MSR-TR-2019-16, Microsoft.Google Scholar - [61] . 2014. Zerocash: Decentralized anonymous payments from bitcoin. In Proceedings of the IEEE Symposium on Security and Privacy. 459–474.
DOI: DOI: https://doi.org/10.1109/SP.2014.36Google ScholarDigital Library
- [62] . 1991. Efficient signature generation by smart cards. J. Cryptol. 4, 3 (
Jan. 1991), 161–174.DOI: DOI: https://doi.org/10.1007/BF00196725Google ScholarDigital Library
- [63] . 1991. Efficient all-to-all communication patterns in hypercube and mesh topologies. In Proceedings of the 6th Distributed Memory Computing Conference.
IEEE Computer Society ,Washington, DC , 398–403.DOI: DOI: https://doi.org/10.1109/DMCC.1991.633174Google ScholarCross Ref
- [64] . 2020. Applications of Blockchain Technology in the Food Industry. Retrieved from https://www.newfoodmagazine.com/article/110116/blockchain/.Google Scholar
- [65] . 2000. Practical threshold signatures. In Proceedings of the 19th International Conference on Theory and Application of Cryptographic Techniques
(EUROCRYPT’00) .Springer-Verlag ,Berlin , 207–220. Retrieved from http://dl.acm.org/citation.cfm?id=1756169.1756190.Google ScholarDigital Library
- [66] . 2008. Bosco: One-step byzantine asynchronous consensus. In Proceedings of the 22nd International Symposium on Distributed Computing
(DISC’08) .Springer-Verlag ,Berlin , 438–450.DOI: DOI: https://doi.org/10.1007/978-3-540-87779-0_30Google ScholarDigital Library
- [67] . 1990. Intensive hypercube communication prearranged communication in link-bound machines. J. Parallel Distrib. Comput. 10, 2 (1990), 167–181.Google Scholar
Digital Library
- [68] . 2016. Keeping authorities “Honest or Bust” with decentralized witness cosigning. In Proceedings of the 37th IEEE Symposium on Security and Privacy.
IEEE Computer Society ,Washington, DC , 526–545.DOI: DOI: https://doi.org/10.1109/SP.2016.38Google ScholarCross Ref
- [69] . Cryptographic Currency For Securities Settlement. US Patent 20150332395. https://patentscope.wipo.int/search/en/detail.jsf?docId=US200513559.Google Scholar
- [70] . 2019. Monoxide: Scale out blockchains with asynchronous consensus zones. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation
(NSDI’19) .USENIX Association ,Berkeley, CA , 95–112. Retrieved from https://www.usenix.org/conference/nsdi19/presentation/wang-jiaping.Google Scholar - [71] . 2014. Ethereum: A secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper 151 (2014), 1–32.Google Scholar
- [72] . 2019. HotStuff: BFT consensus with linearity and responsiveness. In Proceedings of the ACM Symposium on Principles of Distributed Computing
(PODC’19) .Association for Computing Machinery ,New York, NY , 347–356.DOI: DOI: https://doi.org/10.1145/3293611.3331591Google Scholar
Index Terms
Scaling Membership of Byzantine Consensus
Recommendations
Scaling Byzantine Consensus: A Broad Analysis
SERIAL'18: Proceedings of the 2nd Workshop on Scalable and Resilient Infrastructures for Distributed LedgersBlockchains and distributed ledger technology (DLT) that rely on Proof-of-Work (PoW) typically show limited performance. Several recent approaches incorporate Byzantine fault-tolerant (BFT) consensus protocols in their DLT design as Byzantine consensus ...
Low complexity Byzantine-resilient consensus
The application of the tolerance paradigm to security - intrusion tolerance - has been raising a reasonable amount of attention in the dependability and security communities. In this paper we present a novel approach to intrusion tolerence. The idea is ...
Communication optimal multi-valued asynchronous broadcast protocol
LATINCRYPT'10: Proceedings of the First international conference on Progress in cryptology: cryptology and information security in Latin AmericaBroadcast (BC) is considered as the most fundamental primitive for fault-tolerant distributed computing and cryptographic protocols. An important and practical variant of BC is Asynchronous BC (known as A-cast). An A-cast protocol enables a specific ...













Comments