Dynamic Optimization of the Latency Throughput Trade-off in Parallel Chain Distributed Ledgers

In the realm of Distributed Ledgers (DLs), a solution to enhancing maximum peak throughput involves establishing multiple independent blockchains that evolve concurrently. However, an excess of parallel chains in comparison to the application's demands can result in increased latency in content delivery, squandered bandwidth, and heightened computational strain on network nodes. Our innovation, Blockmess, introduces a novel parallel chain DL architecture. This solution dynamically adjusts the number of active parallel chains to align with the application's throughput, effectively reducing latency for end-users and seamlessly accommodating shifting application needs. We have implemented Blockmess in a practical cryptocurrency application, validating its efficacy and deploying it in a large scale distributed environment. Notably, Blockmess builds upon and refines existing state-of-the-art scalability solutions, substantially mitigating latency degradation by intelligently adapting the number of active chains in response to the application's load.


INTRODUCTION
Distributed Ledgers (DLs) enable the development of large-scale applications, where nodes collaboratively maintain a synchronized shared log, free from the control of any centralized authority.
An alternative avenue to boost DLs is by reimagining the conventional blockchain structure.This entails substituting the blockchain with an alternative data structure and adapting the block ordering and conflict resolution rules from Bitcoin [25].These alternatives, known as Directed Acyclic Graph (DAG) architectures, offer higher maximum throughput by reducing the block proposal intervals and relaxing the constraints imposed by traditional blockchains, while employing more intricate algorithms for block ordering [36].
In particular, we focus on a subset of DAG architectures known as parallel chain protocols [2,12,41].These solutions instantiate multiple parallel blockchains and enhance the system's performance, whether in terms of throughput [41], block finalization time [2], or transaction settlement time [12].They achieve these improvements by leveraging blocks proposed across multiple instances, each benefiting from the security properties and configurations of well-established blockchain-based DLs [12,19,41].
Parallel chain protocols exhibit three distinctive features that set them apart from other DAG solutions: (1) Their structural similarities to conventional blockchain architectures enable the application of theoretical models originally designed for blockchain systems to the realm of parallel chains [12,19,41], allowing the use of the security proofs and parameterization directives found in [14,19,27]; (2) many established blockchain designs can be seamlessly adapted to incorporate parallel chains without necessitating changes to their core mechanisms [12], allowing the use of these protocols in conjunction with other DL scalabilty solutions such as hybrid-BFT systems [1,28], sharding [21,42], Proof-of-Stake [6,15,18]; and (3) these protocols enforce a linear order on all proposed blocks, ensuring a strict serialization of application operations.This property enables parallel chain protocols to accommodate a broader spectrum of applications, a feat not achieved by other DAG-based solutions [24,[30][31][32].In particular, they allow the use of generic smart contracts [39].
Utilizing parallel chains shifts the bottleneck in the system's throughput from limitations inherent to the DL protocol itself to the available network bandwidth and the computational capabilities of the network's nodes [41].However, indiscriminate use of parallel chains doesn't offer a blanket solution to the scalability issues faced by DLs.While increasing the number of parallel chains boosts the system's throughput, an excessive number of chains worsens the latency and burdens nodes with unnecessary computational demands.Moreover, without an efficient mechanism to coordinate the placement of transactions, the same transactions may end up duplicated across various chains.This issue becomes more pronounced in systems prone to frequent forks and those with a multitude of parallel chains.This redundancy squanders precious bandwidth, valuable space within blocks, and the computational resources of nodes, which must now sift through duplicated content.
During periods of high demand, applications may require a substantial number of parallel chains to meet their throughput demands.However, this approach can worsen the system's performance under normal operating conditions.Consider an online shopping application running on a DL.This application faces the challenge of accommodating occasional surges in demand, especially during peak seasons like year-end holidays.Under typical operating conditions, the system's primary goal should be to minimize response times for client requests, ensuring a smooth and responsive user experience.However, during these high-demand periods, the application must have the required throughput to handle the increased volume of requests effectively.

Contributions
To address the challenge of adapting to changing application requirements, we present Blockmess, a parallel chain protocol that extends existing state-of-the-art solutions (see §2).Blockmess dynamically adjusts the number of chains employed to meet the application's demands.It organizes chains into a hierarchical tree structure, allowing transactions to be efficiently multiplexed and deterministically assigned to a specific chain within this hierarchy.
Our contributions can be summarized as follows: • We provide a detailed explanation of the Blockmess parallel chain protocol.We describe the underlying rationale ( § 3), the deterministic transaction allocation process ( § 3.1), and the adaptive mechanism for managing parallel chains to align with application requirements ( § 3.2).• We have implemented a functional and secure cryptocurrency application using the Blockmess protocol ( § 4).• In a distributed deployment involving 1000 nodes, we conducted an extensive experimental evaluation of Blockmess ( § 5).This evaluation validates our solution and assesses its performance.Our findings support our claims that Blockmess can significantly improve over the baseline throughput and is able to efficiently adapt to changing application requirements.

BACKGROUND
In this section, we provide the necessary background information on parallel chain protocols to comprehend Blockmess.We will begin by elucidating the core mechanisms that underpin the secure utilization of parallel chain protocols and how these can improve systems' throughput ( §2.1).Lastly, we will delve into ordering of operations across parallel chains and how their use correlates with increased latency ( §2.2).This discussion underscores the necessity of a solution like Blockmess.

2-for-1 PoW
In Proof-of-Work (PoW) Distributed Ledgers (DLs), nodes create a block and aim to find a specific nonce value that, when hashed, produces a block hash with the required number of leading zero bits.Once a node identifies such a nonce, it broadcasts the resulting block across the network.If the block is deemed valid by honest nodes, it gets added to the blockchain.However, in parallel chain architectures, allowing nodes to freely choose the chain to which they append their proposed blocks or tying them to a specific chain opens up a vulnerability.An adversary could concentrate all its computational power on a single chain, potentially corrupting the system.To counter this attack vector, parallel chain protocols do not permit nodes to choose the chain for block appending.The solution implemented involves a sortition procedure based on the 2-for-1 PoW mechanism introduced by Garay et al. [14].This mechanism allows for the generation of two independent PoW hash results from a single query to a hash function.Extending this algorithm to parallel chains, it is generalized into an m-for-1 PoW, enabling a nonce query for each parallel chain, all within a single hash function query [12].The input for this hash function includes the block contents for all chains.Once the destination chain is determined through this process, the corresponding block is shared by discarding the content on all chains except the one where it will be appended.
Parallel chain solutions guarantee the system's security by ensuring that each of the parallel chains grows at the same rate as a single blockchain.Consequently, individual chains will have blocks proposed following the same probability distribution as in a single chain, thus preserving the essential properties of single-chain DLs.It follows that the throughput is directly proportional to the number of parallel chains in use.As the count of active parallel chains can theoretically escalate without bounds, the governing constraint on the system's throughput is not protocol-induced throttling, as observed in blockchain-based DLs.Instead, it hinges on the available bandwidth and the computational capabilities of the network nodes.In the extreme scenario, OHIE [41] demonstrated that throughput can be elevated to a point nearing half the total network bandwidth.

Block Finalization
Cryptocurrency applications don't require a strict serialization of transactions; only a mechanism to resolve conflicts when two conflicting transactions are proposed, as highlighted in prior works [12].However, smart contract platforms, which aim to support a wider range of applications, require stronger consistency guarantees [39].Both OHIE [41] and our approach linearize blocks before delivering them to the application.However, this approach can inadvertently create bottlenecks, particularly when certain chains lag behind others due to extended intervals between block proposals.Such discrepancies are frequent due to the substantial variation in the block proposal interval exponential distribution [14,27].
Blockmess uses OHIE's [41] method for linearizing blocks by establishing a order of priority between chains, assigning ranks to blocks and delivering them based on these ranks.Each block receives a rank higher than the highest ranked block its proposer has seen.All valid blocks have ranks higher than those preceding them in their respective chains.When blocks have the same rank, they are delivered following the priority of their respective chains.
This finalization rule allows slower chains to skip ranks, reducing their impact on the overall progress.It also ensures that recently proposed blocks are more likely to be finalized after older ones, reducing variability in finalization times.Nonetheless, even with this mechanism, the presence of an excessive number of parallel chains still affects the delivery latency of blocks.

BLOCKMESS SPECIFICATION
The key feature of Blockmess lies in its ability to dynamically adjust the number of chains in use to align with the requirements of the application.Initially, we must address the challenge of determining the load of a Decentralized Ledger (DL) and ascertaining whether the current number of parallel chains is suitable.
For nodes to agree on the number of chains employed, the decision to alter the number of chains must be completely deterministic and devoid of timing assumptions.Achieving this property necessitates that all nodes rely on the same set of input data to compute this decision.DLs inherently establish a shared state among nodes that is deterministic and unaffected by timing assumptions.
In a traditional blockchain system, this assessment can be made by examining the size of delivered blocks.If a rapid succession of blocks is consistently full, it can be reasonably inferred that the system is struggling to meet the application's demands.Conversely, if most blocks are nearly empty, it suggests that the application's demands are well below the system's capacity.However, for a system employing a parallel chain protocol, estimates derived from observing block sizes can be skewed by content duplication across different chains, especially if adversaries initiate repeated operations.A solution to this issue involves assigning a specific chain for each operation, eliminating content repetition.
A straightforward method for distributing content among chains is to do so uniformly based on its hash or another distinguishing identifier.However, it's crucial to recognize that when the number of parallel chains changes, all operations must be reorganized.To address this inefficiency, we establish a hierarchical order among the chains using a tree structure.Transactions are allocated to chains based on this hierarchy.When new chains are created, the reallocation of transactions is localized to a specific point in the hierarchy, minimizing the amount of content that needs to be redistributed.This hierarchical approach enhances the efficiency of adapting to changing application demands.
At the beginning of the system's operation, there is a single chain, hereafter referred to as the "Origin (Og)" chain.If we assess that Og is incapable of meeting the application's demands, new chains will be created.The evaluation of the adequacy of the current number of chains is based on the ongoing analysis of the blocks placed in these chains, categorized as follows: (1) A block is Overloaded (O) if its size close to the maximum allowed; (2) a block is Underloaded (U) if it contains very little application content; and (3) otherwise, they are Balanced (B).When several overloaded blocks are added to a chain, it will spawn two child chains.Conversely, if a chain accumulates numerous underloaded blocks, it itself will become underloaded and will merge with its two most recently created children, provided these chains are also underloaded.
Each chain determines the finalization of its blocks based on the finalization rules of blockchains (such as the longest chain [25], GHOST [32], or PHANTOM [31]).Once a block is finalized within a blockchain, it is not immediately delivered to the application; instead, it must be placed in a linear order with blocks from other chains.Blockchain-based DLs and Blockmess deliver the application a linearly ordered list of blocks.Therefore, from the application's perspective, a blockchain and Blockmess are interchangeable.
When chains are created, the application content associated with the parent chain is distributed among the three chains.Each chain has the ability to spawn new chains as needed to alleviate its load, and this process can repeat indefinitely.

Content Multiplexer and Content Allocation
The application content is organized within a ternary tree structure known as the "Content Multiplexer (CMux)." The behavior of each node within the CMux is outlined in algorithm 1.Each CMux node has the capacity to store references to application content and is assigned to a specific chain.When blocks need to be placed within a chain, they retrieve the necessary content from a CMux node associated with that chain.Once these blocks are delivered to the application, the references to the carried content are removed from the CMux (lines [16][17][18][19][20]. At the outset of the program's execution, there exists a single CMux node, which is linked with the Og chain.When a chain spawns two new chains, the most recently created CMux node, which is associated with the parent chain, generates three new nodes (lines 25, 28, and 31).These new nodes are assigned to each of the new chains and one remains associated with the parent chain.
The content intended for Blockmess blocks is enclosed in a structure that contains two values, which do not necessarily have to be unique.These values dictate how content is distributed across the CMux nodes.When new nodes are created, the content references from the parent node are transferred to the children based on these wrapping values (lines 26, 29, and 32).A parent node situated at depth  within the CMux tree examines the  ℎ bit of the values.If these bits differ from each other, the content is moved to the child node linked with the parent node's chain (lines 11 and 26).Conversely, if both bits are valued at 0, the content is allocated to one of the remaining children (lines 12 and 29), and if both bits are valued at 1, it is allocated to the other child node (lines 13 and 32).This mechanism ensures that when new chains are spawned, half of the content in the parent chains is transferred to their newly created offspring.
When new application content is introduced to Blockmess, it is initially directed to the root node of the CMux.Following the described allocation rule, the content descends the tree, with each CMux node assessing a different bit of the content's wrapping values (lines 9-13).As a piece of application content descends the tree, all nodes along its path store a reference to it.The system maintains a record of which CMux node each chain should retrieve content from, based on the number of chains in operation.When a chain spawns children, it obtains content from a child node of the CMux node from which it previously retrieved content (mid node).Conversely, when three chains merge, the node from which it retrieves content becomes the parent of the node it previously obtained content from (lines [36][37]. The process of spawning new chains can be resource-intensive due to the distribution of content to new CMux nodes.If these leaf nodes are not deleted even after they are no longer needed because of a merge, the system avoids redistributing content when new chains are spawned.Only after multiple merges will these unused CMux nodes be pruned.23: // Creates the CMux nodes that will be used in the new chains (lft, rgt) and sets a new node the parent chain get content from (mid).The current node may still be in use in a chain if there is a fork where new chains have not yet spawned.

Adaptive Parallel Chains
We assess the workload of a chain by examining the size of its blocks.When we observe several consecutive blocks within a chain that are completely filled, it suggests that the chain is struggling to handle the imposed load, surpassing its throughput capacity.In such cases, the system generates new chains to alleviate this strain.Conversely, if we consistently encounter almost empty blocks in succession within a chain, we can reasonably deduce that this chain is underutilized and, if feasible, should be merged.
Spawn Chains: A chain determines when to spawn new children by continuously monitoring the block sizes as they are delivered to the application.Each chain maintains a sliding window that tracks the number of overloaded finalized blocks.If, upon the arrival of a new block, the count of overloaded blocks surpasses a predefined threshold, the chain initiates the creation of new chains.
A crucial consideration is the selection of a suitable genesis block for the spawned chains.Using the most recent block in the parent chain as the genesis would be susceptible to exploitation by adversaries who could predict the finalization of this block before honest users deliver it, enabling adversaries to start mining and expanding a private fork over the new chains before they are created by honest users.To address this concern, we employ a future block within the parent chain as the genesis for the new chains.Unlike the decision on the number of chains, which relies on finalized blocks, initiating new chains from their genesis does not necessitate finalization.The genesis block of the children chains must be positioned at a fixed distance from the block that triggers the creation of new chains.Once this genesis block is added to the parent chain, all users can begin proposing blocks to the new chains.
However, between the block that initiates the creation of new chains and the genesis block of the new chains, the sizes of the delivered blocks are not considered for measurement.In the event of two genesis blocks being created due to a fork, resulting in conflicting chains, each pair will retrieve content from the same CMux nodes, and blocks in both pairs will be considered valid.Eventually, only one of the genesis blocks in the parent chain will be finalized, leading to the removal of the conflicting pair.
Merge Chains: The process of merging chains occurs between an underloaded chain and its last two spawned children, provided that these are also underloaded.This operation involves stopping the child chains and allowing their parent chain to append blocks with transactions that match those in the former children.Similar to the scenario for spawning new chains, chains maintain a sliding window that tracks the number of underloaded blocks among the most recently delivered blocks.If, with the arrival of a new block, the count of underloaded blocks in the sliding window surpasses a predefined threshold, the chain is considered underloaded and becomes a candidate for merging.The process of merging differs from that of spawning chains because it involves coordinating all three chains.A merge only occurs when the parent chain and both of its latest spawned children are underloaded.In such cases, all blocks in the children chains that were not delivered are discarded.
After the merge operation, the remaining chain does not immediately start collecting samples again, given that subsequent blocks do not contain content from the merged children and are not representative of the load after the merge.The collection of samples from underloaded blocks resumes when a certain number of finalized blocks have been appended after the block that initiated the merge operation.Since any CMux node contains all the content its children have, we are assured that the content in the parent chain following the merge complies with the system's specifications.

Sample Size and Size Threshold Values
The mechanisms for spawning new chains and merging them share a common principle: sampling block sizes to determine whether an operation should be initiated.Each block has a maximum allowable size denoted as S. If a block's size exceeds a certain fraction of S, defined as  ∈ [0, 1], it is classified as "overloaded." Conversely, if a block's size is less a  ∈ [0, 1], it is "underloaded".
When a chain spawns new children, it's expected that the parent chain's load will decrease by half, and each child chain will inherit a quarter of the parent's load.To prevent a chain from merging shortly after spawning children, it's necessary that  is greater than 2 times .However, if  is increased beyond this inequality, it can lead to a better match between the number of chains and the In addition to setting the ideal threshold values, it's essential to consider the impact of potential adversaries in the system.The maximum computational capacity of an adversary, denoted as F, plays a crucial role.If an adversary aims to prevent the creation of new chains, they can propose empty blocks.Therefore, the system must ensure that the ratio of overloaded blocks required to spawn new chains is greater than F. Conversely, if an adversary seeks to create excessive new chains, they could generate a large number of transactions among their identities, not sharing them with other nodes and then propose full blocks with these transactions.To prevent this scenario, the ratio of the threshold for overloaded blocks must be lower than 1 −  .Similar considerations apply to the sampling of underloaded blocks.The threshold ratio for underloaded blocks must be both greater than F and lower than 1 −  to counteract potential adversarial behavior.
The size of the window used for sampling blocks affects the system's response to adversarial actions.A larger window leads to the adversary's block proposal rate aligning more closely with the system's chain quality [14], reducing variance in the number of blocks proposed by the adversary in the samples.However, increasing the window size slows the system's adaptation to changes in the application load.

Expected Load Balancing
While chains can independently spawn new ones at any time, we can model the system's evolution into epochs, assuming a gradual increase in application demand and that the operation allocation identifiers ( § 3.1) are independent and follow a uniform distribution.
In every epoch, the load distribution ensures that each chain bears a fraction of the load equal to 12 ℎ , except for the most recently created chains, where the load is 1 2 ℎ+1 .Table 1 illustrates the evolution of the number of chains as the epochs progress.It indicates that Blockmess does not generate a disproportionately large number of underloaded chains as the epochs progress.

IMPLEMENTATION
We implemented Blockmess, along with all the necessary Distributed Ledger (DL) service plane protocols, and a straightforward cryptocurrency application in Java, utilizing Babel [13].
Content dissemination occurs via broadcast across an overlay network managed by HyParView [23], adapting code from Costa et al. [4] 1 .Application content employs an eager push policy, while blocks use a lazy push approach [22], consistent with established DL content dissemination practices [9].
Content is signed using the secp256r1 elliptic curve [29], and ECDSA signatures are verified both when nodes receive content and when the it is incorporated into blocks.During dissemination, nodes only propagate operations and blocks to their neighbors after validation, mirroring Bitcoin [9,25].Nodes communicate through TCP connections without TLS; however, authentication and integrity checks are executed at the application level.
We assume half the nodes in the network are faulty.In an flat gossip overlay network following the Erdős-Rényi model [11], To achieve full connectivity with a probability of over 99.9% in a large network when half are faulty nodes require an average degree of at least ⌈2( # 2 + 7)⌉ peers [17].For 20 000 active nodes, this results in an average degree of 33.In networks of similar size, Ethereum [39] and Bitcoin [25] nodes maintain an average degree of 47 [10,37] and 117 nodes [26] respectively.
To reduce the energy consumption of our prototype, we implemented a modified Proof of Work (PoW) algorithm.Nodes are required to wait for a specified interval between attempts to find a valid nonce for proposing a block.To correlate a block proposal interval (Δblocks) with puzzle difficulty, we simulated the block proposal distribution, adapting an approach similar to Pass et al. [27].We approximated the time between blocks in PoW, to a geometric distribution, representing the number of attempts in the entire system to solve the cryptographic puzzle.In doing so, we discretized time into slots, each with a duration equal to the time nodes wait between computing nonce values Δ_.The number of leading zeros required for a successful nonce attempt is determined by the following equation: We assume the number of nodes in the system is fixed.However, using the timestamp mechanism employed in Bitcoin to estimate the computational power in the network [25], we could estimate the number of nodes.This process is deterministic, ensuring nodes agree on the puzzle difficulty, irrespective of the order in which blocks are received and number of chains employed [38].
Our complete implementation is available on GitHub2 .

EXPERIMENTAL EVALUATION
We conducted experiments using 1000 application instances for each test scenario.For each instance, we constrained the network bandwidth to 110Mbps, closely matching the average global bandwidth observed in 20213 .To simulate real-world network conditions, we enforced an average network latency of 75ms (equivalent to a RTT of 150ms), following the latency distribution observed between cities spread across the world 4 .The experiments were conducted on machines equipped with an Intel Xeon E5-2630 v3 processor running at 2.40GHz with 32 hardware threads (hyperthreading enabled).Each machine was configured with 128GB of memory.A 10Gbps ethernet connection was utilized.Each of these machines concurrently executed 25 application instances in parallel.
At the start of each test run, every instance was initialized by loading a database with available UTXOs (Unspent Transaction Outputs), from which new transactions are generated by these instances.The average transaction size is 659 bytes.The exact size of each transaction varied depending on the number of UTXOs utilized as inputs.Blocks within the system are finalized upon reaching a depth of 6 blocks within the longest chain.Each block in the system had a maximum size of 20KB, approximately accommodating 30 transactions per block.The median dissemination time required for 20KB blocks to reach 95% of nodes in our network was 747 ms.As a result we set the average time interval between block proposals to 7.5 sec, allowing resilience against adversaries with a presence in the system amounting to 47% [27].On a single chain, the rounding for the computation of leading zeros following equation 1 leads to an actual block proposal interval of 13.1 sec, resulting in a peak throughput of 2.3 tx/s and a minimum finalization time of 78.6 seconds.For comparison, Bitcoin achieves a maximum throughput of 7 tx/s, a finalization time of 600 seconds, and a resistance against an adversary with 49.1% of the computational power in the network [9].The base parameters are surmised in table 2.
We sampled both overloaded and underloaded blocks, recording data from the 15 most recently finalized blocks in each chain.A chain was classified as either overloaded or underloaded based on whether half or more blocks in the sample fell into the respective category.This ratio resists to an adversary with close to 50% of the computational power in the network, as it attempts to unduly merge/spawn chains by proposing empty/full blocks ( § 3.3).The creation of new chains occurred 15 blocks after the originator of the spawning operation.Similarly, the sampling process for merged blocks ceased 15 blocks after the merge operation had taken place.Experimentally we identified these values to provide secure results even when employing several parallel chains.Lastly, a block is overloaded if it contains over 90% of the space allocated for content.Given that blocks rarely occupy 20 KB of content because the size of the transactions therein does not match the 20 KB limit, choosing an overloaded threshold over 90% would lead to blocks not being considered overloaded even though the application load justifies a spawn.By using a threshold lower than 90%, more chains will be in use than the necessary, penalizing the finalization time of blocks.
The parameterization of Blockmess is surmised in table 3.

Throughput and Latency
The graph depicted in Figure 1 illustrates the relationship between maximum system throughput and median block finalization time.
The number of parallel chains utilized at each data point corresponds to the values outlined in Table 1.Our findings indicate that the optimal throughput for our deployment is achieved when employing 171 chains, surpassing what is attained by utilizing 341 chains.This phenomenon arises due to the saturation of computational resources within the machines employed.Further examination of the finalization times aligns with our expectations, demonstrating that as the number of parallel chains increases, so does the finalization time.However, it's crucial to highlight that the relative increase in finalization time is significantly outweighed by the improvements in the system's throughput.When utilizing 171 parallel chains, although the median finalization time increases by a factor of 5.1, the throughput experiences a substantial boost, increasing by a factor of 72.6.

Chain Growth in Extreme Environments
Blockmess dynamically instantiates parallel chains, adapting to the specific requirements of an application.We evaluated the effectiveness of this process in meeting the diverse demands of an application.The results of our experiment, illustrated in Fig. 2, involve measuring the increase in the number of chains under highload conditions (represented by dashed lines) and their decrease under low-load conditions (represented by continuous line).
Under high load, Blockmess can scale from a single chain to 341 parallel chains in approximately 2172 seconds.While it's improbable that a real-world system would experience such a sudden shift in load, this worst-case scenario ensures that actual performance predictions would not be worse than the experimental results.
The reduction in the number of parallel chains is both slower and less regular than the increase.It takes about 3250 seconds from the start of the execution until a single chain is used, a significant increase compared to the results obtained under high load (dashed line).The extended plateau at the beginning before the number of Figure 1: Throughput/Latency graph as the number of chains varies between 1 and 341 (following table 1).Throughput increases 72.6 times between using a single chain (leftmost point) and 171 chains (rightmost point).Saturation of computational resources leads to worse performance in both throughput and latency between 171 and 341 chains.1) under heavy load (dashed line) and no load (continuous line).In both situations the system can adapt to its environment in less than an hour.
chains begins to decrease is due to longer intervals between block finalizations and the required coordination across several chains.
In both experiments, the system successfully adapted to the application's demands in less than one hour.This adaptability is particularly valuable in applications designed to operate over several years.Even in our worst-case experiment, Blockmess demonstrated its ability to quickly respond to changing requirements.

Attaining Equilibrium in Number of Chains
In this analysis, we intentionally subject Blockmess to a specific application load to observe how it behaves under typical deployment conditions.Nodes are configured to operate under an application demand of 20 tx/s.This experiment provides insights into Blockmess's behavior in a representative deployment scenario.
In Fig. 3, we illustrate a deployment under two distinct underloaded threshold values, specifically, 45% (dashed) and 60% (continuous).Each experimental run spanned a duration of 4500 seconds.3).With 60% threshold (continuous line) the equilibrium is less stable but less chains are employed when compared with the 45% threshold experiment (dashed line).In both experiments the number of chains increases suddenly to process straggling operations and then stabilizes in a dynamic equilibrium.
In both cases, we observe a similar initial trend where the number of parallel chains rapidly increases, significantly boosting system throughput.This surge is a response to the initial low throughput, which leads to transaction accumulation and, consequently, a higher demand for parallel chains.However, after this initial peak, the number of chains gradually decreases as it adapts to the application's demands, approaching an equilibrium point.
When employing an underloaded threshold of 45%, we notice that the number of parallel chains peaks at nearly 80. Subsequently, it decreases steadily until stabilizing at approximately 40 parallel chains.This behavior aligns with the expectation that when chains merge, the parent chain shoulders twice the load of its children.Therefore, after a merge, it's unlikely that the resulting chain will spawn new children in the short term.
In contrast, using a higher underloaded threshold of 60%, results in a different pattern.Initially, the number of parallel chains increases to compensate for the backlog of transactions.However, after reaching its peak, the number of chains decreases more rapidly compared to the 45% threshold.It doesn't stabilize at a fixed number but hovers around 30.In the context of throughput results (see Fig. 1), it's important to note that even when the system stabilizes after the peak, the number of parallel chains used remains higher than the theoretically ideal number.For instance, achieving a throughput of 20 transactions per second (20 tx/s) would theoretically require 11 parallel chains with consistently full blocks, resulting in an average block finalization time of approximately 110 seconds, while with 30 chains, this average extends to around 153 seconds.Consequently, with a 60% underloaded threshold, the finalization time is roughly 1.4 times longer than it would be with a fixed 11 chains.However, if a static number of 171 parallel chains were used to maximize throughput, the finalization time would be 3.6 times worse than using the optimal number of chains.These findings underscore the considerable advantages of Blockmess's adaptive approach compared to fixed-chain solutions.

RELATED WORK
The field of DL scalability introduced various mechanisms to enhance the performance of these systems.Several examples include: (1) Fine-tuning applications based on measurements of their behavior in use [5,9]; (2) delegating the ordering of blocks to a statistically relevant committee of nodes using a deterministic BFT protocol [1,8,20,21,28]; (3) sharding state and block ordering into disjoint sets of nodes [21,35,42]; and (4) utilizing alternative DAG structures to replace the traditional blockchain [24,30,32,36].
No solutions can indefinitely improve DLs performance and all have drawbacks.In (2) the fairness and censorship resistance of the system is reduced given the smaller pool of block proposers, and in (3) transactions across sharded states are slower and the compromise of a single shard can jeopardize the safety of the system as a whole.Within these improvements, we highlight parallel chain protocols, which employ multiple chains to enhance DL performance, albeit not always following the same approach as Blockmess.
For instance, Fitzi et al. [12] concentrate on improving throughput in cryptocurrency applications by delivering transactions without compromising safety from double-spend attacks.This approach does not serialize the delivery of transactions, making it unsuitable for supporting generic applications over a DL.
OHIE [41] is perhaps the most similar parallel chain protocol to Blockmess.It also uses multiple parallel chains to enhance throughput while serializing the delivery of operations.One noteworthy innovation in OHIE is the introduction of a mechanism to accelerate block ordering, a mechanism we applied in our implementation.OHIE's experimental evaluation showed improved throughput until reaching 50% of the available bandwidth, after which it deteriorates.
In contrast, Prism [2] stands out as a parallel chain protocol designed to boost both throughput and reduce latency when compared to conventional DL designs.Prism capitalizes on the concept of parallel chains to expedite the finalization time of blocks and, notably, separates transactions from blocks to enhance throughput, where it performs comparably to OHIE and Blockmess.All these protocols efficiently utilize the available bandwidth and processing power of the nodes within the system.However, Prism's distinctive feature lies in its use of parallel chains to lower the finalization time of blocks.It's important to note that the advantages of Prism are tempered when the system faces double-spend attacks, something that does not happen in Blockmess.
All these parallel chain protocols operate under the assumption of a fixed difficulty level for their PoW challenges.Adjusting the difficulty of these puzzles is a complex task, especially when dealing with multiple chains concurrently.Wang et al. [38] delve into an indepth analysis of these protocols and offer a solution for adapting the PoW difficulty.The valuable insights presented in this paper can be applied to enhance the Blockmess protocol.
Recent research focus is being placed on the development of efficient BFT protocols [7,16,33,34,40] to replace the classical PBFT [3].The use of these protocols is very significant in permissioned Distributed Ledgers, as opposed to permissionless DLs which is the focus of this work.Nevertheless, these protocols can be adapted for use in permissionless DLs by replacing PBFT in hybrid consensus systems [1,28].
The concepts introduced in this work can be effectively combined with other scalability solutions, particularly with the mentioned BFT protocols.As an illustration, ISS [34] instantiates several BFT protocols in parallel and orders the operations linearized by each instance in a manner similar to parallel-chain protocols, incurring in the process a penalty in latency.Leveraging Blockmess' application load estimation, ISS could be modified to dynamically adjust the number of BFT instances in use.

CONCLUSION
Blockmess enhances system throughput while concurrently addressing the latency challenges commonly associated with DL architectures.While the concept of utilizing parallel chains to enhance throughput is not groundbreaking [41], previous solutions did not adjust the number of parallel chains to match the demands of an application, resulting in prolonged block finalization times and inefficient bandwidth utilization.Blockmess dynamically adjusts the number of chains through a deterministic process.It allocates application content to specific chains and assesses their load by analyzing the size of delivered blocks.New chains are created if several proposed blocks in sequence are excessively large, and chains are merged if the blocks are relatively small.
Our observations revealed that our implementation significantly enhanced system throughput when compared with a single chain.Furthermore, we noted that Blockmess exhibits exceptional responsiveness to fluctuations in application load, demonstrating the ability to swiftly adapt the number of chains in operation.When a system experiences significant variations in application load, Blockmess efficiently improves the latency in the delivery of operations by approximating the number of chains in use to align with the ideal number of chains.

Figure 2 :
Figure2: Evolution of number of chains between 1 and 341 chains (following table1) under heavy load (dashed line) and no load (continuous line).In both situations the system can adapt to its environment in less than an hour.

Figure 3 :
Figure3: Equilibrium number of chains for a 20 tx/s load varying the underloaded block thresholds ( § 3.3).With 60% threshold (continuous line) the equilibrium is less stable but less chains are employed when compared with the 45% threshold experiment (dashed line).In both experiments the number of chains increases suddenly to process straggling operations and then stabilizes in a dynamic equilibrium.

Table 1 :
Expected number of chains by epoch.In each epoch the number of chains nearly doubles while the number of chains under high load and low load are always very similar.

Table 2 :
[27]meters of the baseline Distributed Ledger components.With 1000 nodes and maximum block sizes of 20 KB, the average block proposal interval was set to 13.1 seconds, ensuring resiliency against a strong adversary (≈ 47%)[27].Using a single chain this leads to a maximum throughput of 2.3 tx/s and a finalization time of 78.6 seconds.

Table 3 :
Blockmess execution parameters.Block sizes are sampled using a sliding window of 15 blocks.A spawn or merge is triggered if half the sampled blocks are overloaded or underloaded, respectively, thus withstanding a strong adversary (≈ 50%).A block is considered overloaded if it reaches 90% of the maximum allowed size.The underloaded threshold varies based on the specific test being conducted ( § 5.3).