FISCO-BCOS: An Enterprise-Grade Permissioned Blockchain System with High-Performance

Enterprise-grade permissioned blockchain systems provide a promising infrastructure for data sharing and cooperation between different companies. However, performance bottlenecks seriously hinder the adoption of these systems in many industrial applications that process complex business logic and huge transaction volumes. Our research identifies two key factors that limit the system performance: 1) At the block level, the serial dependency of inter-block processing severely limits the system throughput. A new block must wait for the completion of all previous blocks. 2) At the transaction level, the lack of efficient intra-block transactions concurrency makes it difficult to achieve high performance, especially when dealing with multiple CPU-heavy contracts which are commonly used in industrial scenarios. In this paper, we present FISCO-BCOS, an enterprise-grade permissioned blockchain system with high performance. To overcome serial limitations and fully utilize machine resources, FISCO-BCOS introduces Block Level Pipelining (BLP) workflow to process blocks in a pipeline manner. In addition, a scheduling algorithm Deterministic Multi-Contract (DMC) is designed to efficiently execute transactions in parallel. Under BLP and DMC, FISCO-BCOS achieves inter-block and intra-block paralleling to meet high-performance requirements in industrial application scenarios. We conducted experiments on two popular test workloads and compared FISCO-BCOS with state-of-the-art platforms in academia and industry such as BIDL and Hyperledger Fabric (HLF). The result shows that FISCO-BCOS achieves 7.4 times and 28.4 times the throughput of BIDL and HLF, respectively, with half the latency of them. BCOS has already been used in over 300 different large-scale industrial scenarios and has become one of the most popular permissioned blockchains.


INTRODUCTION
As a distributed ledger shared by many untrusted participants, blockchain has attracted increasing attention around the world.There are two types of blockchain systems: permissionless (i.e.public blockchain) and permissioned (i.e.consortium blockchain).Bitcoin [38] and Ethereum [22] are two typical permissionless blockchain systems that provide online payment services for individual users.They are built on a decentralized, peer-to-peer network where participants are free to join and leave without trusting each other.Unlike permissionless blockchains, permissioned blockchain systems are designed to enhance mutual trust and improve the efficiency of collaboration between multiple parties in industrial scenarios [53].Several permissioned blockchain platforms have been built to support the growing number of such requirements, e.g., HyperLedger Fabric (HLF) [3], Quorum [14].
In permissioned blockchains, participants need to be authenticated to join the network.Generally, permissioned blockchains have a higher performance than permissionless ones because they have a limited number of participants and run more efficient byzantine fault tolerant (BFT) algorithms [12,13,54,62] as consensus protocols.With the help of the permissioned blockchain, organizations with common goals can revamp transparency, accountability, and workflow together to significantly save temporal and monetary costs of mutual cooperation [9].In recent years, permissioned blockchains have been widely used in finance, trade, logistics, and other industrial scenarios [17,30,33,52].
Through the experiences of applying permissioned blockchains to industrial scenarios, we discover that high performance is the basic requirement for permissioned blockchains.For instance, some financial systems are desired to handle more than 65,000 transactions per second (tps) [55] along with sub-second latency [40].However, the performance bottlenecks significantly hinder the leverage of current permissioned blockchains in these industrial scenarios.HLF [3] and Quorum [14] are the two most popular enterprise-grade permissioned blockchain systems worldwide, while the throughput and latency of these systems are about 3,500 tps, 600 ms [3] and 2,100 tps, 2s [8] for SmallBank workload [19] respectively.In recent years, researchers in academia have proposed some solutions [28,44,[46][47][48]61] to improve the performance of permissioned blockchains.For example, FastFabric [28] re-architected HLF and increased its throughput to 20,000 tps, and BIDL [46] achieved 41,000 tps in a datacenter network through parallel execution and consensus.These works make a progress in permissioned blockchain performance optimization.But many real industrial applications in payments, smart city governance, etc. require much higher performance (e.g.Visa's 65,000 tps [55]), the current solutions still struggle to meet these scenarios.
By conducting a large number of experiments and taking indepth measurements, we observe two key issues that seriously affect the performance of existing permissioned blockchain systems.
• For inter-block handling, the serial dependency of interblock processing severely limits the system throughput.The processing of a block usually goes through multiple phases, such as , , , , and .The order of these phases varies from system to system.However, in traditional permissioned blockchains, such as Quorum [14], Diem [5], etc, the creation of a new block must wait for all the previous blocks to complete their entire flow.In this way, the system can only process one block at a time, resulting in deficient performance.• For intra-block processing, transaction execution is the most time-consuming task but effective concurrency mechanisms are lacking in current systems.In real industrial scenarios, there are much more complex contracts that require more resources in the execution phase.Early HLF [3] tries to solve this problem by proposing a parallel mechanism to execute transactions in the endorsing phase, however, it leads to a high rate of transactions aborts in the validation phase when transactions access conflicting resources.Some recent approaches [1,7,18,64] introduce static analysis and speculative execution of contracts to process intra-block transactions concurrently.Unfortunately, it is very difficult to deterministically infer all dependencies across smart contracts.Moreover, the waiting for dependencies construction significantly constrains the parallelism of the overall system as well.
To address the above challenges, we present FISCO-BCOS, an enterprise-grade permissioned blockchain system with high performance.FISCO-BCOS designs a Block Level Pipelining (BLP) workflow to break the serial dependency between blocks and processes them in a pipeline with four stages.Some blocks are processed at one stage, while other blocks are processed simultaneously at different stages.In addition, Deteministic Multi-Contract (DMC) is introduced to execute transactions concurrently within a block.Transactions are dispatched into multiple shards and processed in parallel by a group of executors.As a result, FISCO-BCOS achieves inter-block and intra-block parallelism so that can provide high performance in enterprise scenarios.We implemented FISCO-BCOS and evaluated its performance compared to BIDL [46], the state-ofthe-art work in academia, and HLF [3], the most commonly used system in the industry.The result shows that FISCO-BCOS achieves 7.4 times and 28.4 times the throughput of BIDL and HLF, respectively, with half the latency of them.FISCO-BCOS has been widely used in more than 300 real-world business applications and has become one of the most popular permissioned blockchains.Taking the Mutual Health Code Recognition System as an example, built on top of FISCO-BCOS, it has supported over 300 million cross-border travels during its service period.More application cases will be shown in Sec. 6.
In summary, the main contributions of this paper are as follows: (1) We propose an inter-block processing approach, Block Level Pipelining (BLP), that breaks the serial dependency of block processing and handles blocks in a pipelined manner throughout their lifetime.(2) We present an intra-block scheduling algorithm, Deterministic Multi-Contract (DMC), which dispatches transactions into several shards and leverages a set of executors to process transactions in each shard concurrently.(3) We have developed a full-fledged high-performance permissioned blockchain system, FISCO-BCOS, and made it available at Github 1 .As an open-source enterprise-grade permissioned blockchain systems, FISCO-BCOS is widely used in various industrial scenarios.

BACKGROUND
We briefly introduce the current permissioned blockchain workflow and the techniques of smart contract concurrency in this section.

Permissioned Blockchain Workflow.
Same as permissionless blockchains such as Bitcoin [38] and Ethereum [22], permissioned blockchains receive transactions from clients, go through a workflow to update the ledger, and ensure consistency among replicas under a trustless environment.According to their workflows, permissioned blockchains can be generally divided into two different categories.The first is the  →  →   (EOV) workflow.Typical systems with this workflow are HLF [3] and its optimizations [28,47,48].In this workflow, execution nodes (endorsers) execute the transactions concurrently by the optimistic concurrency control (OCC) mechanism, then ordering nodes (orderers) make a consensus on the sequence of the transactions, batch them into a block, and finally return the block to execution nodes for verification and commit.One of the disadvantages of this workflow is the high rate of transaction aborts when transactions access resources in conflict, thus hard to achieve high performance in real-world applications.The second is the  →  (OE) workflow.Systems like Quorum [14] leverage such workflow to process transactions.Under this mechanism, nodes first make an agreement on the orders of transactions.
After that, each node executes transactions based on the established sequences.As a result, transactions tend not to be aborted because of contention.Unfortunately, due to the serial process of blocks, these systems generally do not provide sufficient performance for many real-world applications.

Smart Contract Concurrency.
There are many smart contracts programming languages in permissioned blockchain systems, e.g., Solidity [23], Move [35], and ink! [43].Among them, Solidity, introduced by Ethereum [22] platform, run in the Ethereum Virtual Machine (EVM) [21], is the most wildly used one.Smart contracts programmed with Solidity can be thought of as a collection of self-defined states and functions that manipulate the states.Each contract has a separate storage space and communicates with other contracts through function calls.Once the contracts are deployed in the blockchain, we can access them by sending transactions that invoke the public functions of these contracts, through which the ledger is updated.The execution of a transaction involves invocations of one or more contracts and is generally the most time-consuming phase of the entire block-processing workflow.
Inspired by the design of database systems, many studies [1,4,7,10,18,31,64] have been proposed to add concurrency to smart contracts.In these approaches, transactions within a block are handled as follows: i) the miner generates a transaction dependency graph (usually a directed acyclic graph) by static analysis of contracts or speculative execution of transactions; ii) the miner sends transactions and the graph to validators; iii) validators execute transactions in parallel according to this graph.These efforts have greatly improved the efficiency of transaction execution, but two limitations remain.On the miner side, static analysis of contracts is challenging when dynamic cross-contract invocations occur since the contract access patterns are not predictable.Moreover, speculative execution may lead to high rollback and low parallelism when multiple complex contracts call each other.From the perspective of the validator, the performance is limited by waiting to receive the block and dependency graph.
To sum up, we find that the inter-block processing workflow and intra-block transactions concurrency used by current permissioned blockchain systems face various performance limitations.To deliver an enterprise-grade permissioned blockchain, we must design a more efficient workflow and concurrency mechanism to achieve high performance.

SYSTEM OVERVIEW
FISCO-BCOS is designed to be an enterprise-grade permissioned blockchain with high performance to support industrial scenarios.In this section, we provide an overview of our system.Threat Model.FISCO-BCOS is composed of a group of participants with two different roles: node and client.All participants are identified and managed by explicit secret/public key pair, which is generally used by permissioned blockchains [3,14,46].Any two nodes ⟨  ,   ⟩ are connected by point-to-point bidirectional communication channels, constructing a full-meshed P2P network.FISCO-BCOS adopts the BFT [13,54] protocol to achieve consensus, thus the network contains 3 + 1 nodes, where at most  nodes may be malicious.A client   signs a transaction  with its secret key   and submits it to the network.Upon receiving the signed transaction, node   checks the signature to ensure the transaction is valid then pushes it into the transaction pool and propagates to the other nodes.Afterward, transactions are processed through a workflow shown in Figure 1.
Workflow.To prevent transactions from being aborted due to conflicting access to resources, FISCO-BCOS puts transactions ordering prior to execution.In addition, FISCO-BCOS divides the validation phase into two phases, checkpoint and commit, where the checkpoint phase is network-intensive and the commit phase is io-intensive.In this way, both network and IO resources are fully utilized.Therefore, the workflow of FISCO-BCOS consists of four stages that cover the entire lifetime of one block.Here we describe the four stages of this workflow, and we will give its optimization later.
(1) Ordering.This stage is mainly responsible for transaction ordering.Nodes batch transactions into a  block   ℎ (ℎ denotes block number), then run a BFT consensus protocol to agree on the sequence of transactions and generate a  block   ℎ .The consensus protocol is pluggable so that different varieties of BFT algorithms (e.g., PBFT [13], Hotstuff [62]) adopted in permissioned blockchains can be all applied in FISCO-BCOS as well.
(2) Execution.This stage executes transactions of the  block and yields a new state of the ledger.For instance, the execution of   ℎ creates new state  ℎ from the previous state  ℎ−1 (Figure 1).
(3) Checkpoint.FISCO-BCOS leverages a lightweight protocol to verify that the execution results are consistent across nodes.Concretely, each node broadcasts its execution result   ℎ and prepares an  block   ℎ to commit after collecting enough unanimous result.However, if a node does not collect enough unanimous   ℎ before timeout, the exception handling mechanism would be triggered.With this mechanism, FISCO-BCOS drops the illegal transaction and returns to stage 2 for re-execution.Sec.4.3 illustrates the mechanism in detail.
(4) Commit.Upon receiving the  block   ℎ , FISCO-BCOS starts a process to write the block, after which the  block   ℎ can be retrieved from clients.Optimization.Based on the above workflow, each stage performs tasks on blocks that are in different lifetimes, and thus different stages rely on different resources.In particular, the ordering and checkpoint stage heavily depend on network resources due to the communication between nodes, while the execution stage relies strongly on CPU resources for computation; and the commit stage is disk I/O intensive respectively.To improve the performance of block processing, we propose two significant mechanisms.
Firstly, we design an inter-block processing mechanism, Block Level Pipelining (BLP), that processes multiple blocks in a pipelined Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.manner throughout their lifetime.With this approach, some blocks are processed at one stage, while others are processed at different stages.Secondly, since the execution stage is generally the most time-consuming in the workflow, to improve intra-block handling efficiency, FISCO-BCOS designs a scheduling algorithm, Deterministic Multi-Contract (DMC), to execute transactions in parallel.DMC splits intra-block transactions into several shards and dispatches them to a group of executors for parallel processing.In addition, a mechanism to detect and resolve conflicts is designed for contract calls across shards.
In this way, FISCO-BCOS provides not only inter-block improvements that break the serial dependency of block processing but also intra-block enhancements that efficiently handle complex transactions, greatly improving the overall system's performance.We will illustrate the details of these mechanisms in the next two sections.

BLOCK LEVEL PIPELINING
According to the previous introduction, in order to make full use of network/cpu/io resources and enhance the entire system throughput, FISCO-BCOS introduces Block Level Pipelining(BLP) to optimize the workflow and process several blocks simultaneously.Furthermore, when applying BLP workflow to FISCO-BCOS, three key problems need to be addressed: (1) How to control the pipeline to balance the use of various resources?(2) How to maintain inmemory states so that FISCO-BCOS can execute blocks based on previously uncommitted blocks?(3) How to ensure that all nodes submit the same result and no fork 2 occurs.We describe them in the following sub-sections.

Pipeline Control
In our workflow, two consecutive stages perform a producerconsumer pair, where the former stage produces blocks carrying new context and transforms them into the next stage.Balancing block production and consumption between stages is crucial so that the pipeline can work efficiently.Define   as the number of blocks in stage , then the length of the pipeline is  = 4 =1   .The ideal pipeline status is always   = 1 for all  (i.e. one block per 2 There are two blocks with the same number but different results are committed by two nodes stage).However, due to the imbalance of network, computation, and storage capacity, it is difficult to achieve the ideal state in a practical environment.Instead of struggling to reach the ideal status, we introduce a sliding window algorithm to control the pipeline dynamically.The algorithm adjusts the block-generating speed to balance the ordering and execution stages, by which makes full use of network and computation resources.Specifically, according to the number of  blocks in stage 2 (i.e. 2 , which are being or waiting to be executed), the algorithm adjusts the number of  blocks in stage 1 (i.e. 1 ).Let  1 denote the sliding window size, and  denotes the threshold of  1 , the algorithm runs three situations as shown in Algorithm 1.In the first situation, where  2 equals 0, it means that the execution is faster than the ordering, resulting in an idle execution phase, so we need to propose more blocks by doubling  1 .In the second situation, where  2 is less than  (but greater than 0), we reduce the proposing growth rate to balance ordering and execution.In the last situation, where  2 is greater than , it means that the execution phase is overloaded, so we suspend proposing, then reset  1 to 0.
In addition, to balance the execution and the last two stages, FISCO-BCOS supervises the number of  and  blocks with  ′ = 4 =3   to control the production capability of the execution stage.Block execution is blocked when  ′ exceeds the threshold  ′ .And  ′ is related to the memory consumption so that it can be configured according to the machine's memory.

In-memory States
This section discusses the cache strategy used by FISCO-BCOS to manage in-memory states during block execution.Under BLP workflow, a block's execution is based on both previously committed and uncommitted blocks' states, therefore FISCO-BCOS employs a two-layer cache system as shown in Figure 2. The first layer (L1) is dedicated to uncommitted blocks (which can be rolled back) and consists of linked in-memory blocks, each storing the states modified by its respective block.The second layer (L2) is for committed blocks (which are persistent) and uses a traditional LRU-based cache.This approach enables efficient management of in-memory states during block execution in FISCO-BCOS.As the read flow in Figure 2, when a block wants to read a key, it first searches the L1 cache for that key.If the corresponding keyvalue pair is not found in L1, FISCO-BCOS searches the L2 cache, followed by persistent storage, until it finds the requested key-value pair.Regarding the writing process (write flow in Figure 2), FISCO-BCOS stores the updated keys of a block in a separate new cache during execution.After completing the execution of the entire block, these updated keys are moved to the L1 cache, made readonly.After the checkpoint phase is approved, the block is committed to L2 cache and storage (or rolled back if it is not approved).Note that we only store the states that a block writes and ignore the ones it reads (copy-on-write mechanism [51]) in L1 for memory optimization.Furthermore, we design  for contract data storage to reduce disk I/O operations.A contract's data is managed by several contiguous pages, each of which stores a set of key-value.

Checkpoint Protocol
To ensure that all nodes submit the same result after executing a block without forking, FISCO-BCOS uses a checkpointing protocol to achieve consistency in execution results across nodes.Unlike the traditional BFT [13,54] and CFT [42] protocols, which require three and two rounds of broadcasts, respectively, this protocol requires only one round of communication and is therefore very lightweight and efficient.Meanwhile, the protocol still maintains safety and liveness.
There is no single primary node in this protocol.Each node broadcasts its  block   ℎ to other nodes, and reaches consistency after receiving  unanimous results before timeout.For safety, we set  = ⌈1.5+ 1⌉ (i.e.half of the total nodes 3 + 1), thus it is impossible for two blocks with the same number but different results to be approved simultaneously.
In terms of liveness, FISCO-BCOS introduces exception handling when the nodes cannot reach a consensus before the timeout.This occurs when there are non-deterministic transactions in the block that result in diverse state updates.Note that non-deterministic transactions rarely occur and can only be forged by malicious nodes or due to static analysis errors in smart contracts.To address this situation, each node in FISCO-BCOS records the execution results for each transaction in the current block.Meanwhile, we leverage the Algorithm 2 to find the non-deterministic transaction, drop it, and re-execute the block.As the algorithm shows, FISCO-BCOS leverages the invalid block as well as all the consensus nodes to handle the exception.First, FISCO-BCOS obtains all the transactions from the invalid  block   ℎ (Line 2).For each transaction, FISCO-BCOS calculates the number of nodes that give out different results from the one recorded in the block (Lines 5-9).Then, it checks if there are more than  nodes (which means there is at least one honest node) with different transaction results (Line 10).If yes, the transaction is illegal, leading to consensus failure in the checkpoint phase.FISCO-BCOS then drops the illegal transaction from  block   ℎ and reexecutes the block (Lines 11-12).Finally, the algorithm returns the new block result   ℎ (Line 13).

DETERMINISTIC MULTI-CONTRACT
As previously mentioned, FISCO-BCOS utilizes the BLP workflow to achieve inter-block parallelism and process multiple blocks effectively.However, the execution stage can be time-consuming, particularly in real-world business scenarios where many complex contracts are deployed in a blockchain system.This can result in execution becoming a bottleneck in the workflow.Therefore, an efficient intra-block transaction processing mechanism should be introduced to further improve the performance of FISCO-BCOS.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
In FISCO-BCOS's workflow, we put the ordering stage ahead of the execution, avoiding transaction aborts.Apart of this, in the execution stage, all nodes can start executing intra-block transactions simultaneously since the block has been agreed in the previous stage, eliminating the waiting between the miner and validators occurring in prior approaches [4,10,18,64].Furthermore, we design a parallel mechanism named Deterministic Multi-Contract (DMC) which dispatches transactions of intra-block into several shards and enables multiple executors to process transactions in parallel.As an example shown in Figure 3, during the DMC execution phase, transactions that invoke different contracts (e.g.,  1 and  4 call /1/1 and /2/3 respectively) are first dispatched into three shards, each of which is assigned to a specific executor.Three executors concurrently process transactions in their respective isolated contexts.With the DMC mechanism, FISCO-BCOS achieves a high degree of parallelism in execution, therefore greatly improving the efficiency of the intra-block process.However, since all nodes process their intra-block transactions with DMC independently and must produce a consistent result, in addition to ensuring the efficiency of DMC, it must also ensure that it is deterministic.To this end, three problems need to be solved: (1) designing an efficient and deterministic scheduling algorithm for parallel execution of multiple transactions, (2) handling the case where parallel transactions have conflicting resource accesses, and (3) guaranteeing atomicity of commits among all executors.In the following subsections, we will show our solutions in detail.

Parallel Scheduling
The execution of a transaction involves invocations of one or more contracts.Each contract has a separate storage space and communicates with other contracts through function .Therefore we can observe that there are two types of transactions in FISCO-BCOS, one that only invokes contracts within a shard and the other that invokes contracts across shards.To achieve deterministic results, we use frequent global barriers to sequence the  between shards.In particular, DMC schedules transactions by the contract address they invoke, and runs for several rounds until all transactions have been fully executed.In each round, all transactions with different contract addresses are assigned to a group of shards and processed by their respective executors.Through several rounds, all transactions in the block are completely processed, and the execution ends.
To design an efficient and deterministic scheduling mechanism, there are two concerns we need to address: (1) how to design an algorithm to parallelize transactions that call contracts within a shard only, and (2) how to design a scheduling method to parallelize transactions invoking contracts across shards.
For convenient expression, we use a pair ⟨  , 1⟩ to represent that transaction   invokes contract 1.While ⟨  , 1 → 2⟩ means the transaction   relies on the contract 1 and 1 calls another contract 2.
Parallel scheduling within a single shard.In each round, the executor analyzes transactions assigned to this shard and obtains resources they access, based on which, it constructs a directed acyclic graph (DAG) according to the resources dependencies among transactions.Prior works [26,29,45] have provided many effective strategies for transaction dependencies analysis.With the DAG, the executor executes these transactions without resource conflicts in a deterministic parallel manner, significantly improving the intra-block execution.As shown in Figure 4, three transactions ⟨ 1 , /1/1⟩, ⟨ 2 , /1/1⟩, and ⟨ 3 , /1/1⟩ are concurrently executed following the DAG guidance in 1.
Parallel scheduling across shards.For these transactions, each executor processes transactions independently until a crossshard call occurs.At this time, the execution of this contract (caller) is interrupted and waits until the execution of another contract in called shard (callee) finishes.As shown in Figure 4, ⟨ 5 , /3/5 → /2/3⟩ is interrupted in round 1, due to contract /3/5 makes a  of contract /2/3, and therefore DMC launches an interrupted transaction to be executed in round 2. After the external call finishes, the executor 2 will recover the context restored, changing the contract address back to /3/5, and continue executing contract /3/5 in round 3. Note that we use a similar  − mechanism like interruption processing in the operating system, providing a high degree of parallelism in contract calls across shards.

Conflict Resolution
The second issue that the DMC mechanism may face is conflict resolution.There are two types of conflicts in FISCO-BCOS transaction processing: (1) access conflict; (2) deadlock.The former happens when two transactions, ⟨  , /1/1 → /3/3⟩ and ⟨  , /2/2 → /3/3⟩, are executed in parallel by two different executors.In the first round, each executor executes contracts /1/1 and /2/2 in its shard, respectively.However, in the following round, they both try to access contact /3/3 across shards which leads to conflict.As for the latter case, it happens when a transaction ⟨  , /1/1 → /2/2⟩ and another one ⟨  , /2/2 → /1/1⟩ are processed by two executors in parallel.In this case, two executors wait for each other to finish their task first in round 2, resulting in a deadlock.
To detect and resolve these conflict situations, FISCO-BCOS adds a lock on every key of states that is read or written by a transaction.In that way, when there is an access conflict, the executor can easily find that the key needed by the transaction ⟨  , /2/2 → /3/3⟩ has already been locked by the transaction ⟨  , /1/1 → /3/3⟩).In common pessimistic concurrency control (PCC) mechanisms, either of two transactions can release their locks to resolve access conflicts.However, this approach cannot be used in our system because we must provide a deterministic conflict resolution.We use a global barrier to sort these transactions when they invoke a contract across shards.Conflicting transactions (accessing the same contract) are queued by the transaction index in the block (if  < ,   comes before   ) and wait to be scheduled in order.
In terms of the deadlock situation, each transaction context uses two lists to record the locking situation of state keys.In particular, a __ records the keys that are locked by the current context.And a ___ stores the locked keys that are needed by the current context.For each round of scheduling, the scheduler constructs a dependency graph from __ and ___, by which it can infer a deadlock if there is a circle in the graph.FISCO-BCOS caches the transaction execution result of each round with < , ,  >, where ,  and  denote transaction, round, and context, respectively.To resolve the deadlock, FISCO-BCOS reverts the transaction that has a larger index to the context of the last round and pushes it into a __.After all other transactions are executed, FISCO-BCOS pops transactions from __ to resume the execution.

Commit Atomicity
The last issue is to guarantee atomicity during the commit stage.Because DMC utilizes multiple executors with independent contexts to process intra-block transactions, it must ensure that a block's execution results from these executors are either all committed or not committed.To address this problem, FISCO-BCOS proposed a two-phase commit algorithm.
Algorithm 3 shows the process of the commit.The algorithm takes in the DMC scheduler, all the executors, and the current block header.First, the scheduler gets a primary key from the storage (Line 2).After that, the scheduler pre-writes the block header guided by the primary key (Line 3).The scheduler then broadcasts a 'NOTIFY' message to all executors (Line 4).After receiving the message,  [5][6][7][8][9][10][11].If some exceptions occur in this process, the scheduler will rollback the commit and try to re-commit later.If an executor successfully stores its results, it will then notify the scheduler that it has accomplished its work.Finally, the scheduler will commit to the storage after receiving the notifications from all executors (Line 12).In this way, the DMC mechanism ensures the commit atomicity.Meanwhile, this algorithm also reduces the overhead of the commit stage since multiple executors commit in parallel (Lines 5-9).

IMPLEMENTATION AND APPLICATIONS
We have implemented FISCO-BCOS in C++ (200,000+ lines of code) using a plug-in framework where most core modules can be replaced on demand, including cryptography algorithms, consensus protocols, contract virtual machines, etc.By default, we use lib-secp256k1 [11] for ECDSA signatures and secret/public key operations.We upgrade the PBFT [13] protocol to support transaction ordering and block generation.In addition to the traditional EVM [21], the WASM [57] engine is integrated for transaction execution as well.We utilize RocksDB [36] to store archived blocks, transactions, and states.Besides, the distributed storage engine TiKV [6] is also introduced to support storage scaling.
Accoding to the stastistic from recent blockchain application whitepaper, 300+ real-world business blockchain applications have been successfully deployed to serve for 4000+ enterprises based on FISCO-BCOS.These cases cover more than 16 important business scenarios, such as cross-border collaboration, judicial services, financial services, smart governance, supply chain management, etc [24,25].We show some typical cases here.
Cross-border collaboration.The Mutual Health Code Recognition System, implemented based on FISCO-BCOS, is deployed both in Macau and Guangdong to support travelers crossing between two areas.The platform has supported over 300 million travelers crossings during its service period (30 months) [24].
Judicial Services.WeBank, YIBI Technology and Arbitrators have partnered to launch a blockchain-based evidence and arbitration platform to provide real-time, high-performance, authentic, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.and traceable data for judicial services.With the help of FISCO-BCOS, the platform standardizes evidence and trial processes to achieve the requirements of authenticity, legality, and relevance of evidence.The platform has processed 3 billion pieces of codeposited evidence [24].
Financial Services.With the help of FISCO-BCOS, WeBank, together with its partner banks, has established a permission blockchain-based inter-institutional reconciliation platform (shown in Figure 5).The platform stores business information, including funds and transactions, in the form of copies on the chain for reconciliation purposes.Since its launch, the platform has accumulated over 200 million transactions [24].Smart Governance.With the distributed, transparent and hardto-tamper characteristics of blockchain, based on the FISCO-BCOS platform, the Identification Bureau, Telecom, and Polytechnic Institute of Macau build a permission blockchain platform to implement the electronic flow of cross-departmental information of the government and double the efficiency of government services [24].
Supply Chain Management.Based on the FISCO-BCOS platform, GRGBanking has built a supply chain finance platform that realizes a multi-level split and flow model of accounts receivable.The platform transforms accounts payable of core enterprises into electronic vouchers for payment and financing through smart contracts, realizing multi-level transmission of core enterprises' credit and benefiting multiple parties in the industrial chain.The platform has access to 6 banks and one factoring institution, and the capital flow in the industrial chain exceeds 1.5 million dollars [24].

EVALUATION
We design FISCO-BCOS as an enterprise-grade permissioned blockchain system, so it is important to evaluate the performance of FISCO-BCOS in different enterprise environments.In addition, FISCO-BCOS introduces inter-block pipelining and intra-block paralleization, and it is essential to analyze the effectiveness of these mechanisms.Therefore, we conduct a series of experiments on FISCO-BCOS, seeking to answer the following research questions: • How does FISCO-BCOS perform in different environments?
• How do BLP and DMC work in FISCO-BCOS respectively?
• Is FISCO-BCOS scalable in real-world scenarios?

Experiment Setup
We first introduce the basic information about our experiment setup.
Baseline.We compare FISCO-BCOS with two permissioned blockchain platforms, HLF [3] and BIDL [46].HLF is the most popular enterprise-grade permissioned blockchain platform that is world-widely used in various scenarios [30].BIDL is a state-ofart work in academia that uses a shepherded parallel workflow to achieve high performance in datacenter networks.
TestBed.We deploy two typical scenarios for testing FISCO-BCOS and Baseline solutions in Cloud environments.First, a smallscale testbed consists of 11 virtual machines (1 as client and 10 as nodes), each of which is equipped with an Intel(R) Xeon(R) Platinum 8378C CPU @ 2.80GHz (16 cores, 32 threads), 64 GB of RAM, and Ubuntu 20.04.The communications of IP multicasting [60] between VMs are equipped to support BIDL [46].In addition, we also deploy a large-scale testbed that contains 100 less-configured nodes with 8vCPUs (4 cores, 8 threads).
Workloads.We conduct evaluations by using the popular blockchain benchmark BlockBench [19], which contains both macro-benchmark workloads for evaluating the overall performance and micro-benchmark workloads for evaluating the performance of individual layers.In particular, we use SmallBank(macrobenchmark workload) which creates a group of accounts and performs random transfers among them to evaluate performance for IO-intensive scenarios, and CPUHeavy (micro-benchmark workload) which performs a quick sort over an array of integers to simulate the performance for computation-intensive scenarios.

End-to-end Performance Comparison.
We evaluate the end-to-end performance of FISCO-BCOS, HLF, and BIDL in the small-scale testbed.
Performance in conventional environments.Firstly, we perform an end-to-end performance comparison over the SmallBank workload.In this experiment, we randomly generate a batch of    transactions, each involving two accounts belonging to different organizations.As shown in Figure 6, FISCO-BCOS outperforms HLF and BIDL in both throughput and latency.For throughput, HLF processes transactions in a parallel fashion in the endorsing phase, however, in the ordering and validation phase, it must process transactions and blocks serially so that limit its throughput.BIDL on the other hand introduces a parallel workflow to enhance the efficiency of transactions consensus and execution and therefore achieves high throughput than HLF.Our FISCO-BCOS achieves the highest throughput with the BLP and DMC mechanisms, i.e. 93.82K Txn/s, more than 7.4 times over BIDL3 (12.6KTxn/s) and more than 28.4 times over HLF (3.3K Txn/s).In terms of latency, with the same level of throughput, FISCO-BCOS achieves the lowest latency among the three solutions.When the throughput is below 10K Txn/s, the end-to-end latency of FISCO-BCOS is smaller than half that of HLF and BIDL.In addition, we can observe from Figure 6 that the latency of FISCO-BCOS rises very slowly as the throughput increases quickly.Secondly, to evaluate the performance in computation-intensive cases, we take experiments with CPUHeavy workload as well, where we compare FISCO-BCOS with HLF and BIDL in quick-sort over an array of 100,000 integers.Again, FISCO-BCOS achieves higher throughput and lower latency than BIDL and HLF as shown in Figure 7. Due to the serial execution of transactions, BIDL has an obvious degradation of performance in the CPUHeavy case compared to the SmallBank workload.In contrast, with the DMC mechanism, FISCO-BCOS remains relatively high throughput and low latency, i.e. 12.1K Txn/s, 200ms respectively.In the case of HLF, its performance increases slightly compared to the SmallBank workload, although it is much lower than FISCO-BCOS in both cases.The increase is due to the fact that the performance of HLF strongly depends on the read/write sets generated by the transactions (since HLF nodes verify the validity of transactions one by one based on the read/write sets during the validation phase), while the CPUHeavy workload actually generates fewer read/write operations than the SmallBank workload.
The actual usage of hardware resources depends on the business logic and transaction volume.In this experiment, the machine resources limits for SmallBank and CPUHeavy workloads are Bandwidth and CPU respectively.In the SmallBank experiment, the Bandwidth utilization peaks at more than 800 MBit/s, and CPUHeavy evaluation consumes approximately 30 vCPUs.Performance in constrained environments.Different from traditional centralized systems which are deployed entirely by an organization in an environment with a high-speed and high-stability network, such as a datacenter.Permissioned blockchains are usually deployed across multiple organizations over their respective network environments [30].As an enterprise-grade permissioned blockchain, it is important for FISCO-BCOS to perform well in constrained environments.We conduct FISCO-BCOS performance evaluation on SmallBank over different bandwidths.As depicted in Figure 8, we can see that although the performance of FISCO-BCOS decreases as well when the bandwidth becomes more stringent, FISCO-BCOS still has performance gains over BIDL and HLF.With only 100Mbps of bandwidth, FISCO-BCOS still has 15.3K Txn/s throughput and less than 400 ms latency.This is because the impact of the network on the overall performance of the system is greatly reduced due to the pipeline mechanism.
Apart from this, we also deployed 10 nodes in a wide-area network environment for evaluation, with 3, 3, and 4 nodes in each of 3 cities that are more than 1000km away from each other.Again, FISCO-BCOS performs well and obtains 14K Txn/s, 400ms of throughput and latency respectively.

Evaluations of BLP and DMC.
To better understand the advantages of the two key mechanisms in FISCO-BCOS, i.e. the pipeline workflow and the DMC mechanism, we conduct several experiments with different system configurations over SmallBank and CPUHeavy workloads.As shown in Table .1, both BLP and DMC provide significant improvements on both workloads.However, BLP provides a higher improvement over CPUHeavy on SmallBank, while DMC offers a greater improvement over SmallBank on CPUHeavy.Specifically, BLP improves the throughput of SmallBank and CPUHeavy by 3.6x and 2.2x, respectively.And DMC further improves them by 2.4x and 11x, respectively.We observe that DMC performs very well on CPUHeavy contracts, as it processes intra-block transactions in a highly parallel manner, particularly suitable for CPU-intensive scenarios.

Scalability Evaluations.
To evaluate the performance of FISCO-BCOS on a large-scale network, we compared the performance of SmallBank workloads with a different number of FISCO-BCOS nodes on our large-scale testbed.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
In a real enterprise application scenario, each institutional node involved in building a permissioned blockchain cooperates equally and has a reciprocal status.Therefore, in the large-scale performance evaluation of FISCO-BCOS, we keep each node of the system performing the same duties.All nodes of FISCO-BCOS participate in the entire workflow, which means that each node runs the BFT [13] algorithm to participate in consensus and executes each transaction to get a ledger updated.In this scalability experiment, we increase the number of nodes from 4 to 100 using the SmallBank workload and observe the changes in system throughput while ensuring that FISCO-BCOS performs relatively stable transaction latency (less than 300ms).The results are shown in Figure 9 (a), we can see that the throughput decreases with increasing the number of nodes which is understandable since we use BFT [13] protocol 4 for the ordering phase, however, FISCO-BCOS still achieves satisfactory performance in large scale scenarios, i.e. 17K Txn/s when the node number increases to 100.
In addition to the node scalability experiments, we also performed the scalability evaluation of the shards, i.e. shard scalability.In this experiment, we found that SmallBank and CPUHeavy workloads are not suitable because they are too simple and can be efficiently handled within only one shard.Therefore, here we experimented with two real and much more complex business workloads, namely Evidence and DID [56] (used in Judicial Services, and Cross-border collaboration respectively, see 6).The Evidence application accepts a string of evidence and records it in contracts, while DID performs the issuance of individual certificates.We deploy these two business contracts in 1 to 5 shards and observe their performance to evaluate the scalability of FISCO-BCOS in handling more complex business logic scenarios.As shown in Figure 9 (b), the performance of both applications increases with the increasing number of shards.FISCO-BCOS provides good scalability in real-world applications.

RELATED WORK
In recent years, researchers from industry and academia have paid much attention to the design and application of permissioned blockchains.
HyperLedger Fabric (HLF) [3] is considered the most commonly used permissioned blockchain system.HLF is a modular framework for developing enterprise-grade applications and industry solutions among mutually untrusted organizations through a consensus protocol such as Raft [42].Many works [28,50,65] are proposed to improve HLF's performance, they mainly focus on reducing the computation and I/O overhead in the ordering and validation phase.HLF utilizes the  →  →   workflow through which transactions are processed concurrently during the execution phase, and guarantee correctness by checking the conflicts in the validation phase and discarding conflicted transactions.However, dropping the conflicted transactions may badly affect the performance.In BCOS, it puts the ordering ahead of the execution, and the parallelism of transaction execution of different nodes is in a deterministic way, avoiding transaction aborts.
Quorum [14] is another widely used system based on Ethereum [22].It originates from the Ethereum Golang client (geth) [20] and aims to solve the challenges of the financial industry by providing private transactions.Quorum offers much higher throughput than its public chain version.Diem [5] is a decentralized, programmable permissioned blockchain proposed by Facebook.It is designed to support a low-volatility cryptocurrency that will have the ability to serve as an efficient medium of exchange around the world.Dime introduces a novel Move [35] programming language to define the core mechanisms of blockchain, such as the currency and validator membership.Other systems such as Multichain [37], Corda [15], and Tendermint [49] are also carried out in some scenarios.The difference between these systems and BCOS is that these systems all depend on serial processing mechanisms both for block processing and transaction execution.In BCOS, new mechanisms such as block processing workflow pipelining and DMC-based transaction parallel execution are used to enhance the performance of the chain.
Some studies [1,4,7,18,31,64] rely on a two-step transaction execution parallelism.A leading node executes transactions first and generates a transaction dependency graph for other validators and therefore the validators can re-execute the transactions based on the generated dependency graph in a parallel manner.In this way, validators have to trust the dependency graph generated by the leader; if the miner is a malicious node, the system cannot reach a consensus.Moreover, validators have to wait and stay idle before receiving the dependency graph from the leader, which reduces the performance of the overall system.BCOS also makes full use of parallelism but does not need a leader.
BIDL [46] proposes a new framework and achieves very high performance in datacenter networks.It leverages the network ordering property in a datacenter network to enable a new shepherded parallel workflow that performs the consensus in parallel with the transaction execution.But unfortunately, some requirements of BIDL, such as triangle property [46] and IP multicasting [60] are easy to be fulfilled in datacenter networks but are difficult to hold true in other networks.BCOS does not have strict limits on the network environment and is insensitive to packet loss and narrow bandwidth as we show in Sec. 7.
Furthermore, some researchers have focused on sharding techniques to improve the scalability and performance of blockchains [2,16,32,39,58,63].However, these efforts are mainly designed for permissionless public blockchains and these approaches are too complex to be applied in permissioned blockchains.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Last but not least, some classical studies on databases and operating systems also inspire our work, such as SEDA [59], DMT [34,41], etc.

CONCLUSION
We propose an enterprise-grade permissioned blockchain system with high performance.By using a block-level pipeline workflow for inter-block processing and a deterministic multi-contract mechanism for intra-block handling, FISCO-BCOS breaks the serial dependency of block processing and realizes parallel execution of transactions, significantly improving overall performance.Our experiments show that FISCO-BCOS outperforms state-of-the-art permissioned blockchains from industry and academia, achieving 7.4 times and 28.4 times the throughput of BIDL and HLF, respectively, with half the latency of them.FISCO-BCOS has already been used in over 300 different real-world industrial scenarios and the source code is available on Github.

REPRODUCIBILITY OF EXPERIMENTS
To facilitate testing, we wrote 'experiments.sh'to run the test cases of DXLedger, and we will describe how each experiment was run one by one below.First of all, we should download the DXLedgernew.zipartifact and decompress it to get 'experiments.sh'and other components.
Because the experiment in the paper requires 10-100 machines to deploy the experimental environment, we provide a stand-alone 4-node environment in DOI for Functional verification, which is located in the 4nodes_functions_check_env, users can execute perf scripts to verify functionality.Due to the competition of hardware resources between the four nodes and the pressure testing tools the results is not If you want to fully replicate the experimental results is not representative, it is recommended to use the hardware resources described in subsequent experiments.
• Docker is required only for HLF experiments.
• tc is required only for network emulation.
• ssh no password login is required for remote execution.

ARTIFACT INSTALLATION DEPLOYMENT PROCESS
For DxLedger, everything needed for experiments is included in the artifact DOI (https://zenodo.org/record/8207532). Follow the instructions in AD to install and run the artifact.
For HLF and BIDL, we run the experiments as official documents and also offer instructions in 'REPRODUCIBILITY OF EXPERI-MENTS' (section 0.6 and 0.7).

Figure 3 :
Figure 3: An example of one block being processed by DMC.

Figure 4 :
Figure 4: The scheduling process of 5 transactions that invoke 3 various contracts.Three transfer transactions  1 ,  2 , and  3 are concurrently executed following the DAG guidance in 1. 4 and  5 are assigned to 2, 3 respectively, and  5 invokes a cross-contract call of /2/3.

Figure 7 :
Figure 7: Performance comparison on two workloads.

Figure 9 :
Figure 9: Performance of varying numbers of nodes and shards.