Need for Distributed Consensus
If there is only a single decision-maker, we don’t need a consensus algorithm. In the case of two decision-makers (nodes) and the presence of any faults such as crash fault, or network fault or even if the node behaves maliciously, we can not reach a consensus. To reach a consensus, we always require more than two nodes or decision-makers.
In the case of multiple decision-makers, and collectively, they want to come to a certain decision, and then we require a consensus algorithm. The distributed consensus helps us to reach an agreement in the case of distributed computing. In the case of the state machine replication concept, we replicate the common status so that all the processes have the same view of the state.
Examples:
- State machine replication is the flight control system when there are multiple flights. They want to coordinate their positions among themselves in the closed distributed environment to achieve consensus.
- Fund transferring system in a closed distributed environment.
- Distributed leader election in a closed distributed environment where all the nodes collectively need to elect one leader in the system.
Faults in Distributed Consensus
It is easy and trivial to reach a consensus in the absence of failures in the distributed system. It is as simple as each node broadcasts the personal choice to all and applies a choice function, say the maximum of all the values, and achieves the common viewpoint or decision. However, it is challenging in the presence of faults in a distributed system. These are various kinds of faults that exist in the distributed system:
- Crash Fault: A node suddenly crashes or becomes unavailable in the middle of communication. This may be because of hardware or software faults due to which the node or the process is communicating with another one gets disconnect or fail.
- Network or Partitioned Faults: A network fault occurs because of the link failure, and the network gets partitioned. This may be because of the edge router failures and, consequently, hamper reaching the consensus.
- Byzantine Faults: A node starts behaving maliciously. It is a kind of fault that is very difficult to handle as the node’s behavior is unpredictable. It also includes software and hardware faults.
The crash and network faults are expected and can be predicted using the probabilistic models. However, Byzantine fault can be crafted with malicious intent. Thus, it isn’t easy to guess to apply suitable remedies. So the consensus protocols have to deal with these three kinds of faults in the distributed ecosystem.
Requirements of Consensus Algorithms
The distributed consensus protocols need to satisfy certain properties, and these are as follows:
- Termination: Every correct individual decides some value at the end of the consensus protocol. It means whoever are the non-faulty nodes in the network terminates the consensus protocol at the end and decides on one value, and that value must be the correct value.
- Validity: If all the individuals propose the same value, then all correct individuals decide on that value – This is the basic idea of validity property.
- Integrity: Every correct individual decides at most one value, and some individuals must propose the decided value. The integrity property ensures that the consensus value should not deviate from the value individuals in the network propose.
- Agreement: Every correct or non-faulty individual must agree on the same value.
Whenever they agreed on the same value after termination, we call that the system has reached the consensus.
Popular Distributed Consensus Algorithms
There are various consensus algorithms that the distributed system community has explored, and the most famous are as follows:
- Crash or Network Faults:
- PAXOS
- RAFT
- Byzantine Fault (including Crash or Network Failures):
- Byzantine fault tolerance (BFT)
- Practical Byzantine Fault Tolerance (PBFT)
Correctness of a Distributed Consensus Algorithms
These two properties can characterize the correctness of a distributed consensus algorithms:
- Safety: Correct individuals must not agree on an incorrect value. It means nothing bad will happen in the system. This property ensures that the system will never convert to an incorrect value or the correct individuals in the network will never convert to an incorrect value.
- Liveliness: Every correct value must be accepted eventually. It means something good eventually happens. If someone is proposing some good values, that good value will be committed eventually. Although, there can be some time lag or delay in reaching the consensus. But after the consensus protocol terminates, expected to have a correct consensus value.
These two correctness properties for a distributed consensus algorithm need to ensure whenever we design a distributed consensus algorithms.
Summary
We have seen the properties of a consensus algorithm in the distributed permissioned environment, different types of faults, a list of popular distributed consensus algorithms, and consensus algorithms’ correctness properties.
References
- NPTEL lecture series on Blockchains Architecture, Design and Use Cases by Prof. Sandip Chakraborty, IIT Kharagpur.
517 total views, 1 views today