Consensus is a procedure to reach a common agreement in a distributed or decentralized multi-agent platform. It provides reliability and fault tolerance in a distributed system and ensures the correct operations in the presence of faulty individuals.
Why do we require consensus?
In a conventional distributed system, multiple individual parties take their own decision and come up with a common viewpoint. However, some nodes may collude for malicious intent in the decision-making process. The consensus is a mechanism to ensure reliability or to ensure correct operations in the presence of faulty individuals.
Type of Faults in the Distributed System
It is easy and trivial to reach a consensus in the absence of failures in the distributed system. It is as simple as each node broadcasts the personal choice to all and applies a choice function, say the maximum of all the values, and achieves the common viewpoint or decision. However, it is challenging in the presence of faults in a distributed system. These are various kinds of faults that exist in the distributed system:
- Crash Fault: A node suddenly crashes or becomes unavailable in the middle of communication. This may be because of hardware or software faults due to which the node or the process is communicating with another one gets disconnect or fail.
- Network or Partitioned Faults: A network fault occurs because of the link failure, and the network gets partitioned. This may be because of the edge router failures and, consequently, hamper reaching the consensus.
- Byzantine Faults: A node starts behaving maliciously. It is a kind of fault that is very difficult to handle as the node’s behavior is unpredictable.
The crash and network faults are expected and can be predicted using the probabilistic models. However, Byzantine fault can be crafted with malicious intent. Thus, it isn’t easy to guess to apply suitable remedies. So the consensus protocols have to deal with these 3 kinds of faults in the distributed ecosystem.
Properties of distributed consensus protocols
The distributed consensus protocols need to satisfy certain properties, and these are as follows:
- Termination: Every correct individual decides some value at the end of the consensus protocol. It means whoever are the non-faulty nodes in the network terminates the consensus protocol at the end and decides on one value, and that value must be the correct value.
- Validity: If all the individuals propose the same value, then all correct individuals decide on that value – This is the basic idea of validity property.
- Integrity: Every correct individual decides at most one value, and some individuals must propose the decided value. The integrity property ensures that the consensus value should not deviate from the value individuals in the network propose.
- Agreement: Every correct or non-faulty individual must agree on the same value.
Whenever they agreed on the same value after termination, we call that the system has reached the consensus.
Type of Message-Passing Systems
There are two different types of message-passing systems, and one of them can be used to communicate among the nodes in the distributed system.
- Synchronous Message Passing System: The message must be received within a predefined time interval. It means we have a kind of strong guarantee on the message passing delay. And we know apriori that what can be the maximum delay of message passing for this particular network. This kind of synchronicity gives you simplification in designing the protocol. Moreover, we can think that each node will wait for a certain duration, and that duration will be the maximum bound on the message delay. This gives a guarantee that you will receive all the messages at that time.
- Asynchronous Message Passing System: There is no upper bound on the message transmission delay or the message reception time. It means no timing constraint, and the message can be delayed for an arbitrary period of time. So, in this case, we cannot expect that if we wait for a finite duration, we will receive all the messages with a certain probability or guaranteed probability.
Designing a distributed system in a synchronous environment is much easier because we strongly assume the message passing delay. However, in an asynchronous environment, we don’t have such a guarantee, so we have to also deal with the kind of faults that may occur due to the asynchonousity nature of the system. Thus, consensus under this kind of environment is much more difficult than designing a consensus protocol for the synchronous distributed system.
Important Note: FLP85 (Impossibility Result): In a purely asynchronous distributed system, the consensus problem is impossible (with a deterministic solution) to solve if in the presence of a single crash failure. It means in a purely asynchronous distributed environment, if there is a single fault in the system, we cannot design any deterministic consensus protocol. However, we can always have some randomized or probabilistic solution.
Popular Consensus Algorithms
There are various consensus algorithms that the distributed system community has explored, and the most famous are as follows:
- Paxos
- Raft
- Byzantine Fault Tolerance (BFT)
Correctness of a Distributed Consensus Protocol
These two properties can characterize the correctness of a distributed consensus algorithms:
- Safety: Correct individuals must not agree on an incorrect value. It means nothing bad will happen in the system. This property ensures that the system will never convert to an incorrect value or the correct individuals in the network will never convert to an incorrect value.
- Liveliness: Every correct value must be accepted eventually. It means something good eventually happens. If someone is proposing some good values, that good value will be committed eventually. Although, there can be some time lag or delay in reaching the consensus. But after the consensus protocol terminates, expected to have a correct consensus value.
These two correctness properties for a distributed consensus algorithm need to ensure whenever we design a distributed consensus algorithms.
Consensus protocols from the Perspective of a Blockchain Environment
The traditional distributed consensus protocols are based on Message passing, where individuals are connected over the Internet, and Shared Memory, where a common memory place is available to read and write the shared variables that everyone can access. Moreover, message passing requires a closed environment where everyone knows the identity of others, and a shared memory scheme is also not suitable for internet grid computing because we need to put a memory that should be readable and writeable by every individual node in the network. In general, the shared memory algorithms we apply whenever we try to reach consensus among multiple distributed processes inside a system.
The above schemes can not be used in a permissionless blockchain environment as anybody can join and perform the task without the need for registration, verification, and authentication.
Summary
We have seen Consensus mechanisms, types of faults, properties of consensus protocols, and the correctness of consensus protocols in the distributed system.
References
- NPTEL lecture series on Blockchains Architecture, Design and Use Cases by Prof. Sandip Chakraborty, IIT Kharagpur.
245 total views, 1 views today