Problem description
The consensus problem requires agreement among a number of processes (or agents) on a single data value. Some of the processes (agents) may fail or be unreliable in other ways, so consensus protocols must be fault-tolerant or resilient. The processes must put forth their candidate values, communicate with one another, and agree on a single consensus value. The consensus problem is a fundamental problem in controlling multi-agent systems. One approach to generating consensus is for all processes (agents) to agree on a majority value. In this context, a majority requires at least one more than half of the available votes (where each process is given a vote). However, one or more faulty processes may skew the resultant outcome such that consensus may not be reached or may be reached incorrectly. Protocols that solve consensus problems are designed to deal with a limited number of faulty processes. These protocols must satisfy several requirements to be useful. For instance, a trivial protocol could have all processes output binary value 1. This is not useful; thus, the requirement is modified such that the production must depend on the input. That is, the output value of a consensus protocol must be the input value of some process. Another requirement is that a process may decide upon an output value only once, and this decision is irrevocable. A method is correct in an execution if it does not experience a failure. A consensus protocol tolerating halting failures must satisfy the following properties. ;Termination: Eventually, every correct process decides some value. ;Integrity: If all the correct processes proposed the same value , then any correct process must decide . ;Agreement: Every correct process must agree on the same value. Variations on the definition of ''integrity'' may be appropriate, according to the application. For example, a weaker type of integrity would be for the decision value to equal a value that some correct process proposed – not necessarily all of them. There is also a condition known as validity in the literature which refers to the property that a message sent by a process must be delivered. A protocol that can correctly guarantee consensus amongst n processes of which at most t fail is said to be ''t-resilient''. In evaluating the performance of consensus protocols two factors of interest are ''running time'' and ''message complexity''. Running time is given in Big O notation in the number of rounds of message exchange as a function of some input parameters (typically the number of processes and/or the size of the input domain). Message complexity refers to the amount of message traffic that is generated by the protocol. Other factors may include memory usage and the size of messages.Models of computation
Varying models of computation may define a "consensus problem". Some models may deal with fully connected graphs, while others may deal with rings and trees. In some models message authentication is allowed, whereas in others processes are completely anonymous. Shared memory models in which processes communicate by accessing objects in shared memory are also an important area of research.Communication channels with direct or transferable authentication
In most models of communication protocol participants communicate through ''authenticated channels.'' This means that messages are not anonymous, and receivers know the source of every message they receive. Some models assume a stronger, ''transferable'' form of authentication, where each ''message'' is signed by the sender, so that a receiver knows not just the immediate source of every message, but the participant that initially created the message. This stronger type of authentication is achieved by digital signatures, and when this stronger form of authentication is available, protocols can tolerate a larger number of faults. The two different authentication models are often called ''oral communication'' and ''written communication'' models. In an oral communication model, the immediate source of information is known, whereas in stronger, written communication models, every step along the receiver learns not just the immediate source of the message, but the communication history of the message.Inputs and outputs of consensus
In the most traditional single-value consensus protocols such as Paxos, cooperating nodes agree on a single value such as an integer, which may be of variable size so as to encode usefulCrash and Byzantine failures
There are two types of failures a process may undergo, a crash failure or a Byzantine failure. A ''crash failure'' occurs when a process abruptly stops and does not resume. ''Byzantine failure''s are failures in which absolutely no conditions are imposed. For example, they may occur as a result of the malicious actions of an adversary. A process that experiences a Byzantine failure may send contradictory or conflicting data to other processes, or it may sleep and then resume activity after a lengthy delay. Of the two types of failures, Byzantine failures are far more disruptive. Thus, a consensus protocol tolerating Byzantine failures must be resilient to every possible error that can occur. A stronger version of consensus tolerating Byzantine failures is given by strengthening the Integrity constraint: ;Integrity:If a correct process decides , then must have been proposed by some correct process.Asynchronous and synchronous systems
The consensus problem may be considered in the case of asynchronous or synchronous systems. While real world communications are often inherently asynchronous, it is more practical and often easier to model synchronous systems, given that asynchronous systems naturally involve more issues than synchronous ones. In synchronous systems, it is assumed that all communications proceed in ''rounds''. In one round, a process may send all the messages it requires, while receiving all messages from other processes. In this manner, no message from one round may influence any messages sent within the same round.The FLP impossibility result for asynchronous deterministic consensus
In a fully asynchronous message-passing distributed system, in which at least one process may have a ''crash failure'', it has been proven in the famous 1985 FLP impossibility result by Fischer, Lynch and Paterson that a deterministic algorithm for achieving consensus is impossible. This impossibility result derives from worst-case scheduling scenarios, which are unlikely to occur in practice except in adversarial situations such as an intelligent denial-of-service attacker in the network. In most normal situations, process scheduling has a degree of natural randomness. In an asynchronous model, some forms of failures can be handled by a synchronous consensus protocol. For instance, the loss of a communication link may be modeled as a process which has suffered a Byzantine failure. Randomized consensus algorithms can circumvent the FLP impossibility result by achieving both safety and liveness with overwhelming probability, even under worst-case scheduling scenarios such as an intelligent denial-of-service attacker in the network.Permissioned versus permissionless consensus
Consensus algorithms traditionally assume that the set of participating nodes is fixed and given at the outset: that is, that some prior (manual or automatic) configuration process has permissioned a particular known group of participants who can authenticate each other as members of the group. In the absence of such a well-defined, closed group with authenticated members, a Sybil attack against an open consensus group can defeat even a Byzantine consensus algorithm, simply by creating enough virtual participants to overwhelm the fault tolerance threshold. A permissionless consensus protocol, in contrast, allows anyone in the network to join dynamically and participate without prior permission, but instead imposes a different form of artificial cost or barrier to entry to mitigate the Sybil attack threat.Equivalency of agreement problems
Three agreement problems of interest are as follows.Terminating Reliable Broadcast
A collection of processes, numbered from to communicate by sending messages to one another. Process must transmit a value to all processes such that: # if process is correct, then every correct process receives # for any two correct processes, each process receives the same value. It is also known as The General's Problem.Consensus
Formal requirements for a consensus protocol may include: * ''Agreement'': All correct processes must agree on the same value. * ''Weak validity'': For each correct process, its output must be the input of some correct process. * ''Strong validity'': If all correct processes receive the same input value, then they must all output that value. * ''Termination'': All processes must eventually decide on an output valueWeak Interactive Consistency
For ''n'' processes in a partially synchronous system (the system alternates between good and bad periods of synchrony), each process chooses a private value. The processes communicate with each other by rounds to determine a public value and generate a consensus vector with the following requirements: # if a correct process sends , then all correct processes receive either or nothing (integrity property) # all messages sent in a round by a correct process are received in the same round by all correct processes (consistency property). It can be shown that variations of these problems are equivalent in that the solution for a problem in one type of model may be the solution for another problem in another type of model. For example, a solution to the Weak Byzantine General problem in a synchronous authenticated message passing model leads to a solution for Weak Interactive Consistency. An interactive consistency algorithm can solve the consensus problem by having each process choose the majority value in its consensus vector as its consensus value.Solvability results for some agreement problems
There is a t-resilient anonymous synchronous protocol which solves the Byzantine Generals problem, if and the Weak Byzantine Generals case where is the number of failures and is the number of processes. For systems with processors, of which are Byzantine, it has been shown that there exists no algorithm that solves the consensus problem for in the ''oral-messages model''. The proof is constructed by first showing the impossibility for the three-node case and using this result to argue about partitions of processors. In the ''written-messages model'' there are protocols that can tolerate . In a fully asynchronous system there is no consensus solution that can tolerate one or more crash failures even when only requiring the non triviality property. This result is sometimes called the FLP impossibility proof named after the authors Michael J. Fischer, Nancy Lynch, and Mike Paterson who were awarded a Dijkstra Prize for this significant work. The FLP result has been mechanically verified to hold even under fairness assumptions. However, FLP does not state that consensus can never be reached: merely that under the model's assumptions, no algorithm can always reach consensus in bounded time. In practice it is highly unlikely to occur.Some consensus protocols
The Paxos consensus algorithm by Leslie Lamport, and variants of it such asPermissionless consensus protocols
Consensus number
To solve the consensus problem in a shared-memory system, concurrent objects must be introduced. A concurrent object, or shared object, is a data structure which helps concurrent processes communicate to reach an agreement. Traditional implementations using critical sections face the risk of crashing if some process dies inside the critical section or sleeps for an intolerably long time. Researchers defined wait-freedom as the guarantee that the algorithm completes in a finite number of steps. The consensus number of a concurrent object is defined to be the maximum number of processes in the system which can reach consensus by the given object in a wait-free implementation. Objects with a consensus number of can implement any object with a consensus number of or lower, but cannot implement any objects with a higher consensus number. The consensus numbers form what is called Herlihy's hierarchy of synchronization objects. According to the hierarchy, read/write registers cannot solve consensus even in a 2-process system. Data structures like stacks and queues can only solve consensus between two processes. However, some concurrent objects are universal (notated in the table with ), which means they can solve consensus among any number of processes and they can simulate any other objects through an operation sequence.See also
* Uniform consensus * Quantum Byzantine agreement * Byzantine faultReferences
Further reading
* * *Bashir, Imran. "Blockchain Consensus." ''Blockchain Consensus - An Introduction to Classical, Blockchain, and Quantum Consensus Protocols''. Apress, Berkeley, CA, 2022. {{DEFAULTSORT:Consensus (Computer Science) Distributed computing problems Fault-tolerant computer systems