
The Boyer–Moore majority vote algorithm is an
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
for finding the
majority of a sequence of elements using
linear time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
and constant space. It is named after
Robert S. Boyer and
J Strother Moore, who published it in 1981,
[. Originally published as a technical report in 1981.] and is a prototypical example of a
streaming algorithm In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes (typically just one). In most models, these algorithms have access ...
.
In its simplest form, the algorithm finds a majority element, if there is one: that is, an element that occurs repeatedly for more than half of the elements of the input.
A version of the algorithm that makes a second pass through the data can be used to verify that the element found in the first pass really is a majority.
If a second pass is not performed and there is no majority the algorithm will not detect that no majority exists. In the case that no strict majority exists, the returned element can be arbitrary; it is not guaranteed to be the element that occurs most often (the
mode of the sequence).
It is not possible for a streaming algorithm to find the most frequent element in less than linear space, for sequences whose number of repetitions can be small.
[
]
Description
The algorithm maintains in its local variables a sequence element and a counter, with the counter initially zero.
It then processes the elements of the sequence, one at a time.
When processing an element , if the counter is zero, the algorithm stores as its remembered sequence element and sets the counter to one.
Otherwise, it compares to the stored element and either increments the counter (if they are equal) or decrements the counter (otherwise).
At the end of this process, if the sequence has a majority, it will be the element stored by the algorithm.
This can be expressed in pseudocode
In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine re ...
as the following steps:
*Initialize an element and a counter with
*For each element of the input sequence:
**If , then assign and
**else if , then assign
**else assign
*Return
Even when the input sequence has no majority, the algorithm will report one of the sequence elements as its result.
However, it is possible to perform a second pass over the same input sequence in order to count the number of times the reported element occurs and determine whether it is actually a majority.
This second pass is needed, as it is not possible for a sublinear-space algorithm to determine whether there exists a majority element in a single pass through the input.[.]
Analysis
The amount of memory that the algorithm needs is the space for one element and one counter.
In the random access
Random access (more precisely and more generally called direct access) is the ability to access an arbitrary element of a sequence in equal time or any datum from a population of addressable elements roughly as easily and efficiently as any othe ...
model of computing usually used for the analysis of algorithms
In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that r ...
, each of these values can be stored in a machine word
In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''wor ...
and the total space needed is . If an array index is needed to keep track of the algorithm's position in the input sequence, it doesn't change the overall constant space bound.
The algorithm's bit complexity (the space it would need, for instance, on a Turing machine
A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algor ...
) is higher, the sum of the binary logarithm
In mathematics, the binary logarithm () is the power to which the number must be raised to obtain the value . That is, for any real number ,
:x=\log_2 n \quad\Longleftrightarrow\quad 2^x=n.
For example, the binary logarithm of is , the ...
s of the input length and the size of the universe from which the elements are drawn.[.] Both the random access model and bit complexity analyses only count the working storage of the algorithm, and not the storage for the input sequence itself.
Similarly, on a random access machine the algorithm takes time (linear time) on an input sequence of items, because it performs only a constant number of operations per input item. The algorithm can also be implemented on a Turing machine in time linear in the input length ( times the number of bits per input item).
Correctness
Assume we have a sequence of size that has a majority element . It can be shown by way of negation that the algorithm can't output a non majority element.
The algorithm chooses the first element in the sequence as the majority candidate and initializes a for that candidate with value 1. If the candidate is not the majority element, then the counter must reach zero since there are more elements in the sequence that are not equal to the candidate.
When the counter reaches zero after < iterations, then we consumed exactly elements with the candidate value (K must be even). Whether the candidate equals to the majority element or not, we are left with a subsequence of length - where is still the majority element.
See also
* Element distinctness problem, the problem of testing whether a collection of elements has any repeated elements
*Majority function In Boolean logic, the majority function (also called the median operator) is the Boolean function that evaluates to false when half or more arguments are false and true otherwise, i.e. the value of the function equals the value of the majority of th ...
, the majority of a collection of Boolean values
*Majority problem (cellular automaton) The majority problem, or density classification task, is the problem of finding one-dimensional cellular automaton rules that accurately perform majority voting.
Using local transition rules, cells cannot know the total count of all the ones in s ...
, the problem of finding a majority element in the cellular automaton
A cellular automaton (pl. cellular automata, abbrev. CA) is a discrete model of computation studied in automata theory. Cellular automata are also called cellular spaces, tessellation automata, homogeneous structures, cellular structures, tess ...
computational model
* Misra–Gries heavy hitters algorithm and Misra–Gries summary, the natural generalization of Boyer Moore to storing more than one item.
References
{{DEFAULTSORT:Boyer-Moore majority vote algorithm
Streaming algorithms