Input and output assumptions
In the most general case, the input to counting sort consists of a collection of items, each of which has a non-negative integer key whose maximum value is at most . In some descriptions of counting sort, the input to be sorted is assumed to be more simply a sequence of integers itself, but this simplification does not accommodate many applications of counting sort. For instance, when used as a subroutine inPseudocode
In pseudocode, the algorithm may be expressed as: function CountingSort(input, ''k'') count ← array of ''k'' + 1 zeros output ← array of same length as input for ''i'' = 0 to length(input) - 1 do ''j'' = key(input 'i'' count 'j''= count 'j''+ 1 for ''i'' = 1 to ''k'' do count 'i''= count 'i''+ count 'i'' - 1 for ''i'' = length(input) - 1 down to 0 do ''j'' = key(input 'i'' count 'j''= count 'j''- 1 output ount[''j'' = input[''i''">'j''.html" ;"title="ount[''j''">ount[''j'' = input[''i'' return output Hereinput
is the input array to be sorted, key
returns the numeric key of each item in the input array, count
is an auxiliary array used first to store the numbers of items with each key, and then (after the second loop) to store the positions where items with each key should be placed,
k
is the maximum value of the non-negative key values and output
is the sorted output array.
In summary, the algorithm loops over the items in the first loop, computing a histogram of the number of times each key occurs within the input
collection. After that in the second loop, it performs a prefix sum computation on count
in order to determine, for each key, the position range where the items having that key should be placed; i.e. items of key should be placed starting in position count i">math>i/code>. Finally, in the third loop, it loops over the items of input
again, but in reverse order, moving each item into its sorted position in the output
array.
The relative order of items with equal keys is preserved here; i.e., this is a stable sort.
Complexity analysis
Because the algorithm uses only simple for loops, without recursion or subroutine calls, it is straightforward to analyze. The initialization of the count array, and the second for loop which performs a prefix sum on the count array, each iterate at most times and therefore take time. The other two for loops, and the initialization of the output array, each take time. Therefore, the time for the whole algorithm is the sum of the times for these steps, .
Because it uses arrays of length and , the total space usage of the algorithm is also . For problem instances in which the maximum key value is significantly smaller than the number of items, counting sort can be highly space-efficient, as the only storage it uses other than its input and output arrays is the Count array which uses space .
Variant algorithms
If each item to be sorted is itself an integer, and used as key as well, then the second and third loops of counting sort can be combined; in the second loop, instead of computing the position where items with key i
should be placed in the output, simply append Count /code> copies of the number i
to the output.
This algorithm may also be used to eliminate duplicate keys, by replacing the Count
array with a bit vector that stores a one
for a key that is present in the input and a zero
for a key that is not present. If additionally the items are the integer keys themselves, both second and third loops can be omitted entirely and the bit vector will itself serve as output, representing the values as offsets of the non-zero
entries, added to the range's lowest value. Thus the keys are sorted and the duplicates are eliminated in this variant just by being placed into the bit array.
For data in which the maximum key size is significantly smaller than the number of data items, counting sort may be parallelized
Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different for ...
by splitting the input into subarrays of approximately equal size, processing each subarray in parallel to generate a separate count array for each subarray, and then merging the count arrays. When used as part of a parallel radix sort algorithm, the key size (base of the radix representation) should be chosen to match the size of the split subarrays. The simplicity of the counting sort algorithm and its use of the easily parallelizable prefix sum primitive also make it usable in more fine-grained parallel algorithms.
As described, counting sort is not an in-place algorithm; even disregarding the count array, it needs separate input and output arrays. It is possible to modify the algorithm so that it places the items into sorted order within the same array that was given to it as the input, using only the count array as auxiliary storage; however, the modified in-place version of counting sort is not stable.
History
Although radix sorting itself dates back far longer,
counting sort, and its application to radix sorting, were both invented by Harold H. Seward
Harold H. Seward (July 24, 1930 – June 19, 2012) was a computer scientist, engineer, and inventor. Seward developed the radix sort and counting sort algorithms in 1954 at MIT. He also worked on the Whirlwind Computer and developed instruments ...
in 1954.. Section 5.2, Sorting by counting, pp. 75–80, and historical notes, p. 170..
References
External links
Counting Sort html5 visualization
*.
{{DEFAULTSORT:Counting Sort
Sorting algorithms
Stable sorts