HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, radix sort is a non-
comparative general linguistics, the comparative is a syntactic construction that serves to express a comparison between two (or more) entities or groups of entities in quality or degree - see also comparison (grammar) for an overview of comparison, as well ...
sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important ...
. It avoids comparison by creating and distributing elements into buckets according to their
radix In a positional numeral system, the radix or base is the number of unique digits, including the digit zero, used to represent numbers. For example, for the decimal/denary system (the most common system in use today) the radix (base number) is ...
. For elements with more than one
significant digit Significant figures (also known as the significant digits, ''precision'' or ''resolution'') of a number in positional notation are digits in the number that are reliable and necessary to indicate the quantity of something. If a number expres ...
, this bucketing process is repeated for each digit, while preserving the ordering of the prior step, until all digits have been considered. For this reason, radix sort has also been called bucket sort and digital sort. Radix sort can be applied to data that can be sorted lexicographically, be they integers, words, punch cards, playing cards, or the
mail The mail or post is a system for physically transporting postcards, letters, and parcels. A postal service can be private or public, though many governments place restrictions on private systems. Since the mid-19th century, national postal sys ...
.


History

Radix sort dates back as far as 1887 to the work of
Herman Hollerith Herman Hollerith (February 29, 1860 – November 17, 1929) was a German-American statistician, inventor, and businessman who developed an electromechanical tabulating machine for punched cards to assist in summarizing information and, later, i ...
on tabulating machines. Radix sorting algorithms came into common use as a way to sort
punched card A punched card (also punch card or punched-card) is a piece of stiff paper that holds digital data represented by the presence or absence of holes in predefined positions. Punched cards were once common in data processing applications or to di ...
s as early as 1923.
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
. ''The Art of Computer Programming'', Volume 3: ''Sorting and Searching'', Third Edition. Addison-Wesley, 1997. . Section 5.2.5: Sorting by Distribution, pp. 168–179.
The first memory-efficient computer algorithm for this sorting method was developed in 1954 at
MIT The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the m ...
by Harold H. Seward. Computerized radix sorts had previously been dismissed as impractical because of the perceived need for variable allocation of buckets of unknown size. Seward's innovation was to use a linear scan to determine the required bucket sizes and offsets beforehand, allowing for a single static allocation of auxiliary memory. The linear scan is closely related to Seward's other algorithm —
counting sort In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess dis ...
. In the modern era, radix sorts are most commonly applied to collections of binary strings and
integers An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
. It has been shown in some benchmarks to be faster than other more general-purpose sorting algorithms, sometimes 50% to three times faster.


Digit order

Radix sorts can be implemented to start at either the
most significant digit Significant figures (also known as the significant digits, ''precision'' or ''resolution'') of a number in positional notation are digits in the number that are reliable and necessary to indicate the quantity of something. If a number expres ...
(MSD) or least significant digit (LSD). For example, with 1234, one could start with 1 (MSD) or 4 (LSD). LSD radix sorts typically use the following sorting order: short keys come before longer keys, and then keys of the same length are sorted lexicographically. This coincides with the normal order of integer representations, like the sequence , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11''. LSD sorts are generally stable sorts. MSD radix sorts are most suitable for sorting strings or fixed-length integer representations. A sequence like , c, e, d, f, g, ba'' would be sorted as , ba, c, d, e, f, g''. If lexicographic ordering is used to sort variable-length integers in base 10, then numbers from 1 to 10 would be output as , 10, 2, 3, 4, 5, 6, 7, 8, 9'', as if the shorter keys were left-justified and padded on the right with blank characters to make the shorter keys as long as the longest key. MSD sorts are not necessarily stable if the original ordering of duplicate keys must always be maintained. Other than the traversal order, MSD and LSD sorts differ in their handling of variable length input. LSD sorts can group by length, radix sort each group, then concatenate the groups in size order. MSD sorts must effectively 'extend' all shorter keys to the size of the largest key and sort them accordingly, which can be more complicated than the grouping required by LSD. However, MSD sorts are more amenable to subdivision and recursion. Each bucket created by an MSD step can itself be radix sorted using the next most significant digit, without reference to any other buckets created in the previous step. Once the last digit is reached, concatenating the buckets is all that is required to complete the sort.


Examples


Least significant digit

Input list: : 70, 45, 75, 90, 2, 802, 2, 66'' Starting from the rightmost (last) digit, sort the numbers based on that digit: : , , '' Sorting by the next left digit: : , , , '' :Notice that an implicit digit ''0'' is prepended for the two 2s so that 802 maintains its position between them. And finally by the leftmost digit: : , '' :Notice that a ''0'' is prepended to all of the 1- or 2-digit numbers. Each step requires just a single pass over the data, since each item can be placed in its bucket without comparison with any other element. Some radix sort implementations allocate space for buckets by first counting the number of keys that belong in each bucket before moving keys into those buckets. The number of times that each digit occurs is stored in an
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
. Although it is always possible to pre-determine the bucket boundaries using counts, some implementations opt to use dynamic memory allocation instead.


Most significant digit, forward recursive

Input list, fixed width numeric strings with leading zeros: : 70, 045, 075, 025, 002, 024, 802, 066'' First digit, with brackets indicating buckets: : , '' :Notice that 170 and 802 are already complete because they are all that remain in their buckets, so no further recursion is needed Next digit: : 170, 802'' Final digit: :
002, , 045, 066, 075 , 170, 802 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
'' All that remains is concatenation: : 02, 024, 025, 045, 066, 075, 170, 802''


Complexity and performance

Radix sort operates in O(nw) time, where n is the number of keys, and w is the key length. LSD variants can achieve a lower bound for w of 'average key length' when splitting variable length keys into groups as discussed above. Optimized radix sorts can be very fast when working in a domain that suits them. They are constrained to lexicographic data, but for many practical applications this is not a limitation. Large key sizes can hinder LSD implementations when the induced number of passes becomes the bottleneck.


Specialized variants


In-place MSD radix sort implementations

Binary MSD radix sort, also called binary quicksort, can be implemented in-place by splitting the input array into two bins - the 0s bin and the 1s bin. The 0s bin is grown from the beginning of the array, whereas the 1s bin is grown from the end of the array. The 0s bin boundary is placed before the first array element. The 1s bin boundary is placed after the last array element. The most significant bit of the first array element is examined. If this bit is a 1, then the first element is swapped with the element in front of the 1s bin boundary (the last element of the array), and the 1s bin is grown by one element by decrementing the 1s boundary array index. If this bit is a 0, then the first element remains at its current location, and the 0s bin is grown by one element. The next array element examined is the one in front of the 0s bin boundary (i.e. the first element that is not in the 0s bin or the 1s bin). This process continues until the 0s bin and the 1s bin reach each other. The 0s bin and the 1s bin are then sorted recursively based on the next bit of each array element. Recursive processing continues until the least significant bit has been used for sorting. Handling signed integers requires treating the most significant bit with the opposite sense, followed by unsigned treatment of the rest of the bits. In-place MSD binary-radix sort can be extended to larger radix and retain in-place capability.
Counting sort In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess dis ...
is used to determine the size of each bin and their starting index. Swapping is used to place the current element into its bin, followed by expanding the bin boundary. As the array elements are scanned the bins are skipped over and only elements between bins are processed, until the entire array has been processed and all elements end up in their respective bins. The number of bins is the same as the radix used - e.g. 16 bins for 16-radix. Each pass is based on a single digit (e.g. 4-bits per digit in the case of 16-radix), starting from the
most significant digit Significant figures (also known as the significant digits, ''precision'' or ''resolution'') of a number in positional notation are digits in the number that are reliable and necessary to indicate the quantity of something. If a number expres ...
. Each bin is then processed recursively using the next digit, until all digits have been used for sorting. Neither in-place binary-radix sort nor n-bit-radix sort, discussed in paragraphs above, are stable algorithms.


Stable MSD radix sort implementations

MSD radix sort can be implemented as a stable algorithm, but requires the use of a memory buffer of the same size as the input array. This extra memory allows the input buffer to be scanned from the first array element to last, and move the array elements to the destination bins in the same order. Thus, equal elements will be placed in the memory buffer in the same order they were in the input array. The MSD-based algorithm uses the extra memory buffer as the output on the first level of recursion, but swaps the input and output on the next level of recursion, to avoid the overhead of copying the output result back to the input buffer. Each of the bins are recursively processed, as is done for the in-place MSD radix sort. After the sort by the last digit has been completed, the output buffer is checked to see if it is the original input array, and if it's not, then a single copy is performed. If the digit size is chosen such that the key size divided by the digit size is an even number, the copy at the end is avoided.


Hybrid approaches

Radix sort, such as the two-pass method where
counting sort In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess dis ...
is used during the first pass of each level of recursion, has a large constant overhead. Thus, when the bins get small, other sorting algorithms should be used, such as
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
. A good implementation of
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
is fast for small arrays, stable, in-place, and can significantly speed up radix sort.


Application to parallel computing

This recursive sorting algorithm has particular application to
parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different f ...
, as each of the bins can be sorted independently. In this case, each bin is passed to the next available processor. A single processor would be used at the start (the most significant digit). By the second or third digit, all available processors would likely be engaged. Ideally, as each subdivision is fully sorted, fewer and fewer processors would be utilized. In the worst case, all of the keys will be identical or nearly identical to each other, with the result that there will be little to no advantage to using parallel computing to sort the keys. In the top level of recursion, opportunity for parallelism is in the
counting sort In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess dis ...
portion of the algorithm. Counting is highly parallel, amenable to the parallel_reduce pattern, and splits the work well across multiple cores until reaching memory bandwidth limit. This portion of the algorithm has data-independent parallelism. Processing each bin in subsequent recursion levels is data-dependent, however. For example, if all keys were of the same value, then there would be only a single bin with any elements in it, and no parallelism would be available. For random inputs all bins would be near equally populated and a large amount of parallelism opportunity would be available. There are faster parallel sorting algorithms available, for example optimal complexity O(log(''n'')) are those of the Three Hungarians and Richard Cole and Batcher's bitonic merge sort has an algorithmic complexity of O(log2(''n'')), all of which have a lower algorithmic time complexity to radix sort on a CREW-
PRAM Pram or PRAM may refer to: a bulbous growth on senior canines, varying in size, usually benign and painless. If it bursts, it will ooze pus and blood. Places * Pram, Austria, a municipality in the district of Grieskirchen in the Austrian state o ...
. The fastest known PRAM sorts were described in 1991 by David Powers with a parallelized quicksort that can operate in O(log(n)) time on a CRCW-PRAM with ''n'' processors by performing partitioning implicitly, as well as a radixsort that operates using the same trick in O(''k''), where ''k'' is the maximum keylength. However, neither the PRAM architecture or a single sequential processor can actually be built in a way that will scale without the number of constant
fan-out In digital electronics, the fan-out is the number of gate inputs driven by the output of another single logic gate. In most designs, logic gates are connected to form more complex circuits. While no logic gate input can be fed by more than one ...
gate delays per cycle increasing as O(log(''n'')), so that in effect a pipelined version of Batcher's bitonic mergesort and the O(log(''n'')) PRAM sorts are all O(log2(''n'')) in terms of clock cycles, with Powers acknowledging that Batcher's would have lower constant in terms of gate delays than his Parallel
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
and radix sort, or Cole's
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
, for a keylength-independent sorting network of O(nlog2(''n'')).David M. W. Powers
Parallel Unification: Practical Complexity
Australasian Computer Architecture Workshop, Flinders University, January 1995


Tree-based radix sort

Radix sorting can also be accomplished by building a
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
(or radix tree) from the input set, and doing a pre-order traversal. This is similar to the relationship between
heapsort In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks ...
and the heap data structure. This can be useful for certain data types, see burstsort.


See also

* IBM 80 series Card Sorters * Other
distribution sort In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is importa ...
s * Kirkpatrick-Reisch sorting * Prefix sum


References


External links


Explanation, Pseudocode and implementation
in C and Java
High Performance Implementation
of LSD Radix sort in
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...

High Performance Implementation
of LSD & MSD Radix sort in C# with source i
GitHubVideo tutorial of MSD Radix Sort

Demonstration and comparison
of Radix sort with
Bubble sort Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the input list element by element, comparing the current element with the one after it, swapping their values if needed. These passe ...
,
Merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
and
Quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
implemented in
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...

Article
about Radix sorting IEEE floating-point numbers with implementation.
Faster Floating Point Sorting and Multiple Histogramming
with implementation in C++ *Pointers t
radix sort visualizationsUSort library
contains tuned implementations of radix sort for most numerical C types (C99) *
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
. ''The Art of Computer Programming'', Volume 3: ''Sorting and Searching'', Third Edition. Addison-Wesley, 1997. . Section 5.2.5: Sorting by Distribution, pp. 168–179. *
Thomas H. Cormen Thomas H. Cormen is the co-author of ''Introduction to Algorithms'', along with Charles Leiserson, Ron Rivest, and Cliff Stein. In 2013, he published a new book titled '' Algorithms Unlocked''. He is a professor of computer science at Dartmou ...
, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. ''
Introduction to Algorithms ''Introduction to Algorithms'' is a book on computer programming by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. The book has been widely used as the textbook for algorithms courses at many universities and is ...
'', Second Edition. MIT Press and McGraw-Hill, 2001. . Section 8.3: Radix sort, pp. 170–173.
BRADSORT v1.50 source code

Efficient Trie-Based Sorting of Large Sets of Strings
by Ranjan Sinha and Justin Zobel. This paper describes a method of creating tries of buckets which figuratively burst into sub-tries when the buckets hold more than a predetermined capacity of strings, hence the name, "Burstsort".

Pat Morin * ttp://opendatastructures.org/ods-cpp/11_2_Counting_Sort_Radix_So.html Open Data Structures - C++ Edition - Section 11.2 - Counting Sort and Radix Sort Pat Morin {{DEFAULTSORT:Radix Sort Articles with example C code Sorting algorithms Stable sorts String sorting algorithms