American Flag Sort
   HOME

TheInfoList



OR:

An American flag sort is an efficient,
in-place In computer science, an in-place algorithm is an algorithm that operates directly on the input data structure without requiring extra space proportional to the input size. In other words, it modifies the input in place, without creating a separa ...
variant of
radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
that distributes items into buckets. Non-comparative sorting algorithms such as radix sort and American flag sort are typically used to sort large objects such as strings, for which comparison is not a unit-time operation. American flag sort iterates through the bits of the objects, considering several bits of each object at a time. For each set of bits, American flag sort makes two passes through the array of objects: first to count the number of objects that will fall in each bin, and second to place each object in its bucket. This works especially well when sorting a byte at a time, using 256 buckets. With some optimizations, it is twice as fast as
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
for large sets of
strings String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
. The name
American flag The national flag of the United States, often referred to as the American flag or the U.S. flag, consists of thirteen horizontal Bar (heraldry), stripes, Variation of the field, alternating red and white, with a blue rectangle in the Canton ( ...
sort comes by
analogy Analogy is a comparison or correspondence between two things (or two groups of things) because of a third element that they are considered to share. In logic, it is an inference or an argument from one particular to another particular, as oppose ...
with the Dutch national flag problem in the last step: efficiently partition the array into many "stripes".


Algorithm

Sorting algorithms in general sort a list of objects according to some ordering scheme. In contrast to comparison-based sorting algorithms, such as
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
, American flag sort is based on directly comparing the bytes (numerical representation) of the underlying objects. In-place sorting algorithms, including American flag sort, run without allocating a significant amount of memory beyond that used by the original array. This is a significant advantage, both in memory savings and in time saved copying the array. American flag sort works by successively dividing a list of objects into buckets based on the first digit of their base-N representation (the base used is referred to as the ''radix''). When ''N'' is 3, each object can be swapped into the correct bucket by using the Dutch national flag algorithm. When ''N'' is larger, however, objects cannot be immediately swapped into place, because it is unknown where each bucket should begin and end. American flag sort gets around this problem by making two passes through the array. The first pass counts the number of objects that belong in each of the ''N'' buckets. The beginning of each bucket is then computed as the sum of sizes of the preceding buckets. The second pass puts each object into the correct bucket. American flag sort is most efficient with a radix that is a power of 2, because bit-shifting operations can be used instead of expensive
exponentiation In mathematics, exponentiation, denoted , is an operation (mathematics), operation involving two numbers: the ''base'', , and the ''exponent'' or ''power'', . When is a positive integer, exponentiation corresponds to repeated multiplication ...
s to compute the value of each digit. When sorting strings using 8- or 7-bit encodings such as
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
, it is typical to use a radix of 256 or 128, which amounts to sorting character-by-character.


Performance considerations

For pure English alphabet text, the counts histogram is always sparse. Depending on the hardware, it may be worth clearing the counts in correspondence with completing a bucket (as in the original paper); or it may be worth maintaining a max and min active bucket, or a more complex data structure suitable for sparse arrays. It is also important to use a more basic sorting method for very small data sets, except in pathological cases where keys may share very long prefixes. Most critically, this algorithm follows a random permutation, and is thus particularly cache-unfriendly for large datasets. It is a suitable algorithm in conjunction with a ''k''-way merge algorithm. (The original paper was written before cached memory was in common use.)


Pseudocode

american_flag_sort(Array, Radix)
    for each digit D:
        # first pass: compute counts
        Counts <- zeros(Radix)
        for object X in Array:
            Counts igit D of object X in base Radix+= 1
        # compute bucket offsets
        Offsets <-  sum(Counts[0..i for i in 1..Radix">..i">sum(Counts[0..i for i in 1..Radix        # swap objects into place
        for object X in Array:
            swap X to the bucket starting at Offsets[digit D of X in base Radix]
        for each Bucket:
            american_flag_sort(Bucket, Radix)


Sample implementation in Python

This example written in the Python programming language will perform American flag sort for any radix of 2 or greater. Simplicity of exposition is chosen over clever programming, and so the log function is used instead of bit shifting techniques. from math import floor, log from copy import copy def get_radix_val(x, digit, radix): return int(floor(x / radix**digit)) % radix def compute_offsets(a_list, start, end, digit, radix): counts = for _ in range(radix) for i in range(start, end): val = get_radix_val(a_list digit, radix) counts al+= 1 offset = start offsets =
tart A tart is a baked dish consisting of a filling over a pastry base with an open top not covered with pastry. The pastry is usually shortcrust pastry; the filling may be sweet or savoury, though modern tarts are usually fruit-based, sometimes with ...
for i in range(radix): offset += counts offsets.append(offset) return offsets def swap(a_list, offsets, start, end, digit, radix): next_free = copy(offsets) cur_block = 0 while cur_block < radix: i = next_free ur_block if i >= offsets ur_block+1 cur_block += 1 continue radix_val = get_radix_val(a_list digit, radix) if radix_val != cur_block: swap_to = next_free adix_val a_list a_list wap_to= a_list wap_to a_list next_free adix_val+= 1 def american_flag_sort_helper(a_list, start, end, digit, radix): offsets = compute_offsets(a_list, start, end, digit, radix) swap(a_list, offsets, start, end, digit, radix) if digit

0: return for i in range(len(offsets)-1): american_flag_sort_helper(a_list, offsets offsets +1 digit-1, radix) def american_flag_sort(a_list, radix): for x in a_list: assert type(x)

int max_val = max(a_list) max_digit = int(floor(log(max_val, radix))) american_flag_sort_helper(a_list, 0, len(a_list), max_digit, radix)


See also

*
Bucket sort Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an Array data structure, array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recu ...
* Multi-key quicksort *
Radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
* Dutch national flag problem


References


General

* {{Sorting String sorting algorithms Articles with example pseudocode Articles with example Python (programming language) code