HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, introselect (short for "introspective selection") is a selection algorithm that is a hybrid of quickselect and median of medians which has fast average performance and optimal worst-case performance. Introselect is related to the introsort
sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is importan ...
: these are analogous refinements of the basic quickselect and
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
algorithms, in that they both start with the quick algorithm, which has good average performance and low overhead, but fall back to an optimal worst-case algorithm (with higher overhead) if the quick algorithm does not progress rapidly enough. Both algorithms were introduced by
David Musser David "Dave" Musser is a professor emeritus of computer science at the Rensselaer Polytechnic Institute in Troy, New York, United States. He is known for his work in generic programming, particularly as applied to C++, and his collaboration with ...
in , with the purpose of providing
generic algorithm Generic programming is a style of computer programming in which algorithms are written in terms of types ''to-be-specified-later'' that are then ''instantiated'' when needed for specific types provided as parameters. This approach, pioneered by ...
s for the
C++ Standard Library The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard. ISO/ IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the original ANSI C standard, it wa ...
that have both fast average performance and optimal worst-case performance, thus allowing the performance requirements to be tightened. However, in most C++ Standard Library implementations, a different "introselect" algorithm is used, which combines quickselect and
heapselect In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a ''max heap'', for any given node C, if P is a parent node of C, then the ''key'' (the '' ...
, and has a worst-case running time of ''O''(''n'' log ''n''). The C++ draft standard, as of 2022, does not have requirements on the worst-case performance, therefore allowing such choice.


Algorithms

Introsort achieves practical performance comparable to quicksort while preserving ''O''(''n'' log ''n'') worst-case behavior by creating a hybrid of quicksort and
heapsort In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the ...
. Introsort starts with quicksort, so it achieves performance similar to quicksort if quicksort works, and falls back to heapsort (which has optimal worst-case performance) if quicksort does not progress quickly enough. Similarly, introselect combines quickselect with median of medians to achieve worst-case linear selection with performance similar to quickselect. Introselect works by optimistically starting out with quickselect and only switching to a worst-case linear-time selection algorithm (the Blum-Floyd-Pratt-Rivest-Tarjan median of medians algorithm) if it recurses too many times without making sufficient progress. The switching strategy is the main technical content of the algorithm. Simply limiting the recursion to constant depth is not good enough, since this would make the algorithm switch on all sufficiently large lists. Musser discusses a couple of simple approaches: * Keep track of the list of sizes of the subpartitions processed so far. If at any point ''k'' recursive calls have been made without halving the list size, for some small positive ''k'', switch to the worst-case linear algorithm. * Sum the size of all partitions generated so far. If this exceeds the list size times some small positive constant ''k'', switch to the worst-case linear algorithm. This sum is easy to track in a single scalar variable. Both approaches limit the recursion depth to ''k'' ⌈log ''n''⌉ = ''O''(log ''n'') and the total running time to ''O''(''n)''. The paper suggested that more research on introselect was forthcoming, but the author retired in 2007 without having published any such further research.


See also

* Floyd–Rivest algorithm


References

* {{refend Selection algorithms