Algorithm
In quicksort, there is a subprocedure calledpartition
that can, in linear time, group a list (ranging from indices left
to right
) into two parts: those less than a certain element, and those greater than or equal to the element. Here is pseudocode that performs a partition about the element list ivotIndex/code>:
function partition(list, left, right, pivotIndex) is
pivotValue := list ivotIndex swap list ivotIndexand list ight ''// Move pivot to end''
storeIndex := left
for i from left to right − 1 do
if list < pivotValue then
swap list toreIndexand list increment storeIndex
swap list ightand list toreIndex ''// Move pivot to its final place''
return storeIndex
This is known as the Lomuto partition scheme, which is simpler but less efficient than Hoare's original partition scheme.
In quicksort, we recursively sort both branches, leading to best-case time. However, when doing selection, we already know which partition our desired element lies in, since the pivot is in its final sorted position, with all those preceding it in an unsorted order and all those following it in an unsorted order. Therefore, a single recursive call locates the desired element in the correct partition, and we build upon this for quickselect:
''// Returns the k-th smallest element of list within left..right inclusive''
''// (i.e. left <= k <= right).''
function select(list, left, right, k) is
if left = right then ''// If the list contains only one element,''
return list eft ''// return that element''
pivotIndex := ... ''// select a pivotIndex between left and right,''
''// e.g.,'' left + floor(rand() % (right − left + 1))
pivotIndex := partition(list, left, right, pivotIndex)
''// The pivot is in its final sorted position''
if k = pivotIndex then
return list else if k < pivotIndex then
return select(list, left, pivotIndex − 1, k)
else
return select(list, pivotIndex + 1, right, k)
----Note the resemblance to quicksort: just as the minimum-based selection algorithm is a partial selection sort, this is a partial quicksort, generating and partitioning only of its partitions. This simple procedure has expected linear performance, and, like quicksort, has quite good performance in practice. It is also an in-place algorithm
In computer science, an in-place algorithm is an algorithm that operates directly on the input data structure without requiring extra space proportional to the input size. In other words, it modifies the input in place, without creating a separa ...
, requiring only constant memory overhead if tail call optimization is available, or if eliminating the tail recursion with a loop:
function select(list, left, right, k) is
loop
if left = right then
return list eft pivotIndex := ... ''// select pivotIndex between left and right''
pivotIndex := partition(list, left, right, pivotIndex)
if k = pivotIndex then
return list else if k < pivotIndex then
right := pivotIndex − 1
else
left := pivotIndex + 1
Time complexity
Like quicksort, quickselect has good average performance, but is sensitive to the pivot that is chosen. If good pivots are chosen, meaning ones that consistently decrease the search set by a given fraction, then the search set decreases in size exponentially and by induction (or summing the geometric series
In mathematics, a geometric series is a series (mathematics), series summing the terms of an infinite geometric sequence, in which the ratio of consecutive terms is constant. For example, 1/2 + 1/4 + 1/8 + 1/16 + ⋯, the series \tfrac12 + \tfrac1 ...
) one sees that performance is linear, as each step is linear and the overall time is a constant times this (depending on how quickly the search set reduces). However, if bad pivots are consistently chosen, such as decreasing by only a single element each time, then worst-case performance is quadratic: This occurs for example in searching for the maximum element of a set, using the first element as the pivot, and having sorted data. However, for randomly chosen pivots, this worst case is very unlikely: the probability of using more than comparisons, for any sufficiently large constant , is superexponentially small as a function of .
Variants
The easiest solution is to choose a random pivot, which yields almost certain linear time. Deterministically, one can use median-of-3 pivot strategy (as in the quicksort), which yields linear performance on partially sorted data, as is common in the real world. However, contrived sequences can still cause worst-case complexity; David Musser describes a "median-of-3 killer" sequence that allows an attack against that strategy, which was one motivation for his introselect algorithm.
One can assure linear performance even in the worst case by using a more sophisticated pivot strategy; this is done in the median of medians algorithm. However, the overhead of computing the pivot is high, and thus this is generally not used in practice. One can combine basic quickselect with median of medians as fallback to get both fast average case performance and linear worst-case performance; this is done in introselect.
Finer computations of the average time complexity yield a worst case of for random pivots (in the case of the median; other ''k'' are faster).Blum-style analysis of Quickselect
David Eppstein
David Arthur Eppstein (born 1963) is an American computer scientist and mathematician. He is a distinguished professor of computer science at the University of California, Irvine. He is known for his work in computational geometry, graph algor ...
, October 9, 2007. The constant can be improved to 3/2 by a more complicated pivot strategy, yielding the Floyd–Rivest algorithm, which has average complexity of for median, with other ''k'' being faster.
See also
* Floyd–Rivest algorithm
* Introselect
* Median of medians
References
{{reflist
External links
*
qselect
, ''Quickselect algorithm in Matlab,'' Manolis Lourakis
Selection algorithms