Motivation
Quickselect is linear-time on average, but it can require quadratic time with poor pivot choices. This is because quickselect is a divide and conquer algorithm, with each step taking time in the size of the remaining search set. If the search set decreases exponentially quickly in size (by a fixed proportion), this yields aAlgorithm
As stated before, median-of-medians is used as a pivot selection strategy in the quickselect algorithm, which inleft
, right
and n
when implementing. The following pseudocode assumes that left
, right
, and the list use one-based numbering and that select
is initially called with 1 as the argument to left
and the length of the list as the argument to right
. Note that this returns the index of the n'th smallest number after rearranging the list, rather than the actual value of the n'th smallest number.
function select(list, left, right, n)
loop
if left = right then
return left
pivotIndex := pivot(list, left, right)
pivotIndex := partition(list, left, right, pivotIndex, n)
if n = pivotIndex then
return n
else if n < pivotIndex then
right := pivotIndex - 1
else
left := pivotIndex + 1
Subroutine is the actual median-of-medians algorithm. It divides its input (a list of length ) into groups of at most five elements, computes the median of each of those groups using some subroutine, then recursively computes the true median of the medians found in the previous step:. Note that calls ; this is an instance of mutual recursion.
function pivot(list, left, right)
''// for 5 or less elements just get median''
if right − left < 5 then
return partition5(list, left, right)
''// otherwise move the medians of five-element subgroups to the first n/5 positions''
for i from left to right in steps of 5
''// get the median position of the i'th five-element subgroup''
subRight := i + 4
if subRight > right then
subRight := right
median5 := partition5(list, i, subRight)
swap list edian5and list eft + floor((i − left) / 5)
''// compute the median of the n/5 medians-of-five''
mid := floor((right − left) / 10) + left + 1
return select(list, left, left + floor((right − left) / 5), mid)
Partition helper functions
There is a subroutine called that can, in linear time, group a list (ranging from indicesleft
to right
) into three parts, those less than a certain element, those equal to it, and those greater than the element ( a three-way partition). The grouping into three parts ensures that the median-of-medians maintains linear execution time in a case of many or all coincident elements. Here is pseudocode that performs a partition about the element list ivotIndex/code>:
function partition(list, left, right, pivotIndex, n)
pivotValue := list ivotIndex swap list ivotIndexand list ight ''// Move pivot to end''
storeIndex := left
''// Move all elements smaller than the pivot to the left of the pivot''
for i from left to right − 1 do
if list < pivotValue then
swap list toreIndexand list increment storeIndex
''// Move all elements equal to the pivot right after''
''// the smaller elements''
storeIndexEq := storeIndex
for i from storeIndex to right − 1 do
if list = pivotValue then
swap list toreIndexEqand list increment storeIndexEq
swap list ightand list toreIndexEq ''// Move pivot to its final place''
''// Return location of pivot considering the desired location n''
if n < storeIndex then
return storeIndex ''// n is in the group of smaller elements''
if n ≤ storeIndexEq then
return n ''// n is in the group equal to pivot''
return storeIndexEq ''// n is in the group of larger elements''
The subroutine selects the median of a group of at most five elements; an easy way to implement this is insertion sort
Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Howev ...
, as shown below. It can also be implemented as a decision tree
A decision tree is a decision support system, decision support recursive partitioning structure that uses a Tree (graph theory), tree-like Causal model, model of decisions and their possible consequences, including probability, chance event ou ...
.
function partition5(list, left, right)
i := left + 1
while i ≤ right do
j := i
while j > left and list−1
In mathematics, −1 (negative one or minus one) is the additive inverse of 1, that is, the number that when added to 1 gives the additive identity element, 0. It is the negative integer greater than negative two (−2) and less than 0.
...
> list do
swap list−1
In mathematics, −1 (negative one or minus one) is the additive inverse of 1, that is, the number that when added to 1 gives the additive identity element, 0. It is the negative integer greater than negative two (−2) and less than 0.
...
and list j := j − 1
i := i + 1
return left + (right - left) / 2
Properties of pivot
Of the groups, half the number of groups have their median less than the pivot (Median of Medians). Also, another half the number of groups have their median greater than the pivot. In each of the groups with median less than the pivot, there are two elements that are smaller than their respective medians, which are smaller than the pivot. Thus, each of the groups have at least 3 elements that are smaller than the pivot. Similarly, in each of the groups with median greater than the pivot, there are two elements that are greater than their respective medians, which are greater than the pivot. Thus, each of the groups have at least 3 elements that are greater than the pivot. Hence, the pivot is less than elements and greater than another elements. Thus the chosen median splits the ordered elements somewhere between 30%/70% and 70%/30%, which assures worst-case linear behavior of the algorithm. To visualize:
(red = "(one of the two possible) median of medians", gray = "number < red", white = "number > red")
5-tuples are shown here sorted by median, for clarity. Sorting the tuples is not necessary because we only need the median for use as pivot element.
Note that all elements above/left of the red (30% of the 100 elements) are less, and all elements below/right of the red (another 30% of the 100 elements) are greater.
Proof of linear running time
The median-calculating recursive call does not exceed worst-case linear behavior because the list of medians has size , while the other recursive call recurses on at most 70% of the list. Let be the time it takes to run a median-of-medians Quickselect algorithm on an array of size . Then we know this time is:
:
where
* the part is for finding the ''true'' median of the medians, by running an (independent) Quickselect on them (since finding the median is just a special case of selecting a ''k''-smallest element)
* the term is for the partitioning work to create the two sides, one of which our Quickselect will recurse (we visited each element a constant number of times in order to form them into n/5 groups and take each median in time).
* the part is for the actual Quickselect recursion (for the worst case, in which the ''k''-th element is in the bigger partition that can be of size maximally)
By induction:
:
Analysis
The key step is reducing the problem to selecting in two lists whose total length is shorter than the original list, plus a linear factor for the reduction step. This allows a simple induction to show that the overall running time is linear.
The specific choice of groups of five elements is explained as follows. Firstly, computing median of an odd list is faster and simpler; while one could use an even list, this requires taking the average of the two middle elements, which is slower than simply selecting the single exact middle element. Secondly, five is the smallest odd number such that median of medians works. With groups of only three elements, the resulting list of medians to search in is length , and reduces the list to recurse into length , since it is greater than 1/2 × 2/3 = 1/3 of the elements and less than 1/2 × 2/3 = 1/3 of the elements. Thus this still leaves elements to search in, not reducing the problem sufficiently. The individual lists are shorter, however, and one can bound the resulting complexity to by the Akra–Bazzi method, but it does not prove linearity.
Conversely, one may instead group by = seven, nine, or more elements, and this does work. This reduces the size of the list of medians to '','' and the size of the list to recurse into asymptotes at 3''n''/4 (75%), as the quadrants in the above table approximate 25%, as the size of the overlapping lines decreases proportionally. This reduces the scaling factor from 10 asymptotically to 4, but accordingly raises the term for the partitioning work. Finding the median of a larger group takes longer, but is a constant factor (depending only on ), and thus does not change the overall performance as ''n'' grows. In fact, considering the number of comparisons in the worst case, the constant factor is .
If one instead groups the other way, say dividing the element list into 5 lists, computing the median of each, and then computing the median of these – i.e., grouping by a constant fraction, not a constant number – one does not as clearly reduce the problem, since it requires computing 5 medians, each in a list of elements, and then recursing on a list of length at most . As with grouping by 3, the individual lists are shorter, but the overall length is no shorter – in fact longer – and thus one can only prove superlinear bounds. Grouping into a square of lists of length is similarly complicated.
References
*
{{refend
External links
*
Lecture notes for January 30, 1996: Deterministic selection
, ''ICS 161: Design and Analysis of Algorithms,'' David Eppstein
*
Fast Deterministic Selection
, Andrei Alexandrescu
Selection algorithms