In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, partial sorting is a
relaxed variant of the
sorting problem. Total sorting is the problem of returning a list of items such that its elements all appear in order, while partial sorting is returning a list of the ''k'' smallest (or ''k'' largest) elements in order. The other elements (above the ''k'' smallest ones) may also be sorted, as in an in-place partial sort, or may be discarded, which is common in streaming partial sorts. A common practical example of partial sorting is computing the "Top 100" of some list.
In terms of indices, in a partially sorted list, for every index ''i'' from 1 to ''k,'' the ''i''-th element is in the same place as it would be in the fully sorted list: element ''i'' of the partially sorted list contains
order statistic
In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference.
Importa ...
''i'' of the input list.
Offline problems
Heap-based solution
Heaps admit a simple single-pass partial sort when is fixed: insert the first elements of the input into a max-heap. Then make one pass over the remaining elements, add each to the heap in turn, and remove the largest element. Each insertion operation takes time, resulting in time overall; this "partial heapsort" algorithm is practical for small values of and in
online
In computer technology and telecommunications, online indicates a state of connectivity and offline indicates a disconnected state. In modern terminology, this usually refers to an Internet connection, but (especially when expressed "on line" o ...
settings.
An "online heapselect" algorithm described below, based on a min-heap, takes .
Solution by partitioning selection
A further relaxation requiring only a list of the smallest elements, but without requiring that these be ordered, makes the problem equivalent to
partition-based selection; the original partial sorting problem can be solved by such a selection algorithm to obtain an array where the first elements are the smallest, and sorting these, at a total cost of operations. A popular choice to implement this algorithm scheme is to combine
quickselect
In computer science, quickselect is a selection algorithm to find the ''k''th smallest element in an unordered list. It is also known as the kth order statistics . It is related to the quicksort sorting algorithm. Like quicksort, it was devel ...
and
quicksort
Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
; the result is sometimes called "quickselsort".
Common in current (as of 2022) C++ STL implementations is a pass of
heapselect
In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a ''max heap'', for any given node C, if P is a parent node of C, then the ''key'' (the '' ...
for a list of ''k'' elements, followed by a
heapsort
In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the ...
for the final result.
Specialised sorting algorithms
More efficient than the aforementioned are specialized partial sorting algorithms based on
mergesort
In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
and
quicksort
Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
. In the quicksort variant, there is no need to recursively sort partitions which only contain elements that would fall after the 'th place in the final sorted array (starting from the "left" boundary). Thus, if the pivot falls in position or later, we recurse only on the left partition:
function partial_quicksort(A, i, j, k) is
if i < j then
p ← pivot(A, i, j)
p ← partition(A, i, j, p)
partial_quicksort(A, i, p-1, k)
if p < k-1 then
partial_quicksort(A, p+1, j, k)
The resulting algorithm is called partial quicksort and requires an ''expected'' time of only , and is quite efficient in practice, especially if a
selection sort
In computer science, selection sort is an in-place comparison sorting algorithm. It has an O(''n''2) time complexity, which makes it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is n ...
is used as a base case when becomes small relative to . However, the worst-case time complexity is still very bad, in the case of a bad pivot selection. Pivot selection along the lines of the worst-case linear time selection algorithm (see ) could be used to get better worst-case performance. Partial quicksort, quickselect (including the multiple variant), and quicksort can all be generalized into what is known as a ''chunksort''.
Incremental sorting
Incremental sorting is a version of the partial sorting problem where the input is given up front but is unknown: given a -sorted array, it should be possible to extend the partially sorted part so that the array becomes -sorted.
Heaps lead to an "online heapselect" solution to incremental partial sorting: first "heapify", in linear time, the complete input array to produce a min-heap. Then extract the minimum of the heap times.
A different incremental sort can be obtained by modifying quickselect. The version due to Paredes and Navarro maintains a
stack
Stack may refer to:
Places
* Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group
* Blue Stack Mountains, in Co. Donegal, Ireland
People
* Stack (surname) (including a list of people ...
of pivots across calls, so that incremental sorting can be accomplished by repeatedly requesting the smallest item of an array from the following algorithm:
Algorithm returns the 'th smallest element in
* If :
** Pop
** Return
* Let
* Update
* Push onto
* Return
The stack is initialized to contain only the length of . -sorting the array is done by calling for ; this sequence of calls has
average-case complexity
In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case comp ...
, which is asymptotically equivalent to . The worst-case time is quadratic, but this can be fixed by replacing the random pivot selection by the
median of medians
In computer science, the median of medians is an approximate (median) selection algorithm, frequently used to supply a good pivot for an exact selection algorithm, mainly the quickselect, that selects the ''k''th smallest element of an initially ...
algorithm.
Language/library support
* The
C++ standard specifies a library function called
std::partial_sort
/code>.
* The Python standard library includes functions nlargest
/code> and
/code> in its heapq
module.
* The Julia
Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e ...
standard library includes a PartialQuickSort
/code> implementation.
See also
* Selection algorithm
In computer science, a selection algorithm is an algorithm for finding the ''k''th smallest number in a list or array; such a number is called the ''k''th ''order statistic''. This includes the cases of finding the minimum, maximum, and median e ...
References
{{reflist
External links
* J.M. Chambers (1971)
Partial sorting
CACM 14(5):357–358.
Sorting algorithms
Online sorts
Articles with example pseudocode