
ProxmapSort, or Proxmap sort, is a
sorting algorithm
In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is importan ...
that works by partitioning an
array
An array is a systematic arrangement of similar objects, usually in rows and columns.
Things called an array include:
{{TOC right
Music
* In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
of data items, or keys, into a number of "subarrays" (termed buckets, in similar sorts). The name is short for computing a "proximity map," which indicates for each key K the beginning of a subarray where K will reside in the final sorted order. Keys are placed into each subarray using
insertion sort
Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Howe ...
. If keys are "well distributed" among the subarrays, sorting occurs in linear time. The
computational complexity
In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations) ...
estimates involve the number of subarrays and the proximity mapping function, the "map key," used. It is a form of
bucket
A bucket is typically a watertight, vertical cylinder or truncated cone or square, with an open top and a flat bottom, attached to a semicircular carrying handle called the ''bail''.
A bucket is usually an open-top container. In contrast, a p ...
and
radix sort
In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process ...
.
Once a ProxmapSort is complete, ProxmapSearch can be used to find keys in the sorted array in
time if the keys were well distributed during the sort.
Both algorithms were invented in the late 1980s by Prof. Thomas A. Standish at the
University of California, Irvine
The University of California, Irvine (UCI or UC Irvine) is a public land-grant research university in Irvine, California. One of the ten campuses of the University of California system, UCI offers 87 undergraduate degrees and 129 graduate and p ...
.
Overview
Basic strategy
In general:
Given an array A with ''n'' keys:
* map a key to a subarray of the destination array A2, by applying the map key function to each array item
* determine how many keys will map to the same subarray, using an array of "hit counts," H
* determine where each subarray will begin in the destination array so that each bucket is exactly the right size to hold all the keys that will map to it, using an array of "proxmaps," P
* for each key, compute the subarray it will map to, using an array of "locations," L
* for each key, look up its location, place it into that cell of A2; if it collides with a key already in that position, insertion sort the key into place, moving keys greater than this key to the right by one to make a space for this key. Since the subarray is big enough to hold all the keys mapped to it, such movement will never cause the keys to overflow into the following subarray.
Simplied version:
Given an array A with ''n'' keys
# Initialize: Create and initialize 2 arrays of ''n'' size: hitCount, proxMap, and 2 arrays of A.length: location, and A2.
# Partition: Using a carefully chosen mapKey function, divide the A2 into subarrays using the keys in A
# Disperse: Read over A, dropping each key into its bucket in A2; insertion sorting as needed.
# Collect: Visit the subarrays in order and put all the elements back into the original array, or simply use A2.
Note: "keys" may also contain other data, for instance an array of Student objects that contain the key plus a student ID and name. This makes ProxMapSort suitable for organizing groups of objects, not just keys themselves.
Example
Consider a full array: A
'0'' to ''n-1''with ''n'' keys. Let ''i'' be an index of A. Sort A's keys into array A2 of equal size.
The map key function is defined as mapKey(key) = floor(K).
Pseudocode
// compute hit counts
for i = 0 to 11 // where 11 is n
for i = 0 to 12 // where 12 is A.length
runningTotal = 0; // compute prox map – location of start of each subarray
for i = 0 to 11
if H = 0
P = -9;
else
P = runningTotal;
runningTotal = runningTotal + H
for i = 0 to 12 // compute location – subarray – in A2 into which each item in A is to be placed
L = P apKey(A[i">.html" ;"title="apKey(A[i">apKey(A[i
for I = 0 to 12; // sort items
A2[I">">apKey(A[i<_a>.html" ;"title=".html" ;"title="apKey(A[i">apKey(A[i">.html" ;"title="apKey(A[i">apKey(A[i
for I = 0 to 12; // sort items
A2[I= ;
for i = 0 to 12 // insert each item into subarray beginning at start, preserving order
Here A is the array to be sorted and the mapKey functions determines the number of subarrays to use. For example, floor(K) will simply assign as many subarrays as there are integers from the data in A. Dividing the key by a constant reduces the number of subarrays; different functions can be used to translate the range of elements in A to subarrays, such as converting the letters A–Z to 0–25 or returning the first character (0–255) for sorting strings. Subarrays are sorted as the data comes in, not after all data has been placed into the subarray, as is typical in bucket sorting.
Proxmap searching
ProxmapSearch uses the proxMap array generated by a previously done ProxmapSort to find keys in the sorted array A2 in constant time.
Basic strategy
* Sort the keys using ProxmapSort, keeping the MapKey function, and the P and A2 arrays
* To search for a key, go to P
apKey(k) the start of the subarray that contains the key, if that key is in the data set
*Sequentially search the subarray; if the key is found, return it (and associated information); if find a value greater than the key, the key is not in the data set
* Computing P
apKey(k)takes
time. If a map key that gives a good distribution of keys was used during the sort, each subarray is bounded above by a constant ''c'', so at most ''c'' comparisons are needed to find the key or know it is not present; therefore ProxmapSearch is
. If the worst map key was used, all keys are in the same subarray, so ProxmapSearch, in this worst case, will require
comparisons.
Pseudocode
function mapKey(key) is
return floor(key)
proxMap ← previously generated proxmap array of size n
A2 ← previously sorted array of size n
function proxmap-search(key) is
for i = proxMap
apKey(key)to length(array) − 1 do
if sortedArray
key key then
return sortedArray
Analysis
Performance
Computing H, P, and L all take
time. Each is computed with one pass through an array, with constant time spent at each array location.
* Worst case: MapKey places all items into one subarray, resulting in a standard insertion sort, and time of
.
* Best case: MapKey delivers the same small number of items to each subarray in an order where the best case of insertion sort occurs. Each insertion sort is
, ''c'' the size of the subarrays; there are ''p'' subarrays thus p * c = n, so the insertion phase take O(n); thus, ProxmapSort is
.
* Average case: Each subarray is at most size ''c'', a constant; insertion sort for each subarray is then O(c^2) at worst – a constant. (The actual time can be much better, since c items are not sorted until the last item is placed in the bucket). Total time is the number of buckets, (n/c), times
=
.
Having a good MapKey function is imperative for avoiding the worst case. We must know something about the distribution of the data to come up with a good key.
Optimizations
# Save time: Save the MapKey(i) values so they don't have to be recomputed (as they are in the code above)
# Save space: The proxMaps can be stored in the hitCount array, as the hit counts are not needed once the proxmap is computed; the data can be sorted back into A, instead of using A2, if one takes care to note which A values have been sorted so far, and which not.
JavaScript code implementation:
Array.prototype.ProxmapSort = function()
;
Comparison with other sorting algorithms
Since ProxmapSort is not a
comparison sort
A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator or a three-way comparison) that determines which of two elements should oc ...
, the Ω(''n'' log ''n'') lower bound is inapplicable. Its speed can be attributed to it not being comparison-based and using arrays instead of dynamically allocated objects and pointers that must be followed, such as is done with when using a
binary search tree
In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and ...
.
ProxmapSort allows for the use of ProxmapSearch. Despite the O(n) build time, ProxMapSearch makes up for it with its
average access time, making it very appealing for large databases. If the data doesn't need to be updated often, the access time may make this function more favorable than other
non-comparison sorting based sorts.
Generic bucket sort related to ProxmapSort
Like ProxmapSort, bucket sort generally operates on a list of ''n'' numeric inputs between zero and some maximum key or value ''M'' and divides the value range into ''n'' buckets each of size ''M''/''n''. If each bucket is sorted using
insertion sort
Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Howe ...
, ProxmapSort and bucket sort can be shown to run in predicted linear time.
Thomas H. Cormen
Thomas H. Cormen is the co-author of ''Introduction to Algorithms'', along with Charles Leiserson, Ron Rivest, and Cliff Stein. In 2013, he published a new book titled '' Algorithms Unlocked''. He is a professor of computer science at Dartmout ...
, Charles E. Leiserson
Charles Eric Leiserson is a computer scientist, specializing in the theory of parallel computing and distributed computing, and particularly practical applications thereof. As part of this effort, he developed the Cilk multithreaded language. ...
, Ronald L. Rivest, and Clifford Stein
Clifford Seth Stein (born December 14, 1965), a computer scientist, is a professor of industrial engineering and operations research at Columbia University in New York, NY, where he also holds an appointment in the Department of Computer Scien ...
. ''Introduction to Algorithms
''Introduction to Algorithms'' is a book on computer programming by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. The book has been widely used as the textbook for algorithms courses at many universities and is ...
'', Second Edition. MIT Press and McGraw-Hill, 2001. . Section 8.4: Bucket sort, pp.174–177. However, the performance of this sort degrades with clustering (or too few buckets with too many keys); if many values occur close together, they will all fall into a single bucket and performance will be severely diminished. This behavior also holds for ProxmapSort: if the buckets are too large, its performance will degrade severely.
References
* Thomas A. Standish. ''Data Structures in Java.'' Addison Wesley Longman, 1998. . Section 10.6, pp. 394–405.
*
*
* Norman Jacobso
"A Synopsis of ProxmapSort & ProxmapSearch"from Department of Computer Science,
Donald Bren School of Information and Computer Sciences
The Donald Bren School of Information and Computer Sciences, also known colloquially as UCI's School of ICS or simply the Bren School, is an academic unit of University of California, Irvine (UCI), and the only dedicated school of computer scienc ...
,
UC Irvine
UC may refer to:
Arts and entertainment
* '' University Challenge'', a popular British quiz programme airing on BBC Two
** ''University Challenge (New Zealand)'', the New Zealand version of the British programme
* Universal Century, one of the t ...
.
External links
* http://www.cs.uah.edu/~rcoleman/CS221/Sorting/ProxMapSort.html
* https://web.archive.org/web/20120712094020/http://www.valdosta.edu/~sfares/cs330/cs3410.a.sorting.1998.fa.html
* https://web.archive.org/web/20120314220616/http://www.cs.uml.edu/~giam/91.102/Demos/ProxMapSort/ProxMapSort.c
{{DEFAULTSORT:ProxmapSort
Sorting algorithms
Stable sorts
Articles with example pseudocode