Search Data Structure
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a search data structure is any
data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
that allows the efficient retrieval of specific items from a
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
of items, such as a specific record from a
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
. The simplest, most general, and least efficient search structure is merely an unordered sequential
list A list is a Set (mathematics), set of discrete items of information collected and set forth in some format for utility, entertainment, or other purposes. A list may be memorialized in any number of ways, including existing only in the mind of t ...
of all the items. Locating the desired item in such a list, by the
linear search In computer science, linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched. A linear search runs in linea ...
method, inevitably requires a number of operations proportional to the number ''n'' of items, in the
worst case In computer science, best, worst, and average cases of a given algorithm express what the resource usage is ''at least'', ''at most'' and ''on average'', respectively. Usually the resource being considered is running time, i.e. time complexity, b ...
as well as in the average case. Useful search data structures allow faster retrieval; however, they are limited to queries of some specific kind. Moreover, since the cost of building such structures is at least proportional to ''n'', they only pay off if several queries are to be performed on the same database (or on a database that changes little between queries). Static search structures are designed for answering many queries on a fixed database; dynamic structures also allow insertion, deletion, or
modification Modification may refer to: * Modifications of school work for students with special educational needs * Modifications (genetics), changes in appearance arising from changes in the environment * Posttranslational modifications, changes to prote ...
of items between successive queries. In the dynamic case, one must also consider the cost of fixing the search structure to account for the changes in the database.


Classification

The simplest kind of query is to locate a record that has a specific field (the ''key'') equal to a specified value ''v''. Other common kinds of query are "find the item with smallest (or largest) key value", "find the item with largest key value not exceeding ''v''", "find all items with key values between specified bounds ''v''min and ''v''max". In certain databases the key values may be points in some multi-dimensional space. For example, the key may be a geographic position (
latitude In geography, latitude is a geographic coordinate system, geographic coordinate that specifies the north-south position of a point on the surface of the Earth or another celestial body. Latitude is given as an angle that ranges from −90° at t ...
and
longitude Longitude (, ) is a geographic coordinate that specifies the east- west position of a point on the surface of the Earth, or another celestial body. It is an angular measurement, usually expressed in degrees and denoted by the Greek lett ...
) on the
Earth Earth is the third planet from the Sun and the only astronomical object known to Planetary habitability, harbor life. This is enabled by Earth being an ocean world, the only one in the Solar System sustaining liquid surface water. Almost all ...
. In that case, common kinds of queries are "find the record with a key closest to a given point ''v''", or "find all items whose key lies at a given distance from ''v''", or "find all items within a specified region ''R'' of the space". A common special case of the latter are simultaneous range queries on two or more simple keys, such as "find all employee records with salary between 50,000 and 100,000 and hired between 1995 and 2007".


Single ordered keys

* Array if the key values span a moderately compact interval. *Priority-sorted list; see
linear search In computer science, linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched. A linear search runs in linea ...
*Key-sorted array; see
binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
*
Self-balancing binary search tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donald ...
*
Hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...


Finding the smallest element

* Heap


Asymptotic worst-case analysis

In this table, the
asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates Limit of a function#Limits at infinity, tends to infinity. In pro ...
notation ''O''(''f''(''n'')) means "not exceeding some fixed multiple of ''f''(''n'') in the worst case." ''Note'': Insert on an unsorted array is sometimes quoted as being ''O''(''n'') due to the assumption that the element to be inserted must be inserted at one particular location of the array, which would require shifting all the subsequent elements by one position. However, in a classic array, the array is used to store arbitrary unsorted elements, and hence the exact position of any given element is of no consequence, and insert is carried out by increasing the array size by 1 and storing the element at the end of the array, which is a ''O''(1) operation. Likewise, the deletion operation is sometimes quoted as being ''O''(''n'') due to the assumption that subsequent elements must be shifted, but in a classic unsorted array the order is unimportant (though elements are implicitly ordered by insert-time), so deletion can be carried out by swapping the element to be deleted with the last element in the array and then decrementing the array size by 1, which is a ''O''(1) operation. This table is only an approximate summary; for each data structure there are special situations and variants that may lead to different costs. Also two or more data structures can be combined to obtain lower costs.


Footnotes

{{reflist


See also

* List of data structures *
Skip list In computer science, a skip list (or skiplist) is a Randomized algorithm, probabilistic data structure that allows O(\log n) Average-case complexity, average complexity for search as well as O(\log n) average complexity for insertion within an or ...
Data structures