HOME

TheInfoList



OR:

In computer science, a segment tree, also known as a statistic tree, is a tree
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
used for storing information about intervals, or segments. It allows querying which of the stored segments contain a given point. It is, in principle, a static structure and cannot be modified once built. A similar data structure is the interval tree. A segment tree for a set of ''n'' intervals uses ''O''(''n'' log ''n'') storage and can be built in ''O''(''n'' log ''n'') time. Segment trees support searching for all the intervals that contain a query point in time ''O''(log ''n'' + ''k''), ''k'' being the number of retrieved intervals or segments. Applications of the segment tree are in the areas of
computational geometry Computational geometry is a branch of computer science devoted to the study of algorithms which can be stated in terms of geometry. Some purely geometrical problems arise out of the study of computational geometric algorithms, and such problems ar ...
,
geographic information systems A geographic information system (GIS) is a type of database containing geographic data (that is, descriptions of phenomena for which location is relevant), combined with software tools for managing, analyzing, and visualizing those data. In a br ...
and machine learning. The segment tree can be generalized to higher dimension spaces.


Definition


Description

Let ''S'' be a set of intervals, or segments. Let ''p''1, ''p''2, ..., ''pm'' be the list of distinct interval endpoints, sorted from left to right. Consider the partitioning of the real line induced by those points. The regions of this partitioning are called ''elementary intervals''. Thus, the elementary intervals are, from left to right: :(-\infty, p_1), _1,p_1 (p_1, p_2),
_2, p_2 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
\dots, (p_, p_m), _m, p_m (p_m, +\infty) That is, the list of elementary intervals consists of open intervals between two consecutive endpoints ''pi'' and ''p''''i''+1, alternated with closed intervals consisting of a single endpoint. Single points are treated themselves as intervals because the answer to a query is not necessarily the same at the interior of an elementary interval and its endpoints. Given a set of intervals, or segments, a segment tree ''T'' for is structured as follows: * ''T'' is a
binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
. * Its
leaves A leaf (plural, : leaves) is any of the principal appendages of a vascular plant plant stem, stem, usually borne laterally aboveground and specialized for photosynthesis. Leaves are collectively called foliage, as in "autumn foliage", wh ...
correspond to the elementary intervals induced by the endpoints in , in an ordered way: the leftmost leaf corresponds to the leftmost interval, and so on. The elementary interval corresponding to a leaf ''v'' is denoted Int(''v''). * The internal nodes of ''T'' correspond to intervals that are the union of elementary intervals: the interval Int(''N'') corresponding to node ''N'' is the union of the intervals corresponding to the leaves of the tree rooted at ''N''. That implies that Int(''N'') is the union of the intervals of its two children. * Each node or leaf ''v'' in ''T'' stores the interval Int(''v'') and a set of intervals, in some data structure. This canonical subset of node ''v'' contains the intervals 'x'', ''x′''from such that 'x'', ''x′''contains Int(''v'') and does not contain Int(parent(''v'')). That is, each node in ''T'' stores the segments that span through its interval, but do not span through the interval of its parent.


Construction

A segment tree from the set of segments , can be built as follows. First, the endpoints of the intervals in are sorted. The elementary intervals are obtained from that. Then, a balanced binary tree is built on the elementary intervals, and for each node ''v'' it is determined the interval Int(''v'') it represents. It remains to compute the canonical subsets for the nodes. To achieve this, the intervals in are inserted one by one into the segment tree. An interval ''X'' = 'x'', ''x′''can be inserted in a subtree rooted at ''T'', using the following procedure: * If Int(''T'') is contained in ''X'' then store ''X'' at ''T'', and finish. * Else: ** If ''X'' intersects the interval of the left child of ''T'', then insert ''X'' in that child, recursively. ** If ''X'' intersects the interval of the right child of ''T'', then insert ''X'' in that child, recursively. The complete construction operation takes ''O''(''n'' log ''n'') time, ''n'' being the number of segments in .


Query

A query for a segment tree receives a point ''qx''(should be one of the leaves of tree), and retrieves a list of all the segments stored which contain the point ''qx''. Formally stated; given a node (subtree) ''v'' and a query point ''qx'', the query can be done using the following algorithm: # Report all the intervals in . # If ''v'' is not a leaf: #* If ''qx'' is in Int(left child of ''v'') then #** Perform a query in the left child of ''v''. #* If ''qx'' is in Int(right child of ''v'') then #** Perform a query in the right child of ''v''. In a segment tree that contains ''n'' intervals, those containing a given query point can be reported in ''O''(log ''n'' + ''k'') time, where ''k'' is the number of reported intervals.


Storage requirements

A segment tree ''T'' on a set of ''n'' intervals uses ''O''(''n'' log ''n'') storage. :The set has at most 4''n'' + 1 elementary intervals. Because ''T'' is a binary balanced tree with at most 4''n'' + 1 leaves, its height is O(log ''n''). Since any interval is stored at most twice at a given depth of the tree, that the total amount of storage is ''O''(''n'' log ''n'').


Generalization for higher dimensions

The segment tree can be generalized to higher dimension spaces, in the form of multi-level segment trees. In higher dimensional versions, the segment tree stores a collection of axis-parallel (hyper-)rectangles, and can retrieve the rectangles that contain a given query point. The structure uses ''O''(''n'' log''d'' ''n'') storage, and answers queries in ''O''(log''d'' ''n'') time. The use of fractional cascading lowers the query time bound by a logarithmic factor. The use of the interval tree on the deepest level of associated structures lowers the storage bound by a logarithmic factor.


Notes

A query that asks for all the intervals containing a given point is often referred as a ''stabbing query''. The segment tree is less efficient than the interval tree for range queries in one dimension, due to its higher storage requirement: ''O''(''n'' log ''n'') against the O(''n'') of the interval tree. The importance of the segment tree is that the segments within each node’s canonical subset can be stored in any arbitrary manner. For ''n'' intervals whose endpoints are in a small integer range (e.g., in the range ,…,''O''(''n''), optimal data structures exist with a linear preprocessing time and query time ''O''(1 + ''k'') for reporting all ''k'' intervals containing a given query point. Another advantage of the segment tree is that it can easily be adapted to counting queries; that is, to report the number of segments containing a given point, instead of reporting the segments themselves. Instead of storing the intervals in the canonical subsets, it can simply store the number of them. Such a segment tree uses linear storage, and requires an ''O''(log ''n'') query time, so it is optimal. Higher dimensional versions of the interval tree and the priority search tree do not exist; that is, there is no clear extension of these structures that solves the analogous problem in higher dimensions. But the structures can be used as associated structure of segment trees.


History

The segment tree was invented by Jon Bentley in 1977; in "Solutions to Klee’s rectangle problems".


References


Sources cited

* * http://www.cs.nthu.edu.tw/~wkhon/ds/ds10/tutorial/tutorial6.pdf


External links


Segment Tree – CP-Algorithms
{{DEFAULTSORT:Segment Tree Trees (data structures) Binary trees Computer graphics data structures