computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, a self-balancing binary search tree (BST) is any

node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics * Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines ...

-based

binary search tree In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a Rooted tree, rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left ...

that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.

Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...

. ''

The Art of Computer Programming ''The Art of Computer Programming'' (''TAOCP'') is a comprehensive multi-volume monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. it consists of published volumes 1, 2, 3, 4A, and 4 ...

'', Volume 3: ''Sorting and Searching'', Second Edition. Addison-Wesley, 1998. . Section 6.2.3: Balanced Trees, pp.458–481. These operations when designed for a self-balancing binary search tree, contain precautionary measures against boundlessly increasing tree height, so that these abstract data structures receive the attribute "self-balancing". For height-balanced binary trees, the height is defined to be logarithmic

O(\log n)

in the number

n

of items. This is the case for many binary search trees, such as

AVL tree In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by m ...

s and

red–black tree In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, wh ...

Splay tree A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal ...

s and

treap In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of i ...

s are self-balancing but not height-balanced, as their height is not guaranteed to be logarithmic in the number of items. Self-balancing binary search trees provide efficient implementations for mutable ordered lists, and can be used for other abstract data structures such as

associative array In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...

priority queue In computer science, a priority queue is an abstract data type similar to a regular queue (abstract data type), queue or stack (abstract data type), stack abstract data type. In a priority queue, each element has an associated ''priority'', which ...

s and sets.

Overview

Most operations on a binary search tree (BST) take time directly proportional to the height of the tree, so it is desirable to keep the height small. A binary tree with height ''h'' can contain at most 2⁰+2¹+···+2^''h'' = 2^''h''+1−1 nodes. It follows that for any tree with ''n'' nodes and height ''h'': :

n\le 2^-1

And that implies: :

h\ge\lceil\log_2(n+1)-1\rceil\ge \lfloor\log_2 n\rfloor

. In other words, the minimum height of a binary tree with ''n'' nodes is rounded down; that is,

\lfloor \log_2 n\rfloor

. However, the simplest algorithms for BST item insertion may yield a tree with height ''n'' in rather common situations. For example, when the items are inserted in sorted key order, the tree degenerates into a

linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whi ...

with ''n'' nodes. The difference in performance between the two situations may be enormous: for example, when ''n'' = 1,000,000, the minimum height is

\lfloor \log_2(1,000,000) \rfloor = 19

. If the data items are known ahead of time, the height can be kept small, in the average sense, by adding values in a random order, resulting in a

random binary search tree In computer science and probability theory, a random binary tree is a binary tree selected at random from some probability distribution on binary trees. Different distributions have been used, leading to different properties for these trees. Ran ...

. However, there are many situations (such as

online algorithm In computer science, an online algorithm is one that can process its input piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start. In contrast, an of ...

s) where this

randomization Randomization is a statistical process in which a random mechanism is employed to select a sample from a population or assign subjects to different groups.Oxford English Dictionary "randomization" The process is crucial in ensuring the random alloc ...

is not viable. Self-balancing binary trees solve this problem by performing transformations on the tree (such as

tree rotation In discrete mathematics, tree rotation is an operation on a binary tree that changes the structure without interfering with the order of the elements. A tree rotation moves one node up in the tree and one node down. It is used to change the shape ...

s) at key insertion times, in order to keep the height proportional to Although a certain overhead is involved, it is not bigger than the always necessary lookup cost and may be justified by ensuring fast execution of all operations. While it is possible to maintain a BST with minimum height with expected

O(\log n)

time operations (lookup/insertion/removal), the additional space requirements required to maintain such a structure tend to outweigh the decrease in search time. For comparison, an

is guaranteed to be within a factor of 1.44 of the optimal height while requiring only two additional bits of storage in a naive implementation. Therefore, most self-balancing BST algorithms keep the height within a constant factor of this lower bound. In the

asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates Limit of a function#Limits at infinity, tends to infinity. In pro ...

(" Big-O") sense, a self-balancing BST structure containing ''n'' items allows the lookup, insertion, and removal of an item in

O(\log n)

worst-case time, and ordered enumeration of all items in

O(n)

time. For some implementations these are per-operation time bounds, while for others they are

amortized In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case ...

bounds over a sequence of operations. These times are asymptotically optimal among all data structures that manipulate the key only through comparisons.

Implementations

Data structures implementing this type of tree include: * AA tree *

Red–black tree In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, wh ...

* Scapegoat tree * Tango tree *

Treap In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of i ...

* Weight-balanced tree

Applications

Self-balancing binary search trees can be used in a natural way to construct and maintain ordered lists, such as

s. They can also be used for

s; key-value pairs are simply inserted with an ordering based on the key alone. In this capacity, self-balancing BSTs have a number of advantages and disadvantages over their main competitor,

hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...

s. One advantage of self-balancing BSTs is that they allow fast (indeed, asymptotically optimal) enumeration of the items ''in key order'', which hash tables do not provide. One disadvantage is that their lookup algorithms get more complicated when there may be multiple items with the same key. Self-balancing BSTs have better worst-case lookup performance than most

Cuckoo hashing Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick ...

provides worst-case lookup performance of

O(1)

. hash tables (

O(\log n)

compared to

O(n)

), but have worse average-case performance (

O(\log n)

compared to

O(1)

). Self-balancing BSTs can be used to implement any algorithm that requires mutable ordered lists, to achieve optimal worst-case asymptotic performance. For example, if binary tree sort is implemented with a self-balancing BST, we have a very simple-to-describe yet

asymptotically optimal In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor (independent of the input size) worse than the best possible algorithm. It is a term commonly en ...

O(n \log n)

sorting algorithm. Similarly, many algorithms in computational geometry exploit variations on self-balancing BSTs to solve problems such as the line segment intersection problem and the point location problem efficiently. (For average-case performance, however, self-balancing BSTs may be less efficient than other solutions. Binary tree sort, in particular, is likely to be slower than

merge sort In computer science, merge sort (also commonly spelled as mergesort and as ) is an efficient, general-purpose, and comparison sort, comparison-based sorting algorithm. Most implementations of merge sort are Sorting algorithm#Stability, stable, wh ...

quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...

, or

heapsort In computer science, heapsort is an efficient, comparison-based sorting algorithm that reorganizes an input array into a heap (a data structure where each node is greater than its children) and then repeatedly removes the largest node from that ...

, because of the tree-balancing overhead as well as

cache Cache, caching, or caché may refer to: Science and technology * Cache (computing), a technique used in computer storage for easier data access * Cache (biology) or hoarding, a food storing behavior of animals * Cache (archaeology), artifacts p ...

access patterns.) Self-balancing BSTs are flexible data structures, in that it's easy to extend them to efficiently record additional information or perform new operations. For example, one can record the number of nodes in each subtree having a certain property, allowing one to count the number of nodes in a certain key range with that property in

O(\log n)

time. These extensions can be used, for example, to optimize database queries or other list-processing algorithms.

References

External links

Dictionary of Algorithms and Data Structures: Height-balanced binary search tree

GNU libavl
a LGPL-licensed library of binary tree implementations in C, with documentation {{DEFAULTSORT:Self-Balancing Binary Search Tree Binary trees Trees (data structures)

Overview

Implementations

Applications

See also

References

External links