In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
and
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, a random binary tree is a
binary tree
In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binar ...
selected at random from some
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
on binary trees. Two different distributions are commonly used: binary trees formed by inserting nodes one at a time according to a
random permutation
A random permutation is a random ordering of a set of objects, that is, a permutation-valued random variable. The use of random permutations is often fundamental to fields that use randomized algorithms such as coding theory, cryptography, and sim ...
, and binary trees chosen from a
uniform discrete distribution in which all distinct trees are equally likely. It is also possible to form other distributions, for instance by repeated splitting. Adding and removing nodes directly in a random binary tree will in general disrupt its random structure, but the
treap and related randomized binary search tree
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the rel ...
s use the principle of binary trees formed from a random permutation in order to maintain a
balanced binary search tree dynamically as nodes are inserted and deleted.
For random trees that are not necessarily binary, see
random tree.
Binary trees from random permutations
For any set of numbers (or, more generally, values from some
total order
In mathematics, a total or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation \leq on some set X, which satisfies the following for all a, b and c in X:
# a \leq a ( reflexiv ...
), one may form a
binary search tree
In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and ...
in which each number is inserted in sequence as a leaf of the tree, without changing the structure of the previously inserted numbers. The position into which each number should be inserted is uniquely determined by a
binary search
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
in the tree formed by the previous numbers. For instance, if the three numbers (1,3,2) are inserted into a tree in that sequence, the number 1 will sit at the root of the tree, the number 3 will be placed as its right child, and the number 2 as the left child of the number 3. There are six different permutations of the numbers (1,2,3), but only five trees may be constructed from them. That is because the permutations (2,1,3) and (2,3,1) form the same tree.
Expected depth of a node
For any fixed choice of a value in a given set of numbers, if one randomly permutes the numbers and forms a binary tree from them as described above, the
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
of the length of the path from the root of the tree to is at most , where "" denotes the
natural logarithm
The natural logarithm of a number is its logarithm to the base of the mathematical constant , which is an irrational and transcendental number approximately equal to . The natural logarithm of is generally written as , , or sometimes, if ...
function and the introduces
big O notation. For, the expected number of ancestors of is by linearity of expectation equal to the sum, over all other values in the set, of the probability that is an ancestor of . And a value is an ancestor of exactly when is the first element to be inserted from the elements in the interval . Thus, the values that are adjacent to in the sorted sequence of values have probability of being an ancestor of , the values one step away have probability , etc. Adding these probabilities for all positions in the sorted sequence gives twice a
Harmonic number, leading to the bound above. A bound of this form holds also for the expected search length of a path to a fixed value that is not part of the given set.
To understand it by using min-max records. The number in a random permutation is the min (max) record means it is the min (max) value from the first position to its position. Consider a simple example = (2, 4, 3, 6, 5, 1). The min records in are 2, 1 by searching the min value from start to the end. Similarly, the max recodes in are 2, 4, 6. The expected number of min (max) records is all probability of each number in a random permutation. For position , it has the probability of 1/. Therefore, the expected number of min (max) records is a Harmonic number. To search 3 in , all numbers in (2, 1) is less than 3, and (4, 6, 5, 1) is bigger than 3. The searching on 's random binary tree only counts max records on (2, 1) and min records on (4, 6, 5, 1)—this is why it is twice a Harmonic number.
The longest path
Although not as easy to analyze as the average path length, there has also been much research on determining the expectation (or high probability bounds) of the length of the longest path in a binary search tree generated from a random insertion order. It is now known that this length, for a tree with nodes, is almost surely
:
where is the unique number in the range satisfying the equation
:
Expected number of leaves
In the random permutation model, each of the numbers from the set of numbers used to form the tree, except for the smallest and largest of the numbers, has probability of being a leaf in the tree, for it is a leaf when it inserted after its two neighbors, and any of the six permutations of these two neighbors and it are equally likely. By similar reasoning, the smallest and largest of the numbers have probability of being a leaf. Therefore, the expected number of leaves is the sum of these probabilities, which for is exactly .
Strahler Number
The
Strahler number
In mathematics, the Strahler number or Horton–Strahler number of a mathematical tree is a numerical measure of its branching complexity.
These numbers were first developed in hydrology by and ; in this application, they are referred to as the ...
of a tree is a more sensitive measure of the distance from a leaf in which a node has Strahler number whenever it has either a child with that number or two children with number . For ''n''-node random binary search trees, simulations suggest that the expected Strahler number is
. However, only the upper bound
has actually been proven.
Treaps and randomized binary search trees
In applications of binary search tree data structures, it is rare for the values in the tree to be inserted without deletion in a random order, limiting the direct applications of random binary trees. However, algorithm designers have devised data structures that allow insertions and deletions to be performed in a binary search tree, at each step maintaining as an invariant the property that the shape of the tree is a random variable with the same distribution as a random binary search tree.
If a given set of ordered numbers is assigned numeric priorities (distinct numbers unrelated to their values), these priorities may be used to construct a
Cartesian tree for the numbers, a binary tree that has as its inorder traversal sequence the sorted sequence of the numbers and that is
heap-ordered by priorities. Although more efficient construction algorithms are known, it is helpful to think of a Cartesian tree as being constructed by inserting the given numbers into a binary search tree in priority order. Thus, by choosing the priorities either to be a set of independent random real numbers in the unit interval, or by choosing them to be a random permutation of the numbers from to (where is the number of nodes in the tree), and by maintaining the heap ordering property using
tree rotations after any insertion or deletion of a node, it is possible to maintain a data structure that behaves like a random binary search tree. Such a data structure is known as a
treap or a randomized binary search tree.
Uniformly random binary trees
The number of binary trees with ''n'' nodes is a
Catalan number
In combinatorial mathematics, the Catalan numbers are a sequence of natural numbers that occur in various counting problems, often involving recursively defined objects. They are named after the French-Belgian mathematician Eugène Charles C ...
: for these numbers of trees are
: .
Thus, if one of these trees is selected uniformly at random, its probability is the
reciprocal of a Catalan number. Trees in this model have expected depth proportional to the square root of , rather than to the logarithm. However, the expected
Strahler number
In mathematics, the Strahler number or Horton–Strahler number of a mathematical tree is a numerical measure of its branching complexity.
These numbers were first developed in hydrology by and ; in this application, they are referred to as the ...
of a uniformly random ''n''-node binary tree is
lower than the expected Strahler number of random binary search trees.
Due to their large heights, this model of equiprobable random trees is not generally used for binary search trees, but it has been applied to problems of modeling the
parse tree
A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in comp ...
s of
algebraic expressions in
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
design (where the above-mentioned bound on Strahler number translates into the
number of registers needed to evaluate an expression) and for modeling
evolutionary tree
A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological s ...
s. In some cases the analysis of random binary trees under the random permutation model can be automatically transferred to the uniform model.
[, p. 70.]
Random split trees
generate random binary trees with nodes by generating a real-valued random variable in the unit interval , assigning the first nodes (rounded down to an integer number of nodes) to the left subtree, the next node to the root, and the remaining nodes to the right subtree, and continuing recursively in each subtree. If is chosen uniformly at random in the interval, the result is the same as the random binary search tree generated by a random permutation of the nodes, as any node is equally likely to be chosen as root; however, this formulation allows other distributions to be used instead. For instance, in the uniformly random binary tree model, once a root is fixed each of its two subtrees must also be uniformly random, so the uniformly random model may also be generated by a different choice of distribution for . As Devroye and
Kruszewski show, by choosing a
beta distribution on and by using an appropriate choice of shape to draw each of the branches, the mathematical trees generated by this process can be used to create realistic-looking botanical trees.
Notes
References
*.
*.
*.
*.
*.
*.
*.
*.
*.
*.
*.
*.
*.
*.
*.
*{{citation
, last1 = Seidel , first1 = Raimund
, last2 = Aragon , first2 = Cecilia R.
, doi = 10.1007/s004539900061
, issue = 4/5
, journal = Algorithmica
, pages = 464–497
, title = Randomized Search Trees
, url = http://citeseer.ist.psu.edu/seidel96randomized.html
, volume = 16
, year = 1996.
External links
Open Data Structures - Chapter 7 - Random Binary Search Trees Pat Morin
Binary trees
Statistical randomness
Probabilistic data structures