
In
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
and
probability theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
, a random binary tree is a
binary tree
In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the ''left child'' and the ''right child''. That is, it is a ''k''-ary tree with . A recursive definition using set theor ...
selected at random from some
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
on binary trees. Different distributions have been used, leading to different properties for these trees.
Random binary trees have been used for analyzing the
average-case complexity
In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case comp ...
of
data structure
In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
s based on
binary search tree
In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a Rooted tree, rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left ...
s. For this application it is common to use random trees formed by inserting nodes one at a time according to a
random permutation
A random permutation is a sequence where any order of its items is equally likely at random, that is, it is a permutation-valued random variable of a set of objects. The use of random permutations is common in games of chance and in randomized alg ...
. The resulting trees are very likely to have logarithmic depth and logarithmic
Strahler number
In mathematics, the Strahler number or Horton–Strahler number of a mathematical tree (graph theory), tree is a numerical measure of its branching complexity.
These numbers were first developed in hydrology, as a way of measuring the complexity ...
. The
treap
In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of i ...
and related
balanced binary search tree
In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donald ...
s use update operations that maintain this random structure even when the update sequence is non-random.
Other distributions on random binary trees include the
uniform discrete distribution in which all distinct trees are equally likely, distributions on a given number of nodes obtained by repeated splitting, binary
trie
In computer science, a trie (, ), also known as a digital tree or prefix tree, is a specialized search tree data structure used to store and retrieve strings from a dictionary or set. Unlike a binary search tree, nodes in a trie do not store t ...
s and
radix tree
In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The resu ...
s for random data, and trees of variable size generated by
branching process
In probability theory, a branching process is a type of mathematical object known as a stochastic process, which consists of collections of random variables indexed by some set, usually natural or non-negative real numbers. The original purpose of ...
es.
For random trees that are not necessarily binary, see
random tree
In common usage, randomness is the apparent or actual lack of definite pattern or predictability in information. A random sequence of events, symbols or steps often has no :wikt:order, order and does not follow an intelligible pattern or com ...
.
Background

A binary tree is a
rooted tree
In graph theory, a tree is an undirected graph in which any two vertices are connected by path, or equivalently a connected acyclic undirected graph. A forest is an undirected graph in which any two vertices are connected by path, or equi ...
in which each node may have up to two children (the nodes directly below it in the tree), and those children are designated as being either left or right. It is sometimes convenient instead to consider ''extended binary trees'' in which each node is either an ''external node'' with zero children, or an ''internal node'' with exactly two children. A binary tree that is not in extended form may be converted into an extended binary tree by treating all its nodes as internal, and adding an external node for each missing child of an internal node. In the other direction, an extended binary tree with at least one internal node may be converted back into a non-extended binary tree by removing all its external nodes. In this way, these two forms are almost entirely equivalent for the purposes of mathematical analysis, except that the extended form allows a tree consisting of a single external node, which does not correspond to anything in the non-extended form. For the purposes of computer data structures, the two forms differ, as the external nodes of the first form may be represented explicitly as objects in a data structure.
In a
binary search tree
In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a Rooted tree, rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left ...
the internal nodes are labeled by numbers or other ordered values, called ''keys'', arranged so that an
inorder traversal
In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a tree data structure, exactly once. Su ...
of the tree lists the keys in sorted order. The external nodes remain unlabeled. Binary trees may also be studied with all nodes unlabeled, or with labels that are not given in sorted order. For instance, the
Cartesian tree
In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees fro ...
data structure uses labeled binary trees that are not necessarily binary search trees.
A random binary tree is a random tree drawn from a certain
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
on binary trees. In many cases, these probability distributions are defined using a given set of keys, and describe the probabilities of binary search trees having those keys. However, other distributions are possible, not necessarily generating binary search trees, and not necessarily giving a fixed number of nodes.
From random permutations
For any sequence of distinct ordered keys, one may form a
binary search tree
In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a Rooted tree, rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left ...
in which each key is inserted in sequence as a leaf of the tree, without changing the structure of the previously inserted keys. The position for each insertion can be found by a
binary search
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
in the previous tree. The random permutation model, for a given set of keys, is defined by choosing the sequence randomly from the
permutation
In mathematics, a permutation of a set can mean one of two different things:
* an arrangement of its members in a sequence or linear order, or
* the act or process of changing the linear order of an ordered set.
An example of the first mean ...
s of the set, with each permutation having equal probability.
For instance, if the three keys 1,3,2 are inserted into a binary search tree in that sequence, the number 1 will sit at the root of the tree, the number 3 will be placed as its right child, and the number 2 as the left child of the There are six different permutations of the keys 1,2, and 3, but only five trees may be constructed from them. That is because the permutations 2,1,3 and 2,3,1 form the same tree. Thus, this tree has probability
of being generated, whereas the other four trees each have
Expected depth of a node
For any key
in a given set of
keys, the
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
of the length of the path from the root to
in a random binary search tree is at most
, where "
" denotes the
natural logarithm
The natural logarithm of a number is its logarithm to the base of a logarithm, base of the e (mathematical constant), mathematical constant , which is an Irrational number, irrational and Transcendental number, transcendental number approxima ...
function and the
introduces
big O notation
Big ''O'' notation is a mathematical notation that describes the asymptotic analysis, limiting behavior of a function (mathematics), function when the Argument of a function, argument tends towards a particular value or infinity. Big O is a memb ...
. By
linearity of expectation
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first moment) is a generalization of the weighted average. Informally, the expected va ...
, the expected number of ancestors of
equals the sum, over other keys
, of the probability that
is an ancestor of
. A key
is an ancestor of
exactly when
is the first key to be inserted from the interval