computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

and

probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, a random binary tree is a

binary tree In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the ''left child'' and the ''right child''. That is, it is a ''k''-ary tree with . A recursive definition using set theor ...

selected at random from some

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

on binary trees. Different distributions have been used, leading to different properties for these trees. Random binary trees have been used for analyzing the

average-case complexity In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case comp ...

data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...

s based on

binary search tree In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a Rooted tree, rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left ...

s. For this application it is common to use random trees formed by inserting nodes one at a time according to a

random permutation A random permutation is a sequence where any order of its items is equally likely at random, that is, it is a permutation-valued random variable of a set of objects. The use of random permutations is common in games of chance and in randomized alg ...

. The resulting trees are very likely to have logarithmic depth and logarithmic

Strahler number In mathematics, the Strahler number or Horton–Strahler number of a mathematical tree (graph theory), tree is a numerical measure of its branching complexity. These numbers were first developed in hydrology, as a way of measuring the complexity ...

. The

treap In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of i ...

and related

balanced binary search tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donald ...

s use update operations that maintain this random structure even when the update sequence is non-random. Other distributions on random binary trees include the uniform discrete distribution in which all distinct trees are equally likely, distributions on a given number of nodes obtained by repeated splitting, binary

trie In computer science, a trie (, ), also known as a digital tree or prefix tree, is a specialized search tree data structure used to store and retrieve strings from a dictionary or set. Unlike a binary search tree, nodes in a trie do not store t ...

s and

radix tree In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The resu ...

s for random data, and trees of variable size generated by

branching process In probability theory, a branching process is a type of mathematical object known as a stochastic process, which consists of collections of random variables indexed by some set, usually natural or non-negative real numbers. The original purpose of ...

es. For random trees that are not necessarily binary, see

random tree In common usage, randomness is the apparent or actual lack of definite pattern or predictability in information. A random sequence of events, symbols or steps often has no :wikt:order, order and does not follow an intelligible pattern or com ...

Background

A binary tree is a

rooted tree In graph theory, a tree is an undirected graph in which any two vertices are connected by path, or equivalently a connected acyclic undirected graph. A forest is an undirected graph in which any two vertices are connected by path, or equi ...

in which each node may have up to two children (the nodes directly below it in the tree), and those children are designated as being either left or right. It is sometimes convenient instead to consider ''extended binary trees'' in which each node is either an ''external node'' with zero children, or an ''internal node'' with exactly two children. A binary tree that is not in extended form may be converted into an extended binary tree by treating all its nodes as internal, and adding an external node for each missing child of an internal node. In the other direction, an extended binary tree with at least one internal node may be converted back into a non-extended binary tree by removing all its external nodes. In this way, these two forms are almost entirely equivalent for the purposes of mathematical analysis, except that the extended form allows a tree consisting of a single external node, which does not correspond to anything in the non-extended form. For the purposes of computer data structures, the two forms differ, as the external nodes of the first form may be represented explicitly as objects in a data structure. In a

the internal nodes are labeled by numbers or other ordered values, called ''keys'', arranged so that an inorder traversal of the tree lists the keys in sorted order. The external nodes remain unlabeled. Binary trees may also be studied with all nodes unlabeled, or with labels that are not given in sorted order. For instance, the

Cartesian tree In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees fro ...

data structure uses labeled binary trees that are not necessarily binary search trees. A random binary tree is a random tree drawn from a certain

on binary trees. In many cases, these probability distributions are defined using a given set of keys, and describe the probabilities of binary search trees having those keys. However, other distributions are possible, not necessarily generating binary search trees, and not necessarily giving a fixed number of nodes.

From random permutations

For any sequence of distinct ordered keys, one may form a

in which each key is inserted in sequence as a leaf of the tree, without changing the structure of the previously inserted keys. The position for each insertion can be found by a

binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...

in the previous tree. The random permutation model, for a given set of keys, is defined by choosing the sequence randomly from the

permutation In mathematics, a permutation of a set can mean one of two different things: * an arrangement of its members in a sequence or linear order, or * the act or process of changing the linear order of an ordered set. An example of the first mean ...

s of the set, with each permutation having equal probability. For instance, if the three keys 1,3,2 are inserted into a binary search tree in that sequence, the number 1 will sit at the root of the tree, the number 3 will be placed as its right child, and the number 2 as the left child of the There are six different permutations of the keys 1,2, and 3, but only five trees may be constructed from them. That is because the permutations 2,1,3 and 2,3,1 form the same tree. Thus, this tree has probability

\tfrac26=\tfrac13

of being generated, whereas the other four trees each have

Expected depth of a node

For any key

x

in a given set of

n

keys, the

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

of the length of the path from the root to

x

in a random binary search tree is at most

2\log n+O(1)

, where "

\log

" denotes the

natural logarithm The natural logarithm of a number is its logarithm to the base of a logarithm, base of the e (mathematical constant), mathematical constant , which is an Irrational number, irrational and Transcendental number, transcendental number approxima ...

function and the

O

introduces

big O notation Big ''O'' notation is a mathematical notation that describes the asymptotic analysis, limiting behavior of a function (mathematics), function when the Argument of a function, argument tends towards a particular value or infinity. Big O is a memb ...

. By

linearity of expectation In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first moment) is a generalization of the weighted average. Informally, the expected va ...

, the expected number of ancestors of

x

equals the sum, over other keys

y

, of the probability that

y

is an ancestor of

x

. A key

y

is an ancestor of

x

exactly when

y

is the first key to be inserted from the interval

,y /math>. Because each key in the interval is equally likely to be first, this happens with probability inverse to the length of the interval. Thus, the keys that are adjacent to x in the sorted sequence of keys have probability \tfrac12 of being an ancestor of x, the keys one step away have probability \tfrac13, etc. The sum of these probabilities forms two copies of the harmonic series extending away from x in both directions in the sorted sequence, giving the 2\log n+O(1) bound above. This bound also holds for the expected search path length for a value x that is one of the given keys.

The longest path

The longest root-to-leaf path, in a random binary search tree, is longer than the expected path length, but only by a constant factor. Its length, for a tree with

n

nodes, is

with high probability In mathematics, an event that occurs with high probability (often shortened to w.h.p. or WHP) is one whose probability depends on a certain number ''n'' and goes to 1 as ''n'' goes to infinity, i.e. the probability of the event occurring can be m ...

approximately where

\beta

is the unique number in the range

0<\beta<1

satisfying the equation

Expected number of leaves

In the random permutation model, each key except the smallest and largest has probability

\tfrac13

of being a leaf in the tree. This is because it is a leaf when it inserted after its two neighbors, which happens for two out of the six permutations of it and its two neighbors, all of which are equally likely. By similar reasoning, the smallest and largest key have probability

\tfrac12

of being a leaf. Therefore, the expected number of leaves is the sum of these probabilities, which for

n\ge 2

is exactly

(n+1)/3

Strahler number

The

of vertices in any tree is a measure of the complexity of the subtrees under those vertices. A leaf (external node) has Strahler number one. For any other node, the Strahler number is defined recursively from the Strahler numbers of its children. In a binary tree, if two children have different Strahler numbers, the Strahler number of their parent is the larger of the two child numbers. But if two children have equal Strahler numbers, their parent has a number that is greater by one. The Strahler number of the whole tree is the number at the root node. For

n

-node random binary search trees, simulations suggest that the expected Strahler number is

\log_3 n + O(1)

. A weaker upper bound

\log_3 n + o(\log n)

has been proven.

Treaps and randomized binary search trees

In applications of binary search tree data structures, it is rare for the keys to be inserted without deletion in a random order, limiting the direct applications of random binary trees. However, algorithm designers have devised data structures that allow arbitrary insertions and deletions to preserve the property that the shape of the tree is random, as if the keys had been inserted randomly. If a given set of keys is assigned numeric priorities (unrelated to their values), these priorities may be used to construct a

for the numbers, the binary search tree that would result from inserting the keys in priority order. By choosing the priorities to be independent random real numbers in the unit interval, and by maintaining the Cartesian tree structure using

tree rotation In discrete mathematics, tree rotation is an operation on a binary tree that changes the structure without interfering with the order of the elements. A tree rotation moves one node up in the tree and one node down. It is used to change the shape ...

s after any insertion or deletion of a node, it is possible to maintain a data structure that behaves like a random binary search tree. Such a data structure is known as a

or a randomized binary search tree.; ; . Variants of the treap including the zip tree and zip-zip tree replace the tree rotations by "zipping" operations that split and merge trees, and that limit the number of random bits that need to be generated and stored alongside the keys. The result of these optimizations is still a tree with a random structure, but one that does not exactly match the random permutation model.

Uniformly random binary trees

The number of binary trees with

n

nodes is a

Catalan number The Catalan numbers are a sequence of natural numbers that occur in various Enumeration, counting problems, often involving recursion, recursively defined objects. They are named after Eugène Charles Catalan, Eugène Catalan, though they were p ...

. For

n=1,2,3,\dots

these numbers of trees are Thus, if one of these trees is selected uniformly at random, its probability is the

reciprocal Reciprocal may refer to: In mathematics * Multiplicative inverse, in mathematics, the number 1/''x'', which multiplied by ''x'' gives the product 1, also known as a ''reciprocal'' * Reciprocal polynomial, a polynomial obtained from another pol ...

of a Catalan number. Trees generated from a model in this distribution are sometimes called ''random binary Catalan trees''. They have expected depth proportional to the square root of

n

, rather than to the logarithm. More precisely, the expected depth of a randomly chosen node in an

n

-node tree of this type is The expected

of a uniformly random

n

-node binary tree is

\log_4 n+O(1)

, lower than the expected Strahler number of random binary search trees. Due to their large heights, this model of equiprobable random trees is not generally used for binary search trees. However, it has other applications, including: *Modeling the

parse tree A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...

s of algebraic expressions in

compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...

design. Here the internal nodes of the tree represent

binary operation In mathematics, a binary operation or dyadic operation is a rule for combining two elements (called operands) to produce another element. More formally, a binary operation is an operation of arity two. More specifically, a binary operation ...

s in an expression and the external nodes represent the variables or constants on which the expressions operate. The bound on Strahler number translates into the number of registers needed to evaluate an expression. *Modeling river networks, the original application for which the Strahler number was developed. *Modeling possible

evolutionary tree A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In o ...

s for a fixed number of species. In this application, an extended binary tree is used, with the species at its external nodes. An algorithm of Jean-Luc Rémy generates a uniformly random binary tree of a specified size in time linear in the size, by the following process. Start with a tree consisting of a single external node. Then, while the current tree has not reached the target size, repeatedly choose one of its nodes (internal or external) uniformly at random. Replace the chosen node by a new internal node, having the chosen node as one of its children (equally likely left or right), and having a new external node as its other child. Stop when the target size is reached.

Branching processes

The

Galton–Watson process The Galton–Watson process, also called the Bienaymé-Galton-Watson process or the Galton-Watson branching process, is a branching stochastic process arising from Francis Galton's statistical investigation of the extinction of family names. The ...

describes a family of distributions on trees in which the number of children at each node is chosen randomly, independently of other nodes. For binary trees, two versions of the Galton–Watson process are in use, differing only in whether an extended binary tree with only one node, an external root node, is allowed: *In the version where the root node may be external, it is chosen to be internal with some specified probability

p

or external with probability

1-p

. If it is internal, its two children are trees generated recursively by the same process. *In the version where the root node must be internal, its left and right children are determined to be internal with probability

p

or external with independently of each other. In the case where they are internal, they are the roots of trees that are generated recursively by the same process. Trees generated in this way have been called ''binary Galton–Watson trees''. In the special case where

p=\tfrac12

they are called ''critical binary Galton–Watson trees''.

Analysis

The probability

p=\tfrac12

marks a

phase transition In physics, chemistry, and other related fields like biology, a phase transition (or phase change) is the physical process of transition between one state of a medium and another. Commonly the term is used to refer to changes among the basic Sta ...

for the binary Galton–Watson process: for

p\le\tfrac12

the resulting tree is almost certainly finite, whereas for

p>\tfrac12

it is infinite with positive probability. More precisely, for the probability that the tree remains finite is Another way to generate the same trees is to make a sequence of

coin flip Coin flipping, coin tossing, or heads or tails is using the thumb to make a coin go up while spinning in the air and checking which side is showing when it is down onto a surface, in order to randomly choose between two alternatives. It is a for ...

s, with probability

p

of heads and probability

1-p

of tails, until the first flip at which the number of tails exceeds the number of heads (for the model in which an external root is allowed) or exceeds one plus the number of heads (when the root must be internal), and then use this sequence of coin flips to determine the choices made by the recursive generation process, in depth-first order. Because the number of internal nodes equals the number of heads in this coin flip sequence, all trees with a given number

n

of nodes are generated from (unique) coin flip sequences of the same length, and are equally likely, regardless of

p

. That is, the choice of

p

affects the variation in the size of trees generated by this process, but for a given size the trees are generated uniformly at random. For values of

p

below the critical smaller values of

p

will produce trees with a smaller expected size, while larger values of

p

will produce trees with a larger expected size. At the critical probability

p=\tfrac12

there is no finite bound on the expected size of trees generated by this process. More precisely, for the expected number of nodes at depth

i

in the tree and the expected size of the tree can be obtained by summing the expected numbers of nodes at each depth. For

p<\tfrac12

this gives a

geometric series In mathematics, a geometric series is a series (mathematics), series summing the terms of an infinite geometric sequence, in which the ratio of consecutive terms is constant. For example, 1/2 + 1/4 + 1/8 + 1/16 + ⋯, the series \tfrac12 + \tfrac1 ...

for the expected tree size, but for

p=\tfrac12

this gives

1 + 1 + 1 + 1 + ⋯ In mathematics, , also written , , or simply , is a divergent series. Nevertheless, it is sometimes imputed to have a value of , especially in physics. This value can be justified by certain mathematical methods for obtaining values from diverge ...

, a

divergent series In mathematics, a divergent series is an infinite series that is not convergent, meaning that the infinite sequence of the partial sums of the series does not have a finite limit. If a series converges, the individual terms of the series mus ...

.For the expected number of nodes at each level of the tree, see e.g. , Section I.A.2: Moments, p. 4. any particular tree with

n

internal nodes is generated with and the probability that a random tree has this size is this probability multiplied by a Catalan number,

Applications

Galton–Watson processes were originally developed to study the spread and extinction of human

surname In many societies, a surname, family name, or last name is the mostly hereditary portion of one's personal name that indicates one's family. It is typically combined with a given name to form the full name of a person, although several give ...

s, and have been widely applied more generally to the dynamics of human or animal populations. These processes have been generalized to models where the probability of being an internal or external node at a given level of the tree (a ''generation'', in the population dynamics application) is not fixed, but depends on the number of nodes at the previous level. A version of this process, with the critical has been studied as a model for

speciation Speciation is the evolutionary process by which populations evolve to become distinct species. The biologist Orator F. Cook coined the term in 1906 for cladogenesis, the splitting of lineages, as opposed to anagenesis, phyletic evolution within ...

, where it is known as the ''critical branching process''. In this process, each species has an

exponentially distributed In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuous ...

lifetime, and over the course of its lifetime produces child species at a rate equal to the lifetime. When a child is produced, the parent continues as the left branch of the evolutionary tree, and the child becomes the right branch. Another application of critical Galton–Watson trees (in the version where the root must be internal) arises in the Karger–Stein algorithm for finding

minimum cut In graph theory, a minimum cut or min-cut of a graph is a cut (a partition of the vertices of a graph into two disjoint subsets) that is minimal in some metric. Variations of the minimum cut problem consider weighted graphs, directed graphs, ter ...

s in graphs, using a recursive

edge contraction In graph theory, an edge contraction is an operation that removes an edge from a graph while simultaneously merging the two vertices that it previously joined. Edge contraction is a fundamental operation in the theory of graph minors. Vertex id ...

process. This algorithm calls itself twice recursively, with each call having probability at least

\tfrac12

of preserving the correct solution value. The random tree models the subtree of correct recursive calls. The algorithm succeeds on a graph of

n

vertices whenever this random tree of correct recursive calls has a branch of depth at least

2\log_2 n

, reaching the base case of its recursion. The success probability producing one of the logarithmic factors in the algorithm's

O(n^2\log^3 n)

runtime.

Yule process

Devroye and Robson consider a related continuous-time random process in which each external node is eventually replaced by an internal node with two external children, at an exponentially distributed time after its first appearance as an external node. The number of external nodes in the tree, at any time, is modeled by a

simple birth process In probability theory, a birth process or a pure birth process is a special case of a continuous-time Markov process and a generalisation of a Poisson process. It defines a continuous process which takes values in the natural numbers and can only ...

or Yule process in which the members of a population give birth at a constant rate: giving birth to one child, in the Yule process, corresponds to being replaced by two children, in Devroye and Robson's model. If this process is stopped at any fixed time, the result is a binary tree of a random size (depending on the stopping time), distributed according to the random permutation model for that size. Devroye and Robson use this model as part of an algorithm to quickly generate trees in the random permutation model, described by their numbers of nodes at each depth rather than by their exact structure. A discrete variant of this process starts with a tree consisting of a single external node, and repeatedly replaces a randomly-chosen external node by an internal node with two external children. Again, if this is stopped at a fixed time (with a fixed size), the resulting tree is distributed according to the random permutation model for that size.

Binary tries

Another form of binary tree, the binary

or digital search tree, has a collection of binary numbers labeling some of its external nodes. The internal nodes of the tree represent

prefixes A prefix is an affix which is placed before the stem of a word. Particularly in the study of languages, a prefix is also called a preformative, because it alters the form of the word to which it is affixed. Prefixes, like other affixes, can b ...

of their binary representations that are shared by two or more of the numbers. The left and right children of an internal node are obtained by extending the corresponding prefix by one more bit, a zero or a one bit respectively. If this extension does not match any of the given numbers, or it matches only one of them, the result is an external node; otherwise it is another internal node. Random binary tries have been studied, for instance for sets of random

real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...

s generated independently in the

unit interval In mathematics, the unit interval is the closed interval , that is, the set of all real numbers that are greater than or equal to 0 and less than or equal to 1. It is often denoted ' (capital letter ). In addition to its role in real analysi ...

. Despite the fact that these trees may have some empty external nodes, they tend to be better balanced than random binary search trees. For

n

uniformly random real numbers in the unit interval, or more generally for any

square-integrable In mathematics, a square-integrable function, also called a quadratically integrable function or L^2 function or square-summable function, is a real- or complex-valued measurable function for which the integral of the square of the absolute value ...

probability distribution on the unit interval, the average depth of a node is asymptotically

\log_2 n

, and the average height of the whole tree is asymptotically

2\log_2 n

. The analysis of these trees can be applied to the

computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations ...

of trie-based

sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a List (computing), list into an Total order, order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending ...

s. A variant of the trie, the

or compressed trie, eliminates empty external nodes and their parent internal nodes. The remaining internal nodes correspond to prefixes for which both possible extensions, by a zero or a one bit, are used by at least one of the randomly chosen numbers. For a radix tree for

n

uniformly distributed binary numbers, the shortest leaf-root path has length

\log_2 n-\log_2\log n+o(\log\log n)

and the longest leaf-root path has length

\log_2 n+\sqrt+o(\sqrt),

both

Random split trees

Luc Devroye Luc P. Devroye is a Belgian computer scientist and mathematician and a James McGill Professor in the School of Computer Science of McGill University in Montreal, Quebec, Canada. Devroye wrote around 300 mathematical articles, mostly on probabi ...

and Paul Kruszewski describe a recursive process for constructing random binary trees with

n

nodes. It generates a real-valued random variable

x

in the unit interval

(0,1)

, assigns the first

xn

nodes (rounded down to an integer number of nodes) to the left subtree, the next node to the root, and the remaining nodes to the right subtree. Then, it continues recursively using the same process in the left and right subtrees. If

x

is chosen uniformly at random in the interval, the result is the same as the random binary search tree generated by a random permutation of the nodes, as any node is equally likely to be chosen as root. However, this formulation allows other distributions to be used instead. For instance, in the uniformly random binary tree model, once a root is fixed each of its two subtrees must also be uniformly random, so the uniformly random model may also be generated by a different choice of distribution (depending on

n

) for

x

. As they show, by choosing a

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

x

and by using an appropriate choice of shape to draw each of the branches, the mathematical trees generated by this process can be used to create realistic-looking botanical trees.

Notes

References

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * {{refend Binary trees Statistical randomness Probabilistic data structures