computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, weight-balanced binary trees (WBTs) are a type of

self-balancing binary search tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donald ...

s that can be used to implement dynamic sets,

dictionaries A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...

(maps) and sequences. These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB �trees. Their more common name is due to Knuth. A well known example is a

Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...

of a

corpus Corpus (plural ''corpora'') is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of ...

. Like other self-balancing trees, WBTs store bookkeeping information pertaining to balance in their nodes and perform rotations to restore balance when it is disturbed by insertion or deletion operations. Specifically, each node stores the size of the subtree rooted at the node, and the sizes of left and right subtrees are kept within some factor of each other. Unlike the balance information in

AVL tree In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by m ...

s (using information about the height of subtrees) and

red–black tree In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, wh ...

s (which store a fictional "color" bit), the bookkeeping information in a WBT is an actually useful property for applications: the number of elements in a tree is equal to the size of its root, and the size information is exactly the information needed to implement the operations of an

order statistic tree In computer science, an order statistic tree is a variant of the binary search tree (or more generally, a B-tree) that supports two additional operations beyond insertion, lookup and deletion: * Select(''i'') – find the ''i''-th smallest element ...

, viz., getting the 'th largest element in a set or determining an element's index in sorted order. Weight-balanced trees are popular in the

functional programming In computer science, functional programming is a programming paradigm where programs are constructed by Function application, applying and Function composition (computer science), composing Function (computer science), functions. It is a declarat ...

community and are used to implement sets and maps in MIT Scheme, SLIB, SML-NJ, and implementations of

Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...

Description

A weight-balanced tree is a binary search tree that stores the sizes of subtrees in the nodes. That is, a node has fields * ''key'', of any ordered type * ''value'' (optional, only for mappings) * ''left'', ''right'', pointer to node * ''size'', of type integer. By definition, the size of a leaf (typically represented by a pointer) is zero. The size of an internal node is the sum of sizes of its two children, plus one: (). Based on the size, one defines the weight to be . Weight has the advantage that the weight of a node is simply the sum of the weights of its left and right children. BinaryTreeRotations

Operations that modify the tree must make sure that the weight of the left and right subtrees of every node remain within some factor of each other, using the same rebalancing operations used in

s: rotations and double rotations. Formally, node balance is defined as follows: :A node is -weight-balanced if and . Here, is a numerical parameter to be determined when implementing weight balanced trees. Larger values of produce "more balanced" trees, but not all values of are appropriate; Nievergelt and Reingold proved that :

\alpha < 1 - \frac \approx 0.29289

is a necessary condition for the balancing algorithm to work. Later work showed a lower bound of for , although it can be made arbitrarily small if a custom (and more complicated) rebalancing algorithm is used. Applying balancing correctly guarantees a tree of elements will have height :

h \le \log_ n = \frac = O(\log n)

If is given its maximum allowed value, the worst-case height of a weight-balanced tree is the same as that of a red–black tree at

2 \log_2 n

. The number of balancing operations required in a sequence of insertions and deletions is linear in , i.e., balancing takes a constant amount of overhead in an

amortized In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case ...

sense. While maintaining a tree with the minimum search cost requires four kinds of double rotations (LL, LR, RL, RR as in an AVL tree) in insert/delete operations, if we desire only logarithmic performance, LR and RL are the only rotations required in a single top-down pass.

Set operations and bulk operations

Several set operations have been defined on weight-balanced trees: union,

intersection In mathematics, the intersection of two or more objects is another object consisting of everything that is contained in all of the objects simultaneously. For example, in Euclidean geometry, when two lines in a plane are not parallel, their ...

and

set difference In set theory, the complement of a set , often denoted by A^c (or ), is the set of elements not in . When all elements in the universe, i.e. all elements under consideration, are considered to be members of a given set , the absolute complement ...

. Then fast ''bulk'' operations on insertions or deletions can be implemented based on these set functions. These set operations rely on two helper operations, ''Split'' and ''Join''. With the new operations, the implementation of weight-balanced trees can be more efficient and highly-parallelizable... ;''Join'': The function ''Join'' is on two weight-balanced trees and and a key and will return a tree containing all elements in , as well as . It requires to be greater than all keys in and smaller than all keys in . If the two trees have the balanced weight, ''Join'' simply create a new node with left subtree , root and right subtree . Suppose that has heavier weight than (the other case is symmetric). ''Join'' follows the right spine of until a node which is balanced with . At this point a new node with left child , root and right child is created to replace c. The new node may invalidate the weight-balanced invariant. This can be fixed with a single or a double rotation assuming

\alpha < 1 - \frac

;''Split'': To split a weight-balanced tree into two smaller trees, those smaller than key ''x'', and those larger than key ''x'', first draw a path from the root by inserting ''x'' into the tree. After this insertion, all values less than ''x'' will be found on the left of the path, and all values greater than ''x'' will be found on the right. By applying ''Join'', all the subtrees on the left side are merged bottom-up using keys on the path as intermediate nodes from bottom to top to form the left tree, and the right part is symmetric. For some applications, ''Split'' also returns a Boolean value denoting if ''x'' appears in the tree. The cost of ''Split'' is

O(\log n)

, order of the height of the tree. This algorithm actually has nothing to do with any special properties of a weight-balanced tree, and thus is generic to other balancing schemes such as

s. The join algorithm is as follows: function joinRightWB(T_L, k, T_R) (l, k', c) = expose(T_L) if balance(, T_L, , , T_R, ) return Node(T_L, k, T_R) else T' = joinRightWB(c, k, T_R) (l', k', r') = expose(T') if (balance(, l, ,, T', )) return Node(l, k', T') else if (balance(, l, ,, l', ) and balance(, l, +, l', ,, r', )) return rotateLeft(Node(l, k', T')) else return rotateLeft(Node(l, k', rotateRight(T')) function joinLeftWB(T_L, k, T_R) /* symmetric to joinRightWB */ function join(T_L, k, T_R) if (heavy(T_L, T_R)) return joinRightWB(T_L, k, T_R) if (heavy(T_R, T_L)) return joinLeftWB(T_L, k, T_R) Node(T_L, k, T_R) Here balance

(x, y)

means two weights and are balanced. expose(v)=(l, k, r) means to extract a tree node 's left child , the key of the node and the right child . Node(l, k, r) means to create a node of left child , key and right child . The split algorithm is as follows: function split(T, k) if (T = nil) return (nil, false, nil) (L, (m, c), R) = expose(T) if (k = m) return (L, true, R) if (k < m) (L', b, R') = split(L, k) return (L', b, join(R', m, R)) if (k > m) (L', b, R') = split(R, k) return (join(L, m, L'), b, R)) The union of two weight-balanced trees and representing sets and , is a weight-balanced tree that represents . The following recursive function computes this union: function union(t₁, t₂): if t₁ = nil: return t₂ if t₂ = nil: return t₁ t_<, t_> ← split t₂ on t₁.root return join(union(left(t₁), t_<), t₁.root, union(right(t₁), t_>)) Here, ''Split'' is presumed to return two trees: one holding the keys less than its input key, the other holding the greater keys. (The algorithm is non-destructive, but an in-place destructive version exists as well.) The algorithm for intersection or difference is similar, but requires the ''Join2'' helper routine that is the same as ''Join'' but without the middle key. Based on the new functions for union, intersection or difference, either one key or multiple keys can be inserted to or deleted from the weight-balanced tree. Since ''Split'' and ''Union'' call ''Join'' but do not deal with the balancing criteria of weight-balanced trees directly, such an implementation is usually called the join-based algorithms. The complexity of each of union, intersection and difference is

O\left(m \log \left(+1\right)\right)

for two weight-balanced trees of sizes and

n(\ge m)

. This complexity is optimal in terms of the number of comparisons. More importantly, since the recursive calls to union, intersection or difference are independent of each other, they can be executed in parallel with a parallel depth

O(\log m\log n)

. When

m=1

, the join-based implementation has the same computational

directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ...

(DAG) as single-element insertion and deletion if the root of the larger tree is used to split the smaller tree.

Notes

References

{{CS-Trees Search trees de:Balancierter Baum#Balance der Knotenzahl