Tree (abstract Data Type)
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a tree is a widely used
abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a '' user'' of the data, specifically in terms of possible values, possible operations on data ...
that represents a hierarchical
tree structure A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is gen ...
with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be connected to exactly one parent, except for the ''root'' node, which has no parent (i.e., the root node as the top-most node in the tree hierarchy). These constraints mean there are no cycles or "loops" (no node can be its own ancestor), and also that each child can be treated like the root node of its own subtree, making
recursion Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in m ...
a useful technique for
tree traversal In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a Tree (data structure), tree data stru ...
. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes (parent and children nodes of a node under consideration, if they exist) in a single straight line (called edge or link between two adjacent nodes).
Binary tree In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the ''left child'' and the ''right child''. That is, it is a ''k''-ary tree with . A recursive definition using set theor ...
s are a commonly used type, which constrain the number of children for each parent to at most two. When the order of the children is specified, this data structure corresponds to an ordered tree in
graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
. A value or pointer to other data may be associated with every node in the tree, or sometimes only with the ''leaf nodes'', which have no children nodes. The
abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a '' user'' of the data, specifically in terms of possible values, possible operations on data ...
(ADT) can be represented in a number of ways, including a list of parents with pointers to children, a list of children with pointers to parents, or a list of nodes and a separate list of parent-child relations (a specific type of
adjacency list In graph theory and computer science, an adjacency list is a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph. This ...
). Representations might also be more complicated, for example using
indexes Index (: indexes or indices) may refer to: Arts, entertainment, and media Fictional entities * Index (A Certain Magical Index), Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, a ...
or ancestor lists for performance. Trees as used in computing are similar to but can be different from mathematical constructs of trees in graph theory, trees in set theory, and trees in descriptive set theory.


Terminology

A
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics * Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines ...
is a structure which may contain data and connections to other nodes, sometimes called edges or links. Each node in a tree has zero or more child nodes, which are below it in the tree (by convention, trees are drawn with ''descendants'' going downwards). A node that has a child is called the child's parent node (or superior). All nodes have exactly one parent, except the topmost root node, which has none. A node might have many ancestor nodes, such as the parent's parent. Child nodes with the same parent are sibling nodes. Typically siblings have an order, with the first one conventionally drawn on the left. Some definitions allow a tree to have no nodes at all, in which case it is called ''empty''. An internal node (also known as an inner node, inode for short, or branch node) is any node of a tree that has child nodes. Similarly, an external node (also known as an outer node, leaf node, or terminal node) is any node that does not have child nodes. The height of a node is the length of the longest downward path to a leaf from that node. The height of the root is the height of the tree. The depth of a node is the length of the path to its root (i.e., its ''root path''). Thus the root node has depth zero, leaf nodes have height zero, and a tree with only a single node (hence both a root and leaf) has depth and height zero. Conventionally, an empty tree (tree with no nodes, if such are allowed) has height −1. Each non-root node can be treated as the root node of its own subtree, which includes that node and all its descendants. Other terms used with trees:


Common operations

* Enumerating all the items * Enumerating a section of a tree * Searching for an item * Adding a new item at a certain position on the tree * Deleting an item * Pruning: Removing a whole section of a tree *
Grafting Grafting or graftage is a horticulture, horticultural technique whereby tissues of plants are joined so as to continue their growth together. The upper part of the combined plant is called the scion () while the lower part is called the roots ...
: Adding a whole section to a tree * Finding the root for any node * Finding the
lowest common ancestor In graph theory and computer science, the lowest common ancestor (LCA) (also called least common ancestor) of two nodes and in a Tree (graph theory), tree or directed acyclic graph (DAG) is the lowest (i.e. deepest) node that has both and a ...
of two nodes


Traversal and search methods

Stepping through the items of a tree, by means of the connections between parents and children, is called walking the tree, and the action is a ''walk'' of the tree. Often, an operation might be performed when a pointer arrives at a particular node. A walk in which each parent node is traversed before its children is called a pre-order walk; a walk in which the children are traversed before their respective parents are traversed is called a post-order walk; a walk in which a node's left subtree, then the node itself, and finally its right subtree are traversed is called an in-order traversal. (This last scenario, referring to exactly two subtrees, a left subtree and a right subtree, assumes specifically a
binary tree In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the ''left child'' and the ''right child''. That is, it is a ''k''-ary tree with . A recursive definition using set theor ...
.) A level-order walk effectively performs a
breadth-first search Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next dept ...
over the entirety of a tree; nodes are traversed level by level, where the root node is visited first, followed by its direct child nodes and their siblings, followed by its grandchild nodes and their siblings, etc., until all nodes in the tree have been traversed.


Representations

There are many different ways to represent trees. In working memory, nodes are typically dynamically allocated records with pointers to their children, their parents, or both, as well as any associated data. If of a fixed size, the nodes might be stored in a list. Nodes and relationships between nodes might be stored in a separate special type of
adjacency list In graph theory and computer science, an adjacency list is a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph. This ...
. In
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s, nodes are typically represented as table rows, with indexed row IDs facilitating pointers between parents and children. Nodes can also be stored as items in an array, with relationships between them determined by their positions in the array (as in a
binary heap A binary heap is a heap (data structure), heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964 as a data structure fo ...
). A
binary tree In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the ''left child'' and the ''right child''. That is, it is a ''k''-ary tree with . A recursive definition using set theor ...
can be implemented as a list of lists: the head of a list (the value of the first term) is the left child (subtree), while the tail (the list of second and subsequent terms) is the right child (subtree). This can be modified to allow values as well, as in Lisp
S-expression In computer programming, an S-expression (or symbolic expression, abbreviated as sexpr or sexp) is an expression in a like-named notation for nested List (computing), list (Tree (data structure), tree-structured) data. S-expressions were invented ...
s, where the head (value of first term) is the value of the node, the head of the tail (value of second term) is the left child, and the tail of the tail (list of third and subsequent terms) is the right child. Ordered trees can be naturally encoded by finite sequences, for example with natural numbers.


Examples of trees and non-trees


Type theory

As an
abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a '' user'' of the data, specifically in terms of possible values, possible operations on data ...
, the abstract tree type with values of some type is defined, using the abstract forest type (list of trees), by the functions: : value: → : children: → : nil: () → : node: × → with the axioms: : value(node(, )) = : children(node(, )) = In terms of
type theory In mathematics and theoretical computer science, a type theory is the formal presentation of a specific type system. Type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a foundation of ...
, a tree is an
inductive type In type theory, a system has inductive types if it has facilities for creating a new type from constants and functions that create terms of that type. The feature serves a role similar to data structures in a programming language and allows a ty ...
defined by the constructors (empty forest) and (tree with root node with given value and children).


Mathematical terminology

Viewed as a whole, a tree data structure is an ordered tree, generally with values attached to each node. Concretely, it is (if required to be non-empty): * A rooted tree with the "away from root" direction (a more narrow term is an " arborescence"), meaning: ** A directed graph, ** whose underlying
undirected graph In discrete mathematics, particularly in graph theory, a graph is a structure consisting of a set of objects where some pairs of the objects are in some sense "related". The objects are represented by abstractions called '' vertices'' (also call ...
is a
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
(any two vertices are connected by exactly one simple path), ** with a distinguished root (one vertex is designated as the root), ** which determines the direction on the edges (arrows point away from the root; given an edge, the node that the edge points from is called the ''parent'' and the node that the edge points to is called the ''child''), together with: * an ordering on the child nodes of a given node, and * a value (of some data type) at each node. Often trees have a fixed (more properly, bounded)
branching factor In computing, tree data structures, and game theory, the branching factor is the number of children at each node, the outdegree. If this value is not uniform, an ''average branching factor'' can be calculated. For example, in chess, if a "node ...
( outdegree), particularly always having two child nodes (possibly empty, hence ''at most'' two ''non-empty'' child nodes), hence a "binary tree". Allowing empty trees makes some definitions simpler, some more complicated: a rooted tree must be non-empty, hence if empty trees are allowed the above definition instead becomes "an empty tree or a rooted tree such that ...". On the other hand, empty trees simplify defining fixed branching factor: with empty trees allowed, a binary tree is a tree such that every node has exactly two children, each of which is a tree (possibly empty).


Applications

Trees are commonly used to represent or manipulate
hierarchical A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an importan ...
data in applications such as: * File systems for: **
Directory structure In computing, a directory structure is the way an operating system arranges files that are accessible to the user. Files are typically displayed in a hierarchical tree structure. File names and extensions A filename is a string used to uniquely ...
used to organize subdirectories and files (
symbolic link In computing, a symbolic link (also symlink or soft link) is a file whose purpose is to point to a file or directory (called the "target") by specifying a path thereto. Symbolic links are supported by POSIX and by most Unix-like operating syste ...
s create non-tree graphs, as do multiple
hard link In computing, a hard link is a directory entry (in a Directory (computing), directory-based file system) that associates a name with a Computer file, file. Thus, each file must have at least one hard link. Creating additional hard links for a fil ...
s to the same file or directory) ** The mechanism used to allocate and link blocks of data on the storage device *
Class hierarchy A class hierarchy or inheritance tree in computer science is a classification of object types, denoting objects as the instantiations of classes (class is like a blueprint, the object is what is built from that blueprint) inter-relating the var ...
or "inheritance tree" showing the relationships among classes in
object-oriented programming Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impl ...
;
multiple inheritance Multiple inheritance is a feature of some object-oriented computer programming languages in which an object or class can inherit features from more than one parent object or parent class. It is distinct from single inheritance, where an object ...
produces non-tree graphs *
Abstract syntax tree An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...
s for computer languages *
Natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
: **
Parse tree A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...
s ** Modeling utterances in a
generative grammar Generative grammar is a research tradition in linguistics that aims to explain the cognitive basis of language by formulating and testing explicit models of humans' subconscious grammatical knowledge. Generative linguists, or generativists (), ...
**
Dialogue tree A dialogue tree, or conversation tree, is a gameplay mechanic that is used throughout many adventure games (including action-adventure games) and role-playing video games. When interacting with a non-player character, the player is given a choice ...
for generating conversations *
Document Object Model The Document Object Model (DOM) is a cros s-platform and language-independent API that treats an HTML or XML document as a tree structure wherein each node is an object representing a part of the document. The DOM represents a document with ...
s ("DOM tree") of
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
and
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
documents *
Search tree In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and les ...
s store data in a way that makes an efficient
search algorithm In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the Feasible region, search space of a problem do ...
possible via
tree traversal In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a Tree (data structure), tree data stru ...
** A
binary search tree In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a Rooted tree, rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left ...
is a type of
binary tree In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the ''left child'' and the ''right child''. That is, it is a ''k''-ary tree with . A recursive definition using set theor ...
* Representing sorted lists of data *
Computer-generated imagery Computer-generated imagery (CGI) is a specific-technology or application of computer graphics for creating or improving images in Digital art, art, Publishing, printed media, Training simulation, simulators, videos and video games. These images ...
: ** Space partitioning, including
binary space partitioning In computer science, binary space partitioning (BSP) is a method for space partitioning which recursively subdivides a Euclidean space into two convex sets by using hyperplanes as partitions. This process of subdividing gives rise to a representa ...
**
Digital compositing Digital compositing is the process of digitally assembling multiple images to make a final image, typically for print, motion pictures or screen display. It is the digital analogue of optical film compositing. It's part of VFX processing. Ma ...
* Storing Barnes–Hut trees used to simulate galaxies * Implementing heaps *
Nested set collection A nested set collection or nested set family is a collection of sets that consists of chains of subsets forming a hierarchical structure, like Matryoshka doll, Russian dolls. It is used as reference concept in hierarchy, scientific hierarchy def ...
s * Hierarchical taxonomies such as the
Dewey Decimal Classification The Dewey Decimal Classification (DDC) (pronounced ) colloquially known as the Dewey Decimal System, is a proprietary library classification system which allows new books to be added to a library in their appropriate location based on subject. ...
with sections of increasing specificity. * Hierarchical temporal memory *
Genetic programming Genetic programming (GP) is an evolutionary algorithm, an artificial intelligence technique mimicking natural evolution, which operates on a population of programs. It applies the genetic operators selection (evolutionary algorithm), selection a ...
*
Hierarchical clustering In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two ...
Trees can be used to represent and manipulate various mathematical structures, such as: * Paths through an arbitrary node-and-edge graph (including
multigraph In mathematics, and more specifically in graph theory, a multigraph is a graph which is permitted to have multiple edges (also called ''parallel edges''), that is, edges that have the same end nodes. Thus two vertices may be connected by mor ...
s), by making multiple nodes in the tree for each graph node used in multiple paths * Any mathematical hierarchy Tree structures are often used for mapping the relationships between things, such as: * Components and subcomponents which can be visualized in an exploded-view drawing *
Subroutine call In computer programming, a function (also procedure, method, subroutine, routine, or subprogram) is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times. Callable units provide a p ...
s used to identify which subroutines in a program call other subroutines non recursively * Inheritance of DNA among species by
evolution Evolution is the change in the heritable Phenotypic trait, characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, re ...
, of source code by software projects (e.g.
Linux distribution timeline Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
), of designs in various types of cars, etc. * The contents of hierarchical
namespace In computing, a namespace is a set of signs (''names'') that are used to identify and refer to objects of various kinds. A namespace ensures that all of a given set of objects have unique names so that they can be easily identified. Namespaces ...
s
JSON JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
and YAML documents can be thought of as trees, but are typically represented by nested lists and
dictionaries A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
.


See also

*
Distributed tree search Distributed tree search (DTS) algorithm is a class of algorithms for searching values in an efficient and distributed manner. Their purpose is to iterate through a tree by working along multiple branches in parallel and merging the results of each ...
* :Trees (data structures) (catalogs types of computational trees)


Notes


References


Further reading

*
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...
. ''
The Art of Computer Programming ''The Art of Computer Programming'' (''TAOCP'') is a comprehensive multi-volume monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. it consists of published volumes 1, 2, 3, 4A, and 4 ...
: Fundamental Algorithms'', Third Edition. Addison-Wesley, 1997. . Section 2.3: Trees, pp. 308–423. * Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and
Clifford Stein Clifford Seth Stein (born December 14, 1965), a computer scientist, is a professor of industrial engineering and operations research at Columbia University in New York, NY, where he also holds an appointment in the Department of Computer Scien ...
. ''
Introduction to Algorithms ''Introduction to Algorithms'' is a book on computer programming by Thomas H. Cormen, Charles E. Leiserson, Ron Rivest, Ronald L. Rivest, and Clifford Stein. The book is described by its publisher as "the leading algorithms text in universities w ...
'', Second Edition. MIT Press and McGraw-Hill, 2001. . Section 10.4: Representing rooted trees, pp. 214–217. Chapters 12–14 (Binary Search Trees, Red–Black Trees, Augmenting Data Structures), pp. 253–320.


External links


Description
from the Dictionary of Algorithms and Data Structures {{DEFAULTSORT:Tree (Data Structure) Data types Knowledge representation Abstract data types de:Baum (Graphentheorie)