HOME

TheInfoList



OR:

In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union,
disjoint union In mathematics, a disjoint union (or discriminated union) of a family of sets (A_i : i\in I) is a set A, often denoted by \bigsqcup_ A_i, with an injection of each A_i into A, such that the images of these injections form a partition of A ( ...
, sum type or
coproduct In category theory, the coproduct, or categorical sum, is a construction which includes as examples the disjoint union of sets and of topological spaces, the free product of groups, and the direct sum of modules and vector spaces. The coprodu ...
, is a
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as the value itself, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.


Description

Tagged unions are most important in
functional language In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
s such as ML and
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lang ...
, where they are called datatypes (see
algebraic data type In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types. Two common classes of algebraic types are product types (i.e., ...
) and the compiler is able to verify that all cases of a tagged union are always handled, avoiding many types of errors. They can, however, be constructed in nearly any
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...
, and are much safer than untagged unions, often simply called unions, which are similar but do not explicitly keep track of which member of the union is currently in use. Tagged unions are often accompanied by the concept of a
type constructor In the area of mathematical logic and computer science known as type theory, a type constructor is a feature of a typed formal language that builds new types from old ones. Basic types are considered to be built using nullary type constructors. So ...
, which is similar but not the same as a constructor for a class. Type constructors produce a tagged union type, given the initial tag type and the corresponding type. Mathematically, tagged unions correspond to '' disjoint'' or ''discriminated unions'', usually written using +. Given an element of a disjoint union ''A'' + ''B'', it is possible to determine whether it came from ''A'' or ''B''. If an element lies in both, there will be two effectively distinct copies of the value in ''A'' + ''B'', one from ''A'' and one from ''B''. In
type theory In mathematics, logic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a found ...
, a tagged union is called a sum type. Sum types are the dual of
product type In programming languages and type theory, a product of ''types'' is another, compounded, type in a structure. The "operands" of the product are types, and the structure of a product type is determined by the fixed order of the operands in the prod ...
s. Notations vary, but usually the sum type comes with two introduction forms ( injections) and The elimination form is case analysis, known as
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
in ML-style programming languages: if has type and and have type \tau under the assumptions and respectively, then the term \mathsf\ e\ \mathsf\ x \Rightarrow e_1 \mid y \Rightarrow e_2 has type \tau. The sum type corresponds to intuitionistic
logical disjunction In logic, disjunction is a logical connective typically notated as \lor and read aloud as "or". For instance, the English language sentence "it is raining or it is snowing" can be represented in logic using the disjunctive formula R \lor S ...
under the Curry–Howard correspondence. An
enumerated type In computer programming, an enumerated type (also called enumeration, enum, or factor in the R programming language, and a categorical variable in statistics) is a data type consisting of a set of named values called ''elements'', ''members'', '' ...
can be seen as a degenerate case: a tagged union of
unit type In the area of mathematical logic and computer science known as type theory, a unit type is a type that allows only one value (and thus can hold no information). The carrier (underlying set) associated with a unit type can be any singleton set. ...
s. It corresponds to a set of nullary constructors and may be implemented as a simple tag variable, since it holds no additional data besides the value of the tag. Many programming techniques and data structures, including
rope A rope is a group of yarns, plies, fibres, or strands that are twisted or braided together into a larger and stronger form. Ropes have tensile strength and so can be used for dragging and lifting. Rope is thicker and stronger than similarly ...
,
lazy evaluation In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed ( non-strict evaluation) and which also avoids repeated evaluations (sharing). The ...
,
class hierarchy A class hierarchy or inheritance tree in computer science is a classification of object types, denoting objects as the instantiations of classes (class is like a blueprint, the object is what is built from that blueprint) inter-relating the vari ...
(see below),
arbitrary-precision arithmetic In computer science, arbitrary-precision arithmetic, also called bignum arithmetic, multiple-precision arithmetic, or sometimes infinite-precision arithmetic, indicates that calculations are performed on numbers whose digits of precision are li ...
,
CDR coding In computer science CDR coding is a compressed data representation for Lisp linked lists. It was developed and patented by the MIT Artificial Intelligence Laboratory, and implemented in computer hardware in a number of Lisp machines derived from ...
, the
indirection bit Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions i ...
and other kinds of
tagged pointer In computer science, a tagged pointer is a pointer (concretely a memory address) with additional data associated with it, such as an indirection bit or reference count. This additional data is often "folded" into the pointer, meaning stored inline ...
s, etc. are usually implemented using some sort of tagged union. A tagged union can be seen as the simplest kind of
self-describing In computer programming, self-documenting (or self-describing) source code and user interfaces follow naming conventions and structured programming conventions that enable use of the system without prior specific knowledge. In web development, s ...
data format. The tag of the tagged union can be seen as the simplest kind of
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
.


Advantages and disadvantages

The primary advantage of a tagged union over an untagged union is that all accesses are safe, and the compiler can even check that all cases are handled. Untagged unions depend on program logic to correctly identify the currently active field, which may result in strange behavior and hard-to-find bugs if that logic fails. The primary advantage of a tagged union over a simple record containing a field for each type is that it saves storage by overlapping storage for all the types. Some implementations reserve enough storage for the largest type, while others dynamically adjust the size of a tagged union value as needed. When the value is
immutable In object-oriented and functional programming, an immutable object (unchangeable object) is an object whose state cannot be modified after it is created.Goetz et al. ''Java Concurrency in Practice''. Addison Wesley Professional, 2006, Section 3. ...
, it is simple to allocate just as much storage as is needed. The main disadvantage of tagged unions is that the tag occupies space. Since there are usually a small number of alternatives, the tag can often be squeezed into 2 or 3 bits wherever space can be found, but sometimes even these bits are not available. In this case, a helpful alternative may be folded, computed or encoded tags, where the tag value is dynamically computed from the contents of the union field. Common examples of this are the use of ''reserved values'', where, for example, a function returning a positive number may return -1 to indicate failure, and
sentinel value In computer programming, a sentinel value (also referred to as a flag value, trip value, rogue value, signal value, or dummy data) is a special value in the context of an algorithm which uses its presence as a condition of termination, typically ...
s, most often used in
tagged pointer In computer science, a tagged pointer is a pointer (concretely a memory address) with additional data associated with it, such as an indirection bit or reference count. This additional data is often "folded" into the pointer, meaning stored inline ...
s. Sometimes, untagged unions are used to perform bit-level conversions between types, called reinterpret casts in C++. Tagged unions are not intended for this purpose; typically a new value is assigned whenever the tag is changed. Many languages support, to some extent, a universal data type, which is a type that includes every value of every other type, and often a way is provided to test the actual type of a value of the universal type. These are sometimes referred to as ''variants''. While universal data types are comparable to tagged unions in their formal definition, typical tagged unions include a relatively small number of cases, and these cases form different ways of expressing a single coherent concept, such as a data structure node or instruction. Also, there is an expectation that every possible case of a tagged union will be dealt with when it is used. The values of a universal data type are not related and there is no feasible way to deal with them all. Like
option type In programming languages (especially functional programming languages) and type theory, an option type or maybe type is a polymorphic type that represents encapsulation of an optional value; e.g., it is used as the return type of functions whic ...
s and
exception handling In computing and computer programming, exception handling is the process of responding to the occurrence of ''exceptions'' – anomalous or exceptional conditions requiring special processing – during the execution of a program. In general, a ...
, tagged unions are sometimes used to handle the occurrence of exceptional results. Often these tags are folded into the type as "reserved values", and their occurrence is not consistently checked: this is a fairly common source of programming errors. This use of tagged unions can be formalized as a
monad Monad may refer to: Philosophy * Monad (philosophy), a term meaning "unit" **Monism, the concept of "one essence" in the metaphysical and theological theory ** Monad (Gnosticism), the most primal aspect of God in Gnosticism * ''Great Monad'', an ...
with the following functions: :\text\colon A \to \left( A + E \right) = a \mapsto \text \, a :\text\colon \left( A + E \right) \to \left(A \to \left(B + E \right) \right) \to \left( B + E \right) = a \mapsto f \mapsto \begin \text \, e & \text \ a = \text \, e\\ f \, a' & \text \ a = \text \, a' \end where "value" and "err" are the constructors of the union type, ''A'' and ''B'' are valid result types and ''E'' is the type of error conditions. Alternately, the same monad may be described by ''return'' and two additional functions, ''fmap'' and ''join'': :\text \colon (A \to B) \to \left( \left( A + E \right) \to \left( B + E \right) \right) = f \mapsto a \mapsto \begin \text \, e & \text \ a = \text \, e \\ \text \, \text \, f \, a' \, \text & \text \ a = \text \, a' \end :\text \colon ((A + E ) + E) \to (A + E) = a \mapsto \begin \text \, e & \mbox \ a = \text \, e\\ \text \, e & \text \ a = \text \, \text \, e \, \text \\ \text \, a' & \text \ a = \text \, \text \, a' \, \text \end


Examples

Say we wanted to build a
binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary tr ...
of integers. In ML, we would do this by creating a datatype like this: datatype tree = Leaf , Node of (int * tree * tree) This is a tagged union with two cases: one, the leaf, is used to terminate a path of the tree, and functions much like a null value would in imperative languages. The other branch holds a node, which contains an integer and a left and right subtree. Leaf and Node are the constructors, which enable us to actually produce a particular tree, such as: Node(5, Node(1, Leaf, Leaf), Node(3, Leaf, Node(4, Leaf, Leaf))) which corresponds to this tree: Now we can easily write a typesafe function that, say, counts the number of nodes in the tree: fun countNodes(Leaf) = 0 , countNodes(Node(int, left, right)) = 1 + countNodes(left) + countNodes(right)


Timeline of language support


1960s

In
ALGOL 68 ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously de ...
, tagged unions are called ''united modes'', the tag is implicit, and the case construct is used to determine which field is tagged: mode node = union (real, int, compl, string); Usage example for union case of node: node n := "1234";   case n in (real r): print(("real:", r)), (int i): print(("int:", i)), (compl c): print(("compl:", c)), (string s): print(("string:", s)) out print(("?:", n)) esac


1970s & 1980s

Although primarily only
functional languages In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
such as ML (from the 1970s) and
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lang ...
(from the 1990s) give a central role to tagged unions and have the power to check that all cases are handled, other languages have support for tagged unions as well. However, in practice they can be less efficient in non-functional languages due to optimizations enabled by functional language compilers that can eliminate explicit tag checks and avoid explicit storage of tags. Pascal, Ada, and
Modula-2 Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It ...
call them variant records (formally discriminated type in Ada), and require the tag field to be manually created and the tag values specified, as in this Pascal example: type shapeKind = (square, rectangle, circle); shape = record centerx : integer; centery : integer; case kind : shapeKind of square : (side : integer); rectangle : (width, height : integer); circle : (radius : integer); end; and this Ada equivalent: type Shape_Kind is (Square, Rectangle, Circle); type Shape (Kind : Shape_Kind) is record Center_X : Integer; Center_Y : Integer; case Kind is when Square => Side : Integer; when Rectangle => Width, Height : Integer; when Circle => Radius : Integer; end case; end record; -- Any attempt to access a member whose existence depends -- on a particular value of the discriminant, while the -- discriminant is not the expected one, raises an error. In C and
C++ C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C" ...
, a tagged union can be created from untagged unions using a strict access discipline where the tag is always checked: enum ShapeKind ; struct Shape ; int getSquareSide(struct Shape* s) void setSquareSide(struct Shape* s, int side) /* and so on */ As long as the union fields are only accessed through the functions, the accesses will be safe and correct. The same approach can be used for encoded tags; we simply decode the tag and then check it on each access. If the inefficiency of these tag checks is a concern, they may be automatically removed in the final version. C and C++ also have language support for one particular tagged union: the possibly-null pointer. This may be compared to the option type in ML or the Maybe type in Haskell, and can be seen as a
tagged pointer In computer science, a tagged pointer is a pointer (concretely a memory address) with additional data associated with it, such as an indirection bit or reference count. This additional data is often "folded" into the pointer, meaning stored inline ...
: a tagged union (with an encoded tag) of two types: * Valid pointers, * A
null pointer In computing, a null pointer or null reference is a value saved for indicating that the pointer or reference does not refer to a valid object. Programs routinely use null pointers to represent conditions such as the end of a list of unknown leng ...
type with only one value, null, indicating an exceptional condition. Unfortunately, C compilers do not verify that the null case is always handled, and this is a particularly prevalent source of errors in C code, since there is a tendency to ignore exceptional cases.


2000s

One advanced dialect of C called
Cyclone In meteorology, a cyclone () is a large air mass that rotates around a strong center of low atmospheric pressure, counterclockwise in the Northern Hemisphere and clockwise in the Southern Hemisphere as viewed from above (opposite to an ant ...
has extensive built-in support for tagged unions. The enum types in the
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
,
Haxe Haxe is an open source high-level cross-platform programming language and compiler that can produce applications and source code, for many different computing platforms from one code-base. It is free and open-source software, released under ...
and
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIFT, ...
languages also work as tagged unions. The variant library from
Boost Boost, boosted or boosting may refer to: Science, technology and mathematics * Boost, positive manifold pressure in turbocharged engines * Boost (C++ libraries), a set of free peer-reviewed portable C++ libraries * Boost (material), a material b ...
has demonstrated it was possible to implement a safe tagged union as a library in C++, visitable using function objects. struct display : boost::static_visitor ; boost::variant v = 42; boost::apply_visitor(display(), v); boost::variant v = "hello world"; boost::apply_visitor(display(), v); Scala has case classes: sealed abstract class Tree case object Leaf extends Tree case class Node(value: Int, left: Tree, right: Tree) extends Tree val tree = Node(5, Node(1, Leaf, Leaf), Node(3, Leaf, Node(4, Leaf, Leaf))) Because the class hierarchy is sealed, the compiler can check that all cases are handled in a pattern match: tree match Scala's case classes also permit reuse through subtyping: sealed abstract class Shape(centerX: Int, centerY: Int) case class Square(side: Int, centerX: Int, centerY: Int) extends Shape(centerX, centerY) case class Rectangle(length: Int, height: Int, centerX: Int, centerY: Int) extends Shape(centerX, centerY) case class Circle(radius: Int, centerX: Int, centerY: Int) extends Shape(centerX, centerY) F# has discriminated unions: type Tree = , Leaf , Node of value: int * left: Tree * right: Tree let tree = Node(5, Node(1, Leaf, Leaf), Node(3, Leaf, Node(4, Leaf, Leaf))) Because the defined cases are exhaustive, the compiler can check that all cases are handled in a pattern match: match tree with , Node (x, _, _) -> printfn "top level node value: %i" x , Leaf -> printfn "top level node is a leaf"
Haxe Haxe is an open source high-level cross-platform programming language and compiler that can produce applications and source code, for many different computing platforms from one code-base. It is free and open-source software, released under ...
's enums also work as tagged unions: enum Color These can be matched using a switch expression: switch (color)
Nim Nim is a mathematical two player game. Nim or NIM may also refer to: * Nim (programming language) Nim is a general-purpose, multi-paradigm, statically typed, compiled systems programming language, designed and developed by a team around And ...
has object variants similar in declaration to those in Pascal and Ada: type ShapeKind = enum skSquare, skRectangle, skCircle Shape = object centerX, centerY: int case kind: ShapeKind of skSquare: side: int of skRectangle: length, height: int of skCircle: radius: int Macros can be used to emulate pattern matching or to create syntactic sugar for declaring object variants, seen here as implemented by the packag
patty
import patty proc `~` a: A): ref A = new(result) result[] = a variant List[A]: Nil Cons(x: A, xs: ref List proc listHelper xs: seq : List = if xs.len

0: Nil ) else: Cons(xs ~listHelper(xs .. xs.high) proc list xs: varargs : List = listHelper(@xs) proc sum(xs: List nt: int = (block: match xs: Nil: 0 Cons(y, ys): y + sum(ys[]) ) echo sum(list(1, 2, 3, 4, 5))


2010s

Enums are added in Scala 3, allowing us to rewrite the earlier Scala examples more concisely: enum Tree T case Leaf case Node(x: Int, left: Tree right: Tree enum Shape(centerX: Int, centerY: Int): case Square(side: Int, centerX: Int, centerY: Int) extends Shape(centerY, centerX) case Rectangle(length: Int, height: Int, centerX: Int, centerY: Int) extends Shape(centerX, centerY) case Circle(radius: Int, centerX: Int, centerY: Int) extends Shape(centerX, centerY) The Rust language has extensive support for tagged unions, called enums. For example: enum Tree It also allows matching on unions: let tree = Tree::Node( 2, Box::new(Tree::Node(0, Box::new(Tree::Leaf), Box::new(Tree::Leaf))), Box::new(Tree::Node(3, Box::new(Tree::Leaf), Box::new(Tree::Node(4, Box::new(Tree::Leaf), Box::new(Tree::Leaf))))) ); fn add_values(tree: Tree) -> i64 assert_eq!(add_values(tree), 9); Rust's error handling model relies extensively on these tagged unions, especially the Option type, which is either None or Some(T), and the Result type, which is either Ok(T) or Err(E).
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIFT, ...
also has substantial support for tagged unions via enumerations. For example: enum Tree let tree = Tree.node( 2, .node(0, .leaf, .leaf), .node(3, .leaf, .node(4, .leaf, .leaf)) ) func add_values(_ tree: Tree) -> Int assert(add_values(tree)

9)
With
TypeScript TypeScript is a free and open source programming language developed and maintained by Microsoft. It is a strict syntactical superset of JavaScript and adds optional static typing to the language. It is designed for the development of large appl ...
it is possible to create tagged unions as well. For example: interface Leaf interface Node type Tree = Leaf , Node const root: Tree = function visit(tree: Tree) Python 3.9
introduces support for typing annotations that can be used to define a tagged union type (PEP-593): Currency = Annotated TypedDict('Currency', , total=False), TaggedUnion,
C++17 C++17 is a version of the ISO/ IEC 14882 standard for the C++ programming language. C++17 replaced the prior version of the C++ standard, called C++14, and was later replaced by C++20. History Before the C++ Standards Committee fixed a 3-year r ...
introduces std::variant an
constexpr if
syntaxhighlight lang="c++"> using Tree = std::variant; struct Leaf ; struct Node ; struct Transverser ; /*Tree forest = ...; std::visit(Transverser, forest);*/


Class hierarchies as tagged unions

In a typical
class hierarchy A class hierarchy or inheritance tree in computer science is a classification of object types, denoting objects as the instantiations of classes (class is like a blueprint, the object is what is built from that blueprint) inter-relating the vari ...
in
object-oriented programming Object-oriented programming (OOP) is a programming paradigm based on the concept of " objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of pr ...
, each subclass can encapsulate data unique to that class. The metadata used to perform
virtual method In object-oriented programming, in languages such as C++, and Object Pascal, a virtual function or virtual method is an inheritable and overridable function or method for which dynamic dispatch is facilitated. This concept is an important par ...
lookup (for example, the object's
vtable In computer programming, a virtual method table (VMT), virtual function table, virtual call table, dispatch table, vtable, or vftable is a mechanism used in a programming language to support dynamic dispatch (or run-time method binding). Whe ...
pointer in most C++ implementations) identifies the subclass and so effectively acts as a tag identifying the particular data stored by the instance (see RTTI). An object's constructor sets this tag, and it remains constant throughout the object's lifetime. Nevertheless, a class hierarchy involves true
subtype polymorphism In programming language theory, subtyping (also subtype polymorphism or inclusion polymorphism) is a form of type polymorphism in which a subtype is a datatype that is related to another datatype (the supertype) by some notion of substitutabilit ...
; it can be extended by creating further subclasses of the same base type, which could not be handled correctly under a tag/dispatch model. Hence, it is usually not possible to do case analysis or dispatch on a subobject's 'tag' as one would for tagged unions. Some languages such as Scala allow base classes to be "sealed", and unify tagged unions with sealed base classes.


See also

*
Discriminator In distributed computing, a discriminator is a typed tag field present in OMG IDL discriminated union type and value definitions that determines which union member is selected in the current union instance. Unlike in some conventional programm ...
, the type tag for discriminated unions in
CORBA The Common Object Request Broker Architecture (CORBA) is a standard defined by the Object Management Group (OMG) designed to facilitate the communication of systems that are deployed on diverse platforms. CORBA enables collaboration between sys ...
*
Variant type (COM) Variant is a data type in certain programming languages, particularly Visual Basic, OCaml, Delphi and C++ when using the Component Object Model. It is an implementation of the eponymous concept in computer science. In Visual Basic (and Visual ...


References


External links


boost::variant
is a C++ typesafe discriminated union

is an implementation of variant type in D 2.0 {{data types Data types Type theory Articles with example Pascal code Articles with example ALGOL 68 code Articles with example C code Articles with example C++ code Articles with example Ada code