HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, corecursion is a type of operation that is dual to
recursion Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematic ...
. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, corecursion works synthetically, starting from a base case and building it up, iteratively producing data further removed from a base case. Put simply, corecursive algorithms use the data that they themselves produce, bit by bit, as they become available, and needed, to produce further bits of data. A similar but distinct concept is '' generative recursion'' which may lack a definite "direction" inherent in corecursion and recursion. Where recursion allows programs to operate on arbitrarily complex data, so long as they can be reduced to simple data (base cases), corecursion allows programs to produce arbitrarily complex and potentially infinite data structures, such as
stream A stream is a continuous body of surface water flowing within the bed and banks of a channel. Depending on its location or certain characteristics, a stream may be referred to by a variety of local or regional names. Long large streams ...
s, so long as it can be produced from simple data (base cases) in a sequence of ''finite'' steps. Where recursion may not terminate, never reaching a base state, corecursion starts from a base state, and thus produces subsequent steps deterministically, though it may proceed indefinitely (and thus not terminate under strict evaluation), or it may consume more than it produces and thus become non-''productive''. Many functions that are traditionally analyzed as recursive can alternatively, and arguably more naturally, be interpreted as corecursive functions that are terminated at a given stage, for example
recurrence relation In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a parameter ...
s such as the factorial. Corecursion can produce both
finite Finite is the opposite of infinite. It may refer to: * Finite number (disambiguation) * Finite set, a set whose cardinality (number of elements) is some natural number * Finite verb, a verb form that has a subject, usually being inflected or marke ...
and
infinite Infinite may refer to: Mathematics * Infinite set, a set that is not a finite set *Infinity, an abstract concept describing something without any limit Music *Infinite (group), a South Korean boy band *''Infinite'' (EP), debut EP of American m ...
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, ...
s as results, and may employ
self-referential Self-reference occurs in natural or formal languages when a sentence, idea or formula refers to itself. The reference may be expressed either directly—through some intermediate sentence or formula—or by means of some encoding. In philoso ...
data structures. Corecursion is often used in conjunction with
lazy evaluation In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed ( non-strict evaluation) and which also avoids repeated evaluations (sharing). The ...
, to produce only a finite subset of a potentially infinite structure (rather than trying to produce an entire infinite structure at once). Corecursion is a particularly important concept in
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions tha ...
, where corecursion and
codata The Committee on Data of the International Science Council (CODATA) was established in 1966 as the Committee on Data for Science and Technology, originally part of the International Council of Scientific Unions, now part of the International ...
allow total languages to work with infinite data structures.


Examples

Corecursion can be understood by contrast with recursion, which is more familiar. While corecursion is primarily of interest in functional programming, it can be illustrated using imperative programming, which is done below using the generator facility in Python. In these examples local variables are used, and assigned values imperatively (destructively), though these are not necessary in corecursion in pure functional programming. In pure functional programming, rather than assigning to local variables, these computed values form an invariable sequence, and prior values are accessed by self-reference (later values in the sequence reference earlier values in the sequence to be computed). The assignments simply express this in the imperative paradigm and explicitly specify where the computations happen, which serves to clarify the exposition.


Factorial

A classic example of recursion is computing the
factorial In mathematics, the factorial of a non-negative denoted is the product of all positive integers less than or equal The factorial also equals the product of n with the next smaller factorial: \begin n! &= n \times (n-1) \times (n-2) \ ...
, which is defined recursively by ''0! := 1'' and ''n! := n × (n - 1)!''. To ''recursively'' compute its result on a given input, a recursive function calls (a copy of) ''itself'' with a different ("smaller" in some way) input and uses the result of this call to construct its result. The recursive call does the same, unless the ''base case'' has been reached. Thus a
call stack In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, program stack, control stack, run-time stack, or mac ...
develops in the process. For example, to compute ''fac(3)'', this recursively calls in turn ''fac(2)'', ''fac(1)'', ''fac(0)'' ("winding up" the stack), at which point recursion terminates with ''fac(0) = 1'', and then the stack unwinds in reverse order and the results are calculated on the way back along the call stack to the initial call frame ''fac(3)'' that uses the result of ''fac(2) = 2'' to calculate the final result as ''3 × 2 = 3 × fac(2) =: fac(3)'' and finally return ''fac(3) = 6''. In this example a function returns a single value. This stack unwinding can be explicated, defining the factorial ''corecursively'', as an
iterator In computer programming, an iterator is an object that enables a programmer to traverse a container, particularly lists. Various types of iterators are often provided via a container's interface. Though the interface and semantics of a given iterat ...
, where one ''starts'' with the case of 1 =: 0!, then from this starting value constructs factorial values for increasing numbers ''1, 2, 3...'' as in the above recursive definition with "time arrow" reversed, as it were, by reading it ''backwards'' as The corecursive algorithm thus defined produces a ''stream'' of ''all'' factorials. This may be concretely implemented as a generator. Symbolically, noting that computing next factorial value requires keeping track of both ''n'' and ''f'' (a previous factorial value), this can be represented as: :n, f = (0, 1) : (n + 1, f \times (n+1)) or in
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
, (\(n,f) -> (n+1, f*(n+1))) `iterate` (0,1) meaning, "starting from n, f = 0, 1, on each step the next values are calculated as n+1, f \times (n+1)". This is mathematically equivalent and almost identical to the recursive definition, but the +1 emphasizes that the factorial values are being built ''up'', going forwards from the starting case, rather than being computed after first going backwards, ''down'' to the base case, with a -1 decrement. The direct output of the corecursive function does not simply contain the factorial n! values, but also includes for each value the auxiliary data of its index ''n'' in the sequence, so that any one specific result can be selected among them all, as and when needed. There is a connection with
denotational semantics In computer science, denotational semantics (initially known as mathematical semantics or Scott–Strachey semantics) is an approach of formalizing the meanings of programming languages by constructing mathematical objects (called ''denotations' ...
, where the denotations of recursive programs is built up corecursively in this way. In Python, a recursive factorial function can be defined as: def factorial(n): """Recursive factorial function.""" if n

0: return 1 else: return n * factorial(n - 1)
This could then be called for example as factorial(5) to compute ''5!''. A corresponding corecursive generator can be defined as: def factorials(): """Corecursive generator.""" n, f = 0, 1 while True: yield f n, f = n + 1, f * (n + 1) This generates an infinite stream of factorials in order; a finite portion of it can be produced by: def n_factorials(k): n, f = 0, 1 while n <= k: yield f n, f = n + 1, f * (n + 1) This could then be called to produce the factorials up to ''5!'' via: for f in n_factorials(5): print(f) If we're only interested in a certain factorial, just the last value can be taken, or we can fuse the production and the access into one function, def nth_factorial(k): n, f = 0, 1 while n < k: n, f = n + 1, f * (n + 1) yield f As can be readily seen here, this is practically equivalent (just by substituting return for the only yield there) to the accumulator argument technique for
tail recursion In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recur ...
, unwound into an explicit loop. Thus it can be said that the concept of corecursion is an explication of the embodiment of iterative computation processes by recursive definitions, where applicable.


Fibonacci sequence

In the same way, the
Fibonacci sequence In mathematics, the Fibonacci numbers, commonly denoted , form a sequence, the Fibonacci sequence, in which each number is the sum of the two preceding ones. The sequence commonly starts from 0 and 1, although some authors start the sequence from ...
can be represented as: :a, b = (0, 1) : (b, a+b) Because the Fibonacci sequence is a
recurrence relation In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a parameter ...
of order 2, the corecursive relation must track two successive terms, with the (b, -) corresponding to shift forward by one step, and the (-, a+b) corresponding to computing the next term. This can then be implemented as follows (using parallel assignment): def fibonacci_sequence(): a, b = 0, 1 while True: yield a a, b = b, a + b In Haskell, map fst ( (\(a,b) -> (b,a+b)) `iterate` (0,1) )


Tree traversal

Tree traversal In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a tree data structure, exactly once. ...
via a depth-first approach is a classic example of recursion. Dually, breadth-first traversal can very naturally be implemented via corecursion. Without using recursion or corecursion specifically, one may traverse a tree by starting at the root node, placing its child nodes in a data structure, then iterating by removing node after node from the data structure while placing each removed node's children back into that data structure. If the data structure is a
stack Stack may refer to: Places * Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group * Blue Stack Mountains, in Co. Donegal, Ireland People * Stack (surname) (including a list of people ...
(LIFO), this yields depth-first traversal, and if the data structure is a queue (FIFO), this yields breadth-first traversal. Using recursion, a (post-order) depth-first traversal can be implemented by starting at the root node and recursively traversing each child subtree in turn (the subtree based at each child node) – the second child subtree does not start processing until the first child subtree is finished. Once a leaf node is reached or the children of a branch node have been exhausted, the node itself is visited (e.g., the value of the node itself is outputted). In this case, the call stack (of the recursive functions) acts as the stack that is iterated over. Using corecursion, a breadth-first traversal can be implemented by starting at the root node, outputting its value, then breadth-first traversing the subtrees – i.e., passing on the ''whole list'' of subtrees to the next step (not a single subtree, as in the recursive approach) – at the next step outputting the value of all of their root nodes, then passing on their child subtrees, etc. In this case the generator function, indeed the output sequence itself, acts as the queue. As in the factorial example (above), where the auxiliary information of the index (which step one was at, ''n'') was pushed forward, in addition to the actual output of ''n''!, in this case the auxiliary information of the remaining subtrees is pushed forward, in addition to the actual output. Symbolically: :v, t = ([], [FullTree]) : (RootValues(v, t), ChildTrees(t)) meaning that at each step, one outputs the list of values of root nodes, then proceeds to the child subtrees. Generating just the node values from this sequence simply requires discarding the auxiliary child tree data, then flattening the list of lists (values are initially grouped by level (depth); flattening (ungrouping) yields a flat linear list). In Haskell, concatMap fst ( (\(v, t) -> (rootValues v t, childTrees t)) `iterate` ([], fullTree) ) These can be compared as follows. The recursive traversal handles a ''leaf node'' (at the ''bottom'') as the base case (when there are no children, just output the value), and ''analyzes'' a tree into subtrees, traversing each in turn, eventually resulting in just leaf nodes – actual leaf nodes, and branch nodes whose children have already been dealt with (cut off ''below''). By contrast, the corecursive traversal handles a ''root node'' (at the ''top'') as the base case (given a node, first output the value), treats a tree as being ''synthesized'' of a root node and its children, then produces as auxiliary output a list of subtrees at each step, which are then the input for the next step – the child nodes of the original root are the root nodes at the next step, as their parents have already been dealt with (cut off ''above''). In the recursive traversal there is a distinction between leaf nodes and branch nodes, while in the corecursive traversal there is no distinction, as each node is treated as the root node of the subtree it defines. Notably, given an infinite tree, the corecursive breadth-first traversal will traverse all nodes, just as for a finite tree, while the recursive depth-first traversal will go down one branch and not traverse all nodes, and indeed if traversing post-order, as in this example (or in-order), it will visit no nodes at all, because it never reaches a leaf. This shows the usefulness of corecursion rather than recursion for dealing with infinite data structures. In Python, this can be implemented as follows. The usual post-order depth-first traversal can be defined as: def df(node): """Post-order depth-first traversal.""" if node is not None: df(node.left) df(node.right) print(node.value) This can then be called by df(t) to print the values of the nodes of the tree in post-order depth-first order. The breadth-first corecursive generator can be defined as: def bf(tree): """Breadth-first corecursive generator.""" tree_list = ree while tree_list: new_tree_list = [] for tree in tree_list: if tree is not None: yield tree.value new_tree_list.append(tree.left) new_tree_list.append(tree.right) tree_list = new_tree_list This can then be called to print the values of the nodes of the tree in breadth-first order: for i in bf(t): print(i)


Definition

Initial data types can be defined as being the
least fixpoint In order theory, a branch of mathematics, the least fixed point (lfp or LFP, sometimes also smallest fixed point) of a function from a partially ordered set to itself is the fixed point which is less than each other fixed point, according to the ...
( up to isomorphism) of some type equation; the
isomorphism In mathematics, an isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between them. The word i ...
is then given by an
initial algebra In mathematics, an initial algebra is an initial object in the category of -algebras for a given endofunctor . This initiality provides a general framework for induction and recursion. Examples Functor Consider the endofunctor sending ...
. Dually, final (or terminal) data types can be defined as being the
greatest fixpoint In order theory, a branch of mathematics, the least fixed point (lfp or LFP, sometimes also smallest fixed point) of a function from a partially ordered set to itself is the fixed point which is less than each other fixed point, according to the o ...
of a type equation; the isomorphism is then given by a final
coalgebra In mathematics, coalgebras or cogebras are structures that are dual (in the category-theoretic sense of reversing arrows) to unital associative algebras. The axioms of unital associative algebras can be formulated in terms of commutative diagrams ...
. If the domain of discourse is the
category of sets In the mathematical field of category theory, the category of sets, denoted as Set, is the category whose objects are sets. The arrows or morphisms between sets ''A'' and ''B'' are the total functions from ''A'' to ''B'', and the composition o ...
and total functions, then final data types may contain infinite, non-wellfounded values, whereas initial types do not. On the other hand, if the domain of discourse is the category of
complete partial order In mathematics, the phrase complete partial order is variously used to refer to at least three similar, but distinct, classes of partially ordered sets, characterized by particular completeness properties. Complete partial orders play a central rol ...
s and
continuous functions In mathematics, a continuous function is a function such that a continuous variation (that is a change without jump) of the argument induces a continuous variation of the value of the function. This means that there are no abrupt changes in valu ...
, which corresponds roughly to the
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
programming language, then final types coincide with initial types, and the corresponding final coalgebra and initial algebra form an isomorphism. Corecursion is then a technique for recursively defining functions whose range (codomain) is a final data type, dual to the way that ordinary
recursion Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematic ...
recursively defines functions whose domain is an initial data type. The discussion below provides several examples in Haskell that distinguish corecursion. Roughly speaking, if one were to port these definitions to the category of sets, they would still be corecursive. This informal usage is consistent with existing textbooks about Haskell. The examples used in this article predate the attempts to define corecursion and explain what it is.


Discussion

The rule for ''primitive corecursion'' on
codata The Committee on Data of the International Science Council (CODATA) was established in 1966 as the Committee on Data for Science and Technology, originally part of the International Council of Scientific Unions, now part of the International ...
is the dual to that for
primitive recursion In computability theory, a primitive recursive function is roughly speaking a function that can be computed by a computer program whose loops are all "for" loops (that is, an upper bound of the number of iterations of every loop can be determined ...
on data. Instead of descending on the argument by pattern-matching on its constructors (that ''were called up before'', somewhere, so we receive a ready-made datum and get at its constituent sub-parts, i.e. "fields"), we ascend on the result by filling-in its "destructors" (or "observers", that ''will be called afterwards'', somewhere - so we're actually calling a constructor, creating another bit of the result to be observed later on). Thus corecursion ''creates'' (potentially infinite) codata, whereas ordinary recursion ''analyses'' (necessarily finite) data. Ordinary recursion might not be applicable to the codata because it might not terminate. Conversely, corecursion is not strictly necessary if the result type is data, because data must be finite. In "Programming with streams in Coq: a case study: the Sieve of Eratosthenes" we find hd (conc a s) = a tl (conc a s) = s (sieve p s) = if div p (hd s) then sieve p (tl s) else conc (hd s) (sieve p (tl s)) hd (primes s) = (hd s) tl (primes s) = primes (sieve (hd s) (tl s)) where primes "are obtained by applying the primes operation to the stream (Enu 2)". Following the above notation, the sequence of primes (with a throwaway 0 prefixed to it) and numbers streams being progressively sieved, can be represented as :p, s = (0, .. : (hd(s), sieve(hd(s),tl(s))) or in Haskell, (\(p, s@(h:t)) -> (h, sieve h t)) `iterate` (0, .. The authors discuss how the definition of sieve is not guaranteed always to be ''productive'', and could become stuck e.g. if called with ,10../code> as the initial stream. Here is another example in Haskell. The following definition produces the list of
Fibonacci numbers In mathematics, the Fibonacci numbers, commonly denoted , form a sequence, the Fibonacci sequence, in which each number is the sum of the two preceding ones. The sequence commonly starts from 0 and 1, although some authors start the sequence from ...
in linear time: fibs = 0 : 1 : zipWith (+) fibs (tail fibs) This infinite list depends on lazy evaluation; elements are computed on an as-needed basis, and only finite prefixes are ever explicitly represented in memory. This feature allows algorithms on parts of codata to terminate; such techniques are an important part of Haskell programming. This can be done in Python as well: from itertools import tee, chain, islice, imap def add(x, y): return x + y def fibonacci(): def deferred_output(): for i in output: yield i result, c1, c2 = tee(deferred_output(), 3) paired = imap(add, c1, islice(c2, 1, None)) output = chain(
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
paired) return result for i in islice(fibonacci(), 20): print(i)
The definition of zipWith can be inlined, leading to this: fibs = 0 : 1 : next fibs where next (a: t@(b:_)) = (a+b):next t This example employs a self-referential ''data structure''. Ordinary recursion makes use of self-referential ''functions'', but does not accommodate self-referential data. However, this is not essential to the Fibonacci example. It can be rewritten as follows: fibs = fibgen (0,1) fibgen (x,y) = x : fibgen (y,x+y) This employs only self-referential ''function'' to construct the result. If it were used with strict list constructor it would be an example of runaway recursion, but with non-strict list constructor this guarded recursion gradually produces an indefinitely defined list. Corecursion need not produce an infinite object; a corecursive queue is a particularly good example of this phenomenon. The following definition produces a breadth-first traversal of a binary tree in linear time: data Tree a b = Leaf a , Branch b (Tree a b) (Tree a b) bftrav :: Tree a b -> ree a bbftrav tree = queue where queue = tree : gen 1 queue gen 0 p = [] gen len (Leaf _ : s) = gen (len-1) s gen len (Branch _ l r : s) = l : r : gen (len+1) s This definition takes an initial tree and produces a list of subtrees. This list serves dual purpose as both the queue and the result (''gen len p'' produces its output ''len'' notches after its input back-pointer, ''p'', along the ''queue''). It is finite if and only if the initial tree is finite. The length of the queue must be explicitly tracked in order to ensure termination; this can safely be elided if this definition is applied only to infinite trees. Another particularly good example gives a solution to the problem of breadth-first labeling.Jones and Gibbons 1992. The function label visits every node in a binary tree in a breadth first fashion, and replaces each label with an integer, each subsequent integer is bigger than the last by one. This solution employs a self-referential data structure, and the binary tree can be finite or infinite. label :: Tree a b -> Tree Int Int label t = t′ where (t′, ns) = go t (1:ns) go :: Tree a b -> nt -> (Tree Int Int, nt go (Leaf _ ) (n:ns) = (Leaf n , n+1 : ns ) go (Branch _ l r) (n:ns) = (Branch n l′ r′ , n+1 : ns′′) where (l′, ns′ ) = go l ns (r′, ns′′) = go r ns′ An
apomorphism In formal methods of computer science, an apomorphism (from '' ἀπό'' — Greek for "apart") is the categorical dual of a paramorphism and an extension of the concept of anamorphism (coinduction). Whereas a paramorphism models primitive recurs ...
(such as an
anamorphism In computer programming, an anamorphism is a function that generates a sequence by repeated application of the function to its previous result. You begin with some value A and apply a function f to it to get B. Then you apply f to B to get C, and ...
, such as unfold) is a form of corecursion in the same way that a
paramorphism In formal methods of computer science, a paramorphism (from Greek '' παρά'', meaning "close together") is an extension of the concept of catamorphism first introduced by Lambert Meertens to deal with a form which “eats its argument and kee ...
(such as a catamorphism, such as fold) is a form of recursion. The
Coq Coq is an interactive theorem prover first released in 1989. It allows for expressing mathematical assertions, mechanically checks proofs of these assertions, helps find formal proofs, and extracts a certified program from the constructive proof ...
proof assistant supports corecursion and
coinduction In computer science, coinduction is a technique for defining and proving properties of systems of concurrent interacting objects. Coinduction is the mathematical dual to structural induction. Coinductively defined types are known as codata and ...
using the CoFixpoint command.


History

Corecursion, referred to as ''circular programming,'' dates at least to , who credits John Hughes and Philip Wadler; more general forms were developed in . The original motivations included producing more efficient algorithms (allowing 1 pass over data in some cases, instead of requiring multiple passes) and implementing classical data structures, such as doubly linked lists and queues, in functional languages.


See also

*
Bisimulation In theoretical computer science a bisimulation is a binary relation between state transition systems, associating systems that behave in the same way in that one system simulates the other and vice versa. Intuitively two systems are bisimilar if ...
*
Coinduction In computer science, coinduction is a technique for defining and proving properties of systems of concurrent interacting objects. Coinduction is the mathematical dual to structural induction. Coinductively defined types are known as codata and ...
*
Recursion Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematic ...
*
Anamorphism In computer programming, an anamorphism is a function that generates a sequence by repeated application of the function to its previous result. You begin with some value A and apply a function f to it to get B. Then you apply f to B to get C, and ...


Notes


References

* * * * * * * * * * * * {{refend Theoretical computer science Self-reference Articles with example Haskell code Articles with example Python (programming language) code Functional programming Category theory Recursion