computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, Scott encoding is a way to represent (recursive) data types in the

lambda calculus In mathematical logic, the lambda calculus (also written as ''λ''-calculus) is a formal system for expressing computability, computation based on function Abstraction (computer science), abstraction and function application, application using var ...

Church encoding In mathematics, Church encoding is a means of representing data and operators in the lambda calculus. The Church numerals are a representation of the natural numbers using lambda notation. The method is named for Alonzo Church, who first encoded d ...

performs a similar function. The data and operators form a mathematical structure which is embedded in the lambda calculus. Whereas Church encoding starts with representations of the basic data types, and builds up from it, Scott encoding starts from the simplest method to compose algebraic data types. Mogensen–Scott encoding extends and slightly modifies Scott encoding by applying the encoding to

Metaprogramming Metaprogramming is a computer programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyse, or transform other programs, and even modi ...

. This encoding allows the representation of

terms, as data, to be operated on by a meta program.

History

Scott encoding appears first in a set of unpublished lecture notes by

Dana Scott Dana Stewart Scott (born October 11, 1932) is an American logician who is the emeritus Hillman University Professor of Computer Science, Philosophy, and Mathematical Logic at Carnegie Mellon University; he is now retired and lives in Berkeley, C ...

whose first citation occurs in the book ''Combinatorial Logic, Volume II''. Michel Parigot gave a logical interpretation of and strongly normalizing recursor for Scott-encoded numerals, referring to them as the "Stack type" representation of numbers. Torben Mogensen later extended Scott encoding for the encoding of Lambda terms as data.

Discussion

Lambda calculus allows data to be stored as parameters to a function that does not yet have all the parameters required for application. For example, :

((\lambda x_1 \ldots x_n.\lambda c.c\ x_1 \ldots x_n)\ v_1 \ldots v_n)\ f

May be thought of as a record or struct where the fields

x_1 \ldots x_n

have been initialized with the values

v_1 \ldots v_n

. These values may then be accessed by applying the term to a function ''f''. This reduces to, :

f\ v_1 \ldots v_n

''c'' may represent a constructor for an algebraic data type in functional languages such as

Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...

. Now suppose there are ''N'' constructors, each with

A_i

arguments; :

\begin
\text & \text & \text \\
\hline
((\lambda x_1 \ldots x_.\lambda c_1 \ldots c_N.c_1\ x_1 \ldots x_)\ v_1 \ldots v_) &
f_1 \ldots f_N &
f_1\ v_1 \ldots v_ \\
((\lambda x_1 \ldots x_.\lambda c_1 \ldots c_N.c_2\ x_1 \ldots x_)\ v_1 \ldots v_) &
f_1 \ldots f_N &
f_2\ v_1 \ldots v_ \\
\vdots & \vdots & \vdots \\

((\lambda x_1 \ldots x_.\lambda c_1 \ldots c_N.c_N\ x_1 \ldots x_)\ v_1 \ldots v_) &
f_1 \ldots f_N &
f_N\ v_1 \ldots v_
\end

Each constructor selects a different function from the function parameters

f_1 \ldots f_N

. This provides branching in the process flow, based on the constructor. Each constructor may have a different

arity In logic, mathematics, and computer science, arity () is the number of arguments or operands taken by a function, operation or relation. In mathematics, arity may also be called rank, but this word can have many other meanings. In logic and ...

(number of parameters). If the constructors have no parameters then the set of constructors acts like an ''enum''; a type with a fixed number of values. If the constructors have parameters, recursive data structures may be constructed.

Definition

Let ''D'' be a datatype with ''N'' constructors,

\_^N

, such that constructor

c_i

has

A_i

Scott encoding

The Scott encoding of constructor

c_i

of the data type ''D'' is :

\lambda x_1 \ldots x_ . \lambda c_1 \ldots c_N . c_i\ x_1 \ldots x_

Mogensen–Scott encoding

Mogensen extends Scott encoding to encode any untyped lambda term as data. This allows a lambda term to be represented as data, within a Lambda calculus meta program. The meta function ''mse'' converts a lambda term into the corresponding data representation of the lambda term; :

\\ \end

The "lambda term" is represented as a

tagged union In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. ...

with three cases: * Constructor ''a'' - a variable (

1, not recursive) * Constructor ''b'' - function application (arity 2, recursive in both arguments), * Constructor ''c'' - lambda-abstraction (arity 1, recursive). For example, :

)\\ \lambda a, b, c.c\ (\lambda x.\lambda a, b, c.b\ (\lambda a, b, c.a\ f)\ (\lambda a, b, c.b\ (\lambda a, b, c.a\ x)\ (\lambda a, b, c.a\ x))) \end

Comparison to the Church encoding

The Scott encoding coincides with the

for booleans. Church encoding of pairs may be generalized to arbitrary data types by encoding

c_i

of ''D'' above as :

\lambda x_1 \ldots x_ . \lambda c_1 \ldots c_N . c_i (x_1 c_1 \ldots c_N) \ldots (x_ c_1 \ldots c_N)

compare this to the Mogensen Scott encoding, :

\lambda x_1 \ldots x_ . \lambda c_1 \ldots c_N . c_i x_1 \ldots x_

With this generalization, the Scott and Church encodings coincide on all enumerated datatypes (such as the boolean datatype) because each constructor is a constant (no parameters). Concerning the practicality of using either the Church or Scott encoding for programming, there is a symmetric trade-off: Church-encoded numerals support a constant-time addition operation and have no better than a linear-time predecessor operation; Scott-encoded numerals support a constant-time predecessor operation and have no better than a linear-time addition operation.

Type definitions

Church-encoded data and operations on them are typable in

system F System F (also polymorphic lambda calculus or second-order lambda calculus) is a typed lambda calculus that introduces, to simply typed lambda calculus, a mechanism of universal quantification over types. System F formalizes parametric polymorph ...

, as are Scott-encoded data and operations. However, the encoding is significantly more complicated. The type of the Scott encoding of the natural numbers is the positive recursive type: :

\mu X. \forall R. R \to (X \to R) \to R

Full recursive types are not part of System F, but positive recursive types are expressible in System F via the encoding: :

\to X) \to X)

Combining these two facts yields the System F type of the Scott encoding: :

\forall X. (((\forall R. R \to (X \to R) \to R) \to X) \to X)

This can be contrasted with the type of the Church encoding: :

\forall X. X \to (X \to X) \to X

The Church encoding is a second-order type, but the Scott encoding is fourth-order!

Notes

References

*Stump, A. (2009)
Directly reflective meta-programming
''Higher-Order and Symbolic Computation, 22'', 115-144. *Mogensen, T.Æ. (1992)
Efficient Self-Interpretations in lambda Calculus
''J. Funct. Program., 2'', 345-363. {{DEFAULTSORT:Mogensen-Scott encoding Lambda calculus