computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, program synthesis is the task to construct a program that provably satisfies a given high-level

formal specification In computer science, formal specifications are mathematically based techniques whose purpose is to help with the implementation of systems and software. They are used to describe a system, to analyze its behavior, and to aid in its design by verify ...

. In contrast to

program verification In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of a system with respect to a certain formal specification or property, using formal methods of mathematics. Formal ver ...

, the program is to be constructed rather than given; however, both fields make use of formal proof techniques, and both comprise approaches of different degrees of automation. In contrast to

automatic programming In computer science, automatic programming is a type of computer programming in which some mechanism generates a computer program, to allow human programmers to write the code at a higher abstraction level. There has been little agreement on the ...

techniques, specifications in program synthesis are usually non-

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

ic statements in an appropriate

logical calculus A formal system is an abstract structure and formalization of an axiomatic system used for deducing, using rules of inference, theorems from axioms. In 1921, David Hilbert proposed to use formal systems as the foundation of knowledge in ma ...

. The primary application of program synthesis is to relieve the programmer of the burden of writing correct, efficient code that satisfies a specification. However, program synthesis also has applications to superoptimization and inference of

loop invariant In computer science, a loop invariant is a property of a program loop that is true before (and after) each iteration. It is a logical assertion, sometimes checked with a code assertion. Knowing its invariant(s) is essential in understanding th ...

Origin

During the Summer Institute of Symbolic Logic at Cornell University in 1957,

Alonzo Church Alonzo Church (June 14, 1903 – August 11, 1995) was an American computer scientist, mathematician, logician, and philosopher who made major contributions to mathematical logic and the foundations of theoretical computer science. He is bes ...

defined the problem to synthesize a circuit from mathematical requirements. Even though the work only refers to circuits and not programs, the work is considered to be one of the earliest descriptions of program synthesis and some researchers refer to program synthesis as "Church's Problem". In the 1960s, a similar idea for an "automatic programmer" was explored by researchers in artificial intelligence. Since then, various research communities considered the problem of program synthesis. Notable works include the 1969 automata-theoretic approach by Büchi and Landweber, and the works by

Manna Manna (, ; ), sometimes or archaically spelled Mahna or Mana, is described in the Bible and the Quran as an edible substance that God in Abrahamic religions, God bestowed upon the Israelites while they were wandering the desert during the 40-year ...

and Waldinger (c. 1980). The development of modern

high-level programming language A high-level programming language is a programming language with strong Abstraction (computer science), abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be ea ...

s can also be understood as a form of program synthesis.

21st century developments

The early 21st century has seen a surge of practical interest in the idea of program synthesis in the

formal verification In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of a system with respect to a certain formal specification or property, using formal methods of mathematics. Formal ver ...

community and related fields. Armando Solar-Lezama showed that it is possible to encode program synthesis problems in

Boolean logic In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variable (mathematics), variables are the truth values ''true'' and ''false'', usually denot ...

and use algorithms for the

Boolean satisfiability problem In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) asks whether there exists an Interpretation (logic), interpretation that Satisf ...

to automatically find programs. A broader conceptual development relevant to program synthesis was introduced by Zenil et al. (2019) , who proposed a general framework for extracting generative rules from complex dynamical systems. Based on Algorithmic information theory and an associated algorithmic information calculus (AIC) , the method—called Algorithmic Information Dynamics (AID)—was applied to reconstruct phase spaces and identify causal mechanisms in discrete systems, including cellular automata. Their approach employed perturbation analysis to quantify the algorithmic complexity of system components, enabling rule inference without requiring explicit kinetic equations. This framework provided insights into the causal structure of systems and their reprogrammability toward desired behaviors.

Syntax-guided synthesis

In 2013, a unified framework for program synthesis problems called Syntax-guided Synthesis (stylized SyGuS) was proposed by researchers at UPenn,

UC Berkeley The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California), is a public land-grant research university in Berkeley, California, United States. Founded in 1868 and named after the Anglo-Irish philosopher George Berkele ...

, and

MIT The Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of modern technology and sc ...

. The input to a SyGuS algorithm consists of a logical specification along with a

context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...

of expressions that constrains the syntax of valid solutions. For example, to synthesize a function that returns the maximum of two integers, the logical specification might look like this: and the grammar might be: ::= x , y , 0 , 1 , + , ite(, , ) ::= <= where "ite" stands for "if-then-else". The expression ite(x <= y, y, x) would be a valid solution, because it conforms to the grammar and the specification. From 2014 through 2019, the yearly Syntax-Guided Synthesis Competition (or SyGuS-Comp) compared the different algorithms for program synthesis in a competitive event. The competition used a standardized input format, SyGuS-IF, based on SMT-Lib 2. For example, the following SyGuS-IF encodes the problem of synthesizing the maximum of two integers (as presented above): (set-logic LIA) (synth-fun f ((x Int) (y Int)) Int ((i Int) (c Int) (b Bool)) ((i Int (c x y (+ i i) (ite b i i))) (c Int (0 1)) (b Bool ((<= i i))))) (declare-var x Int) (declare-var y Int) (constraint (>= (f x y) x)) (constraint (>= (f x y) y)) (constraint (or (= (f x y) x) (= (f x y) y))) (check-synth) A compliant solver might return the following output: ((define-fun f ((x Int) (y Int)) Int (ite (<= x y) y x)))

Counter-example guided inductive synthesis

Counter-example guided inductive synthesis (CEGIS) is an effective approach to building sound program synthesizers. CEGIS involves the interplay of two components: a ''generator'' which generates candidate programs, and a ''verifier'' which checks whether the candidates satisfy the specification. Given a set of inputs , a set of possible programs , and a specification , the goal of program synthesis is to find a program in such that for all inputs in , (, ) holds. CEGIS is parameterized over a generator and a verifier: * The ''generator'' takes a set of inputs , and outputs a candidate program that is correct on all the inputs in , that is, a candidate such that for all inputs in , (, ) holds. * The ''verifier'' takes a candidate program and returns ''true'' if the program satisfies on all inputs, and otherwise returns a ''counterexample'', that is, an input in such that (, ) fails. CEGIS runs the generator and verifier run in a loop, accumulating counter-examples: algorithm cegis is input: Program generator ''generate'', verifier ''verify'', specification ''spec'', output: Program that satisfies ''spec'', or failure ''inputs'' := empty set loop ''candidate'' := ''generate''(''spec'', ''inputs'') if ''verify''(''spec'', ''candidate'') then return ''candidate'' else ''verify'' yields a counterexample ''e'' add ''e'' to ''inputs'' end if Implementations of CEGIS typically use

SMT solver In computer science and mathematical logic, satisfiability modulo theories (SMT) is the problem of determining whether a mathematical formula is satisfiable. It generalizes the Boolean satisfiability problem (SAT) to more complex formulas involv ...

s as verifiers. CEGIS was inspired by

counterexample-guided abstraction refinement Counterexample-guided abstraction refinement (CEGAR) is a technique for symbolic model checking. It is also applied in modal logic tableau calculi algorithms to optimise their efficiency. In computer-aided verification and analysis of programs, ...

(CEGAR).

The framework of Manna and Waldinger

The framework of

and Waldinger, published in 1980, starts from a user-given first-order specification formula. For that formula, a proof is constructed, thereby also synthesizing a

functional program In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ma ...

from unifying substitutions. The framework is presented in a table layout, the columns containing: * A line number ("Nr") for reference purposes *

Formulas In science, a formula is a concise way of expressing information symbolically, as in a mathematical formula or a ''chemical formula''. The informal use of the term ''formula'' in science refers to the general construct of a relationship betwe ...

that already have been established, including axioms and preconditions, ("Assertions") * Formulas still to be proven, including postconditions, ("Goals"),The distinction "Assertions" / "Goals" is for convenience only; following the paradigm of

proof by contradiction In logic, proof by contradiction is a form of proof that establishes the truth or the validity of a proposition by showing that assuming the proposition to be false leads to a contradiction. Although it is quite freely used in mathematical pr ...

, a Goal

F

is equivalent to an assertion

\lnot F

. *

Terms Term may refer to: Language *Terminology, context-specific nouns or compound words **Technical term (or ''term of art''), used by specialists in a field ***Scientific terminology, used by scientists *Term (argumentation), part of an argument in d ...

denoting a valid output value ("Program")When

F

and

s

is the Goal formula and the Program term in a line, respectively, then in all cases where

F

holds,

s

is a valid output of the program to be synthesized. This invariant is maintained by all proof rules. An Assertion formula usually is not associated with a Program term. * A justification for the current line ("Origin") Initially, background knowledge, pre-conditions, and post-conditions are entered into the table. After that, appropriate proof rules are applied manually. The framework has been designed to enhance human readability of intermediate formulas: contrary to classical resolution, it does not require

clausal normal form In Boolean algebra, a formula is in conjunctive normal form (CNF) or clausal normal form if it is a conjunction of one or more clauses, where a clause is a disjunction of literals; otherwise put, it is a product of sums or an AND of ORs. In au ...

, but allows one to reason with formulas of arbitrary structure and containing any junctors (" non-clausal resolution"). The proof is complete when

\it true

has been derived in the ''Goals'' column, or, equivalently,

\it false

in the ''Assertions'' column. Programs obtained by this approach are guaranteed to satisfy the specification formula started from; in this sense they are ''correct by construction''. Only a minimalist, yet

Turing-complete In computability theory, a system of data-manipulation rules (such as a model of computation, a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be ...

purely functional programming language In computer science, purely functional programming usually designates a programming paradigm—a style of building the structure and elements of computer programs—that treats all computation as the evaluation of mathematical functions. Program s ...

, consisting of conditional, recursion, and arithmetic and other operatorsOnly the conditional operator ( ?:) is supported from the beginning. However, arbitrary new operators and relations can be added by providing their properties as axioms. In the toy example below, only the properties of

=

and

\leq

that are actually needed in the proof have been axiomatized, in line 1 to 3. is supported. Case studies performed within this framework synthesized algorithms to compute e.g. division,

remainder In mathematics, the remainder is the amount "left over" after performing some computation. In arithmetic, the remainder is the integer "left over" after dividing one integer by another to produce an integer quotient ( integer division). In a ...

square root In mathematics, a square root of a number is a number such that y^2 = x; in other words, a number whose ''square'' (the result of multiplying the number by itself, or y \cdot y) is . For example, 4 and −4 are square roots of 16 because 4 ...

, term unification, answers to

relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...

queries and several

sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a List (computing), list into an Total order, order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending ...

Proof rules

Proof rules include: *Non-clausal resolution (see table). :For example, line 55 is obtained by resolving Assertion formulas

E

from 51 and

F

from 52 which both share some common subformula

p

. The resolvent is formed as the disjunction of

E

, with

p

replaced by

\it true

, and

F

, with

p

replaced by

\it false

. This resolvent logically follows from the conjunction of

E

and

F

. More generally,

E

and

F

need to have only two unifiable subformulas

p_1

and

p_2

, respectively; their resolvent is then formed from

E \theta

and

F \theta

as before, where

\theta

is the most general unifier of

p_1

and

p_2

. This rule generalizes resolution of clauses. :Program terms of parent formulas are combined as shown in line 58 to form the output of the resolvent. In the general case,

\theta

is applied to the latter also. Since the subformula

p

appears in the output, care must be taken to resolve only on subformulas corresponding to

computable Computability is the ability to solve a problem by an effective procedure. It is a key topic of the field of computability theory within mathematical logic and the theory of computation within computer science. The computability of a problem is cl ...

properties. *Logical transformations. :For example,

E \land (F \lor G)

can be transformed to

(E \land F) \lor (E \land G)

) in Assertions as well as in Goals, since both are equivalent. *Splitting of conjunctive assertions and of disjunctive goals. :An example is shown in lines 11 to 13 of the toy example below. *

Structural induction Structural induction is a proof method that is used in mathematical logic (e.g., in the proof of Łoś' theorem), computer science, graph theory, and some other mathematical fields. It is a generalization of mathematical induction over natural ...

. :This rule allows for synthesis of recursive functions. For a given pre- and postcondition "Given

x

such that

\textit(x)

, find

f(x) = y

such that

\textit(x,y)

", and an appropriate user-given

well-ordering In mathematics, a well-order (or well-ordering or well-order relation) on a set is a total ordering on with the property that every non-empty subset of has a least element in this ordering. The set together with the ordering is then called ...

\prec

of the domain of

x

, it is always sound to add an Assertion "

x' \prec x \land \textit(x') \implies \textit(x',f(x'))

". Resolving with this assertion can introduce a recursive call to

f

in the Program term. :An example is given in Manna, Waldinger (1980), p.108-111, where an algorithm to compute quotient and remainder of two given integers is synthesized, using the well-order

(n',d') \prec (n,d)

defined by

0 \leq n' < n

(p.110). Murray has shown these rules to be

complete Complete may refer to: Logic * Completeness (logic) * Completeness of a theory, the property of a theory that every formula in the theory's language or its negation is provable Mathematics * The completeness of the real numbers, which implies t ...

for

first-order logic First-order logic, also called predicate logic, predicate calculus, or quantificational logic, is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over ...

. In 1986, Manna and Waldinger added generalized E-resolution and paramodulation rules to handle also equality; later, these rules turned out to be incomplete (but nevertheless

sound In physics, sound is a vibration that propagates as an acoustic wave through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the ''reception'' of such waves and their ''perception'' by the br ...

Example

As a toy example, a functional program to compute the maximum

M

of two numbers

x

and

y

can be derived as follows. Starting from the requirement description "''The maximum is larger than or equal to any given number, and is one of the given numbers''", the first-order formula

\forall X \forall Y \exists M: X \leq M \land Y \leq M \land (X=M \lor Y=M)

is obtained as its formal translation. This formula is to be proved. By reverse

Skolemization In mathematical logic, a Well-formed_formula, formula of first-order logic is in Skolem normal form if it is in prenex normal form with only Universal quantification, universal first-order quantifiers. Every first-order Well-formed formula, formu ...

,While ordinary Skolemization preserves satisfiability, reverse Skolemization, i.e. replacing universally quantified variables by functions, preserves validity. the specification in line 10 is obtained, an upper- and lower-case letter denoting a variable and a

Skolem constant In mathematical logic, a formula of first-order logic is in Skolem normal form if it is in prenex normal form with only universal first-order quantifiers. Every first-order formula may be converted into Skolem normal form while not changing its ...

, respectively. After applying a transformation rule for the

distributive law In mathematics, the distributive property of binary operations is a generalization of the distributive law, which asserts that the equality x \cdot (y + z) = x \cdot y + x \cdot z is always true in elementary algebra. For example, in elementary ...

in line 11, the proof goal is a disjunction, and hence can be split into two cases, viz. lines 12 and 13. Turning to the first case, resolving line 12 with the axiom in line 1 leads to instantiation of the program variable

M

in line 14. Intuitively, the last conjunct of line 12 prescribes the value that

M

must take in this case. Formally, the non-clausal resolution rule shown in line 57 above is applied to lines 12 and 1, with * being the common instance of and , obtained by syntactically unifying the latter formulas, * being , obtained from instantiated line 1 (appropriately padded to make the context around visible), and * being , obtained from instantiated line 12, yielding

\lnot (

)

, which simplifies to

x \leq x \land y \leq x

. In a similar way, line 14 yields line 15 and then line 16 by resolution. Also, the second case,

x \leq M \land y \leq M \land y = M

in line 13, is handled similarly, yielding eventually line 18. In a last step, both cases (i.e. lines 16 and 18) are joined, using the resolution rule from line 58; to make that rule applicable, the preparatory step 15→16 was needed. Intuitively, line 18 could be read as "in case

x \leq y

, the output

y

is valid (with respect to the original specification), while line 15 says "in case

y \leq x

, the output

x

is valid; the step 15→16 established that both cases 16 and 18 are complementary.Axiom 3 was needed for that; in fact, if

\leq

wasn't a

total order In mathematics, a total order or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation \leq on some set X, which satisfies the following for all a, b and c in X: # a \leq a ( re ...

, no maximum could be computed for uncomparable inputs

x,y

. Since both line 16 and 18 comes with a program term, a

conditional expression In computer science, conditionals (that is, conditional statements, conditional expressions and conditional constructs) are programming language constructs that perform different computations or actions or return different values depending on t ...

results in the program column. Since the goal formula

\textit

has been derived, the proof is done, and the program column of the "

\textit

" line contains the program.

Notes

References

* * * * {{DEFAULTSORT:Program Synthesis Programming paradigms