History of compiler construction
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
is a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. Computer programs are one component of software, which also includes software documentation, documentation and oth ...
that transforms
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...
written in a
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
or computer language (the ''source language''), into another computer language (the ''target language'', often having a binary form known as ''
object code In computing, object code or object module is the product of a compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ...
'' or ''
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
''). The most common reason for transforming source code is to create an
executable In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data fil ...
program. Any program written in a
high-level programming language In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be easier to u ...
must be translated to object code before it can be executed, so all programmers using such a language use a compiler or an interpreter. Thus, compilers are very important to programmers. Improvements to a compiler may lead to a large number of improved features in executable programs. The Production Quality Compiler-Compiler, in the late 1970s, introduced the principles of compiler organization that are still widely used today (e.g., a front-end handling syntax and semantics and a back-end generating machine code).


First compilers

Software for early computers was primarily written in
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
, and before that directly in
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
. It is usually more productive for a programmer to use a high-level language, and programs written in a high-level language can be
reused Reuse is the action or practice of using an item, whether for its original purpose (conventional reuse) or to fulfill a different function ( creative reuse or repurposing). It should be distinguished from recycling, which is the breaking down of ...
on different kinds of computers. Even so, it took a while for compilers to become established, because they generated code that did not perform as well as hand-written assembler, they were daunting development projects in their own right, and the very limited
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remember ...
capacity of early computers created many technical problems for practical compiler implementations. The first practical compiler was written by
Corrado Böhm Corrado Böhm (17 January 1923 – 23 October 2017) was a Professor Emeritus at the University of Rome "La Sapienza" and a computer scientist known especially for his contributions to the theory of structured programming, constructive mathemati ...
in 1951 for his PhD thesis, one of the first computer science doctorates awarded anywhere in the world. The first implemented compiler was written by
Grace Hopper Grace Brewster Hopper (; December 9, 1906 – January 1, 1992) was an American computer scientist, mathematician, and United States Navy rear admiral. One of the first programmers of the Harvard Mark I computer, she was a pioneer of compu ...
, who also coined the term "compiler",
Maurice V. Wilkes Sir Maurice Vincent Wilkes (26 June 1913 – 29 November 2010) was a British computer scientist who designed and helped build the Electronic Delay Storage Automatic Calculator (EDSAC), one of the earliest stored program computers, and who inv ...
. 1968. Computers Then and Now. Journal of the Association for Computing Machinery, 15(1):1–7, January. p. 3 (a comment in brackets added by editor), "(I do not think that the term compiler was then 953in general use, although it had in fact been introduced by Grace Hopper.)"

The World's First COBOL Compilers
referring to her A-0 system which functioned as a loader or
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
, not the modern notion of a compiler. The first Autocode and compiler in the modern sense were developed by Alick Glennie in 1952 at the
University of Manchester The University of Manchester is a public university, public research university in Manchester, England. The main campus is south of Manchester city centre, Manchester City Centre on Wilmslow Road, Oxford Road. The university owns and operates majo ...
for the Mark 1 computer. The FORTRAN team led by John W. Backus at IBM introduced the first commercially available compiler, in 1957, which took 18 person-years to create. The first
ALGOL 58 ALGOL 58, originally named IAL, is one of the family of ALGOL computer programming languages. It was an early compromise design soon superseded by ALGOL 60. According to John Backus The Zurich ACM-GAMM Conference had two principal motives in pro ...
compiler was completed by the end of 1958 by
Friedrich L. Bauer Friedrich Ludwig "Fritz" Bauer (10 June 1924 – 26 March 2015) was a German pioneer of computer science and professor at the Technical University of Munich. Life Bauer earned his Abitur in 1942 and served in the Wehrmacht during World Wa ...
, Hermann Bottenbruch,
Heinz Rutishauser Heinz Rutishauser (30 January 1918 – 10 November 1970) was a Swiss mathematician and a pioneer of modern numerical mathematics and computer science. Life Rutishauser's father died when he was 13 years old and his mother died three years lat ...
, and
Klaus Samelson Klaus Samelson (21 December 1918 – 25 May 1980) was a German mathematician, physicist, and computer pioneer in the area of programming language translation and push-pop stack algorithms for sequential formula translation on computers. Early ...
for the Z22 computer. Bauer et al. had been working on compiler technology for the ''Sequentielle Formelübersetzung'' (i.e. ''sequential formula translation'') in the previous years. By 1960, an extended Fortran compiler, ALTAC, was available on the
Philco Philco (an acronym for Philadelphia Battery Company) is an American electronics industry, electronics manufacturer headquartered in Philadelphia. Philco was a pioneer in battery, radio, and television production. In 1961, the company was purchased ...
2000, so it is probable that a Fortran program was compiled for both IBM and Philco
computer architecture In computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the ...
s in mid-1960. The first known demonstrated
cross-platform In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software ...
high-level language was
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily u ...
. In a demonstration in December 1960, a COBOL program was compiled and executed on both the UNIVAC II and the
RCA The RCA Corporation was a major American electronics company, which was founded as the Radio Corporation of America in 1919. It was initially a patent trust owned by General Electric (GE), Westinghouse, AT&T Corporation and United Fruit Comp ...
501.


Self-hosting compilers

Like any other software, there are benefits from implementing a compiler in a high-level language. In particular, a compiler can be self-hosted – that is, written in the programming language it compiles. Building a self-hosting compiler is a
bootstrapping In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Etymology Tall boots may have a tab, loop or handle at the top known as a bootstrap, allowing one to use fingers ...
problem, i.e. the first such compiler for a language must be either hand written machine code, compiled by a compiler written in another language, or compiled by running the compiler's source on itself in an interpreter.


Corrado Böhm PhD dissertation

Corrado Böhm developed a language, a machine, and a translation method for compiling that language on the machine in his PhD dissertation dated 1951. He not only described a complete compiler, but also defined for the first time that compiler in its own language. The language was interesting in itself, because every statement (including input statements, output statements and control statements) was a special case of an assignment statement.


NELIAC

The Navy Electronics Laboratory International
ALGOL ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the ...
Compiler or
NELIAC The Navy Electronics Laboratory International ALGOL Compiler (NELIAC) is a dialect and compiler implementation of the programming language ALGOL 58, developed by the Navy Electronics Laboratory (NEL) in 1958. It was designed for numeric and logi ...
was a
dialect The term dialect (from Latin , , from the Ancient Greek word , 'discourse', from , 'through' and , 'I speak') can refer to either of two distinctly different types of linguistic phenomena: One usage refers to a variety of a language that is ...
and compiler implementation of the
ALGOL 58 ALGOL 58, originally named IAL, is one of the family of ALGOL computer programming languages. It was an early compromise design soon superseded by ALGOL 60. According to John Backus The Zurich ACM-GAMM Conference had two principal motives in pro ...
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
developed by the
Naval Electronics Laboratory The U.S. Navy Electronics Laboratory (''NEL'') was created in 1945, with consolidation of the naval radio station, radar operators training school, and radio security activity of the Navy Radio and Sound Lab (NRSL) and its wartime partner, the U ...
in 1958. NELIAC was the brainchild of
Harry Huskey Harry Douglas Huskey (January 19, 1916 – April 9, 2017) was an American computer design pioneer. Early life and career Huskey was born in Whittier, in the Smoky Mountains region of North Carolina and grew up in Idaho. He received his bache ...
– then Chairman of the ACM and a well known
computer scientist A computer scientist is a person who is trained in the academic study of computer science. Computer scientists typically work on the theoretical side of computation, as opposed to the hardware side on which computer engineers mainly focus (a ...
(and later academic supervisor of
Niklaus Wirth Niklaus Emil Wirth (born 15 February 1934) is a Swiss computer scientist. He has designed several programming languages, including Pascal, and pioneered several classic topics in software engineering. In 1984, he won the Turing Award, generally ...
), and supported by Maury Halstead, the head of the computational center at NEL. The earliest version was implemented on the prototype USQ-17 computer (called the Countess) at the laboratory. It was the world's first self-compiling compiler – the compiler was first coded in simplified form in assembly language (the ''bootstrap''), then re-written in its own language and compiled by the bootstrap, and finally re-compiled by itself, making the bootstrap obsolete.


Lisp

Another early self-hosting compiler was written for
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispin ...
by Tim Hart and Mike Levin at
MIT The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the m ...
in 1962. They wrote a Lisp compiler in Lisp, testing it inside an existing Lisp interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting. :''The compiler as it exists on the standard compiler tape is a machine language program that was obtained by having the
S-expression In computer programming, an S-expression (or symbolic expression, abbreviated as sexpr or sexp) is an expression in a like-named notation for nested list (tree-structured) data. S-expressions were invented for and popularized by the programming la ...
definition of the compiler work on itself through the interpreter.'' (AI Memo 39) This technique is only possible when an interpreter already exists for the very same language that is to be compiled. It borrows directly from the notion of running a program on itself as input, which is also used in various proofs in
theoretical computer science computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumscribe the ...
, such as the proof that the
halting problem In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a ...
is undecidable.


Forth

Forth Forth or FORTH may refer to: Arts and entertainment * ''forth'' magazine, an Internet magazine * ''Forth'' (album), by The Verve, 2008 * ''Forth'', a 2011 album by Proto-Kaw * Radio Forth, a group of independent local radio stations in Scotla ...
is an example of a self-hosting compiler. The self compilation and cross compilation features of Forth are synonymous with
metacompilation Metacompilation is a computation which involves metasystem transitions (MST) from a computing machine ''M'' to a metamachine ''M' '' which controls, analyzes and imitates the work of ''M''. Semantics-based program transformation, such as partial e ...
and metacompilers. Like
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispin ...
, Forth is an
extensible programming Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language, compiler and runtime environment. Extensible programming languages, supporting this ...
language. It is the
extensible programming Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language, compiler and runtime environment. Extensible programming languages, supporting this ...
language features of Forth and Lisp that enable them to generate new versions of themselves or port themselves to new environments.


Context-free grammars and parsers

A
parser Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lat ...
is an important component of a compiler. It parses the source code of a computer programming language to create some form of internal representation. Programming languages tend to be specified in terms of a
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
because fast and efficient parsers can be written for them. Parsers can be written by hand or generated by a
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
. A context-free grammar provides a simple and precise mechanism for describing how programming language constructs are built from smaller blocks. The formalism of context-free grammars was developed in the mid-1950s by
Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky i ...
. Block structure was introduced into computer programming languages by the ALGOL project (1957–1960), which, as a consequence, also featured a context-free grammar to describe the resulting ALGOL syntax. Context-free grammars are simple enough to allow the construction of efficient parsing algorithms which, for a given string, determine whether and how it can be generated from the grammar. If a programming language designer is willing to work within some limited subsets of context-free grammars, more efficient parsers are possible.


LR parsing

The
LR parser In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parse ...
(left to right) was invented by
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
in 1965 in a paper, "On the Translation of Languages from Left to Right". An LR parser is a parser that reads input from Left to right (as it would appear if visually displayed) and produces a Rightmost derivation. The term LR(''k'') parser is also used, where ''k'' refers to the number of unconsumed lookahead input symbols that are used in making parsing decisions. Knuth proved that LR(''k'') grammars can be parsed with an execution time essentially proportional to the length of the program, and that every LR(''k'') grammar for ''k'' > 1 can be mechanically transformed into an LR(1) grammar for the same language. In other words, it is only necessary to have one symbol lookahead to parse any deterministic context-free grammar (DCFG). Korenjak (1969) was the first to show parsers for programming languages could be produced using these techniques. Frank DeRemer devised the more practical Simple LR (SLR) and Look-ahead LR (LALR) techniques, published in his PhD dissertation at MIT in 1969. This was an important breakthrough, because LR(k) translators, as defined by Donald Knuth, were much too large for implementation on computer systems in the 1960s and 1970s. In practice, LALR offers a good solution; the added power of LALR(1) parsers over SLR(1) parsers (that is, LALR(1) can parse more complex grammars than SLR(1)) is useful, and, though LALR(1) is not comparable with LL(1)(See below) (LALR(1) cannot parse all LL(1) grammars), most LL(1) grammars encountered in practice can be parsed by LALR(1). LR(1) grammars are more powerful again than LALR(1); however, an LR(1) grammar requires a
canonical LR parser In computer science, a canonical LR parser or LR(1) parser is an LR(k) parser for ''k=1'', i.e. with a single lookahead terminal. The special attribute of this parser is that any LR(k) grammar with ''k>1'' can be transformed into an LR(1) grammar. ...
which would be extremely large in size and is not considered practical. The syntax of many
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s are defined by grammars that can be parsed with an LALR(1) parser, and for this reason LALR parsers are often used by compilers to perform syntax analysis of source code. A recursive ascent parser implements an LALR parser using mutually-recursive functions rather than tables. Thus, the parser is ''directly encoded'' in the host language similar to recursive descent. Direct encoding usually yields a parser which is faster than its table-driven equivalent for the same reason that compilation is faster than interpretation. It is also (in principle) possible to hand edit a recursive ascent parser, whereas a tabular implementation is nigh unreadable to the average human. Recursive ascent was first described by Thomas Pennello in his article "Very fast LR parsing" in 1986. The technique was later expounded upon by G.H. Roberts in 1988 as well as in an article by Leermakers, Augusteijn, Kruseman Aretz in 1992 in the journal ''Theoretical Computer Science''.


LL parsing

An LL parser parses the input from Left to right, and constructs a Leftmost derivation of the sentence (hence LL, as opposed to LR). The class of grammars which are parsable in this way is known as the ''LL grammars''. LL grammars are an even more restricted class of context-free grammars than LR grammars. Nevertheless, they are of great interest to compiler writers, because such a parser is simple and efficient to implement. LL(k) grammars can be parsed by a
recursive descent parser In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...
which is usually coded by hand, although a notation such as
META II META II is a domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at UCLA. META II uses what Schorre called syntax equations. Its operation is simply explained as: Each ''syntax equation' ...
might alternatively be used. The design of ALGOL sparked investigation of recursive descent, since the ALGOL language itself is recursive. The concept of recursive descent parsing was discussed in the January 1961 issue of ''
Communications of the ACM ''Communications of the ACM'' is the monthly journal of the Association for Computing Machinery (ACM). It was established in 1958, with Saul Rosen as its first managing editor. It is sent to all ACM members. Articles are intended for readers wi ...
'' in separate papers by A.A. Grau and Edgar T. "Ned" Irons. Richard Waychoff and colleagues also implemented recursive descent in the Burroughs ALGOL compiler in March 1961, the two groups used different approaches but were in at least informal contact. The idea of LL(1) grammars was introduced by Lewis and Stearns (1968). Recursive descent was popularised by
Niklaus Wirth Niklaus Emil Wirth (born 15 February 1934) is a Swiss computer scientist. He has designed several programming languages, including Pascal, and pioneered several classic topics in software engineering. In 1984, he won the Turing Award, generally ...
with
PL/0 PL/0 is a programming language, intended as an educational programming language, that is similar to but much simpler than Pascal, a general-purpose programming language. It serves as an example of how to construct a compiler. It was originally intr ...
, an
educational programming language An educational programming language is a programming language that is designed mostly as an instrument for learning, and less as a tool for writing programs to perform work. Types of educational programming languages Assembly languages Origi ...
used to teach compiler construction in the 1970s. LR parsing can handle a larger range of languages than
LL parsing In computer science, an LL parser (Left-to-right, leftmost derivation) is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence. An LL parser is called an ...
, and is also better at error reporting (This is disputable, REFERENCE is required), i.e. it detects syntactic errors when the input does not conform to the grammar as soon as possible.


Earley parser

In 1970, Jay Earley invented what came to be known as the
Earley parser In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, though (depending on the variant) it may suffer problems with certain nullable grammars. The algorithm, named after its inve ...
. Earley parsers are appealing because they can parse all
context-free language In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG). Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by ...
s reasonably efficiently.


Grammar description languages

John Backus proposed "metalinguistic formulas" to describe the syntax of the new programming language IAL, known today as
ALGOL 58 ALGOL 58, originally named IAL, is one of the family of ALGOL computer programming languages. It was an early compromise design soon superseded by ALGOL 60. According to John Backus The Zurich ACM-GAMM Conference had two principal motives in pro ...
(1959). Backus's work was based on the
Post canonical system A Post canonical system, also known as a Post production system, as created by Emil Post, is a string-manipulation system that starts with finitely-many strings and repeatedly transforms them by applying a finite set j of specified rules of a cert ...
devised by
Emil Post Emil Leon Post (; February 11, 1897 – April 21, 1954) was an American mathematician and logician. He is best known for his work in the field that eventually became known as computability theory. Life Post was born in Augustów, Suwałki Gove ...
. Further development of ALGOL led to
ALGOL 60 ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a ...
; in its report (1963), Peter Naur named Backus's notation Backus normal form (BNF), and simplified it to minimize the character set used. However,
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
argued that BNF should rather be read as
Backus–Naur form In computer science, Backus–Naur form () or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document format ...
, and that has become the commonly accepted usage.
Niklaus Wirth Niklaus Emil Wirth (born 15 February 1934) is a Swiss computer scientist. He has designed several programming languages, including Pascal, and pioneered several classic topics in software engineering. In 1984, he won the Turing Award, generally ...
defined
extended Backus–Naur form In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programm ...
(EBNF), a refined version of BNF, in the early 1970s for PL/0.
Augmented Backus–Naur form In computer science, augmented Backus–Naur form (ABNF) is a metalanguage based on Backus–Naur form (BNF), but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a formal system of a language to be use ...
(ABNF) is another variant. Both EBNF and ABNF are widely used to specify the grammar of programming languages, as the inputs to parser generators, and in other fields such as defining communication protocols.


Parser generators

A
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
generates the lexical-analyser portion of a compiler. It is a program that takes a description of a
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
of a specific programming language and produces a parser for that language. That parser can be used in a compiler for that specific language. The parser detects and identifies the reserved words and symbols of the specific language from a stream of text and returns these as tokens to the code which implements the syntactic validation and translation into object code. This second part of the compiler can also be created by a ''compiler-compiler'' using a formal rules-of-precedence syntax-description as input. The first ''compiler-compiler'' to use that name was written by Tony Brooker in 1960 and was used to create compilers for the
Atlas An atlas is a collection of maps; it is typically a bundle of maps of Earth or of a region of Earth. Atlases have traditionally been bound into book form, but today many atlases are in multimedia formats. In addition to presenting geogra ...
computer at the
University of Manchester The University of Manchester is a public university, public research university in Manchester, England. The main campus is south of Manchester city centre, Manchester City Centre on Wilmslow Road, Oxford Road. The university owns and operates majo ...
, including the Atlas Autocode compiler. However it was rather different from modern compiler-compilers, and today would probably be described as being somewhere between a highly customisable generic compiler and an extensible-syntax language. The name "compiler-compiler" was far more appropriate for Brooker's system than it is for most modern compiler-compilers, which are more accurately described as parser generators. It is almost certain that the "Compiler-Compiler" name has entered common use due to
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
rather than Brooker's work being remembered. In the early 1960s, Robert McClure at
Texas Instruments Texas Instruments Incorporated (TI) is an American technology company headquartered in Dallas, Texas, that designs and manufactures semiconductors and various integrated circuits, which it sells to electronics designers and manufacturers globa ...
invented a compiler-compiler called TMG, the name taken from "transmogrification". In the following years TMG was
ported In software engineering, porting is the process of adapting software for the purpose of achieving some form of execution in a computing environment that is different from the one that a given program (meant for such execution) was originally desi ...
to several
UNIVAC UNIVAC (Universal Automatic Computer) was a line of electronic digital stored-program computers starting with the products of the Eckert–Mauchly Computer Corporation. Later the name was applied to a division of the Remington Rand company an ...
and IBM mainframe computers. The
Multics Multics ("Multiplexed Information and Computing Service") is an influential early time-sharing operating system based on the concept of a single-level memory.Dennis M. Ritchie, "The Evolution of the Unix Time-sharing System", Communications of ...
project, a joint venture between
MIT The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the m ...
and
Bell Labs Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mul ...
, was one of the first to develop an
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
in a high-level language.
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
was chosen as the language, but an external supplier could not supply a working compiler. The Multics team developed their own subset dialect of
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
known as Early PL/I (EPL) as their implementation language in 1964. TMG was ported to
GE-600 series The GE-600 series was a family of 36-bit mainframe computers originating in the 1960s, built by General Electric (GE). When GE left the mainframe business the line was sold to Honeywell, which built similar systems into the 1990s as the division ...
and used to develop EPL by
Douglas McIlroy Malcolm Douglas McIlroy (born 1932) is a mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College. McIlroy is best known for having originally proposed Unix pipelines and developed se ...
, Robert Morris, and others. Not long after
Ken Thompson Kenneth Lane Thompson (born February 4, 1943) is an American pioneer of computer science. Thompson worked at Bell Labs for most of his career where he designed and implemented the original Unix operating system. He also invented the B programmi ...
wrote the first version of
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...
for the
PDP-7 The PDP-7 was a minicomputer produced by Digital Equipment Corporation as part of the PDP series. Introduced in 1964, shipped since 1965, it was the first to use their Flip-Chip technology. With a cost of , it was cheap but powerful by the s ...
in 1969,
Douglas McIlroy Malcolm Douglas McIlroy (born 1932) is a mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College. McIlroy is best known for having originally proposed Unix pipelines and developed se ...
created the new system's first higher-level language: an implementation of McClure's TMG. TMG was also the compiler definition tool used by Ken Thompson to write the compiler for the B language on his PDP-7 in 1970. B was the immediate ancestor of C. An early
LALR parser generator A lookahead LR parser (LALR) generator is a software tool that reads a BNF grammar and creates an LALR parser which is capable of parsing files written in the computer language defined by the BNF grammar. LALR parsers are desirable because they ...
was called "TWS", created by Frank DeRemer and Tom Pennello.


XPL

XPL is a dialect of the
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
, used for the development of compilers for computer languages. It was designed and implemented in 1967 by a team with William M. McKeeman, James J. Horning, and David B. Wortman at
Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is conside ...
and the
University of California, Santa Cruz The University of California, Santa Cruz (UC Santa Cruz or UCSC) is a public land-grant research university in Santa Cruz, California. It is one of the ten campuses in the University of California system. Located on Monterey Bay, on the ed ...
. It was first announced at the 1968
Fall Joint Computer Conference The Joint Computer Conferences were a series of computer conferences in the United States held under various names between 1951 and 1987. The conferences were the venue for presentations and papers representing "cumulative work in the omputerfield ...
in San Francisco. XPL featured a relatively simple translator writing system dubbed
ANALYZER An analyser or analyzer is a tool used to analyze data. For example, a gas analyzer tool is used to analyze gases. It examines the given data and tries to find patterns and relationships. An analyser can be a piece of hardware or software. Autoan ...
, based upon a bottom-up compiler precedence parsing technique called MSP (mixed strategy precedence). XPL was bootstrapped through Burroughs Algol onto the
IBM System/360 The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover both commercial and scientific applic ...
computer. (Some subsequent versions of XPL used on
University of Toronto The University of Toronto (UToronto or U of T) is a public research university in Toronto, Ontario, Canada, located on the grounds that surround Queen's Park. It was founded by royal charter in 1827 as King's College, the first institution ...
internal projects utilized an SLR(1) parser, but those implementations have never been distributed).


Yacc

Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
is a
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
(loosely,
compiler-compiler In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler ...
), not to be confused with lex, which is a
lexical analyzer In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
frequently used as a first stage by Yacc. Yacc was developed by Stephen C. Johnson at
AT&T AT&T Inc. is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the world's largest telecommunications company by revenue and the third largest provider of mobile ...
for the
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...
operating system. The name is an acronym for "
Yet Another Among programmers, yet another (often abbreviated ya, Ya, or YA in the initial part of an acronym) is an idiomatic qualifier in the name of a computer program, organisation, or event that is confessedly unoriginal. Stephen C. Johnson is credited ...
Compiler Compiler." It generates an LALR(1) compiler based on a grammar written in a notation similar to Backus–Naur form. Johnson worked on Yacc in the early 1970s at
Bell Labs Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mul ...
. He was familiar with TMG and its influence can be seen in Yacc and the design of the C programming language. Because Yacc was the default compiler generator on most Unix systems, it was widely distributed and used. Derivatives such as
GNU Bison GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in the BNF notation (a context-free language), warns about any parsing ambiguities, and generates a parser that reads sequen ...
are still in use. The compiler generated by Yacc requires a
lexical analyzer In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
. Lexical analyzer generators, such as lex or
flex Flex or FLEX may refer to: Computing * Flex (language), developed by Alan Kay * FLEX (operating system), a single-tasking operating system for the Motorola 6800 * FlexOS, an operating system developed by Digital Research * FLEX (protocol), a com ...
are widely available. The
IEEE The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operati ...
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming in ...
P1003.2 standard defines the functionality and requirements for both Lex and Yacc.


Coco/R

Coco/R Coco/R is a compiler generator that takes wirth syntax notationIn the manual, however, it is referred as L-attributed Extended Backus–Naur Form syntax (EBNF). grammars of a source language and generates a scanner and a parser for that lang ...
is a
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
that generates LL(1) parsers in Modula-2 (with plug-ins for other languages) from input grammars written in a variant of EBNF. It was developed by Hanspeter Mössenböck at the Swiss Federal Institute of Technology in Zurich (ETHZ) in 1985.


ANTLR

ANTLR In computer-based language recognition, ANTLR (pronounced ''antler''), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), firs ...
is a
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
that generates LL(*) parsers in Java from input grammars written in a variant of EBNF. It was developed by Terence Parr at the University of San Francisco in the early 1990s as a successor of an earlier generator called PCCTS.


Metacompilers

Metacompilers differ from parser generators, taking as input a
program Program, programme, programmer, or programming may refer to: Business and management * Program management, the process of managing several related projects * Time management * Program, a part of planning Arts and entertainment Audio * Programm ...
written in a
metalanguage In logic and linguistics, a metalanguage is a language used to describe another language, often called the ''object language''. Expressions in a metalanguage are often distinguished from those in the object language by the use of italics, quota ...
. Their input consists grammar analyzing formula combined with embedded transform operations that construct abstract syntax trees, or simply output reformatted text strings that may be stack machine code. Many can be programmed in their own metalanguage enabling them to compile themselves, making them self-hosting extensible language compilers. Many metacompilers build on the work of Dewey Val Schorre. His
META II META II is a domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at UCLA. META II uses what Schorre called syntax equations. Its operation is simply explained as: Each ''syntax equation' ...
compiler, first released in 1964, was the first documented metacompiler. Able to define its own language and others, META II accepted syntax formula having imbedded output (code production). It also translated to one of the earliest instances of a
virtual machine In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized h ...
. Lexical analysis was performed by built token recognizing functions: .ID, .STRING, and .NUMBER. Quoted strings in syntax formula recognize lexemes that are not kept. TREE-META, a second generation Schorre metacompiler, appeared around 1968. It extended the capabilities of META II, adding unparse rules separating code production from the grammar analysis. Tree transform operations in the syntax formula produce
abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurr ...
s that the unparse rules operate on. The unparse tree pattern matching provided peephole optimization ability. CWIC, described in a 1970 ACM publication is a third generation Schorre metacompiler that added lexing rules and backtracking operators to the grammar analysis. LISP 2 was married with the unparse rules of TREEMETA in the CWIC generator language. With LISP 2 processing, CWIC can generate fully optimized code. CWIC also provided binary code generation into named code sections. Single and multipass compiles could be implemented using CWIC. CWIC compiled to 8-bit byte-addressable machine code instructions primarily designed to produce IBM System/360 code. Later generations are not publicly documented. One important feature would be the abstraction of the target processor instruction set, generating to a pseudo machine instruction set, macros, that could be separately defined or mapped to a real machine's instructions. Optimizations applying to sequential instructions could then be applied to the pseudo instruction before their expansion to target machine code.


Cross compilation

A
cross compiler A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a PC but generates code that runs on an Android smartphone is a cross c ...
runs in one environment but produces
object code In computing, object code or object module is the product of a compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ...
for another. Cross compilers are used for embedded development, where the target computer has limited capabilities. An early example of cross compilation was AIMICO, where a FLOW-MATIC program on a UNIVAC II was used to generate assembly language for the
IBM 705 The IBM 700/7000 series is a series of large-scale (mainframe) computer systems that were made by IBM through the 1950s and early 1960s. The series includes several different, incompatible processor architectures. The 700s use vacuum-tube lo ...
, which was then assembled on the IBM computer. The
ALGOL 68C ALGOL 68C is an imperative computer programming language, a dialect of ALGOL 68, that was developed by Stephen R. Bourne and Michael Guy to program the Cambridge Algebra System (CAMAL). The initial compiler was written in the Princeton Syntax ...
compiler generated ''ZCODE'' output, that could then be either compiled into the local machine code by a ''ZCODE'' translator or run interpreted. ''ZCODE'' is a register-based intermediate language. This ability to interpret or compile ''ZCODE'' encouraged the porting of ALGOL 68C to numerous different computer platforms.


Optimizing compilers

Compiler optimization In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...
is the process of improving the quality of
object code In computing, object code or object module is the product of a compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ...
without changing the results it produces. The developers of the first FORTRAN compiler aimed to generate code that was ''better'' than the average hand-coded assembler, so that customers would actually use their product. In one of the first real compilers, they often succeeded. Later compilers, like IBM's Fortran IV compiler, placed more priority on good diagnostics and executing more quickly, at the expense of
object code In computing, object code or object module is the product of a compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ...
optimization. It wasn't until the IBM System/360 series that IBM provided two separate compilers—a fast-executing code checker, and a slower, optimizing one.
Frances E. Allen Frances Elizabeth Allen (August 4, 1932August 4, 2020) was an American computer scientist and pioneer in the field of optimizing compilers. Allen was the first woman to become an IBM Fellow, and in 2006 became the first woman to win the Turing ...
, working alone and jointly with John Cocke, introduced many of the concepts for optimization. Allen's 1966 paper, ''Program Optimization,'' introduced the use of
graph data structures Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
to encode program content for optimization. Her 1970 papers, ''Control Flow Analysis'' and ''A Basis for Program Optimization'' established ''intervals'' as the context for efficient and effective data flow analysis and optimization. Her 1971 paper with Cocke, ''A Catalogue of Optimizing Transformations'', provided the first description and systematization of optimizing transformations. Her 1973 and 1974 papers on interprocedural
data flow analysis In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
extended the analysis to whole programs. Her 1976 paper with Cocke describes one of the two main analysis strategies used in optimizing compilers today. Allen developed and implemented her methods as part of compilers for the
IBM 7030 Stretch The IBM 7030, also known as Stretch, was IBM's first transistorized supercomputer. It was the fastest computer in the world from 1961 until the first CDC 6600 became operational in 1964."Designed by Seymour Cray, the CDC 6600 was almost three t ...
-
Harvest Harvesting is the process of gathering a ripe crop from the fields. Reaping is the cutting of grain or pulse for harvest, typically using a scythe, sickle, or reaper. On smaller farms with minimal mechanization, harvesting is the most l ...
and the experimental Advanced Computing System. This work established the feasibility and structure of modern machine- and language-independent optimizers. She went on to establish and lead the PTRAN project on the automatic parallel execution of FORTRAN programs. Her PTRAN team developed new parallelism detection schemes and created the concept of the program dependence graph, the primary structuring method used by most parallelizing compilers. ''Programming Languages and their Compilers'' by John Cocke and
Jacob T. Schwartz __NOTOC__ Jacob Theodore "Jack" Schwartz (January 9, 1930 – March 2, 2009) was an American mathematician, computer scientist, and professor of computer science at the New York University Courant Institute of Mathematical Sciences. He was the ...
, published early in 1970, devoted more than 200 pages to optimization algorithms. It included many of the now familiar techniques such as redundant code elimination and
strength reduction In compiler construction, strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loo ...
.


Peephole optimization

Peephole optimization is a simple but effective optimization technique. It was invented by William M. McKeeman and published in 1965 in CACM. It was used in the XPL compiler that McKeeman helped develop.


Capex COBOL optimizer

Capex Corporation developed the "COBOL Optimizer" in the mid-1970s for
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily u ...
. This type of optimizer depended, in this case, upon knowledge of "weaknesses" in the standard IBM COBOL compiler, and actually replaced (or
patched Patched (Ptc) is a conserved 12-pass transmembrane protein receptor that plays an obligate negative regulatory role in the Hedgehog signaling pathway in insects and vertebrates. Patched is an essential gene in embryogenesis for proper segme ...
) sections of the
object code In computing, object code or object module is the product of a compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ...
with more efficient code. The replacement code might replace a linear table lookup with a
binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the ...
for example or sometimes simply replace a relatively "slow" instruction with a known faster one that was otherwise functionally equivalent within its context. This technique is now known as "
Strength reduction In compiler construction, strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loo ...
". For example, on the
IBM System/360 The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover both commercial and scientific applic ...
hardware the CLI instruction was, depending on the particular model, between twice and 5 times as fast as a CLC instruction for single byte comparisons. Modern compilers typically provide optimization options to allow programmers to choose whether or not to execute an optimization pass.


Diagnostics

When a compiler is given a syntactically incorrect program, a good, clear error message is helpful. From the perspective of the compiler writer, it is often difficult to achieve. The WATFIV Fortran compiler was developed at the
University of Waterloo The University of Waterloo (UWaterloo, UW, or Waterloo) is a public research university with a main campus in Waterloo, Ontario, Canada. The main campus is on of land adjacent to "Uptown" Waterloo and Waterloo Park. The university also operates ...
, Canada in the late 1960s. It was designed to give better error messages than IBM's Fortran compilers of the time. In addition, WATFIV was far more usable, because it combined compiling, linking and execution into one step, whereas IBM's compilers had three separate components to run.


PL/C

PL/C PL/C is an instructional dialect of the programming language PL/I, developed at the Department of Computer Science of Cornell University in the early 1970s in an effort headed by Professor Richard W. Conway and graduate student Thomas R. Wilcox. ...
was a computer programming language developed at Cornell University in the early 1970s. While PL/C was a subset of IBM's PL/I language, it was designed with the specific goal of being used for teaching programming. The two researchers and academic teachers who designed PL/C were
Richard W. Conway Richard Walter Conway (born December 12, 1931) is an American industrial engineer and computer scientist who is the Emerson Electric Company Professor of Manufacturing Management, Emeritus in the Johnson Graduate School of Management at Cornell ...
and Thomas R. Wilcox. They submitted the famous article "Design and implementation of a diagnostic compiler for PL/I" published in the Communications of ACM in March 1973.CACM March 1973 pp 169–179. PL/C eliminated some of the more complex features of PL/I, and added extensive debugging and error recovery facilities. The PL/C compiler had the unusual capability of never failing to compile any program, through the use of extensive automatic correction of many syntax errors and by converting any remaining syntax errors to output statements.


Just-in-time compilation

Just-in-time (JIT) compilation is the generation of executable code
on-the-fly On the fly is a phrase used to describe something that is being changed while the process that the change affects is ongoing. It is used in the automotive, computer, and culinary industries. In cars, on the fly can be used to describe the changing ...
or as close as possible to its actual execution, to take advantage of runtime metrics or other performance-enhancing options.


Intermediate representation

Most modern compilers have a lexer and parser that produce an intermediate representation of the program. The intermediate representation is a simple sequence of operations which can be used by an optimizer and a code generator which produces instructions in the
machine language In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
of the target processor. Because the code generator uses an intermediate representation, the same code generator can be used for many different high-level languages. There are many possibilities for the intermediate representation.
Three-address code In computer science, three-address code (often abbreviated to TAC or 3AC) is an intermediate code used by optimizing compilers to aid in the implementation of code-improving transformations. Each TAC instruction has at most three operands and is ty ...
, also known as a ''quadruple'' or ''quad'' is a common form, where there is an operator, two operands, and a result. Two-address code or ''triples'' have a stack to which results are written, in contrast to the explicit variables of three-address code. Static Single Assignment (SSA) was developed by
Ron Cytron Ron is a shortening of the name Ronald. Ron or RON may also refer to: Arts and media * Big Ron (''EastEnders''), a TV character * Ron (''King of Fighters''), a video game character *Ron Douglas, the protagonist in '' Lucky Stiff'' played by Joe ...
,
Jeanne Ferrante Jeanne Ferrante is a computer scientist active in the field of compiler technology, where she has made important contributions regarding optimization and parallelization. Jeanne Ferrante is Professor of Computer Science and Engineering at Univ ...
, Barry K. Rosen,
Mark N. Wegman Mark N. Wegman is an American computer scientist known for his contributions to algorithms and compiler optimization. Wegman received his B.A. from New York University and his Ph.D. from the University of California, Berkeley. He joined IBM Resea ...
, and
F. Kenneth Zadeck F is the sixth letter of the Latin alphabet. F may also refer to: Science and technology Mathematics * F or f, the number 15 in hexadecimal and higher positional systems * ''p'F'q'', the hypergeometric function * F-distribution, a cont ...
, researchers at IBM in the 1980s. In SSA, a variable is given a value only once. A new variable is created rather than modifying an existing one. SSA simplifies optimization and code generation.


Code generation

A code generator generates
machine language In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
instructions for the target processor.


Register allocation

Sethi–Ullman algorithm In computer science, the Sethi–Ullman algorithm is an algorithm named after Ravi Sethi and Jeffrey D. Ullman, its inventors, for translating abstract syntax trees into machine code that uses as few registers as possible. Overview When generati ...
or Sethi–Ullman numbering is a method to minimise the number of registers needed to hold variables.


Notable compilers

*
Amsterdam Compiler Kit The Amsterdam Compiler Kit (ACK) is a retargetable compiler suite and toolchain written by Andrew Tanenbaum and Ceriel Jacobs, since 2005 maintained by David Given. It has frontends for the following programming languages: C, Pascal, Modula- ...
by
Andrew Tanenbaum Andrew Stuart Tanenbaum (born March 16, 1944), sometimes referred to by the handle ast, is an American-Dutch computer scientist and professor emeritus of computer science at the Vrije Universiteit Amsterdam in the Netherlands. He is the author ...
and Ceriel Jacobs * Berkeley Pasca

written by
Ken Thompson Kenneth Lane Thompson (born February 4, 1943) is an American pioneer of computer science. Thompson worked at Bell Labs for most of his career where he designed and implemented the original Unix operating system. He also invented the B programmi ...
in 1975.
Bill Joy William Nelson Joy (born November 8, 1954) is an American computer engineer and venture capitalist. He co-founded Sun Microsystems in 1982 along with Scott McNealy, Vinod Khosla, and Andy Bechtolsheim, and served as Chief Scientist and CTO at ...
and others at University of California, Berkeley added improvements *
GNU Compiler Collection The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free softwar ...
, formerly the GNU C Compiler. Originally authored by
Richard Stallman Richard Matthew Stallman (; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
in 1987, GCC is a major modern compiler which is used to compile many
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, n ...
projects, notably
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
. *
LLVM LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate repre ...
, formerly known as the ''Low Level Virtual Machine'' * Small-C by Ron Cain and James E Hendrix *
Turbo Pascal Turbo Pascal is a software development system that includes a compiler and an integrated development environment (IDE) for the Pascal programming language running on CP/M, CP/M-86, and DOS. It was originally developed by Anders Hejlsberg at ...
, created by
Anders Hejlsberg Anders Hejlsberg (, born 2 December 1960) is a Danish software engineer who co-designed several programming languages and development tools. He was the original author of Turbo Pascal and the chief architect of Delphi. He currently works for ...
, first released in 1983. *
WATFOR WATFIV, or WATerloo FORTRAN IV, developed at the University of Waterloo, Canada is an implementation of the Fortran computer programming language. It is the successor of WATFOR. WATFIV was used from the late 1960s into the mid-1980s. WATFIV w ...
, created at the
University of Waterloo The University of Waterloo (UWaterloo, UW, or Waterloo) is a public research university with a main campus in Waterloo, Ontario, Canada. The main campus is on of land adjacent to "Uptown" Waterloo and Waterloo Park. The university also operates ...
. One of the first popular educational compilers, although now largely obsolete.


See also

* History of programming languages * Lex (and
Flex lexical analyser Flex (fast lexical analyzer generator) is a free and open-source software alternative to lex. It is a computer program that generates lexical analyzers (also known as "scanners" or "lexers"). It is frequently used as the lex implementation toget ...
), the token parser commonly used in conjunction with yacc (and Bison). * BNF, a
metasyntax In logic and computer science, a metasyntax describes the allowable structure and composition of phrases and sentences of a metalanguage, which is used to describe either a natural language or a computer programming language.Sellink, Alex, and Chr ...
used to express
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
: that is, a formal way to describe formal languages. *
Self-interpreter In computer science, an interpreter is a computer program that directly execution (computers), executes instructions written in a Programming language, programming or scripting language, without requiring them previously to have been Compiler, co ...
, an interpreter written in a language it can interpret.


References


Further reading

* Backus, John, et al.
"The FORTRAN Automatic Coding System"
Proceedings of the Western Joint Computer Conference, Los Angeles, California, February 1957. Describes the design and implementation of the first FORTRAN compiler by the IBM team. * Knuth, D. E., ''RUNCIBLE-algebraic translation on a limited computer'', Communications of the ACM, Vol. 2, p. 18, (Nov. 1959). * Irons, Edgar T., ''A syntax directed compiler for ALGOL 60'', Communications of the ACM, Vol. 4, p. 51. (Jan. 1961) * * Conway, Melvin E., ''Design of a separable transition-diagram compiler'', Communications of the ACM, Volume 6, Issue 7 (July 1963) * Floyd, R. W., ''Syntactic analysis and operator precedence'', Journal of the ACM, Vol. 10, p. 316. (July 1963). *Cheatham, T. E., and Sattley, K., ''Syntax directed compilation'', SJCC p. 31. (1964). * Randell, Brian; Russell, Lawford John, ''ALGOL 60 Implementation: The Translation and Use of ALGOL 60 Programs on a Computer'', Academic Press, 1964 * * Cocke, John; Schwartz, Jacob T., ''Programming Languages and their Compilers: Preliminary Notes'',
Courant Institute of Mathematical Sciences The Courant Institute of Mathematical Sciences (commonly known as Courant or CIMS) is the mathematics research school of New York University (NYU), and is among the most prestigious mathematics schools and mathematical sciences research cente ...
technical report,
New York University New York University (NYU) is a private research university in New York City. Chartered in 1831 by the New York State Legislature, NYU was founded by a group of New Yorkers led by then- Secretary of the Treasury Albert Gallatin. In 1832, th ...
, 1969. * Bauer, Friedrich L.; Eickel, Jürgen (Eds.), ''Compiler Construction, An Advanced Course'', 2nd ed. Lecture Notes in Computer Science 21, Springer 1976, * Gries, David, ''Compiler Construction for Digital Computers'', New York : Wiley, 1971.


External links


Compiler Construction before 1980
– Annotated literature list by
Dick Grune Dick Grune is a Dutch computer scientist and university lecturer best known for inventing and developing the first version of the Concurrent Versions System (CVS). Grune was involved in the construction of Algol 68 compiler In computing, a co ...
* {{Parsers Compilers History of software History of computer science Parsing algorithms History of computing