Self-hosting (compilers)
   HOME

TheInfoList



OR:

In
computer programming Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
, self-hosting is the use of a program as part of the
toolchain A toolchain is a set of software development tools used to build and otherwise develop software. Often, the tools are executed sequentially and form a pipeline such that the output of one tool is the input for the next. Sometimes the term is us ...
or
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
that produces new versions of that same program—for example, a
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
that can compile its own
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
. Self-hosting
software Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications. The history of software is closely tied to the development of digital comput ...
is commonplace on
personal computer A personal computer, commonly referred to as PC or computer, is a computer designed for individual use. It is typically used for tasks such as Word processor, word processing, web browser, internet browsing, email, multimedia playback, and PC ...
s and larger systems. Other programs that are typically self-hosting include kernels,
assemblers Assembler may refer to: Arts and media * Nobukazu Takemura, avant-garde electronic musician, stage name Assembler * Assemblers, a fictional race in the ''Star Wars'' universe * Assemblers, an alternative name of the superhero group Champions of ...
,
command-line interpreter A command-line interface (CLI) is a means of interacting with software via commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user-friendly alternativ ...
s and revision control software.


Operating systems

An operating system is self-hosted when the toolchain to build the operating system runs on that same operating system. For example, Windows can be built on a computer running Windows. Before a system can become self-hosted, another system is needed to develop it until it reaches a stage where self-hosting is possible. When developing for a new computer or operating system, a system to run the development software is needed, but development software used to write and build the operating system is also necessary. This is called a
bootstrapping In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Many analytical techniques are often called bootstrap methods in reference to their self-starting or self-supporting ...
problem or, more generically, a chicken or the egg dilemma. A solution to this problem is the cross compiler (or cross assembler when working with assembly language). A cross compiler allows
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
on one platform to be compiled for a different machine or operating system, making it possible to create an operating system for a machine for which a self-hosting compiler does not yet exist. Once written, software can be deployed to the target system using means such as an EPROM,
floppy disk A floppy disk or floppy diskette (casually referred to as a floppy, a diskette, or a disk) is a type of disk storage composed of a thin and flexible disk of a magnetic storage medium in a square or nearly square plastic enclosure lined with a ...
ette,
flash memory Flash memory is an Integrated circuit, electronic Non-volatile memory, non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for t ...
(such as a USB thumb drive), or JTAG device. This is similar to the method used to write software for gaming consoles or for handheld devices like cellular phones or tablets, which do not host their own development tools. Once the system is mature enough to compile its own code, the cross-development dependency ends. At this point, an operating system is said to be self-hosted.


Compilers

Software development using compiler or interpreters can also be self hosted when the compiler is capable of compiling itself. Since self-hosted compilers suffer from the same bootstrap problems as operating systems, a compiler for a new programming language needs to be written in an existing language. So the developer may use something like assembly language, C/ C++, or even a scripting language like Python or Lua to build the first version of the compiler. Once the language is mature enough, development of the compiler can shift to the compiler's native language, allowing the compiler to build itself.


History

The first self-hosting compiler (excluding assemblers) was written for
Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...
by Hart and Levin at MIT in 1962. They wrote a Lisp compiler in Lisp, testing it inside an existing Lisp Interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting. This technique is usually only practicable when an interpreter already exists for the very same language that is to be compiled; though possible, it is extremely uncommon to humanly compile a compiler with itself. The concept borrows directly from and is an example of the broader notion of running a program on itself as input, used also in various proofs in
theoretical computer science Theoretical computer science is a subfield of computer science and mathematics that focuses on the Abstraction, abstract and mathematical foundations of computation. It is difficult to circumscribe the theoretical areas precisely. The Associati ...
, such as the proof that the
halting problem In computability theory (computer science), computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run for ...
is undecidable.


Examples

Ken Thompson Kenneth Lane Thompson (born February 4, 1943) is an American pioneer of computer science. Thompson worked at Bell Labs for most of his career where he designed and implemented the original Unix operating system. He also invented the B (programmi ...
started development on
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
in 1968 by writing and compiling programs on the
GE-635 The GE-600 series is a family of 36-bit mainframe computers originating in the 1960s, built by General Electric (GE). When GE left the mainframe business, the line was sold to Honeywell, which built similar systems into the 1990s as the divisio ...
and carrying them over to the
PDP-7 The PDP-7 is an 18-bit computing, 18-bit minicomputer produced by Digital Equipment Corporation as part of the Programmed Data Processor, PDP series. Introduced in 1964, shipped since 1965, it was the first to use their Flip-Chip module, Flip- ...
for testing. After the initial Unix kernel, a command interpreter, an editor, an assembler, and a few utilities were completed, the Unix operating system was self-hosting â€“ programs could be written and tested on the PDP-7 itself.Dennis M. Ritchie
"The Development of the C Language"
1993.
Douglas McIlroy Malcolm Douglas McIlroy (born 1932) is an American mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College. McIlroy is best known for having originally proposed Unix pipelines and de ...
wrote TMG (a
compiler-compiler In computer science, a compiler-compiler or compiler generator is a programming tool that creates a Parsing#Computer_languages, parser, interpreter (computer software), interpreter, or compiler from some form of formal description of a programm ...
) in TMG on a piece of paper and "decided to give his piece of paper to his piece of paper", doing the computation himself, thus compiling a TMG compiler into assembly, which he typed up and assembled on Ken Thompson's PDP-7. Development of the GNU system relies largely on GCC (the GNU Compiler Collection) and GNU
Emacs Emacs (), originally named EMACS (an acronym for "Editor Macros"), is a family of text editors that are characterized by their extensibility. The manual for the most widely used variant, GNU Emacs, describes it as "the extensible, customizable, s ...
(a popular editor), making possible the self contained, maintained and sustained development of
free software Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...
for the
GNU Project The GNU Project ( ) is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and Computer hardware, computing dev ...
. Many
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s have self-hosted implementations: compilers that are both in and for the same language. An approach is
bootstrapping In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Many analytical techniques are often called bootstrap methods in reference to their self-starting or self-supporting ...
, where a core version of the language is initially implemented using another high-level language, assembler, or even
machine language In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...
; the resulting compiler is then used to start building successive expanded versions of itself.


List of languages having self-hosting compilers

The following programming languages have self-hosting compilers:


See also

*
Cross-compiler A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a PC but generates code that runs on Android devices is a cross compil ...
* Dogfooding * Futamura projection * Self-interpreter *
Self-reference Self-reference is a concept that involves referring to oneself or one's own attributes, characteristics, or actions. It can occur in language, logic, mathematics, philosophy, and other fields. In natural or formal languages, self-reference ...
* Indirect self-modification


References

{{Reflist Computer programming Self-hosting software