W3sDesign Interpreter Design Pattern UML

computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, an interpreter is a

computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...

that directly executes instructions written in a programming or

scripting language In computing, a script is a relatively short and simple set of instructions that typically automation, automate an otherwise manual process. The act of writing a script is called scripting. A scripting language or script language is a programming ...

, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution: # Parse the

source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...

and perform its behavior directly; # Translate source code into some efficient

intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...

object code In computing, object code or object module is the product of an assembler or compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' ...

and immediately execute that; # Explicitly execute stored precompiled

bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...

made by a

compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...

and matched with the interpreter's

virtual machine In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...

. Early versions of

Lisp programming language Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...

and minicomputer and microcomputer BASIC dialects would be examples of the first type.

Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...

, Raku, Python,

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...

, and

Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...

are examples of the second, while

UCSD Pascal UCSD Pascal is a Pascal programming language system that runs on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1977. It was developed at the University of California, San Diego (UC ...

is an example of the third type. Source programs are compiled ahead of time and stored as machine independent code, which is then linked at run-time and executed by an interpreter and/or compiler (for JIT systems). Some systems, such as

Smalltalk Smalltalk is a purely object oriented programming language (OOP) that was originally created in the 1970s for educational use, specifically for constructionist learning, but later found use in business. It was created at Xerox PARC by Learni ...

and contemporary versions of

BASIC Basic or BASIC may refer to: Science and technology * BASIC, a computer programming language * Basic (chemistry), having the properties of a base * Basic access authentication, in HTTP Entertainment * Basic (film), ''Basic'' (film), a 2003 film ...

and

Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...

, may also combine two and three types. Interpreters of various types have also been constructed for many languages traditionally associated with compilation, such as

Algol ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the ...

, Fortran,

Cobol COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural, and, since 2002, object-oriented language. COBOL is primarily ...

, C and C++. While interpretation and compilation are the two main means by which programming languages are implemented, they are not mutually exclusive, as most interpreting systems also perform some translation work, just like compilers. The terms "

interpreted language In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An inter ...

" or "

compiled language Compiled language categorizes a programming language as used with a compiler and generally implies not used with an interpreter. But, since any language can theoretically be compiled or interpreted the term lacks clarity. In practice, for some lan ...

" signify that the canonical implementation of that language is an interpreter or a compiler, respectively. A high-level language is ideally an

abstraction Abstraction is a process where general rules and concepts are derived from the use and classifying of specific examples, literal (reality, real or Abstract and concrete, concrete) signifiers, first principles, or other methods. "An abstraction" ...

independent of particular implementations.

History

Interpreters were used as early as 1952 to ease programming within the limitations of computers at the time (e.g. a shortage of program storage space, or no native support for floating point numbers). Interpreters were also used to translate between low-level machine languages, allowing code to be written for machines that were still under construction and tested on computers that already existed. The first interpreted high-level language was

Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...

. Lisp was first implemented by Steve Russell on an

IBM 704 The IBM 704 is the model name of a large digital computer, digital mainframe computer introduced by IBM in 1954. Designed by John Backus and Gene Amdahl, it was the first mass-produced computer with hardware for floating-point arithmetic. The I ...

computer. Russell had read John McCarthy's paper, "Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I", and realized (to McCarthy's surprise) that the Lisp ''eval'' function could be implemented in machine code. The result was a working Lisp interpreter which could be used to run Lisp programs, or more properly, "evaluate Lisp expressions". The development of editing interpreters was influenced by the need for interactive computing. In the 1960s, the introduction of time-sharing systems allowed multiple users to access a computer simultaneously, and editing interpreters became essential for managing and modifying code in real-time. The first editing interpreters were likely developed for mainframe computers, where they were used to create and modify programs on the fly. One of the earliest examples of an editing interpreter is the EDT (Editor and Debugger for the TECO) system, which was developed in the late 1960s for the PDP-1 computer. EDT allowed users to edit and debug programs using a combination of commands and macros, paving the way for modern text editors and interactive development environments.

General operation

An interpreter usually consists of a set of known commands it can

execute Execution, in capital punishment Capital punishment, also known as the death penalty and formerly called judicial homicide, is the state-sanctioned killing of a person as punishment for actual or supposed misconduct. The sentence (law), s ...

, and a list of these commands in the order a programmer wishes to execute them. Each command (also known as an Instruction) contains the data the programmer wants to mutate, and information on how to mutate the data. For example, an interpreter might read ADD Books, 5 and ''interpret'' it as a request to add five to the Books variable. Interpreters have a wide variety of instructions which are specialized to perform different tasks, but you will commonly find interpreter instructions for basic mathematical operations, branching, and

memory management Memory management (also dynamic memory management, dynamic storage allocation, or dynamic memory allocation) is a form of Resource management (computing), resource management applied to computer memory. The essential requirement of memory manag ...

, making most interpreters

Turing complete Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist. He was highly influential in the development of theoretical comput ...

. Many interpreters are also closely integrated with a garbage collector and

debugger A debugger is a computer program used to test and debug other programs (the "target" programs). Common features of debuggers include the ability to run or halt the target program using breakpoints, step through code line by line, and display ...

Compilers versus interpreters

Programs written in a

high-level language A high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be easier to use, or may automate (or ...

are either directly executed by some kind of interpreter or converted into

machine code In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...

by a compiler (and assembler and

linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...

) for the CPU to execute. While compilers (and assemblers) generally produce machine code directly executable by computer hardware, they can often (optionally) produce an intermediate form called

. This is basically the same machine specific code but augmented with a symbol table with names and tags to make executable blocks (or modules) identifiable and relocatable. Compiled programs will typically use building blocks (functions) kept in a library of such object code modules. A

is used to combine (pre-made) library files with the object file(s) of the application to form a single executable file. The object files that are used to generate an executable file are thus often produced at different times, and sometimes even by different languages (capable of generating the same object format). A simple interpreter written in a low-level language (e.g. assembly) may have similar machine code blocks implementing functions of the high-level language stored, and executed when a function's entry in a look up table points to that code. However, an interpreter written in a high-level language typically uses another approach, such as generating and then walking a

parse tree A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...

, or by generating and executing intermediate software-defined instructions, or both. Thus, both compilers and interpreters generally turn source code (text files) into tokens, both may (or may not) generate a parse tree, and both may generate immediate instructions (for a

stack machine In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a Virtual machine#Process virtual machines, process virtual machine in which the primary interaction is moving short- ...

, quadruple code, or by other means). The basic difference is that a compiler system, including a (built in or separate) linker, generates a stand-alone ''machine code'' program, while an interpreter system instead ''performs'' the actions described by the high-level program. A compiler can thus make almost all the conversions from source code semantics to the machine level once and for all (i.e. until the program has to be changed) while an interpreter has to do ''some'' of this conversion work every time a statement or function is executed. However, in an efficient interpreter, much of the translation work (including analysis of types, and similar) is factored out and done only the first time a program, module, function, or even statement, is run, thus quite akin to how a compiler works. However, a compiled program still runs much faster, under most circumstances, in part because compilers are designed to optimize code, and may be given ample time for this. This is especially true for simpler high-level languages without (many) dynamic data structures, checks, or

type checking In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a ''type'' (for example, integer, floating point, string) to every '' term'' (a word, phrase, or other set of symbols). Usu ...

. In traditional compilation, the executable output of the linkers (.exe files or .dll files or a library, see picture) is typically relocatable when run under a general operating system, much like the object code modules are but with the difference that this relocation is done dynamically at run time, i.e. when the program is loaded for execution. On the other hand, compiled and linked programs for small

embedded systems An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is em ...

are typically statically allocated, often hard coded in a NOR flash memory, as there is often no secondary storage and no operating system in this sense. Historically, most interpreter systems have had a self-contained editor built in. This is becoming more common also for compilers (then often called an IDE), although some programmers prefer to use an editor of their choice and run the compiler, linker and other tools manually. Historically, compilers predate interpreters because hardware at that time could not support both the interpreter and interpreted code and the typical batch environment of the time limited the advantages of interpretation.

Development cycle

During the software development cycle, programmers make frequent changes to source code. When using a compiler, each time a change is made to the source code, they must wait for the compiler to translate the altered source files and link all of the binary code files together before the program can be executed. The larger the program, the longer the wait. By contrast, a programmer using an interpreter does a lot less waiting, as the interpreter usually just needs to translate the code being worked on to an intermediate representation (or not translate it at all), thus requiring much less time before the changes can be tested. Effects are evident upon saving the source code and reloading the program. Compiled code is generally less readily debugged as editing, compiling, and linking are sequential processes that have to be conducted in the proper sequence with a proper set of commands. For this reason, many compilers also have an executive aid, known as a

Makefile In software development, Make is a command-line interface software tool that performs actions ordered by configured Dependence analysis, dependencies as defined in a configuration file called a ''makefile''. It is commonly used for build automati ...

and program. The Makefile lists compiler and linker command lines and program source code files, but might take a simple command line menu input (e.g. "Make 3") which selects the third group (set) of instructions then issues the commands to the compiler, and linker feeding the specified source code files.

Distribution

converts source code into binary instruction for a specific processor's architecture, thus making it less

portable Portable may refer to: General * Portable building, a manufactured structure that is built off site and moved in upon completion of site and utility work * Portable classroom, a temporary building installed on the grounds of a school to provide a ...

. This conversion is made just once, on the developer's environment, and after that the same binary can be distributed to the user's machines where it can be executed without further translation. A cross compiler can generate binary code for the user machine even if it has a different processor than the machine where the code is compiled. An interpreted program can be distributed as source code. It needs to be translated in each final machine, which takes more time but makes the program distribution independent of the machine's architecture. However, the portability of interpreted source code is dependent on the target machine actually having a suitable interpreter. If the interpreter needs to be supplied along with the source, the overall installation process is more complex than delivery of a monolithic executable, since the interpreter itself is part of what needs to be installed. The fact that interpreted code can easily be read and copied by humans can be of concern from the point of view of

copyright A copyright is a type of intellectual property that gives its owner the exclusive legal right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, ...

. However, various systems of

encryption In Cryptography law, cryptography, encryption (more specifically, Code, encoding) is the process of transforming information in a way that, ideally, only authorized parties can decode. This process converts the original representation of the inf ...

and

obfuscation Obfuscation is the obscuring of the intended meaning of communication by making the message difficult to understand, usually with confusing and ambiguous language. The obfuscation might be either unintentional or intentional (although intent ...

exist. Delivery of intermediate code, such as bytecode, has a similar effect to obfuscation, but bytecode could be decoded with a

decompiler A decompiler is a computer program that translates an executable file back into high-level source code. Unlike a compiler, which converts high-level code into machine code, a decompiler performs the reverse process. While disassemblers translate e ...

or disassembler.

Efficiency

The main disadvantage of interpreters is that an interpreted program typically runs more slowly than if it had been compiled. The difference in speeds could be tiny or great; often an order of magnitude and sometimes more. It generally takes longer to run a program under an interpreter than to run the compiled code but it can take less time to interpret it than the total time required to compile and run it. This is especially important when prototyping and testing code when an edit-interpret-debug cycle can often be much shorter than an edit-compile-run-debug cycle. Interpreting code is slower than running the compiled code because the interpreter must analyze each statement in the program each time it is executed and then perform the desired action, whereas the compiled code just performs the action within a fixed context determined by the compilation. This run-time analysis is known as "interpretive overhead". Access to variables is also slower in an interpreter because the mapping of identifiers to storage locations must be done repeatedly at run-time rather than at

compile time In computer science, compile time (or compile-time) describes the time window during which a language's statements are converted into binary instructions for the processor to execute. The term is used as an adjective to describe concepts relat ...

. There are various compromises between the development speed when using an interpreter and the execution speed when using a compiler. Some systems (such as some Lisps) allow interpreted and compiled code to call each other and to share variables. This means that once a routine has been tested and debugged under the interpreter it can be compiled and thus benefit from faster execution while other routines are being developed. Many interpreters do not execute the source code as it stands but convert it into some more compact internal form. Many

interpreters replace keywords with single

byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...

tokens which can be used to find the instruction in a jump table. A few interpreters, such as the PBASIC interpreter, achieve even higher levels of program compaction by using a bit-oriented rather than a byte-oriented program memory structure, where commands tokens occupy perhaps 5 bits, nominally "16-bit" constants are stored in a

variable-length code In coding theory, a variable-length code is a code which maps source symbols to a ''variable'' number of bits. The equivalent concept in computer science is '' bit string''. Variable-length codes can allow sources to be compressed and decompr ...

requiring 3, 6, 10, or 18 bits, and address operands include a "bit offset". Many BASIC interpreters can store and read back their own tokenized internal representation. An interpreter might well use the same lexical analyzer and

parser Parsing, syntax analysis, or syntactic analysis is a process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar by breaking it into parts. The term '' ...

as the compiler and then interpret the resulting

abstract syntax tree An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...

. Example data type definitions for the latter, and a toy interpreter for syntax trees obtained from C expressions are shown in the box.

Regression

Interpretation cannot be used as the sole method of execution: even though an interpreter can itself be interpreted and so on, a directly executed program is needed somewhere at the bottom of the stack because the code being interpreted is not, by definition, the same as the machine code that the CPU can execute.

Variations

Bytecode interpreters

There is a spectrum of possibilities between interpreting and compiling, depending on the amount of analysis performed before the program is executed. For example,

Emacs Lisp Emacs Lisp is a Lisp dialect made for Emacs. It is used for implementing most of the editing functionality built into Emacs, the remainder being written in C, as is the Lisp interpreter. Emacs Lisp code is used to modify, extend and customi ...

is compiled to

, which is a highly compressed and optimized representation of the Lisp source, but is not machine code (and therefore not tied to any particular hardware). This "compiled" code is then interpreted by a bytecode interpreter (itself written in C). The compiled code in this case is machine code for a

, which is implemented not in hardware, but in the bytecode interpreter. Such compiling interpreters are sometimes also called ''compreters''. In a bytecode interpreter each instruction starts with a byte, and therefore bytecode interpreters have up to 256 instructions, although not all may be used. Some bytecodes may take multiple bytes, and may be arbitrarily complicated. Control tables - that do not necessarily ever need to pass through a compiling phase - dictate appropriate algorithmic

control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an '' ...

via customized interpreters in similar fashion to bytecode interpreters.

Threaded code interpreters

Threaded code interpreters are similar to bytecode interpreters but instead of bytes they use pointers. Each "instruction" is a word that points to a function or an instruction sequence, possibly followed by a parameter. The threaded code interpreter either loops fetching instructions and calling the functions they point to, or fetches the first instruction and jumps to it, and every instruction sequence ends with a fetch and jump to the next instruction. Unlike bytecode there is no effective limit on the number of different instructions other than available memory and address space. The classic example of threaded code is the Forth code used in Open Firmware systems: the source language is compiled into "F code" (a bytecode), which is then interpreted by a

Abstract syntax tree interpreters

In the spectrum between interpreting and compiling, another approach is to transform the source code into an optimized abstract syntax tree (AST), then execute the program following this tree structure, or use it to generate native code just-in-time. In this approach, each sentence needs to be parsed just once. As an advantage over bytecode, the AST keeps the global program structure and relations between statements (which is lost in a bytecode representation), and when compressed provides a more compact representation. Thus, using AST has been proposed as a better intermediate format for just-in-time compilers than bytecode. Also, it allows the system to perform better analysis during runtime. However, for interpreters, an AST causes more overhead than a bytecode interpreter, because of nodes related to syntax performing no useful work, of a less sequential representation (requiring traversal of more pointers) and of overhead visiting the tree.

Just-in-time compilation

Further blurring the distinction between interpreters, bytecode interpreters and compilation is just-in-time (JIT) compilation, a technique in which the intermediate representation is compiled to native

at runtime. This confers the efficiency of running native code, at the cost of startup time and increased memory use when the bytecode or AST is first compiled. The earliest published JIT compiler is generally attributed to work on

LISP Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...

by John McCarthy in 1960. Adaptive optimization is a complementary technique in which the interpreter profiles the running program and compiles its most frequently executed parts into native code. The latter technique is a few decades old, appearing in languages such as

in the 1980s. Just-in-time compilation has gained mainstream attention amongst language implementers in recent years, with

, the .NET Framework, most modern

JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...

implementations, and

Matlab MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...

now including JIT compilers.

Template interpreter

Making the distinction between compilers and interpreters yet again even more vague is a special interpreter design known as a template interpreter. Rather than implement the execution of code by virtue of a large switch statement containing every possible bytecode, while operating on a software stack or a tree walk, a template interpreter maintains a large array of bytecode (or any efficient intermediate representation) mapped directly to corresponding native machine instructions that can be executed on the host hardware as key value pairs (or in more efficient designs, direct addresses to the native instructions), known as a "Template". When the particular code segment is executed the interpreter simply loads or jumps to the opcode mapping in the template and directly runs it on the hardware. Due to its design, the template interpreter very strongly resembles a just-in-time compiler rather than a traditional interpreter, however it is technically not a JIT due to the fact that it merely translates code from the language into native calls one opcode at a time rather than creating optimized sequences of CPU executable instructions from the entire code segment. Due to the interpreter's simple design of simply passing calls directly to the hardware rather than implementing them directly, it is much faster than every other type, even bytecode interpreters, and to an extent less prone to bugs, but as a tradeoff is more difficult to maintain due to the interpreter having to support translation to multiple different architectures instead of a platform independent virtual machine/stack. To date, the only template interpreter implementations of widely known languages to exist are the interpreter within Java's official reference implementation, the Sun HotSpot Java Virtual Machine, and the Ignition Interpreter in the Google V8 JavaScript execution engine.

Self-interpreter

A self-interpreter is a

programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...

interpreter written in a programming language which can interpret itself; an example is a

interpreter written in BASIC. Self-interpreters are related to self-hosting compilers. If no

exists for the language to be interpreted, creating a self-interpreter requires the implementation of the language in a host language (which may be another programming language or assembler). By having a first interpreter such as this, the system is bootstrapped and new versions of the interpreter can be developed in the language itself. It was in this way that

Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...

developed the TANGLE interpreter for the language

WEB Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...

of the de-facto standard

TeX Tex, TeX, TEX, may refer to: People and fictional characters * Tex (nickname), a list of people and fictional characters with the nickname * Tex Earnhardt (1930–2020), U.S. businessman * Joe Tex (1933–1982), stage name of American soul singer ...

typesetting system Typesetting is the composition of Written language, text for publication, display, or distribution by means of arranging metal type, physical ''type'' (or ''sort'') in mechanical systems or ''glyphs'' in digital systems representing ''char ...

. Defining a computer language is usually done in relation to an abstract machine (so-called

operational semantics Operational semantics is a category of formal programming language semantics in which certain desired properties of a program, such as correctness, safety or security, are verified by constructing proofs from logical statements about its exec ...

) or as a mathematical function (

denotational semantics In computer science, denotational semantics (initially known as mathematical semantics or Scott–Strachey semantics) is an approach of formalizing the meanings of programming languages by constructing mathematical objects (called ''denotations'' ...

). A language may also be defined by an interpreter in which the semantics of the host language is given. The definition of a language by a self-interpreter is not well-founded (it cannot define a language), but a self-interpreter tells a reader about the expressiveness and elegance of a language. It also enables the interpreter to interpret its source code, the first step towards reflective interpreting. An important design dimension in the implementation of a self-interpreter is whether a feature of the interpreted language is implemented with the same feature in the interpreter's host language. An example is whether a closure in a

-like language is implemented using closures in the interpreter language or implemented "manually" with a data structure explicitly storing the environment. The more features implemented by the same feature in the host language, the less control the programmer of the interpreter has; for example, a different behavior for dealing with number overflows cannot be realized if the arithmetic operations are delegated to corresponding operations in the host language. Some languages such as

and

Prolog Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving, and computational linguistics. Prolog has its roots in first-order logic, a formal logic. Unlike many other programming language ...

have elegant self-interpreters. Much research on self-interpreters (particularly reflective interpreters) has been conducted in the Scheme programming language, a dialect of Lisp. In general, however, any

Turing-complete In computability theory, a system of data-manipulation rules (such as a model of computation, a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be ...

language allows writing of its own interpreter. Lisp is such a language, because Lisp programs are lists of symbols and other lists. XSLT is such a language, because XSLT programs are written in XML. A sub-domain of

metaprogramming Metaprogramming is a computer programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyse, or transform other programs, and even modi ...

is the writing of

domain-specific language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...

s (DSLs). Clive Gifford introduced a measure quality of self-interpreter (the eigenratio), the limit of the ratio between computer time spent running a stack of ''N'' self-interpreters and time spent to run a stack of self-interpreters as ''N'' goes to infinity. This value does not depend on the program being run. The book '' Structure and Interpretation of Computer Programs'' presents examples of meta-circular interpretation for Scheme and its dialects. Other examples of languages with a self-interpreter are Forth and Pascal.

Microcode

Microcode is a very commonly used technique "that imposes an interpreter between the hardware and the architectural level of a computer". As such, the microcode is a layer of hardware-level instructions that implement higher-level

instructions or internal

state machine A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...

sequencing in many

digital processing Digital data, in information theory and information systems, is information represented as a string of discrete symbols, each of which can take on one of only a finite number of values from some alphabet, such as letters or digits. An example ...

elements. Microcode is used in general-purpose

central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...

s, as well as in more specialized processors such as

microcontroller A microcontroller (MC, uC, or μC) or microcontroller unit (MCU) is a small computer on a single integrated circuit. A microcontroller contains one or more CPUs (processor cores) along with memory and programmable input/output peripherals. Pro ...

digital signal processor A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on metal–oxide–semiconductor (MOS) integrated circuit chips. ...

s, channel controllers,

disk controller A disk controller is a controller circuit that enables a CPU to communicate with a hard disk, floppy disk or other kind of disk drive. It also provides an interface between the disk drive and the bus connecting it to the rest of the system.{ ...

network interface controller A network interface controller (NIC, also known as a network interface card, network adapter, LAN adapter and physical network interface) is a computer hardware component that connects a computer to a computer network. Early network interface ...

s, network processors,

graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...

s, and in other hardware. Microcode typically resides in special high-speed memory and translates machine instructions,

data or other input into sequences of detailed circuit-level operations. It separates the machine instructions from the underlying

electronics Electronics is a scientific and engineering discipline that studies and applies the principles of physics to design, create, and operate devices that manipulate electrons and other Electric charge, electrically charged particles. It is a subfield ...

so that instructions can be designed and altered more freely. It also facilitates the building of complex multi-step instructions, while reducing the complexity of computer circuits. Writing microcode is often called microprogramming and the microcode in a particular processor implementation is sometimes called a microprogram. More extensive microcoding allows small and simple

microarchitecture In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as μarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular ...

s to emulate more powerful architectures with wider

word length In computing, a word is any processor design's natural unit of data. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''word size'', ''word wid ...

, more

execution unit In computer engineering, an execution unit (E-unit or EU) is a part of a processing unit that performs the operations and calculations forwarded from the instruction unit. It may have its own internal control sequence unit (not to be confused w ...

s and so on, which is a relatively simple way to achieve software compatibility between different products in a processor family.

Computer processor

Even a non microcoding computer processor itself can be considered to be a parsing immediate execution interpreter that is written in a general purpose hardware description language such as

VHDL VHDL (Very High Speed Integrated Circuit Program, VHSIC Hardware Description Language) is a hardware description language that can model the behavior and structure of Digital electronics, digital systems at multiple levels of abstraction, ran ...

to create a system that parses the machine code instructions and immediately executes them.

Applications

* Interpreters are frequently used to execute

command language A command language is a language for job control in computing. It is a domain-specific and interpreted language; common examples of a command language are shell or batch programming languages. These languages can be used directly at the ...

s, and glue languages since each operator executed in command language is usually an invocation of a complex routine such as an editor or compiler. *

Self-modifying code In computer science, self-modifying code (SMC or SMoC) is source code, code that alters its own instruction (computer science), instructions while it is execution (computing), executing – usually to reduce the instruction path length and imp ...

can easily be implemented in an interpreted language. This relates to the origins of interpretation in Lisp and

artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...

research. *

Virtualization In computing, virtualization (abbreviated v12n) is a series of technologies that allows dividing of physical computing resources into a series of virtual machines, operating systems, processes or containers. Virtualization began in the 1960s wit ...

. Machine code intended for a hardware architecture can be run using a

. This is often used when the intended architecture is unavailable, or among other uses, for running multiple copies. * Sandboxing: While some types of sandboxes rely on operating system protections, an interpreter or virtual machine is often used. The actual hardware architecture and the originally intended hardware architecture may or may not be the same. This may seem pointless, except that sandboxes are not compelled to actually execute all the instructions the source code it is processing. In particular, it can refuse to execute code that violates any

security Security is protection from, or resilience against, potential harm (or other unwanted coercion). Beneficiaries (technically referents) of security may be persons and social groups, objects and institutions, ecosystems, or any other entity or ...

constraints it is operating under. *

Emulator In computing, an emulator is Computer hardware, hardware or software that enables one computer system (called the ''host'') to behave like another computer system (called the ''guest''). An emulator typically enables the host system to run sof ...

s for running computer software written for obsolete and unavailable hardware on more modern equipment.

References

Sources

External links

IBM Card Interpreters
page at Columbia University
Theoretical Foundations For Practical 'Totally Functional Programming'
(Chapter 7 especially) Doctoral dissertation tackling the problem of formalising what is an interpreter
Short animation
explaining the key conceptual difference between interpreters and compilers. Archived a
ghostarchive.org
on May 9, 2022. {{Authority control * Programming language implementation