HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, an interpreter is a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer progra ...
that directly executes instructions written in a programming or
scripting language A scripting language or script language is a programming language that is used to manipulate, customize, and automate the facilities of an existing system. Scripting languages are usually interpreted at runtime rather than compiled. A scripti ...
, without requiring them previously to have been
compiled In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily ...
into a
machine language In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
program. An interpreter generally uses one of the following strategies for program execution: #
Parse Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
the
source code In computing, source code, or simply code, is any collection of code, with or without comment (computer programming), comments, written using a human-readable programming language, usually as plain text. The source code of a Computer program, p ...
and perform its behavior directly; #
Translate Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
source code into some efficient
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good ...
or
object code In computing, object code or object module is the product of a compiler. In a general sense object code is a sequence of statements or instructions in a computer language, usually a machine code language (i.e., binary) or an intermediate lang ...
and immediately execute that; # Explicitly execute stored precompiled
bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
made by a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
and matched with the interpreter
Virtual Machine In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized har ...
. Early versions of
Lisp programming language Lisp (historically LISP) is a family of programming languages with a long history and a distinctive, fully parenthesized prefix notation. Originally specified in 1960, Lisp is the second-oldest high-level programming language still in common ...
and minicomputer and microcomputer BASIC dialects would be examples of the first type.
Perl Perl is a family of two High-level programming language, high-level, General-purpose programming language, general-purpose, Interpreter (computing), interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it ...
, Raku, Python,
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...
, and
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum (aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapp ...
are examples of the second, while
UCSD Pascal UCSD Pascal is a Pascal programming language system that runs on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1977. It was developed at the University of California, San Diego (UC ...
is an example of the third type. Source programs are compiled ahead of time and stored as machine independent code, which is then linked at run-time and executed by an interpreter and/or compiler (for JIT systems). Some systems, such as
Smalltalk Smalltalk is an object-oriented, dynamically typed reflective programming language. It was designed and created in part for educational use, specifically for constructionist learning, at the Learning Research Group (LRG) of Xerox PARC by ...
and contemporary versions of
BASIC BASIC (Beginners' All-purpose Symbolic Instruction Code) is a family of general-purpose, high-level programming languages designed for ease of use. The original version was created by John G. Kemeny and Thomas E. Kurtz at Dartmouth College ...
and
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
may also combine two and three. Interpreters of various types have also been constructed for many languages traditionally associated with compilation, such as
Algol ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by th ...
, Fortran,
Cobol COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily ...
, C and C++. While interpretation and compilation are the two main means by which programming languages are implemented, they are not mutually exclusive, as most interpreting systems also perform some translation work, just like compilers. The terms "
interpreted language In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpre ...
" or "
compiled language A compiled language is a programming language whose implementations are typically compilers (translators that generate machine code from source code), and not interpreters (step-by-step executors of source code, where no pre-runtime translation t ...
" signify that the canonical implementation of that language is an interpreter or a compiler, respectively. A
high-level language In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be easier to u ...
is ideally an
abstraction Abstraction in its main sense is a conceptual process wherein general rules and concepts are derived from the usage and classification of specific examples, literal ("real" or " concrete") signifiers, first principles, or other methods. "An a ...
independent of particular implementations.


History

Interpreters were used as early as 1952 to ease programming within the limitations of computers at the time (e.g. a shortage of program storage space, or no native support for floating point numbers). Interpreters were also used to translate between low-level machine languages, allowing code to be written for machines that were still under construction and tested on computers that already existed. The first interpreted high-level language was
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispi ...
. Lisp was first implemented in 1958 by Steve Russell on an
IBM 704 The IBM 704 is a large digital mainframe computer introduced by IBM in 1954. It was the first mass-produced computer with hardware for floating-point arithmetic. The IBM 704 ''Manual of operation'' states: The type 704 Electronic Data-Proc ...
computer. Russell had read John McCarthy's paper, and realized (to McCarthy's surprise) that the Lisp ''eval'' function could be implemented in machine code. The result was a working Lisp interpreter which could be used to run Lisp programs, or more properly, "evaluate Lisp expressions".


General operation

An interpreter usually consists of a set of known
commands Command may refer to: Computing * Command (computing), a statement in a computer language * COMMAND.COM, the default operating system shell and command-line interpreter for DOS * Command key, a modifier key on Apple Macintosh computer keyboards * ...
it can
execute Execute, in capital punishment Capital punishment, also known as the death penalty, is the state-sanctioned practice of deliberately killing a person as a punishment for an actual or supposed crime, usually following an authorized, rule- ...
, and a list of these commands in the order a programmer wishes to execute them. Each command (also known as an Instruction) contains the data the programmer wants to mutate, and information on how to mutate the data. For example, an interpreter might read ADD Wikipedia_Users, 5 and ''interpret'' it as a request to add five to the Wikipedia_Users variable. Interpreters have a wide variety of instructions which are specialized to perform different tasks, but you will commonly find interpreter instructions for basic mathematical operations, branching, and
memory management Memory management is a form of resource management applied to computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and free it for reuse when ...
, making most interpreters
Turing complete Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical co ...
. Many interpreters are also closely integrated with a
garbage collector A waste collector, also known as a garbageman, garbage collector, trashman (in the US), binman or (rarely) dustman (in the UK), is a person employed by a public or private enterprise to collect and dispose of municipal solid waste (refuse) and ...
and
debugger A debugger or debugging tool is a computer program used to test and debug other programs (the "target" program). The main use of a debugger is to run the target program under controlled conditions that permit the programmer to track its execut ...
.


Compilers versus interpreters

Programs written in a
high-level language In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be easier to u ...
are either directly executed by some kind of interpreter or converted into
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
by a compiler (and
assembler Assembler may refer to: Arts and media * Nobukazu Takemura, avant-garde electronic musician, stage name Assembler * Assemblers, a fictional race in the ''Star Wars'' universe * Assemblers, an alternative name of the superhero group Champions of A ...
and
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
) for the
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
to execute. While compilers (and assemblers) generally produce machine code directly executable by computer hardware, they can often (optionally) produce an intermediate form called
object code In computing, object code or object module is the product of a compiler. In a general sense object code is a sequence of statements or instructions in a computer language, usually a machine code language (i.e., binary) or an intermediate lang ...
. This is basically the same machine specific code but augmented with a
symbol table In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier (or symbols), constants, procedures and functions in a program's source code is associated with inf ...
with names and tags to make executable blocks (or modules) identifiable and relocatable. Compiled programs will typically use building blocks (functions) kept in a library of such object code modules. A
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
is used to combine (pre-made) library files with the object file(s) of the application to form a single executable file. The object files that are used to generate an executable file are thus often produced at different times, and sometimes even by different languages (capable of generating the same object format). A simple interpreter written in a low-level language (e.g.
assembly Assembly may refer to: Organisations and meetings * Deliberative assembly, a gathering of members who use parliamentary procedure for making decisions * General assembly, an official meeting of the members of an organization or of their representa ...
) may have similar machine code blocks implementing functions of the high-level language stored, and executed when a function's entry in a look up table points to that code. However, an interpreter written in a high-level language typically uses another approach, such as generating and then walking a
parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in comp ...
, or by generating and executing intermediate software-defined instructions, or both. Thus, both compilers and interpreters generally turn source code (text files) into tokens, both may (or may not) generate a parse tree, and both may generate immediate instructions (for a
stack machine In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down s ...
, quadruple code, or by other means). The basic difference is that a compiler system, including a (built in or separate) linker, generates a stand-alone ''machine code'' program, while an interpreter system instead ''performs'' the actions described by the high-level program. A compiler can thus make almost all the conversions from source code semantics to the machine level once and for all (i.e. until the program has to be changed) while an interpreter has to do ''some'' of this conversion work every time a statement or function is executed. However, in an efficient interpreter, much of the translation work (including analysis of types, and similar) is factored out and done only the first time a program, module, function, or even statement, is run, thus quite akin to how a compiler works. However, a compiled program still runs much faster, under most circumstances, in part because compilers are designed to optimize code, and may be given ample time for this. This is especially true for simpler high-level languages without (many) dynamic data structures, checks, or
type checking In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer program ...
. In traditional compilation, the executable output of the linkers (.exe files or .dll files or a library, see picture) is typically relocatable when run under a general operating system, much like the object code modules are but with the difference that this relocation is done dynamically at run time, i.e. when the program is loaded for execution. On the other hand, compiled and linked programs for small
embedded systems An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
are typically statically allocated, often hard coded in a NOR flash memory, as there is often no secondary storage and no operating system in this sense. Historically, most interpreter systems have had a self-contained editor built in. This is becoming more common also for compilers (then often called an IDE), although some programmers prefer to use an editor of their choice and run the compiler, linker and other tools manually. Historically, compilers predate interpreters because hardware at that time could not support both the interpreter and interpreted code and the typical batch environment of the time limited the advantages of interpretation.


Development cycle

During the software development cycle, programmers make frequent changes to source code. When using a compiler, each time a change is made to the source code, they must wait for the compiler to translate the altered source files and link all of the binary code files together before the program can be executed. The larger the program, the longer the wait. By contrast, a programmer using an interpreter does a lot less waiting, as the interpreter usually just needs to translate the code being worked on to an intermediate representation (or not translate it at all), thus requiring much less time before the changes can be tested. Effects are evident upon saving the source code and reloading the program. Compiled code is generally less readily debugged as editing, compiling, and linking are sequential processes that have to be conducted in the proper sequence with a proper set of commands. For this reason, many compilers also have an executive aid, known as a
Makefile In software development, Make is a build automation tool that automatically builds executable programs and libraries from source code by reading files called ''Makefiles'' which specify how to derive the target program. Though integrated deve ...
and program. The Makefile lists compiler and linker command lines and program source code files, but might take a simple command line menu input (e.g. "Make 3") which selects the third group (set) of instructions then issues the commands to the compiler, and linker feeding the specified source code files.


Distribution

A
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
converts source code into binary instruction for a specific processor's architecture, thus making it less
portable Portable may refer to: General * Portable building, a manufactured structure that is built off site and moved in upon completion of site and utility work * Portable classroom, a temporary building installed on the grounds of a school to provide a ...
. This conversion is made just once, on the developer's environment, and after that the same binary can be distributed to the user's machines where it can be executed without further translation. A
cross compiler A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a PC but generates code that runs on an Android smartphone is a cross co ...
can generate binary code for the user machine even if it has a different processor than the machine where the code is compiled. An interpreted program can be distributed as source code. It needs to be translated in each final machine, which takes more time but makes the program distribution independent of the machine's architecture. However, the portability of interpreted source code is dependent on the target machine actually having a suitable interpreter. If the interpreter needs to be supplied along with the source, the overall installation process is more complex than delivery of a monolithic executable since the interpreter itself is part of what need to be installed. The fact that interpreted code can easily be read and copied by humans can be of concern from the point of view of
copyright A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, education ...
. However, various systems of
encryption In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can dec ...
and
obfuscation Obfuscation is the obscuring of the intended meaning of communication by making the message difficult to understand, usually with confusing and ambiguous language. The obfuscation might be either unintentional or intentional (although intent u ...
exist. Delivery of intermediate code, such as bytecode, has a similar effect to obfuscation, but bytecode could be decoded with a
decompiler A decompiler is a computer program that translates an executable file to a high-level source file which can be recompiled successfully. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level l ...
or
disassembler A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembl ...
.


Efficiency

The main disadvantage of interpreters is that an interpreted program typically runs slower than if it had been
compiled In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily ...
. The difference in speeds could be tiny or great; often an order of magnitude and sometimes more. It generally takes longer to run a program under an interpreter than to run the compiled code but it can take less time to interpret it than the total time required to compile and run it. This is especially important when prototyping and testing code when an edit-interpret-debug cycle can often be much shorter than an edit-compile-run-debug cycle. Interpreting code is slower than running the compiled code because the interpreter must analyze each statement in the program each time it is executed and then perform the desired action, whereas the compiled code just performs the action within a fixed context determined by the compilation. This run-time analysis is known as "interpretive overhead". Access to variables is also slower in an interpreter because the mapping of identifiers to storage locations must be done repeatedly at run-time rather than at
compile time In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concept ...
. There are various compromises between the development speed when using an interpreter and the execution speed when using a compiler. Some systems (such as some Lisps) allow interpreted and compiled code to call each other and to share variables. This means that once a routine has been tested and debugged under the interpreter it can be compiled and thus benefit from faster execution while other routines are being developed. Many interpreters do not execute the source code as it stands but convert it into some more compact internal form. Many
BASIC BASIC (Beginners' All-purpose Symbolic Instruction Code) is a family of general-purpose, high-level programming languages designed for ease of use. The original version was created by John G. Kemeny and Thomas E. Kurtz at Dartmouth College ...
interpreters replace keywords with single
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
tokens which can be used to find the instruction in a jump table. A few interpreters, such as the
PBASIC PBASIC is a microcontroller-based version of BASIC created by Parallax, Inc. in 1992. PBASIC was created to bring ease of use to the microcontroller and embedded processor world. It is used for writing code for the BASIC Stamp microcontrollers. A ...
interpreter, achieve even higher levels of program compaction by using a bit-oriented rather than a byte-oriented program memory structure, where commands tokens occupy perhaps 5 bits, nominally "16-bit" constants are stored in a
variable-length code In coding theory a variable-length code is a code which maps source symbols to a ''variable'' number of bits. Variable-length codes can allow sources to be compressed and decompressed with ''zero'' error (lossless data compression) and still b ...
requiring 3, 6, 10, or 18 bits, and address operands include a "bit offset". Many BASIC interpreters can store and read back their own tokenized internal representation. An interpreter might well use the same lexical analyzer and
parser Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
as the compiler and then interpret the resulting
abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
. Example data type definitions for the latter, and a toy interpreter for syntax trees obtained from C expressions are shown in the box.


Regression

Interpretation cannot be used as the sole method of execution: even though an interpreter can itself be interpreted and so on, a directly executed program is needed somewhere at the bottom of the stack because the code being interpreted is not, by definition, the same as the machine code that the CPU can execute.


Variations


Bytecode interpreters

There is a spectrum of possibilities between interpreting and compiling, depending on the amount of analysis performed before the program is executed. For example,
Emacs Lisp Emacs Lisp is a dialect of the Lisp programming language used as a scripting language by Emacs (a text editor family most commonly associated with GNU Emacs and XEmacs). It is used for implement