An intermediate representation (IR) is the
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
or code used internally by a
compiler or
virtual machine to represent
source code. An IR is designed to be conducive to further processing, such as
optimization and
translation.
A "good" IR must be ''accurate'' – capable of representing the source code without loss of information
– and ''independent'' of any particular source or target language.
An IR may take one of several forms: an in-memory
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
, or a special
tuple- or
stack
Stack may refer to:
Places
* Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group
* Blue Stack Mountains, in Co. Donegal, Ireland
People
* Stack (surname) (including a list of people ...
-based
code
In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
readable by the program.
In the latter case it is also called an ''intermediate language''.
A canonical example is found in most modern compilers. For example, the
CPython interpreter transforms the linear human-readable text representing a program into an intermediate
graph structure that allows
flow analysis
Flow may refer to:
Science and technology
* Fluid flow, the motion of a gas or liquid
* Flow (geomorphology), a type of mass wasting or slope movement in geomorphology
* Flow (mathematics), a group action of the real numbers on a set
* Flow (psy ...
and re-arrangement before execution. Use of an intermediate representation such as this allows compiler systems like the
GNU Compiler Collection and
LLVM to be used by many different source languages to
generate code for many different target
architectures.
Intermediate language
An intermediate language is the language of an
abstract machine designed to aid in the analysis of
computer programs. The term comes from their use in
compilers, where the source code of a program is translated into a form more suitable for code-improving transformations before being used to generate
object
Object may refer to:
General meanings
* Object (philosophy), a thing, being, or concept
** Object (abstract), an object which does not exist at any particular time or place
** Physical object, an identifiable collection of matter
* Goal, an ai ...
or
machine
A machine is a physical system using Power (physics), power to apply Force, forces and control Motion, movement to perform an action. The term is commonly applied to artificial devices, such as those employing engines or motors, but also to na ...
code for a target machine. The design of an intermediate language typically differs from that of a practical
machine language in three fundamental ways:
* Each instruction represents exactly one fundamental operation; ''e.g.'' "shift-add"
addressing modes common in
microprocessors are not present.
*
Control flow information may not be included in the instruction set.
* The number of
processor register
A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
s available may be large, even limitless.
A popular format for intermediate languages is
three-address code.
The term is also used to refer to languages used as intermediates by some
high-level programming languages which do not output object or machine code themselves, but output the intermediate language only. This intermediate language is submitted to a compiler for such language, which then outputs finished object or machine code. This is usually done to ease the process of
optimization or to increase
portability by using an intermediate language that has compilers for many
processors and
operating systems
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating systems schedule tasks for efficient use of the system and may also inc ...
, such as
C. Languages used for this fall in complexity between high-level languages and
low-level
High-level and low-level, as technical terms, are used to classify, describe and point to specific goals of a systematic operation; and are applied in a wide range of contexts, such as, for instance, in domains as widely varied as computer scienc ...
languages, such as
assembly language
In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
s.
Languages
Though not explicitly designed as an intermediate language,
C's nature as an abstraction of
assembly
Assembly may refer to:
Organisations and meetings
* Deliberative assembly, a gathering of members who use parliamentary procedure for making decisions
* General assembly, an official meeting of the members of an organization or of their representa ...
and its ubiquity as the de facto
system language in
Unix-like and other operating systems has made it a popular intermediate language:
Eiffel
Eiffel may refer to:
Places
* Eiffel Peak, a summit in Alberta, Canada
* Champ de Mars – Tour Eiffel station, Paris, France; a transit station
Structures
* Eiffel Tower, in Paris, France, designed by Gustave Eiffel
* Eiffel Bridge, Ungheni, M ...
,
Sather
Sather is an object-oriented programming language. It originated circa 1990 at the International Computer Science Institute (ICSI) at the University of California, Berkeley, developed by an international team led by Steve Omohundro. It supports ...
,
Esterel, some
dialects of
Lisp
A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech.
Types
* A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lisping ...
(
Lush
Lush may refer to:
People
Music
* Lush (band), a British rock band
* ''Lush'' (Mitski album), a 2012 album by Mitski
* ''Lush'' (Snail Mail album), a 2018 album by Snail Mail
* "Lush", a single by Skepta featuring Jay Sean
* ''Lush 3'', a si ...
,
Gambit),
Haskell (
Glasgow Haskell Compiler),
Squeak's Smalltalk-subset Slang,
Nim
Nim is a mathematical two player game.
Nim or NIM may also refer to:
* Nim (programming language)
* Nim Chimpsky, a signing chimpanzee Acronyms
* Network Installation Manager, an IBM framework
* Nuclear Instrumentation Module
* Negative index met ...
,
Cython,
Seed7
Seed7 is an extensible general-purpose programming language designed by Thomas Mertes. It is syntactically similar to Pascal and Ada. Along with many other features, it provides an extension mechanism. Daniel Zingaro"Modern Extensible Languages" ...
,
SystemTap,
Vala, V, and others make use of C as an intermediate language. Variants of C have been designed to provide C's features as a portable
assembly language
In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
, including
C-- and the
C Intermediate Language
George Ciprian Necula is a Romanian computer scientist, engineer at Google, and former professor at the University of California, Berkeley who does research in the area of programming languages and software engineering, with a particular focus on ...
.
Any language targeting a
virtual machine or
p-code machine can be considered an intermediate language:
*
Java bytecode
* Microsoft's
Common Intermediate Language is an intermediate language designed to be shared by all compilers for the
.NET Framework
The .NET Framework (pronounced as "''dot net"'') is a proprietary software framework developed by Microsoft that runs primarily on Microsoft Windows. It was the predominant implementation of the Common Language Infrastructure (CLI) until bein ...
, before static or dynamic compilation to machine code.
* While most intermediate languages are designed to support statically typed languages, the
Parrot intermediate representation is designed to support dynamically typed languages—initially Perl and Python.
*
TIMI
The Thrombolysis In Myocardial Infarction, or TIMI Study Group, is an Academic Research Organization (ARO) affiliated with Brigham and Women's Hospital and Harvard Medical School dedicated to advancing the knowledge and care of patients with car ...
is used by compilers on the
IBM i
IBM i (the ''i'' standing for ''integrated'') is an operating system developed by IBM for IBM Power Systems. It was originally released in 1988 as OS/400, as the sole operating system of the IBM AS/400 line of systems. It was renamed to i5/OS in ...
platform.
*
O-code for
BCPL
BCPL ("Basic Combined Programming Language") is a procedural, imperative, and structured programming language. Originally intended for writing compilers for other languages, BCPL is no longer in common use. However, its influence is still ...
*
MATLAB precompiled code
*
Microsoft P-Code
In computer programming, a p-code machine (portable code machine) is a virtual machine designed to execute ''p-code'' (the assembly language or machine code of a hypothetical central processing unit (CPU)). This term is applied both generically t ...
*
Pascal
Pascal, Pascal's or PASCAL may refer to:
People and fictional characters
* Pascal (given name), including a list of people with the name
* Pascal (surname), including a list of people and fictional characters with the name
** Blaise Pascal, Fren ...
p-code
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
The
GNU Compiler Collection (GCC) uses several intermediate languages internally to simplify portability and
cross-compilation
A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a PC but generates code that runs on an Android smartphone is a cross ...
. Among these languages are
* the historical
Register Transfer Language (RTL)
* the tree language
GENERIC
Generic or generics may refer to:
In business
* Generic term, a common name used for a range or class of similar things not protected by trademark
* Generic brand, a brand for a product that does not have an associated brand or trademark, other ...
* the
SSA-based GIMPLE. (Lower-level than GENERIC; input for most optimizers; has a compact "bytecode" notation.)
GCC supports generating these IRs, as a final target:
*
HSA Intermediate Layer Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA ...
*
LLVM Intermediate Representation
LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
(converted from GIMPLE in the now-defunct llvm-gcc which uses LLVM optimizers and codegen)
The
LLVM compiler framework is based on the
LLVM IR
LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
intermediate language, of which the compact, binary serialized representation is also referred to as "bitcode" and has been productized by Apple.
Like GIMPLE Bytecode, LLVM Bitcode is useful in link-time optimization. Like GCC, LLVM also targets some IRs meant for direct distribution, including Google's
PNaCl IR and
SPIR. A further development within LLVM is the use of ''Multi-Level Intermediate Representation'' (MLIR) with the potential to generate code for different heterogeneous targets, and to combine the outputs of different compilers.
The ILOC intermediate language is used in classes on compiler design as a simple target language.
"CISC 471 Compiler Design"
by Uli Kremer
Other
Static analysis tools often use an intermediate representation. For instance, radare2 is a toolbox for binary files analysis and reverse-engineering. It uses the intermediate languages ESIL and REIL to analyze binary files.
See also
* Interlingual machine translation
Interlingual machine translation is one of the classic approaches to machine translation. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representa ...
* Pivot language
A pivot language, sometimes also called a bridge language, is an artificial or natural language used as an intermediary language for translation between many different languages – to translate between any pair of languages A and B, one translates ...
* Abstract syntax tree
* Bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
(Intermediate code)
* Symbol table
* Source-to-source compiler
* Graph rewriting and term rewriting
* UNCOL
References
External links
The Stanford SUIF Group
{{DEFAULTSORT:Intermediate language
Compiler construction
Programming language classification