Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL), is the
intermediate language
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
binary instruction set defined within the
Common Language Infrastructure
The Common Language Infrastructure (CLI) is an open specification and technical standard originally developed by Microsoft and standardized by ISO/ IEC (ISO/IEC 23271) and Ecma International (ECMA 335) that describes executable code and a ...
(CLI) specification. CIL instructions are executed by a CLI-compatible runtime environment such as the
Common Language Runtime
The Common Language Runtime (CLR), the virtual machine component of Microsoft .NET Framework, manages the execution of .NET programs. Just-in-time compilation converts the managed code (compiled intermediate language code) into machine instruc ...
. Languages which target the CLI compile to CIL. CIL is
object-oriented
Object-oriented programming (OOP) is a programming paradigm based on the concept of " objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of ...
,
stack-based
Stack-oriented programming, is a programming paradigm which relies on a stack machine model for passing parameters. Stack-oriented languages operate on one or more stacks, each of which may serve a different purpose. Programming constructs in ...
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
. Runtimes typically
just-in-time compile CIL instructions into
native code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
.
CIL was originally known as Microsoft Intermediate Language (MSIL) during the beta releases of the .NET languages. Due to standardization of
C# and the CLI, the bytecode is now officially known as CIL.
Windows Defender
Microsoft Defender Antivirus (formerly Windows Defender) is an anti-malware component of Microsoft Windows. It was first released as a downloadable free anti-spyware program for Windows XP and was shipped with Windows Vista and Windows 7. It h ...
virus definitions continue to refer to binaries compiled with it as MSIL.
General information
During compilation of
CLI programming languages, the
source code
In computing, source code, or simply code, is any collection of code, with or without comment (computer programming), comments, written using a human-readable programming language, usually as plain text. The source code of a Computer program, p ...
is translated into CIL code rather than into platform- or processor-specific
object code
In computing, object code or object module is the product of a compiler.
In a general sense object code is a sequence of statements or instructions in a computer language, usually a machine code language (i.e., binary) or an intermediate lang ...
. CIL is a
CPU
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
- and platform-independent instruction set that can be executed in any environment supporting the Common Language Infrastructure, such as the
.NET runtime on
Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
, or the
cross-platform
In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software ...
Mono
Mono may refer to:
Common meanings
* Infectious mononucleosis, "the kissing disease"
* Monaural, monophonic sound reproduction, often shortened to mono
* Mono-, a numerical prefix representing anything single
Music Performers
* Mono (Japanese ...
runtime. In theory, this eliminates the need to distribute different executable files for different platforms and CPU types. CIL code is verified for safety during runtime, providing better security and reliability than natively compiled executable files.
The execution process looks like this:
#Source code is converted to CIL
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
and a
CLI assembly is created.
#Upon execution of a CIL assembly, its code is passed through the runtime's
JIT compiler
In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may co ...
to generate native code. Ahead-of-time compilation may also be used, which eliminates this step, but at the cost of executable-file portability.
#The computer's processor executes the native code.
Instructions
CIL bytecode has
instructions for the following groups of tasks:
*Load and store
*
Arithmetic
Arithmetic () is an elementary part of mathematics that consists of the study of the properties of the traditional operations on numbers—addition, subtraction, multiplication, division, exponentiation, and extraction of roots. In the 19th c ...
*
Type conversion
In computer science, type conversion, type casting, type coercion, and type juggling are different ways of changing an expression from one data type to another. An example would be the conversion of an integer value into a floating point valu ...
*
Object creation and manipulation
*
Operand stack management (push / pop)
*
Control transfer (branching)
*
Method invocation and return
*
Throwing exceptions
*
Monitor-based concurrency
*Data and function pointers manipulation needed for C++/CLI and unsafe C# code
Computational model
The Common Intermediate Language is object-oriented and
stack-based
Stack-oriented programming, is a programming paradigm which relies on a stack machine model for passing parameters. Stack-oriented languages operate on one or more stacks, each of which may serve a different purpose. Programming constructs in ...
, which means that instruction parameters and results are kept on a single stack instead of in several registers or other memory locations, as in most
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming l ...
s.
Code that adds two numbers in
x86 assembly language
x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for ...
, where eax and edx specify two different
general-purpose registers
A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
:
add eax, edx
Code in an
intermediate language
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
(IL), where 0 is eax and 1 is edx:
ldloc.0 // push local variable 0 onto stack
ldloc.1 // push local variable 1 onto stack
add // pop and add the top two stack items then push the result onto the stack
stloc.0 // pop and store the top stack item to local variable 0
In the latter example, the values of the two registers, eax and edx, are first pushed on the stack. When the add-instruction is called the operands are "popped", or retrieved, and the result is "pushed", or stored, on the stack. The resulting value is then popped from the stack and stored in eax.
Object-oriented concepts
CIL is designed to be object-oriented. You may create objects, call methods, and use other types of members, such as fields.
Every
method
Method ( grc, μέθοδος, methodos) literally means a pursuit of knowledge, investigation, mode of prosecuting such inquiry, or system. In recent centuries it more often means a prescribed process for completing a task. It may refer to:
*Scien ...
needs (with some exceptions) to reside in a class. So does this static method:
.class public Foo
The method Add does not require any instance of Foo to be declared because it is declared as static, and it may then be used like this in C#:
int r = Foo.Add(2, 3); // 5
In CIL it would look like this:
ldc.i4.2
ldc.i4.3
call int32 Foo::Add(int32, int32)
stloc.0
Instance classes
An instance class contains at least one
constructor
Constructor may refer to:
Science and technology
* Constructor (object-oriented programming), object-organizing method
* Constructors (Formula One), person or group who builds the chassis of a car in auto racing, especially Formula One
* Construc ...
and some
instance
Instantiation or instance may refer to:
Philosophy
* A modern concept similar to ''participation'' in classical Platonism; see the Theory of Forms
* The instantiation principle, the idea that in order for a property to exist, it must be had by ...
members. The following class has a set of methods representing actions of a Car-object.
.class public Car
Creating objects
In C# class instances are created like this:
Car myCar = new Car(1, 4);
Car yourCar = new Car(1, 3);
And those statements are roughly the same as these instructions in CIL:
ldc.i4.1
ldc.i4.4
newobj instance void Car::.ctor(int, int)
stloc.0 // myCar = new Car(1, 4);
ldc.i4.1
ldc.i4.3
newobj instance void Car::.ctor(int, int)
stloc.1 // yourCar = new Car(1, 3);
Invoking instance methods
Instance methods are invoked in C# as the one that follows:
myCar.Move(3);
As invoked in CIL:
ldloc.0 // Load the object "myCar" on the stack
ldc.i4.3
call instance void Car::Move(int32)
Metadata
The
Common Language Infrastructure
The Common Language Infrastructure (CLI) is an open specification and technical standard originally developed by Microsoft and standardized by ISO/ IEC (ISO/IEC 23271) and Ecma International (ECMA 335) that describes executable code and a ...
(CLI) records information about compiled classes as
metadata. Like the type library in the
Component Object Model, this enables applications to support and discover the interfaces, classes, types, methods, and fields in the assembly. The process of reading such metadata is called "
reflection".
Metadata can be data in the form of "attributes". Attributes can be customized by extending the
Attribute
class. This is a powerful feature. It allows the creator of the class the ability to adorn it with extra information that consumers of the class can use in various meaningful ways, depending on the application domain.
Example
Below is a basic
Hello, World program written in CIL assembler. It will display the string "Hello, world!".
.assembly Hello
.assembly extern mscorlib
.method static void Main()
The following code is more complex in number of opcodes.
''This code can also be compared with the corresponding code in the article about
Java bytecode
In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming languages, ...
.''
static void Main(string[] args)
In CIL assembler syntax it looks like this:
.method private hidebysig static void Main(string[] args) cil managed
This is just a representation of how CIL looks near the
virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized har ...
(VM) level. When compiled the methods are stored in tables and the instructions are stored as bytes inside the assembly, which is a
Portable Executable
The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems. The PE format is a data structure that encapsulates the information necessary f ...
(PE).
Generation
A CIL assembly and instructions are generated by either a compiler or a utility called the ''IL Assembler'' (
ILAsm) that is shipped with the execution environment.
Assembled CIL can also be disassembled into code again using the ''IL Disassembler'' (ILDASM). There are other tools such as
.NET Reflector
.NET Reflector is a class browser, decompiler and static analyzer for software created with .NET Framework, originally written by Lutz Roeder. MSDN Magazine named it as one of the Ten Must-Have utilities for developers, and Scott Hanselman li ...
that can decompile CIL into a high-level language (e. g. C# or
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to:
* Visual Basic .NET (now simply referred to as "Visual Basic"), the current version of Visual Basic launched in 2002 which runs on .NET
* Visual Basic (c ...
). This makes CIL a very easy target for reverse engineering. This trait is shared with
Java bytecode
In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming languages, ...
. However, there are tools that can
obfuscate
Obfuscation is the obscuring of the intended meaning of communication by making the message difficult to understand, usually with confusing and ambiguous language. The obfuscation might be either unintentional or intentional (although intent us ...
the code, and do it so that the code cannot be easily readable but still be runnable.
Execution
Just-in-time compilation
Just-in-time compilation
In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compiler, compilation during execution of a program (at run time (program lifecycle phase), run tim ...
(JIT) involves turning the byte-code into code immediately executable by the CPU. The conversion is performed gradually during the program's execution. JIT compilation provides environment-specific optimization, runtime
type safety
In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that i ...
, and assembly verification. To accomplish this, the JIT compiler examines the assembly metadata for any illegal accesses and handles violations appropriately.
Ahead-of-time compilation
CLI-compatible execution environments also come with the option to do an
Ahead-of-time compilation
In computer science, ahead-of-time compilation (AOT compilation) is the act of compiling an (often) higher-level programming language into an (often) lower-level language before execution of a program, usually at build-time, to reduce the amount ...
(AOT) of an assembly to make it execute faster by removing the JIT process at runtime.
In the
.NET Framework
The .NET Framework (pronounced as "''dot net"'') is a proprietary software framework developed by Microsoft that runs primarily on Microsoft Windows. It was the predominant implementation of the Common Language Infrastructure (CLI) until bein ...
there is a special tool called the
Native Image Generator
The Native Image Generator, or simply NGen, is the ahead-of-time compilation (AOT) service of the .NET Framework. It allows a CLI assembly to be pre-compiled instead of letting the Common Language Runtime (CLR) do a just-in-time compilation (JIT ...
(NGEN) that performs the AOT. A different approach for AOT is
CoreRT
The domain name net is a generic top-level domain (gTLD) used in the Domain Name System of the Internet. The name is derived from the word ''network'', indicating it was originally intended for organizations involved in networking technologies ...
that allows the compilation of .Net Core code to a single executable with no dependency on a runtime. In
Mono
Mono may refer to:
Common meanings
* Infectious mononucleosis, "the kissing disease"
* Monaural, monophonic sound reproduction, often shortened to mono
* Mono-, a numerical prefix representing anything single
Music Performers
* Mono (Japanese ...
there is also an option to do an AOT.
Pointer instructions - C++/CLI
A notable difference from Java's bytecode is that CIL comes with ldind, stind, ldloca, and many call instructions which are enough for data/function pointers manipulation needed to compile C/C++ code into CIL.
class A ;
void test_pointer_operations(int param)
The corresponding code in CIL can be rendered as this:
.method assembly static void modopt( scorlibystem.Runtime.CompilerServices.CallConvCdecl)
test_pointer_operations(int32 param) cil managed
// end of method 'Global Functions'::test_pointer_operations
See also
*
LLVM
LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate repre ...
*
List of CIL instructions
This is a list of the instructions in the instruction set of the Common Intermediate Language bytecode.
* '' Opcode'' abbreviated from operation code is the portion of a machine language instruction that specifies the operation to be performed.
...
*
List of CLI languages
CLI languages are computer programming languages that are used to produce libraries and programs that conform to the Common Language Infrastructure (CLI) specifications. With some notable exceptions, most CLI languages compile entirely to the Com ...
References
Further reading
*
External links
Common Language Infrastructure (Standard ECMA-335)“ECMA C# and Common Language Infrastructure Standards” on the Visual Studio website*
Hello world program in CILSpeed: NGen Revs Up Your Performance With Powerful New Features -- MSDN Magazine, April 2005
{{.NET
Assembly languages
Common Language Infrastructure
Bytecodes