C-- (pronounced ''C minus minus'') is a
C-like
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
, designed to be generated mainly by
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
s for
high-level language
A high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be easier to use, or may automate (or ...
s rather than written by human programmers. It was created by
functional programming
In computer science, functional programming is a programming paradigm where programs are constructed by Function application, applying and Function composition (computer science), composing Function (computer science), functions. It is a declarat ...
researchers
Simon Peyton Jones and Norman Ramsey. Unlike many other intermediate languages, it is represented in plain
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
text, not
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
or another
binary format.
There are two main branches:
* C--, the original branch, with the final version 2.0 released in May 2005
* Cmm, the fork actively used as the
intermediate representation
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
(IR) in the
Glasgow Haskell Compiler (GHC)
Design
C-- is a "portable
assembly language
In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...
", designed to ease the implementation of compilers that produce high-quality
machine code
In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...
. This is done by delegating low-level
code-generation and
program optimization
In computer science, program optimization, code optimization, or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be op ...
to a C-- compiler. The language's
syntax
In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
borrows heavily from C while omitting or changing standard C features such as
variadic function
In mathematics and in computer programming, a variadic function is a function of indefinite arity, i.e., one which accepts a variable number of arguments. Support for variadic functions differs widely among programming languages.
The term ''var ...
s,
pointer syntax, and aspects of C's type system, because they hamper essential features of C-- and ease of code-generation.
The name of the language is an in-joke, indicating that C-- is a reduced form of C, in the same way that "
C++" was chosen to connote an improved version of C. (In C,
--
and
++
mean "decrement" and "increment", respectively.)
Work on C-- began in the late 1990s. Since writing a custom
code generator is a challenge in itself, and the compiler
backends available to researchers at that time were complex and poorly documented, several projects had written compilers which generated C code (for instance, the original
Modula-3
Modula-3 is a programming language conceived as a successor to an upgraded version of Modula-2 known as Modula-2+. It has been influential in research circles (influencing the designs of languages such as Java, C#, Python and Nim), but it ha ...
compiler). However, C is a poor choice for functional languages: it does not guarantee
tail-call optimization, or support accurate
garbage collection or efficient
exception handling
In computing and computer programming, exception handling is the process of responding to the occurrence of ''exceptions'' – anomalous or exceptional conditions requiring special processing – during the execution of a program. In general, an ...
. C-- is a tightly-defined simpler alternative to C which supports all of these. Its most innovative feature is a run-time interface which allows writing of portable garbage collectors, exception handling systems and other run-time features which work with any C-- compiler.
The first version of C-- was released in April 1998 as a MSRA paper,
[ accompanied by a January 1999 paper on garbage collection.] A revised manual was posted in HTML form in May 1999. Two sets of major changes proposed in 2000 by Norman Ramsey ("Proposed Changes") and Christian Lindig ("A New Grammar") led to C-- version 2, which was finalized around 2004 and officially released in 2005.
Type system
The C-- type system
In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a ''type'' (for example, integer, floating point, string) to every '' term'' (a word, phrase, or other set of symbols). Usu ...
is designed to reflect constraints imposed by hardware rather than conventions imposed by higher-level languages. A value stored in a register or memory may have only one type: bit-vector. However, bit-vector is a polymorphic type which comes in several widths, e.g. , , or . A separate 32-or-64 bit family of floating-point types is supported. In addition to the bit-vector type, C-- provides a boolean type , which can be computed by expressions and used for control flow but cannot be stored in a register or memory. As in an assembly language, any higher type discipline, such as distinctions between signed, unsigned, float, and pointer, is imposed by the C-- operators or other syntactic constructs. C-- is not type-checked, nor does it enforce or check the calling convention.[
C-- version 2 removes the distinction between bit-vector and floating-point types. These types can be annotated with a string "kind" tag to distinguish, among other things, a variable's integer vs float typing and its storage behavior (global or local). The former is useful on targets that have separate registers for integer and floating-point values. Special types for pointers and the native word were introduced, although they are mapped to a bit-vector with a target-dependent length.]
Example code
The following C-- code calculates the sum and product of integers 1 through n
(n is received as an argument).
It demonstrates two language features:
* Procedures can return multiple results.
* Tail recursion is explicitly requested with the "jump" keyword.
/* Tail recursion */
export sp;
sp( bits32 n )
sp_help( bits32 n, bits32 s, bits32 p )
Implementations
The specification page of C-- lists a few implementations of C--. The "most actively developed" compiler, Quick C--, was abandoned in 2013.
Haskell
Some developers of C--, including Simon Peyton Jones, João Dias, and Norman Ramsey, work or have worked on GHC, whose development has led to extensions in the C-- language, forming the ''Cmm'' dialect which uses the C preprocessor
The C preprocessor (CPP) is a text file processor that is used with C, C++ and other programming tools. The preprocessor provides for file inclusion (often header files), macro expansion, conditional compilation, and line control. Although ...
for ergonomics.
GHC backends are responsible for further transforming C-- into executable code, via LLVM
LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
IR, slow C, or directly through the built-in native backend. Despite the original intention, GHC does perform many of its generic optimizations on C--. As with other compiler IRs, the C-- representation can be dumped for debugging. Target-specific optimizations are performed later by the backend.
Processing systems
As of 2018, most processing systems are not maintained, nor is their source code released.
*Quick C-- is a compiler developed by The Quick C-- Team. It compiles version 2 of C-- code to Intel x86 Linux machine code. Compilation to machine code for other platforms is available as an experimental feature. Previously, Quick C-- was developed in parallel with the evolution of the C-- language specification, but the project was archived in 2019 on GitHub and development has ceased, though the source code is available there.
*cmmc is a C-- compiler implemented in the ML programming language by Fermin Reig. It generates machine code for Alpha, Sparc, and x86 architectures.
*Trampoline C-- Compiler is a C-- to C transpiler developed by Sergei Egorov in May 1999. It translates C-- code into C code, allowing it to be compiled using standard C compilers.
*The Oregon Graduate Institute's C-- compiler (OGI C-- Compiler) is the earliest prototype C-- compiler, developed in 1997 using the ML programming language. Maintenance of the OGI C-- Compiler was discontinued once development of Quick C-- began.
See also
* BCPL
BCPL ("Basic Combined Programming Language") is a procedural, imperative, and structured programming language. Originally intended for writing compilers for other languages, BCPL is no longer in common use. However, its influence is still f ...
* LLVM
LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
References
{{Reflist, refs=
[{{Cite journal, last1=Nordin, first1=Thomas, last2=Jones, first2=Simon Peyton, last3=Iglesias, first3=Pablo Nogueira, last4=Oliva, first4=Dino, date=1998-04-23, title=The C– Language Reference Manual, url=https://www.microsoft.com/en-us/research/publication/the-c-language-reference-manual/, journal=Microsoft, language=en-US]
[{{Cite journal, last1=Reig, first1=Fermin, last2=Ramsey, first2=Norman, last3=Jones, first3=Simon Peyton, date=1999-01-01, title=C–: a portable assembly language that supports garbage collection, pages=1–28 , url=https://www.microsoft.com/en-us/research/publication/portable-assembly-language-supports-garbage-collection/, journal=Microsoft, language=en-US]
[{{cite web , last1=Ramsey, first1=Norman, last2=Jones, first2=Simon Peyton, title=The C-- Language Specification, Version 2.0 , url=https://www.cs.tufts.edu/~nr/c--/extern/man2.pdf , accessdate=11 December 2019]
[GHC Commentary: What the hell is a .cmm file?](_blank)
/ref>
[{{cite web, title=An improved LLVM backend, date=April 2019 , url=https://ghc.haskell.org/trac/ghc/wiki/ImprovedLLVMBackend]
/ref>
[Debugging compilers with optimization fuel](_blank)
/ref>
[{{cite web , title=C-- Downloads , url=https://www.cs.tufts.edu/~nr/c--/code.html , website=www.cs.tufts.edu , accessdate=11 December 2019]
[{{cite web , url=https://www.cs.tufts.edu/~nr/c--/extern/manual.html , last1=Nordin, first1=Thomas, last2=Jones, first2=Simon Peyton, last3=Iglesias, first3=Pablo Nogueira, last4=Oliva, first4=Dino, date=1999-05-23, title=The C– Language Reference Manual]
External links
Archive of old official website
(cminusminus.org)
Quick C-- code archive
(the reference implementation)
C programming language family
Compilers