In
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a preprocessor (or precompiler) is a
program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
s. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and
macro expansions, while others have the power of full-fledged
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s.
A common example from
computer programming
Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
is the processing performed on
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
before the next step of compilation.
In some
computer language
A computer language is a formal language used to communicate with a computer. Types of computer languages include:
* Software construction#Construction languages, Construction language – all forms of communication by which a human can Comput ...
s (e.g.,
C and
PL/I
PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language initially developed by IBM. It is designed for scientific, engineering, business and system programming. It has b ...
) there is a phase of
translation
Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
known as ''preprocessing''. It can also include macro processing, file inclusion and language extensions.
Lexical preprocessors
Lexical preprocessors are the lowest-level of preprocessors as they only require
lexical analysis
Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful ''lexical tokens'' belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives ...
, that is, they operate on the source text, prior to any
parsing
Parsing, syntax analysis, or syntactic analysis is a process of analyzing a String (computer science), string of Symbol (formal), symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal gramm ...
, by performing simple substitution of
tokenized
Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful ''lexical tokens'' belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives ...
character sequences for other tokenized character sequences, according to user-defined rules. They typically perform
macro substitution,
textual inclusion of other files, and conditional compilation or inclusion.
C preprocessor
The most common example of this is the
C preprocessor, which takes lines beginning with '#' as
directives.
The C preprocessor does not expect its input to use the syntax of the C language.
Some languages take a different approach and use built-in language features to achieve similar things. For example:
* Instead of macros, some languages use aggressive inlining and templates.
* Instead of includes, some languages use compile-time imports that rely on type information in the object code.
* Some languages use
if-then-else
and
dead code elimination to achieve
conditional compilation.
Other lexical preprocessors
Other lexical preprocessors include the general-purpose
m4, most commonly used in cross-platform build systems such as
autoconf, and
GEMA, an open source macro processor which operates on patterns of context.
Syntactic preprocessors
Syntactic preprocessors were introduced with the
Lisp
Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation.
Originally specified in the late 1950s, ...
family of languages. Their role is to transform syntax trees according to a number of user-defined rules. For some programming languages, the rules are written in the same language as the program (compile-time reflection). This is the case with
Lisp
Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation.
Originally specified in the late 1950s, ...
and
OCaml
OCaml ( , formerly Objective Caml) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Comparison of multi-paradigm programming languages, multi-paradigm programming language which extends the ...
. Some other languages rely on a fully external language to define the transformations, such as the
XSLT preprocessor for
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
, or its statically typed counterpart CDuce.
Syntactic preprocessors are typically used to customize the syntax of a language, extend a language by adding new primitives, or embed a
domain-specific programming language (DSL) inside a general purpose language.
Customizing syntax
A good example of syntax customization is the existence of two different syntaxes in the
Objective Caml programming language. Programs may be written indifferently using the "normal syntax" or the "revised syntax", and may be pretty-printed with either syntax on demand.
Similarly, a number of programs written in
OCaml
OCaml ( , formerly Objective Caml) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Comparison of multi-paradigm programming languages, multi-paradigm programming language which extends the ...
customize the syntax of the language by the addition of new operators.
Extending a language
The best examples of language extension through macros are found in the
Lisp
Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation.
Originally specified in the late 1950s, ...
family of languages. While the languages, by themselves, are simple dynamically typed functional cores, the standard distributions of
Scheme or
Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...
permit imperative or object-oriented programming, as well as static typing. Almost all of these features are implemented by syntactic preprocessing, although it bears noting that the "macro expansion" phase of compilation is handled by the compiler in Lisp. This can still be considered a form of preprocessing, since it takes place before other phases of compilation.
Specializing a language
One of the unusual features of the
Lisp
Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation.
Originally specified in the late 1950s, ...
family of languages is the possibility of using macros to create an internal DSL. Typically, in a large
Lisp
Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation.
Originally specified in the late 1950s, ...
-based project, a module may be written in a variety of such
minilanguages, one perhaps using a
SQL-based dialect of
Lisp
Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation.
Originally specified in the late 1950s, ...
, another written in a dialect specialized for
GUIs or pretty-printing, etc.
Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...
's standard library contains an example of this level of syntactic abstraction in the form of the LOOP macro, which implements an Algol-like minilanguage to describe complex iteration, while still enabling the use of standard Lisp operators.
The
MetaOCaml preprocessor/language provides similar features for external DSLs. This preprocessor takes the description of the semantics of a language (i.e. an interpreter) and, by combining compile-time interpretation and code generation, turns that definition into a compiler to the
OCaml
OCaml ( , formerly Objective Caml) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Comparison of multi-paradigm programming languages, multi-paradigm programming language which extends the ...
programming languageāand from that language, either to bytecode or to native code.
General purpose preprocessor
Most preprocessors are specific to a particular data processing task (e.g.,
compiling
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
the C language). A preprocessor may be promoted as being ''general purpose'', meaning that it is not aimed at a specific usage or programming language, and is intended to be used for a wide variety of text processing tasks.
M4 is probably the most well known example of such a general purpose preprocessor, although the C preprocessor is sometimes used in a non-C specific role. Examples:
* using C preprocessor for
JavaScript
JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have ...
preprocessing.
* using C preprocessor for
devicetree processing within the
Linux kernel
The Linux kernel is a Free and open-source software, free and open source Unix-like kernel (operating system), kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and was soon adopted as the k ...
.
* using
M4 (see on-article example) or C preprocessor
[Show how to use C-preprocessor as template engine]
"Using a C preprocessor as an HTML authoring tool"
''by J. Korpela'', 2000. as a
template engine, to
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
generation.
*
imake, a
make interface using the C preprocessor, written for the
X Window System
The X Window System (X11, or simply X) is a windowing system for bitmap displays, common on Unix-like operating systems.
X originated as part of Project Athena at Massachusetts Institute of Technology (MIT) in 1984. The X protocol has been at ...
but now deprecated in favour of
automake.
*
grompp, a preprocessor for simulation input files for
GROMACS (a fast, free, open-source code for some problems in
computational chemistry) which calls the system C preprocessor (or other preprocessor as determined by the simulation input file) to parse the topology, using mostly the #define and #include mechanisms to determine the effective topology at grompp run time.
See also
*
*
*
*
*
*
*
*
*
* The
* The
* The
* The
*
References
External links
DSL Design in Lisp* Th
Generic PreProcessor* Gema, th
General Purpose Macro Processor* The
PIKTbr>
piktc
pyexpander, a python based general purpose macro processor
minimac, a minimalist macro processor
Java Comment Preprocessor
{{Authority control
Programming language implementation