Literate programming (LP) is a
programming paradigm
A programming paradigm is a relatively high-level way to conceptualize and structure the implementation of a computer program. A programming language can be classified as supporting one or more paradigms.
Paradigms are separated along and descri ...
introduced in 1984 by
Donald Knuth
Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...
in which a
computer program
A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
is given as an explanation of how it works in a
natural language
A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
, such as English, interspersed (embedded) with
snippets of
macros and traditional
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
, from which
compilable source code can be generated.
The approach is used in
scientific computing and in
data science
Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge from potentially noisy, stru ...
routinely for
reproducible research and
open access
Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are delivered to readers free of access charges or other barriers. With open access strictly defined (according to the 2001 de ...
purposes. Literate programming tools are used by millions of programmers today.
The literate programming paradigm, as conceived by Donald Knuth, represents a move away from writing computer programs in the manner and order imposed by the
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
, and instead gives
programmer
A programmer, computer programmer or coder is an author of computer source code someone with skill in computer programming.
The professional titles Software development, ''software developer'' and Software engineering, ''software engineer' ...
s macros to develop programs in the order demanded by the logic and flow of their thoughts. Literate programs are written as an exposition of logic in more
natural language
A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
in which
macros are used to hide abstractions and traditional
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
, more like the text of an
essay
An essay ( ) is, generally, a piece of writing that gives the author's own argument, but the definition is vague, overlapping with those of a Letter (message), letter, a term paper, paper, an article (publishing), article, a pamphlet, and a s ...
.
Literate programming tools are used to obtain two representations from a source file: one understandable by a compiler or
interpreter, the "tangled" code, and another for viewing as formatted
documentation
Documentation is any communicable material that is used to describe, explain or instruct regarding some attributes of an object, system or procedure, such as its parts, assembly, installation, maintenance, and use. As a form of knowledge managem ...
, which is said to be "woven" from the literate source.
[If one remembers that the first version of the tool was called WEB, the amusing literary reference hidden by Knuth in these names becomes obvious: "Oh, what a tangled web we weave when first we practise to deceive" — ]Sir Walter Scott
Sir Walter Scott, 1st Baronet (15 August 1771 – 21 September 1832), was a Scottish novelist, poet and historian. Many of his works remain classics of European literature, European and Scottish literature, notably the novels ''Ivanhoe'' (18 ...
, in Canto VI, Stanza 17 of '' Marmion'' (1808) an epic poem about the Battle of Flodden in 1513. – the actual citation appeared as an epigraph in a May 1986 article by Jon Bentley and Donald Knuth in one of the classical "Programming Pearls" columns in ''Communications of the ACM'', vol. 29, no. 5, p. 365. While the first generation of literate programming tools were
computer language
A computer language is a formal language used to communicate with a computer. Types of computer languages include:
* Software construction#Construction languages, Construction language – all forms of communication by which a human can Comput ...
-specific, the later ones are
language-agnostic and exist beyond the individual programming languages.
History and philosophy
Literate programming was first introduced in 1984 by Donald Knuth, who intended it to create programs that were suitable literature for human beings. He implemented it at
Stanford University
Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
as a part of his research on
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s and digital
typography
Typography is the art and technique of Typesetting, arranging type to make written language legibility, legible, readability, readable and beauty, appealing when displayed. The arrangement of type involves selecting typefaces, Point (typogra ...
. The implementation was called "
WEB
Web most often refers to:
* Spider web, a silken structure created by the animal
* World Wide Web or the Web, an Internet-based hypertext system
Web, WEB, or the Web may also refer to:
Computing
* WEB, a literate programming system created by ...
" since he believed that it was one of the few three-letter words of English that had not yet been applied to computing. However, it resembles the complicated nature of software delicately pieced together from simple materials.
The practice of literate programming has seen an important resurgence in the 2010s with the use of
computational notebooks, especially in
data science
Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge from potentially noisy, stru ...
.
Concept
Literate programming is writing out the program logic in a human language with included (separated by a primitive markup) code snippets and macros. Macros in a literate source file are simply title-like or explanatory phrases in a human language that describe human abstractions created while solving the programming problem, and hiding chunks of code or lower-level macros. These macros are similar to the
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s in
pseudocode typically used in teaching
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
. These arbitrary explanatory phrases become precise new operators, created on the fly by the programmer, forming a ''meta-language'' on top of the underlying programming language.
A
preprocessor
In computer science, a preprocessor (or precompiler) is a Computer program, program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which i ...
is used to substitute arbitrary hierarchies, or rather "interconnected 'webs' of macros", to produce the compilable source code with one command ("tangle"), and documentation with another ("weave"). The preprocessor also provides an ability to write out the content of the macros and to add to already created macros in any place in the text of the literate program source file, thereby disposing of the need to keep in mind the restrictions imposed by traditional programming languages or to interrupt the flow of thought.
Advantages
According to Knuth,
literate programming provides higher-quality programs, since it forces programmers to explicitly state the thoughts behind the program, making poorly thought-out design decisions more obvious. Knuth also claims that literate programming provides a first-rate documentation system, which is not an add-on, but is grown naturally in the process of exposition of one's thoughts during a program's creation. The resulting documentation allows the author to restart their own thought processes at any later time, and allows other programmers to understand the construction of the program more easily. This differs from traditional documentation, in which a programmer is presented with source code that follows a compiler-imposed order, and must decipher the thought process behind the program from the code and its associated comments. The meta-language capabilities of literate programming are also claimed to facilitate thinking, giving a higher "bird's eye view" of the code and increasing the number of concepts the mind can successfully retain and process. Applicability of the concept to programming on a large scale, that of commercial-grade programs, is proven by an edition of
TeX
Tex, TeX, TEX, may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Tex Earnhardt (1930–2020), U.S. businessman
* Joe Tex (1933–1982), stage name of American soul singer ...
code as a literate program.
Knuth also claims that literate programming can lead to easy porting of software to multiple environments, and even cites the implementation of TeX as an example.
Contrast with documentation generation
Literate programming is very often misunderstood
to refer only to formatted documentation produced from a common file with both source code and comments – which is properly called
documentation generation – or to voluminous commentaries included with code. This is the converse of literate programming: well-documented code or documentation extracted from code follows the structure of the code, with documentation embedded in the code; while in literate programming, code is embedded in documentation, with the code following the structure of the documentation.
This misconception has led to claims that comment-extraction tools, such as the
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
Plain Old Documentation or
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
Javadoc
Javadoc (also capitalized as JavaDoc or javadoc) is an API documentation generator for the Java programming language. Based on information in Java source code, Javadoc generates documentation formatted as HTML and other formats via extensions. ...
systems, are "literate programming tools". However, because these tools do not implement the "web of abstract concepts" hiding behind the system of natural-language macros, or provide an ability to change the order of the source code from a machine-imposed sequence to one convenient to the human mind, they cannot properly be called literate programming tools in the sense intended by Knuth.
Workflow
Implementing literate programming consists of two steps:
# Weaving: Generating a comprehensive document about the program and its maintenance.
# Tangling: Generating machine executable code
Weaving and tangling are done on the same source so that they are consistent with each other.
Example
A classic example of literate programming is the literate implementation of the standard
Unix
Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
wc
word counting program. Knuth presented a
CWEB version of this example in Chapter 12 of his ''Literate Programming'' book. The same example was later rewritten for the
noweb literate programming tool.
This example provides a good illustration of the basic elements of literate programming.
Creation of macros
The following snippet of the
wc
literate program
shows how arbitrary descriptive phrases in a natural language are used in a literate program to create macros, which act as new "operators" in the literate programming language, and hide chunks of code or other macros. The mark-up notation consists of double angle brackets (
<<...>>
) that indicate macros. The
@
symbol which, in a noweb file, indicates the beginning of a documentation chunk. The
<<*>>
symbol stands for the "root", topmost node the literate programming tool will start expanding the web of macros from. Actually, writing out the expanded source code can be done from any section or subsection (i.e. a piece of code designated as
<<name of the chunk>>=
, with the equal sign), so one literate program file can contain several files with machine source code.
The purpose of wc is to count lines, words, and/or characters in a list of files. The
number of lines in a file is ......../more explanations/
Here, then, is an overview of the file wc.c that is defined by the noweb program wc.nw:
<<*>>=
<>
<>
<>
<>
<>
@
We must include the standard I/O definitions, since we want to send formatted output
to stdout and stderr.
<
The unraveling of the chunks can be done in any place in the literate program text file, not necessarily in the order they are sequenced in the enclosing chunk, but as is demanded by the logic reflected in the explanatory text that envelops the whole program.
Program as a web
Macros are not the same as "section names" in standard documentation. Literate programming macros hide the real code behind themselves, and be used inside any low-level machine language operators, often inside logical operators such as
if
,
while
or
case
. This can be seen in the following
wc
literate program.
The present chunk, which does the counting, was actually one of
the simplest to write. We look at each character and change state if it begins or ends
a word.
<>=
while (1)
@
The macros stand for any chunk of code or other macros, and are more general than top-down or bottom-up "chunking", or than subsectioning. Donald Knuth said that when he realized this, he began to think of a program as a ''web'' of various parts.
Order of human logic, not that of the compiler
In a noweb literate program besides the free order of their exposition, the chunks behind macros, once introduced with
<<...>>=
, can be grown later in any place in the file by simply writing
<<name of the chunk>>=
and adding more content to it, as the following snippet illustrates (
+
is added by the document formatter for readability, and is not in the code).
Record of the train of thought
The documentation for a literate program is produced as part of writing the program. Instead of comments provided as side notes to source code a literate program contains the explanation of concepts on each level, with lower level concepts deferred to their appropriate place, which allows better communication of thought. The snippets of the literate
wc
above show how an explanation of the program and its source code are interwoven. Such exposition of ideas creates the flow of thought that is like a literary work. Knuth wrote a "novel" which explains the code of the
interactive fiction
Interactive fiction (IF) is software simulating environments in which players use text Command (computing), commands to control Player character, characters and influence the environment. Works in this form can be understood as literary narrati ...
game
Colossal Cave Adventure.
Remarkable examples
*
Axiom
An axiom, postulate, or assumption is a statement that is taken to be true, to serve as a premise or starting point for further reasoning and arguments. The word comes from the Ancient Greek word (), meaning 'that which is thought worthy or ...
, which is evolved from scratchpad, a computer algebra system developed by IBM. It is now being developed by Tim Daly, one of the developers of scratchpad, Axiom is totally written as a literate program.
Literate programming practices
The first published literate programming environment was
WEB
Web most often refers to:
* Spider web, a silken structure created by the animal
* World Wide Web or the Web, an Internet-based hypertext system
Web, WEB, or the Web may also refer to:
Computing
* WEB, a literate programming system created by ...
, introduced by Knuth in 1981 for his
TeX
Tex, TeX, TEX, may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Tex Earnhardt (1930–2020), U.S. businessman
* Joe Tex (1933–1982), stage name of American soul singer ...
typesetting system; it uses
Pascal as its underlying programming language and TeX for typesetting of the documentation. The complete commented TeX source code was published in Knuth's ''TeX: The program'', volume B of his 5-volume ''
Computers and Typesetting''. Knuth had privately used a literate programming system called DOC as early as 1979. He was inspired by the ideas of
Pierre-Arnoul de Marneffe.
The free
CWEB, written by Knuth and Silvio Levy, is WEB adapted for
C and
C++, runs on most operating systems, and can produce TeX and
PDF documentation.
There are various other implementations of the literate programming concept as given below. Many of the newer among these do not have macros and hence do not comply with the
order of human logic principle, which makes them perhaps "semi-literate" tools. These, however, allow cellular execution of code which makes them more along the lines of
exploratory programming tools.
Other useful tools include:
See also
*
Documentation generator – the inverse on literate programming where documentation is embedded in and generated from source code
*
Notebook interface – virtual notebook environment used for literate programming
*
Sweave and
Knitr – examples of use of the "noweb"-like Literate Programming tool inside the R language for creation of dynamic statistical reports
*
Self-documenting code – source code that can be easily understood without documentation
References
Further reading
*
*
* (includes software)
*
*
*
*
External links
LiterateProgrammingat
WikiWikiWebLiterate Programming FAQat
CTAN
C mathematical operations are a group of functions in the C standard library, standard library of the C programming language implementing basic mathematical functions. Different C standards provide different, albeit backwards-compatible, sets of ...
{{Donald Knuth navbox
Articles with example code
Computer-related introductions in 1981
Programming paradigms
Articles with example C code