HOME

TheInfoList



OR:

In
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, a programming language specification (or standard or definition) is a
documentation Documentation is any communicable material that is used to describe, explain or instruct regarding some attributes of an object, system or procedure, such as its parts, assembly, installation, maintenance and use. As a form of knowledge manageme ...
artifact that defines a
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
so that
user Ancient Egyptian roles * User (ancient Egyptian official), an ancient Egyptian nomarch (governor) of the Eighth Dynasty * Useramen, an ancient Egyptian vizier also called "User" Other uses * User (computing), a person (or software) using an ...
s and implementors can agree on what programs in that language mean. Specifications are typically detailed and formal, and primarily used by implementors, with users referring to them in case of ambiguity; the
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
specification is frequently cited by users, for instance, due to the complexity. Related documentation includes a
programming language reference In computing, a programming language reference or language reference manual is part of the documentation associated with most mainstream programming languages. It is written for users and developers, and describes the basic elements of the langu ...
, which is intended expressly for users, and a programming language rationale, which explains why the specification is written as it is; these are typically more informal than a specification.


Standardization

Not all major programming languages have specifications, and languages can exist and be popular for decades without a specification. A language may have one or more implementations, whose behavior acts as a ''de facto'' standard, without this behavior being documented in a specification.
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
(through
Perl 5 Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
) is a notable example of a language without a specification, while PHP was only specified in 2014, after being in use for 20 years. A language may be implemented and then specified, or specified and then implemented, or these may develop together, which is usual practice today. This is because implementations and specifications provide checks on each other: writing a specification requires precisely stating the behavior of an implementation, and implementation checks that a specification is possible, practical, and consistent. Writing a specification before an implementation has largely been avoided since
ALGOL 68 ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously d ...
(1968), due to unexpected difficulties in implementation when implementation is deferred. However, languages are still occasionally implemented and gain popularity without a formal specification: an implementation is essential for use, while a specification is desirable but not essential (informally, "code talks").


Forms

A programming language specification can take several forms, including the following: * An explicit definition of the
syntax In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituenc ...
and
semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
of the language. While syntax is commonly specified using a formal grammar, semantic definitions may be written in
natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
(e.g., the approach taken for the
C language C (''pronounced like the letter c'') is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities ...
), or a formal semantics (e.g., the
Standard ML Standard ML (SML) is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of ...
and
Scheme A scheme is a systematic plan for the implementation of a certain idea. Scheme or schemer may refer to: Arts and entertainment * ''The Scheme'' (TV series), a BBC Scotland documentary series * The Scheme (band), an English pop band * ''The Schem ...
specifications). A notable example is the C language, which gained popularity without a formal specification, instead being described as part of a book, ''
The C Programming Language ''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well as ...
'' (1978), and only much later being formally standardized in
ANSI C ANSI C, ISO C, and Standard C are successive standards for the C programming language published by the American National Standards Institute (ANSI) and ISO/IEC JTC 1/SC 22/WG 14 of the International Organization for Standardization (ISO) and th ...
(1989). * A description of the behavior of a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
(sometimes called "translator") for the language (e.g., the
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
language and Fortran). The syntax and semantics of the language has to be inferred from this description, which may be written in natural or a formal language. * A ''model'' implementation, sometimes written in the language being specified (e.g.,
Prolog Prolog is a logic programming language associated with artificial intelligence and computational linguistics. Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily ...
). The syntax and semantics of the language are explicit in the behavior of the model implementation.


Syntax

The
syntax In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituenc ...
of a programming language represents the definition of acceptable words, i.e. formal parameters and rules upon which to decide whether a given code is valid in respect to the language. On that note, the language syntax usually consists of a combination of the following three construction components: * An alphabet (non-empty, finite set of symbols; usually
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
characters) *
Regular expressions A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" o ...
describing its
lexemes A lexeme () is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms taken ...
(for alphabet-wise tokenisation) * A
Context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
which describes how the lexemes may be combined in order to form a correct program Syntax specification generally supposes a natural language description in order to provide modeste comprehensibility. However, the formal representation of the above outlined components is usually part of the section as it favors the implementation and approval of the language and its concepts.


Semantics

Formulating a rigorous semantics of a large, complex, practical programming language is a daunting task even for experienced specialists, and the resulting specification can be difficult for anyone but experts to understand. The following are some of the ways in which programming language semantics can be described; all languages use at least one of these description methods, and some languages combine more than one *
Natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
: Description by human natural language. * Formal semantics: Description by
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
. *
Reference implementation In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation o ...
s: Description by
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. Computer programs are one component of software, which also includes software documentation, documentation and oth ...
* Test suites: Description by examples of programs and their expected behaviors. While few language specifications start off in this form, the evolution of some language specifications has been influenced by the semantics of a test suite (e.g. in the past the specification of
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, T ...
has been modified to match the behavior of the Ada Conformity Assessment Test Suite).


Natural language

Most widely used languages are specified using natural language descriptions of their semantics. This description usually takes the form of a ''reference manual'' for the language. These manuals can run to hundreds of pages, e.g., the print version of ''The Java Language Specification, 3rd Ed.'' is 596 pages long. The imprecision of natural language as a vehicle for describing programming language semantics can lead to problems with interpreting the specification. For example, the semantics of
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
threads were specified in English, and it was later discovered that the specification did not provide adequate guidance for implementors.William Pugh. The Java Memory Model is Fatally Flawed. ''Concurrency: Practice and Experience'' 12(6):445-455, August 2000


Formal semantics

Formal semantics are grounded in mathematics. As a result, they can be more precise and less ambiguous than semantics given in natural language. However, supplemental natural language descriptions of the semantics are often included to aid understanding of the formal definitions. For example, The ISO Standard for
Modula-2 Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It ...
contains both a formal and a natural language definition on opposing pages. Programming languages whose semantics are described formally can reap many benefits. For example: * Formal semantics enable mathematical proofs of program correctness; * Formal semantics facilitate the design of
type system In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
s, and proofs about the soundness of those type systems; * Formal semantics can establish unambiguous and uniform standards for implementations of a language. Automatic tool support can help to realize some of these benefits. For example, an
automated theorem prover Automated theorem proving (also known as ATP or automated deduction) is a subfield of automated reasoning and mathematical logic dealing with proving mathematical theorems by computer programs. Automated reasoning over mathematical proof was a m ...
or theorem checker can increase a programmer's (or language designer's) confidence in the correctness of proofs about programs (or the language itself). The power and scalability of these tools varies widely: full
formal verification In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal met ...
is computationally intensive, rarely scales beyond programs containing a few hundred lines and may require considerable manual assistance from a programmer; more lightweight tools such as model checkers require fewer resources and have been used on programs containing tens of thousands of lines; many compilers apply static type checks to any program they compile.


Reference implementation

A
reference implementation In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation o ...
is a single implementation of a programming language that is designated as authoritative. The behavior of this implementation is held to define the proper behavior of a program written in the language. This approach has several attractive properties. First, it is precise, and requires no human interpretation: disputes as to the meaning of a program can be settled simply by executing the program on the reference implementation (provided that the implementation behaves deterministically for that program). On the other hand, defining language semantics through a reference implementation also has several potential drawbacks. Chief among them is that it conflates limitations of the reference implementation with properties of the language. For example, if the reference implementation has a bug, then that bug must be considered to be an authoritative behavior. Another drawback is that programs written in this language may rely on quirks in the reference implementation, hindering portability across different implementations. Nevertheless, several languages have successfully used the reference implementation approach. For example, the
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
interpreter is considered to define the authoritative behavior of Perl programs. In the case of Perl, the
open-source model Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
of software distribution has contributed to the fact that nobody has ever produced another implementation of the language, so the issues involved in using a reference implementation to define the language semantics are moot.


Test suite

Defining the semantics of a programming language in terms of a test suite involves writing a number of example programs in the language, and then describing how those programs ought to behave — perhaps by writing down their correct outputs. The programs, plus their outputs, are called the "test suite" of the language. Any correct language implementation must then produce exactly the correct outputs on the test suite programs. The chief advantage of this approach to semantic description is that it is easy to determine whether a language implementation passes a test suite. The user can simply execute all the programs in the test suite, and compare the outputs to the desired outputs. However, when used by itself, the test suite approach has major drawbacks as well. For example, users want to run their own programs, which are not part of the test suite; indeed, a language implementation that could ''only'' run the programs in its test suite would be largely useless. But a test suite does not, by itself, describe how the language implementation should behave on any program not in the test suite; determining that behavior requires some extrapolation on the implementor's part, and different implementors may disagree. In addition, it is difficult to use a test suite to test behavior that is intended or allowed to be nondeterministic. Therefore, in common practice, test suites are used only in combination with one of the other language specification techniques, such as a natural language description or a reference implementation.


See also

*
Programming language reference In computing, a programming language reference or language reference manual is part of the documentation associated with most mainstream programming languages. It is written for users and developers, and describes the basic elements of the langu ...


External links


Language specifications

A few examples of official or draft language specifications: *Specifications written primarily in formal mathematics:
The Definition of Standard ML, revised edition
- a formal definition in an
operational semantics Operational semantics is a category of formal programming language semantics in which certain desired properties of a program, such as correctness, safety or security, are verified by constructing proofs from logical statements about its execut ...
style.
Scheme R5RS
- a formal definition in a
denotational semantics In computer science, denotational semantics (initially known as mathematical semantics or Scott–Strachey semantics) is an approach of formalizing the meanings of programming languages by constructing mathematical objects (called ''denotations' ...
style *Specifications written primarily in natural language:
Algol 60 report

Ada 95 reference manual

Java language specification


*Specifications via test suite:
Ruby's de facto community-driven specification


Notes

{{Reflist
Specification A specification often refers to a set of documented requirements to be satisfied by a material, design, product, or service. A specification is often a type of technical standard. There are different types of technical or engineering specificati ...