Wirth syntax notation (WSN) is a
metasyntax
In logic and computer science, a metasyntax describes the allowable structure and composition of phrases and sentences of a metalanguage, which is used to describe either a natural language or a computer programming language.Sellink, Alex, and Ch ...
, that is, a formal way to describe
formal language
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.
The alphabet of a formal language consists of s ...
s. Originally proposed by
Niklaus Wirth
Niklaus Emil Wirth (born 15 February 1934) is a Swiss computer scientist. He has designed several programming languages, including Pascal, and pioneered several classic topics in software engineering. In 1984, he won the Turing Award, generally ...
in 1977 as an alternative to
Backus–Naur form
In computer science, Backus–Naur form () or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document format ...
(BNF). It has several advantages over BNF in that it contains an explicit iteration construct, and it avoids the use of an explicit symbol for the empty string (such as
or ε).
WSN has been used in several
international standards
international standard is a technical standard developed by one or more international standards organizations. International standards are available for consideration and use worldwide. The most prominent such organization is the International Or ...
, starting with
ISO 10303-21
STEP-File is a widely used data exchange form of STEP. ISO 10303 can represent 3D objects in Computer-aided design (CAD) and related information. Due to its ASCII structure, a STEP-file is easy to read, with typically one instance per line. T ...
.
It was also used to define the syntax of
EXPRESS
Express or EXPRESS may refer to:
Arts, entertainment, and media Films
* '' Express: Aisle to Glory'', a 1998 comedy short film featuring Kal Penn
* '' The Express: The Ernie Davis Story'', a 2008 film starring Dennis Quaid
Music
* ''Express'' ...
, the
data modelling
Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques.
Overview
Data modeling is a process used to define and analyze data requirements needed to su ...
language of
STEP.
WSN defined in itself
SYNTAX = .
PRODUCTION = IDENTIFIER "=" EXPRESSION "." .
EXPRESSION = TERM .
TERM = FACTOR .
FACTOR = IDENTIFIER
, LITERAL
, " EXPRESSION "
, "(" EXPRESSION ")"
, "" .
IDENTIFIER = letter .
LITERAL = """" character """" .
The equals sign indicates a production. The element on the left is defined to be the combination of elements on the right. A production is terminated by a full stop (period).
*Repetition is denoted by curly brackets, ''e.g.,''
stands for ε , a , aa , aaa , ....
*Optionality is expressed by square brackets, ''e.g.,''
stands for ab , b.
*Parentheses serve for groupings, ''e.g.,'' (a, b)c stands for ac , bc.
We take these concepts for granted today, but they
were novel and even controversial in 1977. Wirth later incorporated some
of the concepts (with a different syntax and notation) into
extended Backus–Naur form
In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programmi ...
.
Notice that
letter
and
character
are left undefined. This is because numeric characters (digits 0 through 9) may be included in both definitions or excluded from one, depending on the language being defined, ''e.g.'':
digit = "0" , "1" , "2" , "3" , "4" , "5" , "6" , "7" , "8" , "9" .
upper-case = "A" , "B" , … , "Y" , "Z" .
lower-case = "a" , "b" , … , "y" , "z" .
letter = upper-case , lower-case .
If
character
goes on to include
digit
and other printable
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
characters, then it diverges even more from
letter
, which one can assume does not include the digit characters or any of the special (non-
alphanumeric
Alphanumericals or alphanumeric characters are a combination of alphabetical and numerical characters. More specifically, they are the collection of Latin letters and Arabic digits. An alphanumeric code is an identifier made of alphanumeric ...
) characters.
Another example
The syntax of BNF can be represented with WSN as follows, based on translating
the BNF example of itself:
syntax = rule syntax
In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constit ...
.
rule = opt-whitespace "<" rule-name ">" opt-whitespace "::="
opt-whitespace expression line-end .
opt-whitespace = .
expression = list " expression .
line-end = opt-whitespace EOL , line-end line-end .
list = term opt-whitespace list .
term = literal , "<" rule-name ">" .
literal = """" text """" , "'" text "'" .
This definition appears overcomplicated because the concept of "optional
whitespace" must be explicitly defined in BNF, but it is implicit in WSN. Even in this example,
text
is left undefined, but it is assumed to mean "
ASCII-character
". (
EOL
is also left undefined.) Notice how the
kludge
A kludge or kluge () is a workaround or quick-and-dirty solution that is clumsy, inelegant, inefficient, difficult to extend and hard to maintain. This term is used in diverse fields such as computer science, aerospace engineering, Internet sl ...
"<" rule-name ">"
has been used twice because
text
was not explicitly defined.
One of the problems with BNF which this example illustrates is that by allowing both single-quote and double-quote characters to be used for a
literal
, there is an added potential for human error in attempting to create a machine-readable syntax. One of the concepts migrated to later metasyntaxes was the idea that giving the user multiple choices made it harder to write parsers for grammars defined by the syntax, so computer languages in general have become more restrictive in how a ''quoted-literal'' is defined.
Syntax Diagram
Syntax_diagram
Syntax diagrams (or railroad diagrams) are a way to represent a context-free grammar. They represent a graphical alternative to Backus–Naur form, EBNF, Augmented Backus–Naur form, and other text-based grammars as metalanguages. Early books us ...
References
{{Wirth
Metalanguages