An affix grammar is a two-level
grammar formalism used to describe the
syntax
In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
of languages, mainly
computer language
A computer language is a formal language used to communicate with a computer. Types of computer languages include:
* Software construction#Construction languages, Construction language – all forms of communication by which a human can Comput ...
s, using an approach based on how natural language is typically described.
The formalism was invented in 1962 by
Lambert Meertens while developing a grammar for generating English sentences. Meertens also applied affix grammars to the description and composition of music, and obtained a special prize from the jury at the 1968
International Federation for Information Processing
The International Federation for Information Processing (IFIP) is a global organisation for researchers and professionals working in the field of computing to conduct research, develop standards and promote information sharing.
Established in 19 ...
(IFIP) Congress in
Edinburgh
Edinburgh is the capital city of Scotland and one of its 32 Council areas of Scotland, council areas. The city is located in southeast Scotland and is bounded to the north by the Firth of Forth and to the south by the Pentland Hills. Edinburgh ...
for his computer-generated
string quartet
The term string quartet refers to either a type of musical composition or a group of four people who play them. Many composers from the mid-18th century onwards wrote string quartets. The associated musical ensemble consists of two Violin, violini ...
, Quartet No. 1 in C major for 2 violins, viola and violoncello, based on the first non-
context-free affix grammar.
[.][Quartet No. 1 in C major for 2 violins, viola and violoncello](_blank)
Score and links to mp3 sound files of a performance by the Amsterdam String Quartet (1968). The string quartet was published in 1968, as ''Mathematical Centre Report MR 96''.
[
]
The grammatical rules of an affix grammar are those of a
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
, except that certain parts in the nonterminals (the
affix
In linguistics, an affix is a morpheme that is attached to a word stem to form a new word or word form. The main two categories are Morphological derivation, derivational and inflectional affixes. Derivational affixes, such as ''un-'', ''-ation' ...
es) are used as arguments. If the same affix occurs multiple times in a rule, its value must
agree, i.e. it must be the same everywhere. In some types of affix grammar, more complex relationships between affix values are possible.
Example
We can describe an extremely simple fragment of English in the following manner:
: ''Sentence'' → ''Subject'' ''Predicate''
: ''Subject'' → ''Noun''
: ''Predicate'' → ''Verb'' ''Object''
: ''Object'' → ''Noun''
:
: ''Noun'' → John
: ''Noun'' → Mary
: ''Noun'' → children
: ''Noun'' → parents
:
: ''Verb'' → like
: ''Verb'' → likes
: ''Verb'' → help
: ''Verb'' → helps
This
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
describes simple sentences such as
: John likes children
: Mary helps John
: children help parents
: parents like John
With more nouns and verbs, and more rules to introduce other parts of speech, a large range of English sentences can be described; so this is a promising approach for describing the syntax of English.
However, the given grammar also describes sentences such as
: John like children
: children helps parents
These sentences are wrong: in English, subject and verb have a
grammatical number
In linguistics, grammatical number is a Feature (linguistics), feature of nouns, pronouns, adjectives and verb agreement (linguistics), agreement that expresses count distinctions (such as "one", "two" or "three or more"). English and many other ...
, which must agree.
An affix grammar can express this directly:
: ''Sentence'' → ''Subject'' + ''number Predicate'' + ''number''
: ''Subject'' + ''number'' → ''Noun'' + ''number''
: ''Predicate'' + ''number'' → ''Verb'' + ''number Object''
: ''Object'' → ''Noun'' + ''number''
: ''Noun'' + ''singular'' → John
: ''Noun'' + ''singular'' → Mary
: ''Noun'' + ''plural'' → children
: ''Noun'' + ''plural'' → parents
:
: ''Verb'' + ''singular'' → likes
: ''Verb'' + ''plural'' → like
: ''Verb'' + ''singular'' → helps
: ''Verb'' + ''plural'' → help
This grammar only describes correct English sentences, although it could be argued that
: John likes John
is still incorrect and should instead read
: John likes himself
This, too, can be incorporated using affixes, if the means of describing the relationships between different affix values are powerful enough. As remarked above, these means depend on the type of affix grammar chosen.
Types
In the simplest type of affix grammar, affixes can only take values from a finite domain, and affix values can only be related through agreement, as in the example.
Applied in this way, affixes increase compactness of grammars, but do not add expressive power.
Another approach is to allow affixes to take arbitrary strings as values and allow concatenations of affixes to be used in rules. The ranges of allowable values for affixes can be described with context-free grammar rules. This produces the formalism of
two-level grammars, also known as ''
Van Wijngaarden grammars'' or ''2VW'' grammars. These have been successfully used to describe complicated languages, in particular, the syntax of the
Algol 68
ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language member of the ALGOL family that was conceived as a successor to the ALGOL 60 language, designed with the goal of a much wider scope of application and ...
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
. However, it turns out that, even though affix values can only be manipulated with string concatenation, this formalism is
Turing complete
Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist. He was highly influential in the development of theoretical comput ...
; hence, even the most basic questions about the language described by an arbitrary 2VW grammar are
undecidable in general.
Extended Affix Grammars, developed in the 1980s, are a more restricted version of the same idea. They were mainly applied to describe the grammar of natural language, e.g. English.
Another possibility is to allow the values of affixes to be computed by code written in some programming language. Two basic approaches have been used:
* In
attribute grammar
An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are t ...
s, the affixes (called attributes) can take values from arbitrary domains (e.g. integer or real numbers, complex data structures) and arbitrary functions can be specified, written in a language of choice, to describe how affix values in rules are derived from each other.
* In CDL (the
Compiler Description Language) and its successor
CDL2, developed in the 1970s, fragments of source code (usually in
assembly language
In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...
) can be used in rules instead of normal right-hand sides, allowing primitives for input scanning and affix value computations to be expressed directly. Designed as a basis for practical
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
construction, this approach was used to write compilers, and other software, e.g. a
text editor
A text editor is a type of computer program that edits plain text. An example of such program is "notepad" software (e.g. Windows Notepad). Text editors are provided with operating systems and software development packages, and can be used to c ...
.
References
{{reflist
Formal languages
Compiler construction
Syntax
Grammar frameworks