A context-sensitive grammar (CSG) is a
formal grammar in which the left-hand sides and right-hand sides of any
production rules may be surrounded by a context of
terminal
Terminal may refer to:
Computing Hardware
* Terminal (electronics), a device for joining electrical circuits together
* Terminal (telecommunication), a device communicating over a line
* Computer terminal, a set of primary input and output devi ...
and
nonterminal symbols. Context-sensitive grammars are more general than
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
s, in the sense that there are languages that can be described by CSG but not by context-free grammars. Context-sensitive grammars are less general (in the same sense) than
unrestricted grammars. Thus, CSG are positioned between context-free and unrestricted grammars in the
Chomsky hierarchy.
A
formal language
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.
The alphabet of a formal language consists of sym ...
that can be described by a context-sensitive grammar, or, equivalently, by a
noncontracting grammar In formal language theory, a grammar is noncontracting (or monotonic) if all of its production rules are of the form
α → β where α and β are strings of nonterminal and terminal symbols, and
the length of α is less than or equal t ...
or a
linear bounded automaton, is called a
context-sensitive language. Some textbooks actually define CSGs as non-contracting,
although this is not how
Noam Chomsky
Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky i ...
defined them in 1959.
This choice of definition makes no difference in terms of the languages generated (i.e. the two definitions are
weakly equivalent), but it does make a difference in terms of what grammars are structurally considered context-sensitive; the latter issue was analyzed by Chomsky in 1963.
Chomsky introduced context-sensitive grammars as a way to describe the syntax of
natural language where it is often the case that a word may or may not be appropriate in a certain place depending on the context.
Walter Savitch has criticized the terminology "context-sensitive" as misleading and proposed "non-erasing" as better explaining the distinction between a CSG and an
unrestricted grammar.
Although it is well known that certain features of languages (e.g.
cross-serial dependency) are not context-free, it is an open question how much of CSG's expressive power is needed to capture the context sensitivity found in natural languages. Subsequent research in this area has focused on the more computationally tractable
mildly context-sensitive languages. The syntaxes of some
visual programming languages can be described by context-sensitive
graph grammars.
Formal definition
A
formal grammar ''G'' = (''N'', Σ, ''P'', ''S''), with ''N'' a set of nonterminal symbols, Σ a set of terminal symbols, ''P'' a set of production rules, and ''S'' the
start symbol, is context-sensitive if all rules in ''P'' are of the form
: α''A''β → αγβ
with ''A'' ∈ ''N'',
[i.e., ''A'' a single nonterminal] α,β ∈ (''N''∪Σ)
* [i.e., α and β strings of nonterminals and terminals] and γ ∈ (''N''∪Σ)
+.
[i.e., γ is a nonempty string of nonterminals and terminals]
A string ''u'' ∈ (''N''∪Σ)
* directly yields, or directly derives to, a string ''v'' ∈ (''N''∪Σ)
*, denoted as ''u'' ⇒ ''v'', if ''u'' can be written as ''l''α''A''β''r'', and ''v'' can be written as ''l''αγβ''r'', for some production rule (α''A''β→αγβ) ∈ ''P'', and some context strings ''l'', ''r'' ∈ (''N''∪Σ)
*.
More generally, ''u'' is said to yield, or derive to, ''v'', denoted as ''u'' ⇒
* ''v'', if ''u'' = ''u''
1 ⇒ ... ⇒ ''u''
''n'' = ''v'' for some ''n''≥0 and some strings ''u''
2, ..., ''u''
''n''-1 (''N''∪Σ)
*. That is, the relation (⇒
*) is the
reflexive transitive closure of the relation (⇒).
The language of the grammar ''G'' is the set of all terminal symbol strings derivable from its start symbol, formally: ''L''(''G'') = .
Derivations that do not end in a string composed of terminal symbols only are possible, but don't contribute to ''L''(''G'').
The only difference between this definition of Chomsky and that of
unrestricted grammars is that γ can be empty in the unrestricted case.
Some definitions of a context-sensitive grammar only require that for any production rule of the form u → v, the length of u shall be less than or equal to the length of v. This seemingly weaker requirement is in fact
weakly equivalent, see
Noncontracting grammar#Transforming into context-sensitive grammar.
In addition, a rule of the form
: ''S'' → λ
where λ represents the
empty string
In formal language theory, the empty string, or empty word, is the unique string of length zero.
Formal theory
Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special c ...
and ''S'' does not appear on the right-hand side of any rule is permitted. The addition of the empty string allows the statement that the context sensitive languages are a proper superset of the context-free languages, rather than having to make the weaker statement that all context-free grammars with no →λ productions are also context sensitive grammars.
The name ''context-sensitive'' is explained by the α and β that form the context of ''A'' and determine whether ''A'' can be replaced with γ or not. This is different from a
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
where the context of a nonterminal is not taken into consideration. Indeed, every production of a context-free grammar is of the form ''V'' → ''w'' where ''V'' is a ''single'' nonterminal symbol, and ''w'' is a string of terminals and/or nonterminals; ''w'' can be empty.
If the possibility of adding the empty string to a language is added to the strings recognized by the noncontracting grammars (which can never include the empty string) then the languages in these two definitions are identical.
The left-context- and right-context-sensitive grammars are defined by restricting the rules to just the form α''A'' → αγ and to just ''A''β → γβ, respectively. The languages generated by these grammars are also the full class of context-sensitive languages.
[ also at https://www.encyclopediaofmath.org/index.php/Grammar,_context-sensitive] The equivalence was established by
Penttonen normal form In formal language theory, a context-sensitive grammar is in Kuroda normal form if all production rules are of the form:
:''AB'' → ''CD'' or
:''A'' → ''BC'' or
:''A'' → ''B'' or
:''A'' → ''a''
where A, B, C and D are nonterminal symbols and ...
.
[ citing ]
Examples
''a''''n''''b''''n''''c''''n''
The following context-sensitive grammar, with start symbol ''S'', generates the canonical non-
context-free language :
Rules 1 and 2 allow for blowing-up ''S'' to ''a''
''n''''BC''(''BC'')
''n''-1; rules 3 to 6 allow for successively exchanging each ''CB'' to ''BC'' (
four rules are needed for that since a rule ''CB'' → ''BC'' wouldn't fit into the scheme α''A''β → αγβ); rules 7–10 allow replacing a non-terminals ''B'' and ''C'' with its corresponding terminals ''b'' and ''c'' respectively, provided it is in the right place.
A generation chain for ' is:
: ''S''
: →
2
: →
2
: →
1
: →
3
: →
4
: →
5
: →
6
: →
3
: →
4
: →
5
: →
6
: →
3
: →
4
: →
5
: →
6
: →
7
: →
8
: →
8
: →
9
: →
10
: →
10
:
''a''''n''''b''''n''''c''''n''''d''''n'', etc.
More complicated grammars
can be used to parse , and other languages with even more letters. Here we show a simpler approach using non-contracting grammars:
Start with a kernel of regular productions generating the sentential forms
and then include the non contracting productions
,
,
,
,
,
,
,
,
,
.
''a''''m''''b''''n''''c''''m''''d''''n''
A non contracting grammar (for which there is an equivalent
) for the language
is defined by
and
,
,
,
,
,
,
.
With these definitions, a derivation for
is:
.
''a''2i
A noncontracting grammar for the language is constructed in Example 9.5 (p. 224) of (Hopcroft, Ullman, 1979):
#