HOME

TheInfoList



OR:

In
formal language theory In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...
, an alphabet is a non-empty set of
symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different co ...
/
glyph A glyph () is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A g ...
s, typically thought of as representing letters, characters, or digits but among other possibilities the "symbols" could also be a set of
phoneme In phonology and linguistics, a phoneme () is a unit of sound that can distinguish one word from another in a particular language. For example, in most dialects of English, with the notable exception of the West Midlands and the north-wes ...
s (sound units). Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any
cardinality In mathematics, the cardinality of a set is a measure of the number of elements of the set. For example, the set A = \ contains 3 elements, and therefore A has a cardinality of 3. Beginning in the late 19th century, this concept was generalized ...
("size") and depending on its purpose maybe be finite (e.g., the alphabet of letters "a" through "z"),
countable In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural numbers ...
(e.g., \), or even uncountable (e.g., \). Strings, also known as "words", over an alphabet are defined as a
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
of the symbols from the alphabet set. For example, the alphabet of lowercase letters "a" through "z" can be used to form English words like "iceberg" while the alphabet of both upper and lower case letters can also be used to form proper names like "Wikipedia". A common alphabet is , the binary alphabet, and a "00101111" is an example of a binary string. Infinite
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
of symbols may be considered as well (see Omega language). It is often necessary for practical purposes to restrict the symbols in an alphabet so that they are unambiguous when interpreted. For instance, if the two-member alphabet is , a string written on paper as "000" is ambiguous because it is unclear if it is a sequence of three "0" symbols, a "00" followed by a "0", or a "0" followed by a "00".


Notation

If ''L'' is a formal language, i.e. a (possibly infinite) set of finite-length strings, the alphabet of ''L'' is the set of all symbols that may occur in any string in ''L''. For example, if ''L'' is the set of all variable identifiers in the programming language C, ''L''’s alphabet is the set . Given an alphabet \Sigma, the set of all strings of length n over the alphabet \Sigma is indicated by \Sigma^n. The set \bigcup_ \Sigma^i of all finite strings (regardless of their length) is indicated by the Kleene star operator as \Sigma^*, and is also called the Kleene closure of \Sigma. The notation \Sigma^\omega indicates the set of all infinite sequences over the alphabet \Sigma, and \Sigma^\infty indicates the set \Sigma^\ast \cup \Sigma^\omega of all finite or infinite sequences. For example, using the binary alphabet , the strings ε, 0, 1, 00, 01, 10, 11, 000, etc. are all in the Kleene closure of the alphabet (where ε represents the
empty string In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special c ...
).


Applications

Alphabets are important in the use of
formal languages In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of sy ...
,
automata An automaton (; plural: automata or automatons) is a relatively self-operating machine, or control mechanism designed to automatically follow a sequence of operations, or respond to predetermined instructions.Automaton – Definition and More ...
and
semiautomata In mathematics and theoretical computer science, a semiautomaton is a deterministic finite automaton having inputs but no output. It consists of a set ''Q'' of states, a set Σ called the input alphabet, and a function ''T'': ''Q'' × Σ → ''Q' ...
. In most cases, for defining instances of automata, such as deterministic finite automata (DFAs), it is required to specify an alphabet from which the input strings for the automaton are built. In these applications, an alphabet is usually required to be a
finite set In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example, :\ is a finite set with five elements. T ...
, but is not otherwise restricted. When using automata,
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s, or
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
s as part of string-processing
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s, the alphabet may be assumed to be the
character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
of the text to be processed by these algorithms, or a subset of allowable characters from the character set.


See also

* Combinatorics on words


References


Literature

* John E. Hopcroft and Jeffrey D. Ullman, '' Introduction to Automata Theory, Languages, and Computation'', Addison-Wesley Publishing, Reading Massachusetts, 1979. . {{DEFAULTSORT:Alphabet (formal languages) Formal languages Combinatorics on words