The Speech Assessment Methods Phonetic Alphabet (SAMPA) is a computer-readable phonetic script using 7-bit printable
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
characters, based on the
International Phonetic Alphabet
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standard written representation ...
(IPA). It was originally developed in the late 1980s for six European languages by the
EEC ESPRIT information technology research and development program. As many symbols as possible have been taken over from the IPA; where this is not possible, other signs that are available are used, e.g.
@">code>@for
schwa (IPA ),
2">code>2for the vowel sound found in
French (IPA ), and
9">code>9for the vowel sound found in French (IPA ).
The characters
"s{mp@">code>"s{mp@represent the pronunciation of the name SAMPA in English, with the initial symbol
indicating primary stress (in IPA, ). Like IPA, SAMPA is usually enclosed in
square brackets or
slashes, which are not part of the alphabet proper and merely signify that it is phonetic as opposed to regular text.
Languages
Today, officially, SAMPA has been developed for all the sounds of the following languages:
*
Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
*
Bulgarian
*
Cantonese
Cantonese is the traditional prestige variety of Yue Chinese, a Sinitic language belonging to the Sino-Tibetan language family. It originated in the city of Guangzhou (formerly known as Canton) and its surrounding Pearl River Delta. While th ...
*
Czech
*
Danish
*
Dutch
*
English
*
Estonian
*
French
*
German
*
Greek
*
Hebrew
Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
*
Hungarian
*
Italian
*
Norwegian
*
Polish
*
Portuguese
*
Romanian
*
Russian
*
Scots
*
Serbo-Croatian
Serbo-Croatian ( / ), also known as Bosnian-Croatian-Montenegrin-Serbian (BCMS), is a South Slavic language and the primary language of Serbia, Croatia, Bosnia and Herzegovina, and Montenegro. It is a pluricentric language with four mutually i ...
*
Slovak
*
Slovenian
*
Spanish
*
Swedish
*
Thai
*
Turkish
Features
SAMPA was developed in the late 1980s in the
European Commission
The European Commission (EC) is the primary Executive (government), executive arm of the European Union (EU). It operates as a cabinet government, with a number of European Commissioner, members of the Commission (directorial system, informall ...
-funded
ESPRIT project 2589 "Speech Assessment Methods" (SAM)—hence "SAM Phonetic Alphabet"—in order to facilitate email data exchange and computational processing of transcriptions in phonetics and speech technology.
SAMPA is a partial
encoding of the
IPA. The first version of SAMPA was the union of the sets of phoneme codes for Danish, Dutch, English, French, German and Italian; later versions extended SAMPA to cover other European languages. Since SAMPA is based on phoneme inventories, each SAMPA table is valid only in the language it was created for. In order to make this
IPA encoding technique universally applicable,
X-SAMPA was created, which provides ''one single table'' without language-specific differences.
SAMPA was devised as a
hack to work around the inability of
text encodings to represent IPA symbols. Consequently, as
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
support for IPA symbols becomes more widespread, the necessity for a separate, computer-readable system for representing the IPA in ASCII decreases. However, text input relies on specific keyboard encodings or input devices. For this reason, SAMPA and X-SAMPA are still widely used
in computational phonetics and in speech technology.
See also
*
Comparison of ASCII encodings of the International Phonetic Alphabet
*
SAMPA chart
*
SAMPA chart for English, a concise version
*
X-SAMPA, a language-independent notation similar to SAMPA, but covering the entire IPA repertoire
*
BABEL Speech Corpus
References
* Ranchhod, Elisabeth & J. Mamede, Nuno (2002). ''Advances in Natural Language Processing: Third International Conference, PorTAL 2002, Faro, Portugal, June 23–26, 2002. Proceedings (
Lecture Notes in Computer Science)''. (1st ed.). Springer. .
* L. DeMiller, Anna & Rettig, James (2000). ''Linguistics: A Guide to the Reference Literature'' (2nd ed.). Libraries Unlimited. .
* Lamberts, Koen & Goldstone, Rob (2004). ''Handbook of Cognition''. Sage Publications Ltd. .
External links
SAMPA computer readable phonetic alphabet
from (German) written text to SAMPA and IPA (Ajax-application)
an
{{IPA navigation
1980s establishments in Europe
Writing systems introduced in the 1980s
1980s in computing