Speech Assessment Methods Phonetic Alphabet
   HOME

TheInfoList



OR:

The Speech Assessment Methods Phonetic Alphabet (SAMPA) is a computer-readable phonetic script using 7-bit printable
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
characters, based on the
International Phonetic Alphabet The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standard written representation ...
(IPA). It was originally developed in the late 1980s for six European languages by the
EEC The European Economic Community (EEC) was a regional organisation created by the Treaty of Rome of 1957,Today the largely rewritten treaty continues in force as the ''Treaty on the functioning of the European Union'', as renamed by the Lisbo ...
ESPRIT information technology research and development program. As many symbols as possible have been taken over from the IPA; where this is not possible, other signs that are available are used, e.g. code>@for schwa (IPA ), code>2for the vowel sound found in French (IPA ), and code>9for the vowel sound found in French (IPA ). The characters ">code>"s{mp@represent the pronunciation of the name SAMPA in English, with the initial symbol indicating primary stress (in IPA, ). Like IPA, SAMPA is usually enclosed in
square brackets A bracket is either of two tall fore- or back-facing punctuation marks commonly used to isolate a segment of text or data from its surroundings. They come in four main pairs of shapes, as given in the box to the right, which also gives their n ...
or slashes, which are not part of the alphabet proper and merely signify that it is phonetic as opposed to regular text.


Languages

Today, officially, SAMPA has been developed for all the sounds of the following languages: *
Arabic Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
* Bulgarian *
Cantonese Cantonese is the traditional prestige variety of Yue Chinese, a Sinitic language belonging to the Sino-Tibetan language family. It originated in the city of Guangzhou (formerly known as Canton) and its surrounding Pearl River Delta. While th ...
*
Czech Czech may refer to: * Anything from or related to the Czech Republic, a country in Europe ** Czech language ** Czechs, the people of the area ** Czech culture ** Czech cuisine * One of three mythical brothers, Lech, Czech, and Rus *Czech (surnam ...
* Danish *
Dutch Dutch or Nederlands commonly refers to: * Something of, from, or related to the Netherlands ** Dutch people as an ethnic group () ** Dutch nationality law, history and regulations of Dutch citizenship () ** Dutch language () * In specific terms, i ...
* English *
Estonian Estonian may refer to: * Something of, from, or related to Estonia, a country in the Baltic region in northern Europe * Estonians, people from Estonia, or of Estonian descent * Estonian language * Estonian cuisine * Estonian culture See also

...
* French *
German German(s) may refer to: * Germany, the country of the Germans and German things **Germania (Roman era) * Germans, citizens of Germany, people of German ancestry, or native speakers of the German language ** For citizenship in Germany, see also Ge ...
*
Greek Greek may refer to: Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group *Greek language, a branch of the Indo-European language family **Proto-Greek language, the assumed last common ancestor of all kno ...
*
Hebrew Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
* Hungarian *
Italian Italian(s) may refer to: * Anything of, from, or related to the people of Italy over the centuries ** Italians, a Romance ethnic group related to or simply a citizen of the Italian Republic or Italian Kingdom ** Italian language, a Romance languag ...
* Norwegian *
Polish Polish may refer to: * Anything from or related to Poland, a country in Europe * Polish language * Polish people, people from Poland or of Polish descent * Polish chicken * Polish brothers (Mark Polish and Michael Polish, born 1970), American twin ...
* Portuguese *
Romanian Romanian may refer to: *anything of, from, or related to the country and nation of Romania **Romanians, an ethnic group **Romanian language, a Romance language ***Romanian dialects, variants of the Romanian language **Romanian cuisine, traditional ...
*
Russian Russian(s) may refer to: *Russians (), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *A citizen of Russia *Russian language, the most widely spoken of the Slavic languages *''The Russians'', a b ...
* Scots *
Serbo-Croatian Serbo-Croatian ( / ), also known as Bosnian-Croatian-Montenegrin-Serbian (BCMS), is a South Slavic language and the primary language of Serbia, Croatia, Bosnia and Herzegovina, and Montenegro. It is a pluricentric language with four mutually i ...
* Slovak * Slovenian *
Spanish Spanish might refer to: * Items from or related to Spain: **Spaniards are a nation and ethnic group indigenous to Spain **Spanish language, spoken in Spain and many countries in the Americas **Spanish cuisine **Spanish history **Spanish culture ...
* Swedish * Thai * Turkish


Features

SAMPA was developed in the late 1980s in the
European Commission The European Commission (EC) is the primary Executive (government), executive arm of the European Union (EU). It operates as a cabinet government, with a number of European Commissioner, members of the Commission (directorial system, informall ...
-funded ESPRIT project 2589 "Speech Assessment Methods" (SAM)—hence "SAM Phonetic Alphabet"—in order to facilitate email data exchange and computational processing of transcriptions in phonetics and speech technology. SAMPA is a partial
encoding In communications and Data processing, information processing, code is a system of rules to convert information—such as a letter (alphabet), letter, word, sound, image, or gesture—into another form, sometimes data compression, shortened or ...
of the IPA. The first version of SAMPA was the union of the sets of phoneme codes for Danish, Dutch, English, French, German and Italian; later versions extended SAMPA to cover other European languages. Since SAMPA is based on phoneme inventories, each SAMPA table is valid only in the language it was created for. In order to make this IPA encoding technique universally applicable,
X-SAMPA The Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) is a variant of SAMPA developed in 1995 by John C. Wells, professor of phonetics at University College London. It is designed to unify the individual language SAMPA alphabets, and ...
was created, which provides ''one single table'' without language-specific differences. SAMPA was devised as a
hack Hack may refer to: Arts, entertainment, and media Games * Hack (Unix video game), ''Hack'' (Unix video game), a 1984 roguelike video game * .hack (video game series), ''.hack'' (video game series), a series of video games by the multimedia fran ...
to work around the inability of
text encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a c ...
s to represent IPA symbols. Consequently, as
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
support for IPA symbols becomes more widespread, the necessity for a separate, computer-readable system for representing the IPA in ASCII decreases. However, text input relies on specific keyboard encodings or input devices. For this reason, SAMPA and X-SAMPA are still widely used in computational phonetics and in speech technology.


See also

*
Comparison of ASCII encodings of the International Phonetic Alphabet The International Phonetic Alphabet (IPA) consists of more than 100 letters and diacritics. Before Unicode became widely available, several ASCII-based encoding systems of the IPA were proposed. The alphabet went through a large revision at the Ki ...
* SAMPA chart * SAMPA chart for English, a concise version *
X-SAMPA The Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) is a variant of SAMPA developed in 1995 by John C. Wells, professor of phonetics at University College London. It is designed to unify the individual language SAMPA alphabets, and ...
, a language-independent notation similar to SAMPA, but covering the entire IPA repertoire * BABEL Speech Corpus


References

* Ranchhod, Elisabeth & J. Mamede, Nuno (2002). ''Advances in Natural Language Processing: Third International Conference, PorTAL 2002, Faro, Portugal, June 23–26, 2002. Proceedings (
Lecture Notes in Computer Science ''Lecture Notes in Computer Science'' is a series of computer science books published by Springer Science+Business Media since 1973. Overview The series contains proceedings, post-proceedings, monographs, and Festschrifts. In addition, tutorials ...
)''. (1st ed.). Springer. . * L. DeMiller, Anna & Rettig, James (2000). ''Linguistics: A Guide to the Reference Literature'' (2nd ed.). Libraries Unlimited. . * Lamberts, Koen & Goldstone, Rob (2004). ''Handbook of Cognition''. Sage Publications Ltd. .


External links


SAMPA computer readable phonetic alphabet






from (German) written text to SAMPA and IPA (Ajax-application)

an

{{IPA navigation 1980s establishments in Europe Writing systems introduced in the 1980s 1980s in computing