Data Coding Scheme is a one-
octet
Octet may refer to:
Music
* Octet (music), ensemble consisting of eight instruments or voices, or composition written for such an ensemble
** String octet, a piece of music written for eight string instruments
*** Octet (Mendelssohn), 1825 com ...
field in
Short Messages (SM) and
Cell Broadcast Messages (CB) which carries a basic information how the recipient handset should process the received message. The information includes:
* the character set or message coding, which determines the encoding of the message user data
* the message class, which determines to which component of the Mobile Station (MS) or User Equipment (UE) the message should be delivered
* the request to automatically delete the message after reading
* the state of flags indicating presence of unread voicemail, fax, e-mail or other messages
* the indication that the message content is compressed
* the language of the cell broadcast message
The field is described in
3GPP 23.040 and
3GPP 23.038 under the name TP-DCS.
Message Character Sets
A special 7-bit encoding called the
''GSM 7 bit default alphabet'' was designed for the Short Message System in GSM. The alphabet contains the most-often used symbols from most Western-European languages (and some Greek uppercase letters). Some
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
characters and the
Euro sign
The euro sign () is the currency sign used for the euro, the official currency of the eurozone and unilaterally adopted by Kosovo and Montenegro. The design was presented to the public by the European Commission on 12 December 1996. It consi ...
did not fit into the GSM 7-bit default alphabet and must be encoded using two septets. These characters form GSM 7 bit default alphabet ''extension table''. Support of the GSM 7-bit alphabet is mandatory for GSM handsets and network elements.
Languages which use
Latin script
The Latin script, also known as Roman script, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greece, Greek city of Cumae, in southe ...
, but use characters which are not present in the GSM 7-bit default alphabet, often replace missing characters with
diacritic
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacriti ...
marks with corresponding characters without diacritics, which causes not entirely satisfactory user experience, but is often accepted. In order to include these missing characters the 16-bit
UTF-16
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as cod ...
(in GSM called UCS-2) encoding may be used at the price of reducing the length of a (non-segmented) message from 160 to 70 characters.
The messages in Chinese, Korean or Japanese languages must be encoded using the
UTF-16
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as cod ...
character encoding. The same was also true for other languages using non-Latin scripts like Russian, Arabic, Hebrew and various Indian languages. In 3GPP TS 23.038 8.0.0 published in 2008 a new feature, an extended
National language shift table was introduced, which in the version 11.0.0 published in 2012 covers
Turkish
Turkish may refer to:
*a Turkic language spoken by the Turks
* of or about Turkey
** Turkish language
*** Turkish alphabet
** Turkish people, a Turkic ethnic group and nation
*** Turkish citizen, a citizen of Turkey
*** Turkish communities ...
,
Spanish
Spanish might refer to:
* Items from or related to Spain:
** Spaniards are a nation and ethnic group indigenous to Spain
**Spanish language, spoken in Spain and many Latin American countries
**Spanish cuisine
Other places
* Spanish, Ontario, Ca ...
,
Portuguese
Portuguese may refer to:
* anything of, from, or related to the country and nation of Portugal
** Portuguese cuisine, traditional foods
** Portuguese language, a Romance language
*** Portuguese dialects, variants of the Portuguese language
** Port ...
,
Bengali
Bengali or Bengalee, or Bengalese may refer to:
*something of, from, or related to Bengal, a large region in South Asia
* Bengalis, an ethnic and linguistic group of the region
* Bengali language, the language they speak
** Bengali alphabet, the ...
,
Gujarati
Gujarati may refer to:
* something of, from, or related to Gujarat, a state of India
* Gujarati people, the major ethnic group of Gujarat
* Gujarati language, the Indo-Aryan language spoken by them
* Gujarati languages, the Western Indo-Aryan sub- ...
,
Hindi
Hindi (Devanāgarī: or , ), or more precisely Modern Standard Hindi (Devanagari: ), is an Indo-Aryan languages, Indo-Aryan language spoken chiefly in the Hindi Belt region encompassing parts of North India, northern, Central India, centr ...
,
Kannada
Kannada (; ಕನ್ನಡ, ), originally romanised Canarese, is a Dravidian language spoken predominantly by the people of Karnataka in southwestern India, with minorities in all neighbouring states. It has around 47 million native s ...
,
Malayalam
Malayalam (; , ) is a Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry ( Mahé district) by the Malayali people. It is one of 22 scheduled languages of India. Malayalam wa ...
,
Oriya
Oriya (also spelled Odia) may refer to:
* Odia people in India
* Odia language, an Indian language, belonging to the Indo-Aryan branch of the Indo-European language family
* Odia script, a writing system used for the Oriya language
** Oriya (Unicod ...
,
Punjabi
Punjabi, or Panjabi, most often refers to:
* Something of, from, or related to Punjab, a region in India and Pakistan
* Punjabi language
* Punjabi people
* Punjabi dialects and languages
Punjabi may also refer to:
* Punjabi (horse), a British Th ...
,
Tamil
Tamil may refer to:
* Tamils, an ethnic group native to India and some other parts of Asia
** Sri Lankan Tamils, Tamil people native to Sri Lanka also called ilankai tamils
**Tamil Malaysians, Tamil people native to Malaysia
* Tamil language, nati ...
,
Telugu
Telugu may refer to:
* Telugu language, a major Dravidian language of India
*Telugu people, an ethno-linguistic group of India
* Telugu script, used to write the Telugu language
** Telugu (Unicode block), a block of Telugu characters in Unicode
S ...
and
languages. The mechanism replaces GSM 7-bit default alphabet code table and/or extended table with a national table(s) according to special information elements in
User Data Header User Data Header (UDH) is a binary structure which may be present at the start of a short message in the Short Message Service in GSM. It does not contain any text, but it specifies how the message should be formatted and processed.
UDH can be used ...
. The non-segmented message using national language shift table(s) may carry up to 155 (or 153) 7-bit characters.
GSM recognizes only two encodings for
text message
Text messaging, or texting, is the act of composing and sending electronic messages, typically consisting of alphabetic and numeric characters, between two or more users of mobile devices, desktops/ laptops, or another type of compatible compu ...
s and one encoding for
binary messages:
* GSM 7-bit default alphabet (which includes using of National language shift tables as well)
* UCS-2
* 8-bit data
Message Classes
The TP-DCS octet has a complex syntax to allow carrying of other information; the most notable are message classes:
Flash messages are received by a mobile phone even though it has full memory. They are not stored in the phone, they just displayed on the phone display.
Other Features
Automatic Deletion after Reading
The handset should delete any message received with a TP-DCS value falling to the "Message Marked for Automatic Deletion Coding Group" after user has read it.
Message Waiting Indication
Message Waiting Indication group of DCS values serves to set or reset flags indicating presence of unread
voicemail
A voicemail system (also known as voice message or voice bank) is a computer-based system that allows users and subscribers to exchange personal voice messages; to select and deliver voice information; and to process transactions relating to ind ...
,
fax
Fax (short for facsimile), sometimes called telecopying or telefax (the latter short for telefacsimile), is the telephonic transmission of scanned printed material (both text and images), normally to a telephone number connected to a printer o ...
,
e-mail
Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic ( digital) version of, or counterpart to, mail, at a time when "mail" mean ...
or other messages.
Data Compression
A special DCS value also allows message
compression
Compression may refer to:
Physical science
*Compression (physics), size reduction due to forces
*Compression member, a structural element such as a column
*Compressibility, susceptibility to compression
*Gas compression
*Compression ratio, of a c ...
, but it perhaps is not used by any operator.
DCS Values
SMS Data Coding Scheme
The values of TP-DCS are defined in
GSM recommendation 03.38.
[3GPP TS 23.038](_blank)
Alphabets and language-specific information.
iDEN
Integrated Digital Enhanced Network (iDEN) is a mobile telecommunications technology, developed by Motorola, which provides its users the benefits of a trunked radio and a cellular telephone. It was called the first mobile social network by m ...
mobile standard uses values F7
16 and F8
16 in a special way.
CBS Data Coding Scheme
For the DCS values in Cell Broadcast Messages see
GSM recommendation 03.38.
See also
*
Short Message Service
Short Message/Messaging Service, commonly abbreviated as SMS, is a text messaging service component of most telephone, Internet and mobile device systems. It uses standardized communication protocols that let mobile devices exchange short text ...
*
Cell Broadcast
Cell Broadcast (CB) is a method of sending messages to multiple mobile telephone users in a defined area at the same time. It is defined by the ETSI’s GSM committee and 3GPP and is part of the 2G, 3G, 4G LTE (telecommunication) and 5G s ...
*
GSM 03.38
In mobile telephony GSM 03.38 or 3GPP 23.038 is a character encoding used in GSM networks for SMS (Short Message Service), CB (Cell Broadcast) and USSD (Unstructured Supplementary Service Data). The 3GPP TS 23.038 standard (originally GSM recommend ...
*
GSM 03.40
GSM 03.40 or 3GPP TS 23.040 is a mobile telephony standard describing the format of the Transfer Protocol Data Units (TPDU) of the Short Message Transfer Protocol (SM-TP) used in the GSM networks to carry Short Messages. This format is used throu ...
*
User Data Header User Data Header (UDH) is a binary structure which may be present at the start of a short message in the Short Message Service in GSM. It does not contain any text, but it specifies how the message should be formatted and processed.
UDH can be used ...
*
Concatenated SMS In the cellular phone industry, mobile phones and their networks sometimes support concatenated short message service (or concatenated SMS) to overcome the limitation on the number of characters that can be sent in a single SMS text message transmis ...
*
Short message service technical realisation (GSM) The Short Message Service is realised by the use of the Mobile Application Part (MAP) of the SS7 protocol, with Short Message protocol elements being transported across the network as fields within the MAP messages.Mobile Application Part specifica ...
*
Enhanced Messaging Service Enhanced Messaging Service (EMS) was a cross-industry collaboration between Samsung, Ericsson, Motorola, Siemens and Alcatel among others, which provided an application-level extension to Short Message Service (SMS) for cellular phones availabl ...
*
Multimedia Messaging Service
Multimedia Messaging Service (MMS) is a standard way to send messages that include multimedia content to and from a mobile phone over a cellular network. Users and providers may refer to such a message as a PXT, a picture message, or a multimedia ...
*
Short Message Peer-to-Peer
Short Message Peer-to-Peer (SMPP) in the telecommunications industry is an open, industry standard protocol designed to provide a flexible data communication interface for the transfer of short message data between External Short Messaging Entit ...
*
Universal Computer Protocol {{refimprove, date=March 2019
External Machine Interface (EMI), an extension to Universal Computer Protocol (UCP), is a protocol primarily used to connect to short message service centres (SMSCs) for mobile telephones. The protocol was developed ...
References
{{Reflist
GSM standard