ISCII
   HOME

TheInfoList



OR:

Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of
India India, officially the Republic of India (Hindi: ), is a country in South Asia. It is the List of countries and dependencies by area, seventh-largest country by area, the List of countries and dependencies by population, second-most populous ...
. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese,
Devanagari Devanagari ( ; , , Sanskrit pronunciation: ), also called Nagari (),Kathleen Kuiper (2010), The Culture of India, New York: The Rosen Publishing Group, , page 83 is a left-to-right abugida (a type of segmental writing system), based on the ...
,
Gujarati Gujarati may refer to: * something of, from, or related to Gujarat, a state of India * Gujarati people, the major ethnic group of Gujarat * Gujarati language, the Indo-Aryan language spoken by them * Gujarati languages, the Western Indo-Aryan sub- ...
,
Gurmukhi Gurmukhī ( pa, ਗੁਰਮੁਖੀ, , Shahmukhi: ) is an abugida developed from the Laṇḍā scripts, standardized and used by the second Sikh guru, Guru Angad (1504–1552). It is used by Punjabi Sikhs to write the language, commonly ...
,
Kannada Kannada (; ಕನ್ನಡ, ), originally romanised Canarese, is a Dravidian language spoken predominantly by the people of Karnataka in southwestern India, with minorities in all neighbouring states. It has around 47 million native s ...
,
Malayalam Malayalam (; , ) is a Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry ( Mahé district) by the Malayali people. It is one of 22 scheduled languages of India. Malayalam wa ...
, Oriya,
Tamil Tamil may refer to: * Tamils, an ethnic group native to India and some other parts of Asia ** Sri Lankan Tamils, Tamil people native to Sri Lanka also called ilankai tamils **Tamil Malaysians, Tamil people native to Malaysia * Tamil language, na ...
, and
Telugu Telugu may refer to: * Telugu language, a major Dravidian language of India *Telugu people, an ethno-linguistic group of India * Telugu script, used to write the Telugu language ** Telugu (Unicode block), a block of Telugu characters in Unicode ...
. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for
Kashmiri Kashmiri may refer to: * People or things related to the Kashmir Valley or the broader region of Kashmir * Kashmiris, an ethnic group native to the Kashmir Valley * Kashmiri language, their language People with the name * Kashmiri Saikia Baruah ...
,
Sindhi Sindhi may refer to: *something from, or related to Sindh, a province of Pakistan * Sindhi people, an ethnic group from the Sindh region * Sindhi language, the Indo-Aryan language spoken by them People with the name * Sarkash Sindhi (1940–2012 ...
,
Urdu Urdu (;"Urdu"
'' Persian,
Pashto Pashto (,; , ) is an Eastern Iranian language in the Indo-European language family. It is known in historical Persian literature as Afghani (). Spoken as a native language mostly by ethnic Pashtuns, it is one of the two official langua ...
and
Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter ...
. The Persian-based writing systems were subsequently encoded in the PASCII encoding. ISCII has not been widely used outside certain government institutions, although a variant without the mechanism was used on
classic Mac OS Mac OS (originally System Software; retronym: Classic Mac OS) is the series of operating systems developed for the Macintosh family of personal computers by Apple Computer from 1984 to 2001, starting with System 1 and ending with Mac OS 9. ...
, Mac OS Devanagari, and it has now been rendered largely obsolete by
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
. Unicode uses a separate block for each Indic writing system, and largely preserves the ISCII layout within each block.


Background

The Brahmi-derived writing systems have similar structure. So ISCII encodes letters with the same phonetic value at the same code point, overlaying the various scripts. For example, the ISCII codes 0xB3 0xDB represent i This will be rendered as കി in
Malayalam Malayalam (; , ) is a Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry ( Mahé district) by the Malayali people. It is one of 22 scheduled languages of India. Malayalam wa ...
, कि in Devanagari, as ਕਿ in Gurmukhi, and as கி in Tamil. The writing system can be selected in rich text by markup or in plain text by means of the code described below. One motivation for the use of a single encoding is the idea that it will allow easy
transliteration Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus ''trans-'' + '' liter-'') in predictable ways, such as Greek → , Cyrillic → , Greek → the digraph , Armenian → or L ...
from one writing system to another. However, there are enough incompatibilities that this is not really a practical idea. ISCII is an 8-bit encoding. The lower 128 code points are plain
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
, the upper 128 code points are ISCII-specific. In addition to the code points representing characters, ISCII makes use of a code point with mnemonic that indicates that the following byte contains one of two kinds of information. One set of values changes the writing system until the next writing system indicator or end-of-line. Another set of values select display modes such as bold and italic. ISCII does not provide a means of indicating the default writing system.


Codepage layout

The following table shows the character set for
Devanagari Devanagari ( ; , , Sanskrit pronunciation: ), also called Nagari (),Kathleen Kuiper (2010), The Culture of India, New York: The Rosen Publishing Group, , page 83 is a left-to-right abugida (a type of segmental writing system), based on the ...
. The code sets for Assamese, Bengali, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu are similar, with each Devanagari form replaced by the equivalent form in each writing system. Each character is shown with its decimal code and its
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
equivalent.


Special code points

; INV character—code point D9 (217): The INV (invisible consonant) character is used as a pseudo-consonant to display combining elements in isolation. For example, क (ka) + ् (halant) + INV = क्‍ (half ka). The Unicode equivalent is (). However, as noted
below Below may refer to: *Earth * Ground (disambiguation) *Soil *Floor * Bottom (disambiguation) *Less than *Temperatures below freezing *Hell or underworld People with the surname *Ernst von Below (1863–1955), German World War I general *Fred Below ...
, the ISCII halant character can be doubled or combined with the ISCII nukta to achieve effects created by or ZWJ in Unicode. For this reason,
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ancest ...
maps the ISCII INV character to the Unicode , so as to guarantee round-tripping. ; ATR character—code point EF (239): The ATR (attribute) character followed by a byte code is used to switch to a different font attribute (such as bold) or to a different ISCII or PASCII language (such as Bengali), up to the next ATR sequence or the end of the line. This has no direct Unicode equivalent, as font attributes are not part of Unicode, and each script has a distinct set of code points. ; EXT character—code point F0 (240): The EXT (extensions for Vedic) character followed by a byte code indicates a Vedic accent. This has no direct Unicode equivalent, as Vedic accents are assigned to distinct code points. ; Halant character ्—code point E8 (232): The halant character removes the implicit vowel from a consonant and is used between consonants to represent conjunct consonants. For example, क (ka) + ् (halant) + त (ta) = क्त (kta). The sequence ् (halant) + ् (halant) displays a conjunct with an explicit halant, for example क (ka) + ् (halant) + ् (halant) + त (ta) = क्‌त. The sequence ् (halant) + ़ (nukta) displays a conjunct with half consonants, if available, for example क (ka) + ् (halant) + ़ (nukta) + त (ta) = क्‍त. ; Nukta character ़—code point E9 (233): The nukta character after another ISCII character is used for a number of rarer characters which don't exist in the main ISCII set. For example क (ka) + ़ (nukta) = क़ (qa). These characters have precomposed forms in Unicode, as shown in the following table.


Code pages for ISCII conversion

To convert from Unicode (UTF-8) to an ISCII / ANSI coding, the following code pages may be used: * 57002: Devanagari (Hindi, Marathi, Sanskrit, Konkani) * 57003: Bengali * 57004: Tamil * 57005: Telugu * 57006: Assamese * 57007: Odia * 57008: Kannada * 57009: Malayalam * 57010: Gujarati * 57011: Punjabi (Gurmukhi)


Code points for all languages


References


External links


Converters from/to ISCII to/from various fonts

The ISCII 1991 standard (PDF)

Padma – Mozilla extension for transforming ISCII to Unicode




{{DEFAULTSORT:Indian Script Code For Information Interchange Indic computing Character sets Hindustani orthography