Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of
India
India, officially the Republic of India, is a country in South Asia. It is the List of countries and dependencies by area, seventh-largest country by area; the List of countries by population (United Nations), most populous country since ...
. It encodes the main
Indic scripts and a Roman transliteration. The supported scripts are:
Bengali–Assamese,
Devanagari
Devanagari ( ; in script: , , ) is an Indic script used in the Indian subcontinent. It is a left-to-right abugida (a type of segmental Writing systems#Segmental systems: alphabets, writing system), based on the ancient ''Brāhmī script, Brā ...
,
Gujarati,
Gurmukhi,
Kannada
Kannada () is a Dravidian language spoken predominantly in the state of Karnataka in southwestern India, and spoken by a minority of the population in all neighbouring states. It has 44 million native speakers, and is additionally a ...
,
Malayalam
Malayalam (; , ) is a Dravidian languages, Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry (union territory), Puducherry (Mahé district) by the Malayali people. It is one of ...
,
Odia,
Tamil, and
Telugu. ISCII does not encode the writing systems of India that are based on
Persian, but its writing system switching codes nonetheless provide for
Kashmiri,
Sindhi,
Urdu
Urdu (; , , ) is an Indo-Aryan languages, Indo-Aryan language spoken chiefly in South Asia. It is the Languages of Pakistan, national language and ''lingua franca'' of Pakistan. In India, it is an Eighth Schedule to the Constitution of Indi ...
,
Persian,
Pashto
Pashto ( , ; , ) is an eastern Iranian language in the Indo-European language family, natively spoken in northwestern Pakistan and southern and eastern Afghanistan. It has official status in Afghanistan and the Pakistani province of Khyb ...
and
Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
. The Persian-based writing systems were subsequently encoded in the
PASCII encoding.
ISCII has not been widely used outside certain government institutions, although a variant without the mechanism was used on
classic Mac OS
Mac OS (originally System Software; retronym: Classic Mac OS) is the series of operating systems developed for the Mac (computer), Macintosh family of personal computers by Apple Computer, Inc. from 1984 to 2001, starting with System 1 and end ...
,
Mac OS Devanagari,
and it has now been rendered largely obsolete by
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
. Unicode uses a separate block for each Indic writing system, and largely preserves the ISCII layout within each block.
Background
The Brahmi-derived writing systems have similar structure.
So ISCII encodes letters with the same phonetic value at the same code point, overlaying the various scripts. For example, the ISCII codes 0xB3 0xDB represent
i This will be rendered as കി in
Malayalam
Malayalam (; , ) is a Dravidian languages, Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry (union territory), Puducherry (Mahé district) by the Malayali people. It is one of ...
, कि in Devanagari, as ਕਿ in Gurmukhi, and as கி in Tamil. The writing system can be selected in rich text by markup or in plain text by means of the code described below.
One motivation for the use of a single encoding is the idea that it will allow easy
transliteration
Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus '' trans-'' + '' liter-'') in predictable ways, such as Greek → and → the digraph , Cyrillic → , Armenian → or L ...
from one writing system to another.
However, there are enough incompatibilities that this is not really a practical idea.
ISCII is an 8-bit encoding.
The lower 128 code points are plain
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
, the upper 128 code points are ISCII-specific. In addition to the code points representing characters, ISCII makes use of a code point with mnemonic that indicates that the following byte contains one of two kinds of information. One set of values changes the writing system until the next writing system indicator or end-of-line. Another set of values select display modes such as bold and italic. ISCII does not provide a means of indicating the default writing system.
Codepage layout
The following table shows the character set for
Devanagari
Devanagari ( ; in script: , , ) is an Indic script used in the Indian subcontinent. It is a left-to-right abugida (a type of segmental Writing systems#Segmental systems: alphabets, writing system), based on the ancient ''Brāhmī script, Brā ...
. The code sets for Assamese, Bengali, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu are similar, with each Devanagari form replaced by the
equivalent form in each writing system. Each character is shown with its decimal code and its
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
equivalent.
Special code points
; INV character—code point D9 (217): The INV (invisible consonant) character is used as a pseudo-consonant to display combining elements in isolation. For example, क (ka) + ् (halant) + INV = क् (half ka). The Unicode equivalent is (). However, as noted
below, the ISCII halant character can be doubled or combined with the ISCII nukta to achieve effects created by or ZWJ in Unicode. For this reason,
Apple
An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
maps the ISCII INV character to the Unicode , so as to guarantee
round-tripping.
; ATR character—code point EF (239): The ATR (attribute) character followed by a byte code is used to switch to a different font attribute (such as bold) or to a different ISCII or
PASCII language (such as Bengali), up to the next ATR sequence or the end of the line. This has no direct Unicode equivalent, as font attributes are not part of Unicode, and each script has a distinct set of code points.
; EXT character—code point F0 (240): The EXT (extensions for Vedic) character followed by a byte code indicates a Vedic accent. This has no direct Unicode equivalent, as Vedic accents are assigned to distinct code points.
; Halant character ्—code point E8 (232): The halant character removes the implicit vowel from a consonant and is used between consonants to represent conjunct consonants. For example, क (ka) + ् (halant) + त (ta) = क्त (kta). The sequence ् (halant) + ् (halant) displays a conjunct with an explicit halant, for example क (ka) + ् (halant) + ् (halant) + त (ta) = क्त. The sequence ् (halant) + ़ (nukta) displays a conjunct with half consonants, if available, for example क (ka) + ् (halant) + ़ (nukta) + त (ta) = क्त.
; Nukta character ़—code point E9 (233): The
nukta character after another ISCII character is used for a number of rarer characters which don't exist in the main ISCII set. For example क (ka) + ़ (nukta) = क़ (qa). These characters have precomposed forms in Unicode, as shown in the following table.
Code pages for ISCII conversion
To convert from Unicode (UTF-8) to an ISCII / ANSI coding, the following code pages may be used:
* 57002: Devanagari (Hindi, Marathi, Sanskrit, Konkani)
* 57003: Bengali
* 57004: Tamil
* 57005: Telugu
* 57006: Assamese
* 57007: Odia
* 57008: Kannada
* 57009: Malayalam
* 57010: Gujarati
* 57011: Punjabi (Gurmukhi)
Code points for all languages
References
External links
Converters from/to ISCII to/from various fontsPadma – Mozilla extension for transforming ISCII to Unicode
{{DEFAULTSORT:Indian Script Code For Information Interchange
Indic computing
Character sets
Hindustani orthography