Tamil Script Code for Information Interchange (TSCII) is a coding scheme for representing the
Tamil script
The Tamil script ( ) is an abugida script that is used by Tamils and Tamil language, Tamil speakers in India, Sri Lanka, Malaysia, Singapore and elsewhere to write the Tamil language. It is one of the official scripts of the Indian Republic. ...
. The lower 128 codepoints are plain
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
, the upper 128 codepoints are TSCII-specific. After long years of being used on the Internet by private agreement only, it was successfully registered with the
IANA
The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Internet P ...
in 2007.
TSCII encodes the characters in visual (written) order, paralleling the use of the Tamil Typewriter.
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
, instead, uses the logical order encoding strategy for Tamil, following
ISCII
Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Eastern Nagari, Bengali–Ass ...
, in contrast to the case of
Thai, where the visual order encoding grandfathered by
TIS-620
Thai Industrial Standard 620-2533, commonly referred to as TIS-620, is the most common single-byte character encoding for the Thai language. The standard is published by the Thai Industrial Standards Institute (TISI), an organ of the Ministry ...
was adopted.
The government of
Tamil Nadu
Tamil Nadu (; , TN) is the southernmost States and union territories of India, state of India. The List of states and union territories of India by area, tenth largest Indian state by area and the List of states and union territories of Indi ...
endorses its own TAB/TAM standards for 8-bit encoding and other, older encoding schemes can still be found on the web.
The free etext collection a
Project Maduraiuses the TSCII encoding, but has already started to provide
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
versions.
History
The need for a common encoding for Tamil was felt by members of various mailing list based forums in mid-1990s, as there were multiple custom coded fonts were prevalent in those forums. While some of the commercial encodings were popular than the others, they were not accepted by wider community due to conflicting commercial interests. While Unicode was accepted by most as the future standard, most of the desktop systems at that time were still not capable of handling Unicode for Tamil language, and an interim 8-bit encoding was required.
A separate mailing list for discussion of such encodings (
[email protected]) was created in 1997 to initiate this discussion, starting with an email written by
Dr.K.Kalyanasundaram to the popular Tamil author
Sujatha who headed the committee for standardization of Tamil keyboard. This forum quickly attracted enthusiastic participants from across the globe, including several prominent Tamil scholars. Archives of these discussion are maintained by
INFITT.
Subsequent to publishing TSCII, most of the members of
[email protected] mailing list became part of INFITT, which is a wider initiative to bring in standardization and continued development in various areas of Tamil computing.
Codepage layout
Conversion Tools
Text encoded in UTF-8 can be converted to TSCII using the GNU iconv tools as follows,
$ iconv -f utf-8 -t tscii hello.utf8 > hello.tscii
Whereas conversion from TSCII to UTF-8 is done by interchanging -f and -t flags.
Visual Application
An open source project is available a
AnyTaFont2UTF8is maintained by
Isaiyini Tamil Community
See also
*
TACE16 (Tamil All Character Encoding)
*
Clip font
*
Tamil keyboard
The Tamil language, Tamil keyboard layout, keyboard is used in computers and mobile devices to input text in the Tamil script.
The keyboard layout approved by the Government of Tamil Nadu is Tamil 99. The InScript keyboard is the keyboard layout ...
*
தமிழ் 99
*
InScript
InScript (short for Indic Script) is the decreed standard keyboard layout for Indian scripts using a standard 104- or 105-key layout. This keyboard layout was standardised by the Government of India for inputting text in languages of India writ ...
*
Tamil (Unicode block)
Tamil is a Unicode block containing characters for the Tamil, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B82..U+0BCD were a direct copy of the Tamil characters ...
*
Tamil blogosphere
The Tamil blogosphere is the online community of Tamil-language weblogs that are a part of the larger Indian blogosphere. The Tamil blogosphere has a considerable number of contributors from Sri Lanka and Singapore, and is one of the largest bl ...
AnyTaFont2UTF8– an
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
project for all Tamil encoding/font mapping characters.
References
External links
TSCII Start PageUnicode Technical Note #15 Text conversion From TSCII 1.7 to UnicodeINFITT (International Forum for Information Technology in Tamil)TSCII to Unicode Online & Webpage ConversionPadma – Mozilla extension for transforming TSCII to Unicode
{{character encoding
Tamil character-encoding standards
Character sets
Input methods
An input method (or input method editor, commonly abbreviated IME) is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters (or mouse oper ...