VISCII
   HOME

TheInfoList



OR:

VISCII is an unofficially-defined modified
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
for using the Vietnamese language with computers. It should not be confused with the similarly-named officially registered VSCII encoding. VISCII keeps the 95 printable characters of ASCII unmodified, but it replaces 6 of the 33 control characters with printable characters. It adds 128
precomposed character A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacri ...
s.
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
and the
Windows-1258 Windows-1258 is a code page used in Microsoft Windows to represent Vietnamese texts. It makes use of combining diacritical marks. Windows-1258 is compatible with neither the Vietnamese standard ( TCVN 5712 / VSCII), nor the various other encodin ...
code page are now used for virtually all Vietnamese computer data, but legacy VSCII and VISCII files may need conversion.


History and naming

VISCII was designed by the Vietnamese Standardization Working Group (Viet-Std Group) led by Christopher Cuong T. Nguyen, Cuong M. Bui, and Hoc D. Ngo based in
Silicon Valley Silicon Valley is a region in Northern California that serves as a global center for high technology and innovation. Located in the southern part of the San Francisco Bay Area, it corresponds roughly to the geographical areas San Mateo Coun ...
, California in 1992 while they were working with the Unicode consortium to include pre-composed Vietnamese characters in the Unicode standard. VISCII, along with
VIQR Vietnamese Quoted-Readable (usually abbreviated VIQR), also known as Vietnet, is a convention for writing Vietnamese using ASCII characters encoded in only 7 bits, making possible for Vietnamese to be supported in computing and communication system ...
, was first published in a bilingual report in September 1992, in which it was dubbed the "Vietnamese Standard Code for Information Interchange". The report noted a proliferation in computer usage in Vietnam and the increasing volume of computer-based communications among Vietnamese abroad, that existing applications used vendor-specific encodings which were unable to interoperate with one another, and that
standardisation Standardization or standardisation is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organizations and governments. Standardization ...
between vendors was therefore necessary. The successful inclusion of composed and precomposed Vietnamese in Unicode 1.0 was the result of the lessons learned from the development of 8-bit VISCII and 7-bit VIQR. The next year, in 1993, Vietnam adopted TCVN 5712, its first national standard in the
information technology Information technology (IT) is the use of computers to create, process, store, retrieve, and exchange all kinds of Data (computing), data . and information. IT forms part of information and communications technology (ICT). An information te ...
domain. This defined a character encoding named VSCII, which had been developed by the TCVN Technical Committee on Information Technology (TCVN/TC1), and with its name standing for "Vietnamese Standard Code for Information Interchange". VSCII is incompatible with, and otherwise unrelated to, the earlier-published VISCII. Unlike VISCII, VSCII is a "Vietnamese Standard" in the sense of a national standard. VISCII and
VIQR Vietnamese Quoted-Readable (usually abbreviated VIQR), also known as Vietnet, is a convention for writing Vietnamese using ASCII characters encoded in only 7 bits, making possible for Vietnamese to be supported in computing and communication system ...
were approved as the informational-status , attributed to the Viet-Std group and dated May 1993. As is the case with IETF RFCs, RFC 1456 notes them to be "conventions" used by overseas Vietnamese speakers on
Usenet Usenet () is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it wa ...
, and that it "specifies no level of standard". In spite of this, it continues to call VISCII the "VIetnamese Standard Code for Information Interchange" (the same name taken by VSCII). The labels VISCII and csVISCII are registered with the IANA for VISCII, with reference to RFC 1456. (There is, on the other hand, no official IANA label for TCVN 5712 / VSCII, although x-viet-tcvn5712 was previously supported by Mozilla Firefox.)


Design

A traditional extended ASCII character set consists of the ASCII set plus up to 128 characters. Vietnamese requires 134 additional letter-diacritic combinations, which is six too many. There are (short of dropping tone mark support for capital letters, as in VSCII-3) essentially four different ways to handle this problem: #Use
variable-width encoding A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set (a repertoire of symbols) for representation, usually in a computer. Most common variable-width encodings ar ...
(as does
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...
) #Include combining diacritical marks for tone marks (as do VSCII-2 and
Windows-1258 Windows-1258 is a code page used in Microsoft Windows to represent Vietnamese texts. It makes use of combining diacritical marks. Windows-1258 is compatible with neither the Vietnamese standard ( TCVN 5712 / VSCII), nor the various other encodin ...
) or for diacritics in general (as do ANSEL and
VNI VNI Software Company is a developer of various education, entertainment, office, and utility computer software, software packages. They are known for developing an Character encoding, encoding (VNI encoding) and a popular input method (VNI Input) ...
) #Replace some ASCII punctuation, preferably punctuation which is not invariant in
ISO 646 ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in ...
(as does VNI for DOS) #Replace at least six of the basic ASCII
control character In computing and telecommunication, a control character or non-printing character (NPC) is a code point (a number) in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than the ...
s (as do VPS and VSCII-1) VISCII went for the last option, replacing six of the least problematic (e.g., least likely to be recognised by an application and acted on specially)
C0 control codes The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, ...
(STX, ENQ, ACK, DC4, EM, and RS) with six of the least-used uppercase letter-diacritic combinations. While this option may cause programs that use those control codes to malfunction when handling VISCII text, it creates fewer complications than the other two options (the designers note that non-
8-bit clean ''8-bit clean'' is an attribute of computer systems, communication channels, and other devices and software, that handle 8-bit character encodings correctly. Such encoding include the ISO 8859 series and the UTF-8 encoding of Unicode. History ...
transmission had been found to pose more difficulty in practice than the control character re-use). Nonetheless, locations of both C0 or C1 control characters and the codes used for the non-breaking space in ISO-8859-1,
Mac OS Roman Mac OS Roman is a character encoding created by Apple Computer, Inc. for use by Macintosh computers. It is suitable for representing text in English and several other Western languages. Mac OS Roman encodes 256 characters, the first 128 of which ...
and
OEM-US Code page 437 (CCSID 437) is the character set of the original IBM PC (personal computer). It is also known as CP437, OEM-US, OEM 437, PC-8, or DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (diacri ...
were deliberately assigned to uppercase letters, with the intention of making use of lowercase codepoints with an all-capital font a serviceable workaround if graphical characters could not be displayed for those codes. However, using up all the extended
code point In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but ...
s for accented letters left no room to add useful symbols, superscripted numbers, curved quotes, proper dashes, etc., like most other extended ASCII character sets. Location of characters deliberately mostly follows ISO-8859-1 where there are characters in common between the two code pages (the uppercase Õ being noted as an exception), motivated by user friendliness concerns.


Support

VISCII is partially supported by th
TriChlor Software Group
in California, which has released various VISCII-compliant software packages, libraries, and fonts for MS-DOS and Windows, Unix, and Macintosh. VISCII-compliant software is available at man
FTP sites
VISCII was historically offered as an encoding for outgoing
email Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic ( digital) version of, or counterpart to, mail, at a time when "mail" mean ...
by
Mozilla Thunderbird Mozilla Thunderbird is a free and open-source cross-platform email client, personal information manager, news client, RSS and chat client developed by the Mozilla Foundation and operated by subsidiary MZLA Technologies Corporation. The projec ...
. It was also supported by the Windows Vietnamese keyboard software, WinVNKey, created by Christopher Cuong T. Nguyen and later upgraded through various Windows versions by Hoc D. Ngo and others. VISCII was mostly used by overseas Vietnamese speakers, with VSCII (TCVN) being more popular in northern Vietnam and
VNI VNI Software Company is a developer of various education, entertainment, office, and utility computer software, software packages. They are known for developing an Character encoding, encoding (VNI encoding) and a popular input method (VNI Input) ...
being more popular in southern Vietnam.


Character set


See also

*
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
* Vietnamese Quoted-Readable (VIQR) *
Vietnamese Standard Code for Information Interchange VSCII (Vietnamese Standard Code for Information Interchange), also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietname ...
(VSCII) *
Windows-1258 Windows-1258 is a code page used in Microsoft Windows to represent Vietnamese texts. It makes use of combining diacritical marks. Windows-1258 is compatible with neither the Vietnamese standard ( TCVN 5712 / VSCII), nor the various other encodin ...


References


Further reading

* *https://www.math.nmsu.edu/~mleisher/Software/csets/VISCII.TXT


External links

* - Conventions for Encoding the Vietnamese Language
Vietnamese-Standardization Working Group
based in California
VISCII-compliant software and fonts for MS-DOS and WindowsVISCII-compliant software, libraries, and fonts for UnixWinVNKey
Vietnamese keyboard driver for Windows supporting multinational character sets, including VISCII
MacVNKey
VISCII-compliant keyboard driver for Macintosh classic {{Character encodings Character sets Vietnamese writing systems