HOME

TheInfoList



OR:

In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in
Taiwan Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the no ...
and
Hong Kong Hong Kong ( (US) or (UK); , ), officially the Hong Kong Special Administrative Region of the People's Republic of China (abbr. Hong Kong SAR or HKSAR), is a List of cities in China, city and Special administrative regions of China, special ...
: 全形; in CJK: 全角) and halfwidth (in
Taiwan Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the no ...
and
Hong Kong Hong Kong ( (US) or (UK); , ), officially the Hong Kong Special Administrative Region of the People's Republic of China (abbr. Hong Kong SAR or HKSAR), is a List of cities in China, city and Special administrative regions of China, special ...
: 半形; in CJK: 半角) characters. Unlike monospaced fonts, a halfwidth character occupies half the width of a fullwidth character, hence the name. ''
Halfwidth and Fullwidth Forms In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlike ...
'' is also the name of a
Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ...
U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode.


Rationale

In the days of
text mode Text mode is a computer display mode in which content is internally represented on a computer screen in terms of characters rather than individual pixels. Typically, the screen consists of a uniform rectangular grid of ''character cells'', each ...
computing, Western characters were normally laid out in a grid on the screen, often 80 columns by 24 or 25 lines. Each character was displayed as a small
dot matrix A dot matrix is a 2-dimensional patterned array, used to represent characters, symbols and images. Most types of modern technology use dot matrices for display of information, including mobile phones, televisions, and printers. The system is al ...
, often about 8
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the ...
s wide, and a
SBCS SBCS, or Single Byte Character Set, is used to refer to character encodings that use exactly one byte for each graphic character. An SBCS can accommodate a maximum of 256 symbols, and is useful for scripts that do not have many symbols or accented ...
(single-byte character set) was generally used to encode characters of Western languages. For aesthetic reasons and readability, it is preferable for
Han character Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji' ...
s to be approximately square-shaped, thusly twice as wide as these fixed-width SBCS characters. As these were typically encoded in a
DBCS A double-byte character set (DBCS) is a character encoding in which either all characters (including control characters) are encoded in two bytes, or merely every graphic character not representable by an accompanying single-byte character set ...
(double-byte character set) this also meant that their width on screen in a
duospaced font A duospaced font (also called a duospace font) is a fixed-width font whose letters and characters occupy either of two integer multiples of a specified, fixed horizontal space. Traditionally, this means either a single or double character width, al ...
was proportional to their byte length. Some terminals and editing programs could not deal with double-byte characters starting at odd columns, only even ones (some could not even put double-byte and single-byte characters in the same line). So the DBCS sets generally included Roman characters and digits also, for use alongside the CJK characters in the same line. On the other hand, early Japanese computing used a single-byte code page called
JIS X 0201 JIS X 0201, a Japanese Industrial Standard developed in 1969 (then called JIS C 6220 until the JIS category reform), was the first Japanese electronic character set to become widely used. It is either a 7-bit encoding or an 8-bit encoding, altho ...
for
katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...
. These would be rendered at the same width as the other single-byte characters, making them
half-width kana are katakana characters displayed compressed at half their normal width (a 1:2 aspect ratio), instead of the usual square (1:1) aspect ratio. For example, the usual (full-width) form of the katakana ''ka'' is カ while the half-width form is カ. ...
characters rather than normally proportioned kana. Although the JIS X 0201 standard itself did not specify half-width display for katakana, this became the visually distinguishing feature in
Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjuncti ...
between the single-byte JIS X 0201 and double-byte
JIS X 0208 JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standards, Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. Th ...
katakana. Some IBM code pages used a similar treatment for Korean jamo, based on the N-byte Hangul code and its
EBCDIC Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight- bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding ...
translation.


In Unicode

For compatibility with existing character sets that contained both half- and fullwidth versions of the same character,
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
allocated a single block at U+FF00–FFEF containing the necessary "alternative width" characters. This includes a fullwidth version of all the
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
characters and some non-ASCII punctuation such as the Yen sign, halfwidth versions of katakana and
hangul The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The l ...
, and halfwidth versions of some other symbols such as circles. Only characters needed for lossless round trip to existing character sets were allocated, rather than (for instance) making a fullwidth version of every Latin accented character. Unicode assigns ''every'' code point an "East Asian width"
property Property is a system of rights that gives people legal control of valuable things, and also refers to the valuable things themselves. Depending on the nature of the property, an owner of property may have the right to consume, alter, share, r ...
. This may be:
Terminal emulator A terminal emulator, or terminal application, is a computer program that emulates a video terminal within some other display architecture. Though typically synonymous with a shell or text terminal, the term ''terminal'' covers all remote term ...
s can use this property to decide whether a character should consume one or two "columns" when figuring out tabs and cursor position.


In OpenType

OpenType OpenType is a format for scalable computer fonts. It was built on its predecessor TrueType, retaining TrueType's basic structure and adding many intricate data structures for prescribing typographic behavior. OpenType is a registered trademark ...
has the "fwid", "halt", "hwid" and "vhal" feature tags to be used for providing fullwidth or halfwidth form of a character.


See also

*
Han unification Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a featur ...
* East Asian punctuation * Em size – full width forms *
Hangul Jamo (Unicode block) Hangul Jamo ( ko, 한글 자모, ) is a Unicode block containing positional (''choseong'', ''jungseong'', and ''jongseong'') forms of the Hangul consonant and vowel clusters. They can be used to dynamically compose syllables that are not avail ...
*
Katakana (Unicode block) Katakana is a Unicode block containing katakana characters for the Japanese and Ainu languages. Block History The following Unicode-related documents record the purpose and process of defining specific characters in the Katakana block: See ...
*
Latin script in Unicode Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with co ...
*
Enclosed Alphanumerics Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop. It is currently fully allocated. Within the Basic Multilingual Plan ...
– bullet point sequences, some appear as full width (e.g. ⒈,⓵,⑴,⒜,ⓐ)


References


External links


East Asian Width
Unicode Standard Annex #11 {{Unicode navigation Kana *Halfwidth