HOME

TheInfoList



OR:

The Hong Kong Supplementary Character Set (; commonly abbreviated to HKSCS) is a set of
Chinese character Chinese characters are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represent the only on ...
s – 4,702 in total in the initial release—used in
Cantonese Cantonese is the traditional prestige variety of Yue Chinese, a Sinitic language belonging to the Sino-Tibetan language family. It originated in the city of Guangzhou (formerly known as Canton) and its surrounding Pearl River Delta. While th ...
, as well as when writing the names of some places in Hong Kong (whether in written Cantonese or standard written Chinese sentences). It evolved from the preceding Government Chinese Character Set () or GCCS. GCCS is a set of supplementary
Chinese character Chinese characters are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represent the only on ...
s coded in the user-defined areas of the Big5 character set. It was originally used within the
Hong Kong Government The Government of the Hong Kong Special Administrative Region (commonly known as the Hong Kong Government or HKSAR Government) is the Executive (government), executive authorities of Hong Kong. It was established on 1 July 1997, following the ...
and later used by the public. It later evolved into Hong Kong Supplementary Character Set when the characters in the set were submitted to ISO-10646 for coding.


History and versions

The HKSCS has gone through a few iterations.


Big-5 extensions (1995–2009)

HKSCS versions up to HKSCS-2008 are encoded in Big5 (Big5-HKSCS, big5hk) and ISO 10646 (
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
).


GCCS

Due to the inherent differences between standard written Chinese and written Cantonese, the Government of Hong Kong recognised the need for a standardised set of ''proprietary'' characters that would allow for the streamlining of electronic communication; at the time, the Big5 Chinese encoding scheme did not contain a vast majority of these characters (some were erroneously cross-listed with similar characters). The Government Chinese Character Set () or GCCS was thus developed by the government. The character set consists of Chinese characters commonly used in Hong Kong. Some characters are
Cantonese Cantonese is the traditional prestige variety of Yue Chinese, a Sinitic language belonging to the Sino-Tibetan language family. It originated in the city of Guangzhou (formerly known as Canton) and its surrounding Pearl River Delta. While th ...
-specific, while some are alternative forms of characters. The set is not well-organised and the characters are not closely examined.


HKSCS-1999

Subsequently, the HKSCS-1999 (HKSCS 1999 specification) was developed. 106 GCCS characters were removed in HKSCS-1999 as a result of unification, and their Big5 code points are reserved for compatibility. Found a
Mapping table - HKSCS-2008
/ref> Retired "not verifiable" GCCS characters are found in UTC Sources (UTC-00877–UTC-00898), where they are sourced from Adobe-CNS1-1, an Adobe-CNS1 supplement implemented to support GCCS.


HKSCS-2001 and HKSCS-2004

Following the acceptance of HKSCS-1999, newer revisions were released in 2001 (adding 116 new characters) and in 2004 (adding 123 new characters), totalling 4,941 characters. Starting from HKSCS-2004, all characters previously using the
Private Use Area In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the standard. Three Private Use Areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearly covering ...
(PUA) section of Unicode (via Microsoft's mapping of the Unicode PUA over the private-use ranges of Big5) are remapped, with many of them reassigned to characters in the Supplementary Ideographic Plane, such as in the
CJK Unified Ideographs Extension B CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for ...
or CJK Compatibility Ideographs Supplement Unicode blocks. However, to preserve compatibility with programs that generated PUA code points, the already-allocated code points are reserved, and no new characters will be mapped to the private use area.


HKSCS-2008

Since around 2005, many Hong Kong and
Macau Macau or Macao is a special administrative regions of China, special administrative region of the People's Republic of China (PRC). With a population of about people and a land area of , it is the most List of countries and dependencies by p ...
websites have switched encoding from Big5-HKSCS to Unicode, including HKGolden. The last edition of HKSCS to encode all of its characters in Big5 was HKSCS-2008.


Unicode subsets (2015 onwards)


HKSCS-2016

By 2015, efforts were underway in Hong Kong to migrate away from Big5-HKSCS and towards a defined subset of Unicode, at the time tentatively termed the ''Hong Kong Character Set (HKCS)''. This was planned to be published by the end of 2015 as "HKCS-2015", and to have four parts differentiated by different Unihan source prefixes: * Source prefix followed by Big5
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
:
character repertoire Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a c ...
of HKSCS-2008 in the narrow sense * Source prefix followed by Big5 hexadecimal: character repertoire of Big5-ETEN * Source prefix followed by a four-digit decimal incremental accession number: post-2008 vertical extensions (i.e. newly submitted
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Uni ...
) * Source prefix followed by hexadecimal Unicode code point: post-2008 horizontal extensions (i.e. the addition of a Hong Kong reference glyph and source reference to an existing CJK Unified Ideograph) In particular, 22 horizontal extensions were already planned for inclusion as of June 2015. These were all minor variants of existing Big5 characters containing the 昷/𥁕, 兑/兌 and 吿/告 components, for which Hong Kong font conventions were closer to those of
mainland China "Mainland China", also referred to as "the Chinese mainland", is a Geopolitics, geopolitical term defined as the territory under direct administration of the People's Republic of China (PRC) in the aftermath of the Chinese Civil War. In addit ...
than to those of
Taiwan Taiwan, officially the Republic of China (ROC), is a country in East Asia. The main geography of Taiwan, island of Taiwan, also known as ''Formosa'', lies between the East China Sea, East and South China Seas in the northwestern Pacific Ocea ...
, but for which the preferred versions had been encoded separately from the Big5 versions in the Unified Repertoire and Ordering due to the source separation rule. Of these 22 characters, 14 were considered "core" characters for Hong Kong use. By November, a total of 78 requested additions to HKSCS had been received by the Hong Kong government, all of which already existed in Unicode. By this point, the planned horizontal extension was being referred to as an "HKSCS" version, rather than as "HKCS". Ultimately, HKSCS-2016 added a total of 24 characters relative to HKSCS-2008, 22 of which were the source-separated variants of existing Big5 characters. Since all 24 characters already existed in Unicode, all received a source reference prefixed with ; the source prefix was not used. As such, the characters added in HKSCS-2016 are referenced to Unicode only, and were not added to the Big5 extension.


Macao Supplementary Character Set

Similarly to Hong Kong's situation, there are also characters that are needed by
Macau Macau or Macao is a special administrative regions of China, special administrative region of the People's Republic of China (PRC). With a population of about people and a land area of , it is the most List of countries and dependencies by p ...
but included in neither Big5 nor HKSCS, hence, the ''Macao Supplementary Character Set'' was developed, building on HKSCS with additional Unicode-mapped characters. The first batch of 121 MSCS characters were submitted for addition to or horizontal extension in Unicode (as appropriate) in 2009. At the time, the term ''Macao Information Systems Character Set'' (''MISCS'') was in use for the entire character set, while "MSCS" referred more narrowly to the additional characters only. The first final version of MSCS, MSCS-2020, was established in 2021, and uses the following Unihan source prefixes. Although the potential scope of these source prefixes collectively comprises a superset of HKSCS-2016, "MSCS" in a strict sense does not cover the Big5 or HKSCS characters (since it is intended to be combined with HKSCS) except those which are used as base characters for ideographic variation sequences. * Source prefix followed by Big5 hexadecimal: character repertoire of HKSCS-2008 in the narrow sense (same as ) * Source prefix followed by level number and Big5 hexadecimal: character repertoire of Big5-ETEN (same as ) * Source prefix followed by a five-digit decimal incremental accession number: Macau-specific vertical extensions. This prefix had initially been in 2009, but was shortened to for vertical extensions in 2020, while horizontal extensions had their source references replaced with references. * Source prefix followed by hexadecimal Unicode code point: Macau-specific horizontal extensions * Source prefix followed by hexadecimal Unicode code point: horizontal extensions from HKSCS-2016 (same as ) * Source prefix followed by hexadecimal Unicode code point and a three-digit decimal number: used for variation sequences registered in the
Ideographic Variation Database A variant form is an alternate glyph for a character, encoded in Unicode through the mechanism of variation sequences: sequences in Unicode that consist of a base character followed by a variation selector character. A variant form usually has a v ...
(IVD)


Compatibility


Operating systems

In
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
98, NT 4.0, 2000, XP, HKSCS support can be enabled using Microsoft's patch. In Microsoft's implementation, application using
code page 950 Code page 950 is the code page used on Microsoft Windows for Traditional Chinese. It is Microsoft's implementation of the '' de facto'' standard Big5 character encoding. The code page is not registered with IANA, and hence, it is not a standard t ...
automatically uses a hidden code page 951 table for the Big5 encoding of the HKSCS extensions. The table supports all code points in HKSCS-2001, except for the compatibility code points specified by the standard. In addition, the MingLiU font is altered using Microsoft's patch. This patch is known to create conflicts in applications such as
Microsoft Office Microsoft Office, MS Office, or simply Office, is an office suite and family of client software, server software, and services developed by Microsoft. The first version of the Office suite, announced by Bill Gates on August 1, 1988, at CO ...
, or any application using fonts supporting
simplified Chinese characters Simplified Chinese characters are one of two standardized Chinese characters, character sets widely used to write the Chinese language, with the other being traditional characters. Their mass standardization during the 20th century was part of ...
(e.g.: SimSun). If the target environment contains custom font mapped to the code points affected by Microsoft's patch, the custom fonts can undo Microsoft's patch. Furthermore, the patch breaks EUDC Editor supplied with the affected versions of Windows. Starting with
Windows Vista Windows Vista is a major release of the Windows NT operating system developed by Microsoft. It was the direct successor to Windows XP, released five years earlier, which was then the longest time span between successive releases of Microsoft W ...
, HKSCS-2004 characters are only supported as Unicode 4.1 or later; however, HKSCS-2001 and HKSCS-1999 characters are supported as Big5-HKSCS and Unicode, but Big5-HKSCS is available only if set "Language for non-Unicode programs" to "Hong Kong" or "Macau". All characters are assigned standard, non- PUA codepoints. The characters are displayed with the MingLiU font, and these characters can be entered via the keyboard. The patch that provides Big5 encoding of HKSCS is unsupported in Windows Vista and later. A utility provided by Microsoft is available to convert HKSCS and Unicode PUA-encoded characters to Unicode 4.1 version. In 2010, Microsoft published a HKSCS-2004 patch for Windows XP and Windows Server 2003. It replaces Windows XP version of MingLiU, PMingLiU, and MingLiU_HKSCS (if HKSCS-2001 patch was applied) with Windows 7 version of MingLiU, PMingLiU and MingLiU_HKSCS. In addition, MingLiU-ExtB, MingLiU_HKSCS-ExtB and PMingLiU-ExtB fonts will be added onto target system. However, IME is not updated as it was in the case of HKSCS-2001 patch, and the fonts are from pre-release of Windows 7. For earlier versions of the OS, HKSCS support requires the use of Microsoft's patch, or the Hong Kong government's Digital 21's utilities. IBM assigns CCSID 5471 to the HKSCS-2001 Big5
code page In computing, a code page is a character encoding and as such it is a specific association of a set of printable character (computing), characters and control characters with unique numbers. Typically each number represents the binary value in a s ...
(with CPGID 1374 as CCSID 5470 as the double byte component), CCSID 9567 to the HKSCS-2004 code page (with CPGID 1374 as CCSID 9566 as the double byte component), and CCSID 13663 to the HKSCS-2008 code page (with CPGID 1374 as CCSID 13662 as the double byte component), while CCSID 1375 (with CPGID 1374 as CCSID 1374 as its double byte component) is assigned to a growing HKSCS code page, currently equivalent to CCSID 13663. HKSCS support was added to
glibc The GNU C Library, commonly known as glibc, is the GNU Project implementation of the C standard library. It provides a wrapper around the system calls of the Linux kernel and other kernels for application use. Despite its name, it now also dir ...
in 2000, but it has not been updated since then. HKSCS-2004 support is handled as Unicode 4.1 and later. For freedesktop.org setup, ''AR PL ShanHeiSun Uni'' font fully supports HKSCS-2004 since 0.1-0.dot.1, with latest revision of HKSCS-2004 supported in version 0.1.20060903-1. Modern desktop distributions (e.g. Ubuntu) include Arphic Technology's HKSCS-compliant UKai and UMing fonts out of the box when Traditional Chinese Language support is selected during installation. They can also be installed manually at a later time.
Mac OS X macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
10.0–10.2 supports HKSCS-1999. 10.3–10.4 supports HKSCS-2001. Some of the letters added to HKSCS-2004 is supported via Unicode PUA in OS X 10.4. Starting with OS X 10.5, all the HKSCS-2004 characters are supported via standard Unicode 4.1 code points.


Applications and the Web

Mozilla Mozilla is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, publishes and supports Mozilla products, thereby promoting free software and open standards. The community is supported institution ...
1.5 and above supports HKSCS, with HKSCS-2004 support added into Gecko 1.8.1 code base. Unlike the above-mentioned patch, Mozilla uses its own code page table. However, the fix for bug 343129 does not support characters mapped to code points above Basic Multilingual Plane. QT 3.x-based applications (e.g.:
KDE KDE is an international free software community that develops free and open-source software. As a central development hub, it provides tools and resources that enable collaborative work on its projects. Its products include the KDE Plasma gra ...
) only support characters mapped to code points FFFF or lower. In QT4, characters outside BMP are supported via surrogates. Big5-HKSCS Text Codec supports HKSCS-1999 back in Qt-2.3.x, but it was too late in Qt development schedule to be officially included in the Qt-2.3.x series, so it was officially supported in Qt-3.0.1. HKSCS-2001 support was added in Qt-3.0.5.
GNOME A gnome () is a mythological creature and diminutive spirit in Renaissance magic and alchemy, introduced by Paracelsus in the 16th century and widely adopted by authors, including those of modern fantasy literature. They are typically depict ...
supports HKSCS characters in Unicode ranges, except those mapped to the Basic Multilingual Plane compatibility block. Patches to support characters mapped to above Basic Multilingual Plane was introduced during Pango 1.1. The
WHATWG The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, ...
Encoding Standard (used by
HTML5 HTML5 (Hypertext Markup Language 5) is a markup language used for structuring and presenting hypertext documents on the World Wide Web. It was the fifth and final major HTML version that is now a retired World Wide Web Consortium (W3C) recommend ...
) includes HKSCS in its definition of Big5 (used even with the plain Big5 label). However, only its decoder uses all HKSCS extensions, while its encoder explicitly excludes those with lead bytes below 0xA1 (thus excluding most of the HKSCS extensions but including, for example, those inherited from Big5 ETEN). Newer browsers follow this standard, including
Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements curr ...
.


See also

*
Cantonese Cantonese is the traditional prestige variety of Yue Chinese, a Sinitic language belonging to the Sino-Tibetan language family. It originated in the city of Guangzhou (formerly known as Canton) and its surrounding Pearl River Delta. While th ...
* Written Cantonese


References


External links


Hong Kong Government site on the HKSCS
Downloadable HKSCS documents & font (Wayback Machine snapshot) *
Common Chinese Language Interface
Digital Policy Office of the Hong Kong Government
Microsoft HKSCS Support for Windows Platform
(Wayback Machine snapshot)

Download page of Dynalab ()'s HKSCS font.
Graphical View of Big5-HKSCS in ICU's Converter Explorer

A character set that works on Mac OS X

UMing/UKai – A free, open-source font supporting HKSCS

Open Source Hong Kong Fonts Project
{{Use dmy dates, date=July 2014 Culture of Hong Kong Cantonese language Chinese character encodings