ARIB STD B24 Character Set
Volume 1 of the Association of Radio Industries and Businesses ( ARIB) STD-B24 standard for Broadcast Markup Language specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on . The latest revision is version 6.3 as of . It includes a number of not found in the base standards ( JIS X 0208 and JIS X 0201). It was the source standard for many symbol characters which were added to Unicode, including portions of the Miscellaneous Symbols, Enclosed Alphanumeric Supplement and Enclosed Ideographic Supplement blocks. Its contributions partially overlap the Unicode emoji, but were added a year earlier, in Unicode 5.2. Fascicle 1 of the ARIB STD-B62 standard, published in 2014, defines Unicode mappings for a selection of the B24 extended characters (excluding, for example, those duplicated by JIS X 0213), as well as a few extended Kanji. It also includes a mapping of utilised characters outside the Basic Multilingual Pl ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
JIS X 0201
JIS X 0201, a Japanese Industrial Standards, Japanese Industrial Standard developed in 1969, was the first Japanese electronic character set to become widely used. The character set was initially known as JIS C 6220 before the JIS category reform. Its two forms were a 7-bit encoding or an 8-bit encoding, although the 8-bit form was dominant until Unicode (specifically UTF-8) replaced it. The full name of this standard is ''7-bit and 8-bit coded character sets for information interchange'' (). The first 96 codes comprise an ISO 646 variant, mostly following ASCII with some differences, while the second 96 character codes represent the phonetic Japanese katakana signs. Since the encoding does not provide any way to express hiragana or kanji, it is only capable of expressing simplified written Japanese. Nevertheless, this simplification can represent the full range of sounds in the language. In the 1970s, this was acceptable for media such as text mode computer terminals, telegrams, r ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Enclosed Alphanumeric Supplement
Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane. The block is mostly an extension of the Enclosed Alphanumerics block, containing further enclosed alphanumeric characters which are not included in that block or Enclosed CJK Letters and Months. Most of the characters are single alphanumerics in boxes or circles, or with trailing commas. Two of the symbols are identified as dingbats. A number of multiple-letter enclosed abbreviations are also included, mostly to provide compatibility with Broadcast Markup Language standards (see ARIB STD B24 character set) and Japanese telecommunications networks' emoji sets. The block also includes the regional indicator symbols to be used for emoji country flag support. Block Emoji The Enclosed Alphanumeric Supplement ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Space Character
A whitespace character is a character data element that represents white space when text is rendered for display by a computer. For example, a ''space'' character (, ASCII 32) represents blank space such as a word divider in a Western script. A printable character results in output when rendered, but a whitespace character does not. Instead, whitespace characters define the layout of text to a limited degree, interrupting the normal sequence of rendering characters next to each other. The output of subsequent characters is typically shifted to the right (or to the left for right-to-left script) or to the start of the next line. The effect of multiple sequential whitespace characters is cumulative such that the next printable character is rendered at a location based on the accumulated effect of preceding whitespace characters. The origin of the term ''whitespace'' is rooted in the common practice of rendering text on white paper. Normally, a whitespace character is ''n ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
ISO-IR-137
The character sets used by Videotex are based, to greater or lesser extents, on ISO/IEC 2022. Three Data Syntax systems are defined by ITU T.101, corresponding to the Videotex systems of different countries. Data Syntax 1 Data Syntax 1 is defined in Annex B of T.101:1994. It is based on the CAPTAIN system used in Japan. Its graphical sets include JIS X 0201 and JIS X 0208. The following G-sets are available through ISO/IEC 2022-based designation escapes: Mosaic sets for Data Syntax 1 The mosaic sets supply characters for use in semigraphics. � Not in Unicode Data Syntax 2 Data Syntax 2 is defined in Annex C of T.101:1994. It corresponds to some European Videotex systems such as CEPT T/CD 06-01. The graphical character coding of Data Syntax 2 is based on T.51. The default G2 set of Data Syntax 2 is based on an older version of T.51, lacking the non-breaking space, soft hyphen, not sign ( ¬) and broken bar ( ¦) present in the current version, but adding a ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Pseudographics
Text-based semigraphics, pseudographics, or character graphics is a primitive method used in early text mode video hardware to emulate raster graphics without having to implement the logic for such a display mode. There are two different ways to accomplish the emulation of raster graphics. The first one is to create a low-resolution Dot matrix#All points addressable, all points addressable mode using a set of special character (computing), characters with all binary combinations of a certain subdivision matrix of the text mode character size; this method is referred to as block graphics, or sometimes mosaic graphics. The second one is to use special shapes instead of glyphs (letters and figures) that appear as if drawn in raster graphics mode, sometimes referred to as semi- or pseudo-graphics; an important example of this is box-drawing characters. Semigraphical characters (including some block elements) are still incorporated into the video BIOS, BIOS of any VGA compatible vi ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
JISCII
Code page 895 ( CCSID 895) is a 7-bit character set and is Japan's national ISO 646 variant. It is the Roman set (first or left half) of the JIS X 0201 (formerly JIS C 6220) Japanese Standard and is variously called Japan 7-Bit Latin, JISCII, JIS Roman, JIS C6220-1969-ro, ISO646-JP or Japanese-Roman. Its ISO-IR registration number is 14. Amongst IBM's code pages, it accompanies code page 896 (half-width katakana), which encodes the Kana set of JIS X 0201 with extensions, and code page 897 which encodes the 8-bit form of JIS X 0201. It is used in Unix-like systems and, when combined with code page 896 and the 2-byte IBM code page 952 and code page 953, makes up the four code-sets of code page 954, one of IBM's versions of EUC-JP. Codepage layout See also * Shift JIS Shift JIS (also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjun ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
ISO-2022-JP
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard in the field of character encoding. It is equivalent to the Ecma International, ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994. ISO 2022 specifies a general structure which character encodings can conform to, dedicating particular ranges of bytes (Hexadecimal, 0x00–1F and 0x7F–9F) to be used for non-printing C0 and C1 control codes, control codes for formatting and in-band instructions (such as newline, line breaks or formatting instructions for text terminals), rather than graphic character, graphical characters. It also specifies a syntax for escape sequences, multiple-byte sequences beginning with the control code, which can likewise be used for in-band instr ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Pseudographics
Text-based semigraphics, pseudographics, or character graphics is a primitive method used in early text mode video hardware to emulate raster graphics without having to implement the logic for such a display mode. There are two different ways to accomplish the emulation of raster graphics. The first one is to create a low-resolution Dot matrix#All points addressable, all points addressable mode using a set of special character (computing), characters with all binary combinations of a certain subdivision matrix of the text mode character size; this method is referred to as block graphics, or sometimes mosaic graphics. The second one is to use special shapes instead of glyphs (letters and figures) that appear as if drawn in raster graphics mode, sometimes referred to as semi- or pseudo-graphics; an important example of this is box-drawing characters. Semigraphical characters (including some block elements) are still incorporated into the video BIOS, BIOS of any VGA compatible vi ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Private Use Areas
In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the standard. Three Private Use Areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearly covering, planes 15 and 16 (, ). They are intentionally left undefined so that third parties may assign their own characters without conflicting with Unicode Standard assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions. Assignments to private-use code points need not be "private" in the sense of strictly internal to an organisation; a number of assignment schemes have been published by several organisations. Such publication may include a font that supports the definition (showing the glyphs), and software making use of the private-use characters (e.g., a graphics character for a "print document" function). By definition, multiple private parties may assign d ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Basic Multilingual Plane
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+''hhhhhh''). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version , five of the planes have assigned code points (characters), and seven are named. The limit of 17 planes is due to UTF-16, which can encode 220 code points (16 planes) as pairs of words, plus the BMP as a single word. UTF-8 was designed with a much larger limit of 231 (2,147,483,648) code points (32,768 planes), and would still be able to encode 221 (2,097,152) code points (32 planes) even under the current limit of 4 bytes. The 17 planes can accommodate 1,114 ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
JIS X 0213
JIS X 0213 is a Japanese Industrial Standard defining coded character sets for encoding the characters used in Japan. This standard extends JIS X 0208. The first version was published in 2000 and revised in 2004 (JIS2004) and 2012. As well as adding a number of special characters, characters with diacritic marks, etc., it included an additional 3,625 kanji. The full name of the standard is . JIS X 0213 has two "planes" (94×94 character tables). Plane 1 is a superset of JIS X 0208 containing kanji sets level 1 to 3 and non-kanji characters such as Hiragana, Katakana (including letters used to write the Ainu language), Latin, Greek and Cyrillic alphabets, digits, symbols and so on. Plane 2 contains only level 4 kanji set. Total number of the defined characters is 11,233. Each character is capable of being encoded in two bytes. This standard largely replaced the rarely used JIS X 0212-1990 "supplementary" standard, which included 5,801 kanji and 266 non-kanji. Of the additional 3 ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |