Variation Selector
   HOME

TheInfoList



OR:

A variant form is a different glyph for a character, encoded in
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
through the mechanism of variation sequences: sequences in Unicode that consist of a base character followed by a variation selector character. A variant form usually has a very similar appearance and meaning as its base form. The mechanism is intended for variant forms where, generally, if the variant form is unavailable, displaying the base character does not change the meaning of the text, and may not even be noticeable by many readers. Unicode defines two types of variation sequences: * ''Standardized variation sequences'' defined in StandardizedVariants.txt * ''Ideographic variation sequences'' defined in the Ideographic Variation Database (IVD) Variation selector characters reside in several Unicode blocks: *
Variation Selectors Variation Selectors is the block name of a Unicode code point block containing 16 variation selectors. Each variation selector is used to specify a specific glyph variant for a preceding character. They are currently used to specify standardize ...
(16 characters abbreviated VS1–VS16) *
Variation Selectors Supplement Variation Selectors Supplement is a Unicode block containing additional Variation Selectors beyond those found in the Variation Selectors Variation Selectors is the block name of a Unicode code point block containing 16 variation selectors. ...
(240 characters abbreviated VS17–VS256) * Mongolian (3 characters abbreviated FVS1–FVS3) Variation selectors are not required for Arabic and Latin cursive characters, where substitution of glyphs can occur based on context: glyphs may be connected together depending on whether the character is the initial character in a word, the final character, a medial character or an isolated character. These types of glyph substitution are easily handled by the context of the character with no other authoring input involved. Authors may also use special-purpose characters such as joiners and non-joiners to force an alternate form of glyph where it would not otherwise appear. Ligatures are similar instances where glyphs may be substituted simply by turning ligatures on or off as a
rich text Rich may refer to: Common uses * Rich, an entity possessing wealth * Rich, an intense flavor, color, sound, texture, or feeling ** Rich (wine), a descriptor in wine tasting Places United States * Rich, Mississippi, an unincorporated comm ...
attribute. For other glyph substitution, the author's intent may need to be encoded with the text and cannot be determined contextually. This is the case with character/glyphs referred to as
gaiji are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subseque ...
, where different glyphs are used for the same character either historically or for ideographs for family names. This is one of the gray areas in distinguishing between a glyph and a character: If a family name differs slightly from the ideograph character it derives from, then is that a simple glyph variant or a character variant? Character substitutions may also occur outside of Unicode, for example with
OpenType OpenType is a format for scalable computer fonts. It was built on its predecessor TrueType, retaining TrueType's basic structure and adding many intricate data structures for prescribing typographic behavior. OpenType is a registered trademark ...
Layout tags.


Blocks with standardized variation sequences

As of Unicode 15.0, standardized variation sequences specifically for emoji/text presentation are defined for base characters in twenty blocks: * Arrows * Basic Latin *
CJK Symbols and Punctuation CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character. Block The block has variation sequences defined for East ...
* Dingbats * Emoticons * Enclosed Alphanumeric Supplement *
Enclosed Alphanumerics Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop. It is currently fully allocated. Within the Basic Multilingual Plane, ...
*
Enclosed CJK Letters and Months Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alpha ...
*
Enclosed Ideographic Supplement Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more ...
*
General Punctuation General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic a ...
*
Geometric Shapes Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF. U+25A0–U+25CF The BLACK CIRCLE is displayed when typing in a password field, in order to hide characters from a screen recorder or shoulder surfing. U+2 ...
*
Latin-1 Supplement The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. Thi ...
*
Letterlike Symbols Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not expl ...
*
Mahjong Tiles Mahjong tiles () are tiles of Chinese origin that are used to play mahjong as well as mahjong solitaire and other games. Although they are most commonly tiles, they may refer to playing cards with similar contents as well. Development The ...
*
Miscellaneous Symbols Miscellaneous Symbols is a Unicode block (U+2600–U+26FF) containing glyphs representing concepts from a variety of categories: astrological, astronomical, chess, dice, musical notation, political symbols, recycling, religious symbols, trigr ...
*
Miscellaneous Symbols and Arrows Miscellaneous Symbols and Arrows is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and ...
*
Miscellaneous Symbols and Pictographs Miscellaneous Symbols and Pictographs is a Unicode block containing meteorological and astronomical symbols, emoji characters largely for compatibility with Japanese telephone carriers' implementations of Shift JIS, and characters originally fro ...
*
Miscellaneous Technical Miscellaneous Technical is a Unicode block ranging from U+2300 to U+23FF, which contains various common symbols which are related to and used in the various technical, programming language, and academic professions. For example: * Symbol ⌂ (H ...
* Supplemental Arrows-B *
Transport and Map Symbols Transport and Map Symbols is a Unicode block containing transportation and map icons, largely for compatibility with Japanese telephone carriers' emoji implementations of Shift JIS, and to encode characters in the Wingdings and Wingdings 2 char ...
Other standardized variation sequences are formed with base characters in the following fourteen blocks: *
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
* CJK Unified Ideographs Extension A *
CJK Unified Ideographs Extension B CJK Unified Ideographs Extension B is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and ...
*
Egyptian Hieroglyph Format Controls Egyptian Hieroglyph Format Controls is a Unicode block containing formatting characters that enable full formatting of quadrats for Egyptian hieroglyphs Egyptian hieroglyphs (, ) were the formal writing system used in Ancient Egypt, used for ...
*
Egyptian Hieroglyphs Egyptian hieroglyphs (, ) were the formal writing system used in Ancient Egypt, used for writing the Egyptian language. Hieroglyphs combined logographic, syllabic and alphabetic elements, with some 1,000 distinct characters.There were about 1, ...
*
Halfwidth and Fullwidth Forms In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlik ...
* Manichaean *
Mathematical Alphanumeric Symbols Mathematical Alphanumeric Symbols is a Unicode block comprising styled forms of Latin and Greek letters and decimal digits that enable mathematicians to denote different notions with different letter styles. The letters in various fonts ofte ...
*
Mathematical Operators Mathematical Operators is a Unicode block containing characters for mathematical, logical, and set notation. Notably absent are the plus sign (+), greater than sign (>) and less than sign (<), due to them already appearing in the Basi ...
* Mongolian *
Myanmar Myanmar, ; UK pronunciations: US pronunciations incl. . Note: Wikipedia's IPA conventions require indicating /r/ even in British English although only some British English speakers pronounce r at the end of syllables. As John Wells explai ...
*
Myanmar Extended-A Myanmar Extended-A is a Unicode block containing Myanmar characters for writing the Khamti Shan The Tai Khamti, ( Khamti: တဲး ၵံးတီႈ, ( th, ชาวไทคำตี่, my, ခန္တီးရှမ်းလူမ ...
* Phags-pa *
Supplemental Mathematical Operators Supplemental Mathematical Operators is a Unicode block containing various mathematical symbols, including N-ary operators, summations and integrals, intersections and unions, logical and relational operators, and subset/superset relations. Block ...


Blocks with ideographic variation sequences

, ideographic variation sequences are defined for base characters in nine blocks: * CJK Compatibility Ideographs *
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
* CJK Unified Ideographs Extension A *
CJK Unified Ideographs Extension B CJK Unified Ideographs Extension B is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and ...
*
CJK Unified Ideographs Extension C __FORCETOC__ CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese. The block has dozens of ideographic variation sequences registered in the Unicode Ide ...
*
CJK Unified Ideographs Extension D CJK Unified Ideographs Extension D is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and d ...
*
CJK Unified Ideographs Extension E CJK Unified Ideographs Extension E is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and d ...
*
CJK Unified Ideographs Extension F CJK Unified Ideographs Extension F is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, as well as more than a thousand Sawndip characters for writing the Zhuang language The Zhuang la ...
*
CJK Unified Ideographs Extension H __FORCETOC__ CJK Unified Ideographs Extension H is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, Sawndip, and Vietnamese. Block History The following Unicode-related documents record the purpo ...


See also

*
Unicode control characters Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character ( ) is used in C-programming application environments ...
*
Variant Chinese characters Variant Chinese characters (; Kanji: ; Hepburn: ''itaiji''; ; Revised Romanization: ''icheja'') are Chinese characters that are homophones and synonyms. Most variants are allographs in most circumstances, such as casual handwriting. Some contexts ...
* List of typographic features


References

{{Unicode navigation Unicode