UCS-PUP16
   HOME

TheInfoList



OR:

In the
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
standard, a plane is a contiguous group of 65,536 (216)
code point A code point, codepoint or code position is a particular position in a Table (database), table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dime ...
s. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
format (U+''hhhhhh''). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version , five of the planes have assigned code points (characters), and seven are named. The limit of 17 planes is due to
UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two ''code units''. UTF-16 arose from an earli ...
, which can encode 220 code points (16 planes) as pairs of
words A word is a basic element of language that carries meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguists on its ...
, plus the BMP as a single word.
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
was designed with a much larger limit of 231 (2,147,483,648) code points (32,768 planes), and would still be able to encode 221 (2,097,152) code points (32 planes) even under the current limit of 4
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s. The 17 planes can accommodate 1,114,112 code points. Of these, 2,048 are
surrogates ''Surrogates'' is a 2009 American science fiction action film based on the 2005–2006 comic book series ''The Surrogates''. Directed by Jonathan Mostow, it stars Bruce Willis as Tom Greer, an FBI agent who ventures out into the real world to ...
(used to make the pairs in UTF-16), 66 are non-characters, and 137,468 are reserved for private use, leaving 974,530 for public assignment. Planes are further subdivided into
Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ...
s, which, unlike planes, do not have a fixed size. The 338 blocks defined in Unicode cover 27% of the possible code point space, and range in size from a minimum of 16 code points (sixteen blocks) to a maximum of 65,536 code points (Supplementary Private Use Area-A and -B, which constitute the entirety of planes 15 and 16). For future usage, ranges of characters have been tentatively mapped out for most known current and ancient writing systems.


Overview


Assigned characters


Basic Multilingual Plane

The first plane, plane 0, the Basic Multilingual Plane (BMP), contains characters for almost all modern languages, and a large number of
symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise different concep ...
. A primary objective for the BMP is to support the unification of prior character sets as well as characters for
writing Writing is the act of creating a persistent representation of language. A writing system includes a particular set of symbols called a ''script'', as well as the rules by which they encode a particular spoken language. Every written language ...
. Most of the assigned code points in the BMP are used to encode Chinese, Japanese, and Korean ( CJK) characters. The High Surrogate (U+D800–U+DBFF) and Low Surrogate (U+DC00–U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a ''pair'' of 16-
bit The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...
codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character. 65,520 of the 65,536 code points in this plane have been allocated to a Unicode block, leaving just 16 code points in a single unallocated range (2FE0..2FEF). , the BMP comprises the following 164 blocks: * Alphabetic left-to-right scripts: ** Basic Latin (Lower half of
ISO/IEC 8859-1 ISO/IEC 8859-1:1998, ''Information technology—8-bit single-byte coded graphic character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987 ...
: ISO/IEC 646:1991-IRV aka
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
) (0000–007F) **
Latin-1 Supplement The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) – FF (U+00FF). C1 Controls (0080–009F) are not graphic. T ...
(Upper half of
ISO/IEC 8859-1 ISO/IEC 8859-1:1998, ''Information technology—8-bit single-byte coded graphic character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987 ...
) (0080–00FF) **
Latin Extended-A Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy characte ...
(0100–017F) **
Latin Extended-B Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version ...
(0180–024F) **
IPA Extensions IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs ...
(0250–02AF) **
Spacing Modifier Letters Spacing Modifier Letters is a Unicode block containing characters for the IPA, UPA, and other phonetic transcriptions. Included are the IPA tone marks, and modifiers for aspiration and palatalization. The word ''spacing'' indicates that these ...
(02B0–02FF) **
Combining Diacritical Marks Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character " Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actua ...
(0300–036F) **
Greek and Coptic Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally also used for writing Coptic, using the similar Greek letters in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Un ...
(0370–03FF) **
Cyrillic The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...
(0400–04FF) ** Cyrillic Supplement (0500–052F) **
Armenian Armenian may refer to: * Something of, from, or related to Armenia, a country in the South Caucasus region of Eurasia * Armenians, the national people of Armenia, or people of Armenian descent ** Armenian diaspora, Armenian communities around the ...
(0530–058F) * Semitic abjads and other right-to-left scripts: **
Hebrew Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
(0590–05FF) **
Arabic Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
(0600–06FF) **
Syriac Syriac may refer to: * Suret, a Neo-Aramaic language * Syriac alphabet, a writing system primarily used to write the Syriac language ** Syriac (Unicode block) ** Syriac Supplement * Syriac Christianity, a branch of Eastern Christianity * Syriac la ...
(0700–074F) ** Arabic Supplement (0750–077F) **
Thaana Thaana, Tãna, Taana or Tāna (  ) is the present writing system of the Maldivian language spoken in the Maldives. Thaana has characteristics of both an abugida (diacritics, vowel-killer strokes) and a true alphabet (all vowels are w ...
(0780–07BF) **
N'Ko NKo (ߒߞߏ), also spelled N'Ko, is an alphabetic script devised by Solomana Kante, Solomana Kanté in 1949, as a modern writing system for the Manding languages of West Africa. The term ''NKo'', which means ''I say'' in all Manding languages, i ...
(07C0–07FF) **
Samaritan Samaritans (; ; ; ), are an ethnoreligious group originating from the Hebrews and Israelites of the ancient Near East. They are indigenous to Samaria, a historical region of History of ancient Israel and Judah, ancient Israel and Judah that ...
(0800–083F) **
Mandaic Mandaic may refer to: * Mandaic language * Mandaic alphabet The Mandaic alphabet is a writing system primarily used to write the Mandaic language. It is thought to have evolved between the second and seventh century CE from either a cursive fo ...
(0840–085F) **
Syriac Supplement Syriac Supplement is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purp ...
(0860–086F) ** Arabic Extended-B (0870–089F) **
Arabic Extended-A Arabic Extended-A is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages.The Unicode ConsortiumThe Unicode Standard, Version 6.0.0 (Mountain View, CA: The Unicode Consortium, 2011. )Chapter 8/ref ...
(08A0–08FF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout South Asia, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used b ...
scripts: **
Devanagari Devanagari ( ; in script: , , ) is an Indic script used in the Indian subcontinent. It is a left-to-right abugida (a type of segmental Writing systems#Segmental systems: alphabets, writing system), based on the ancient ''Brāhmī script, Brā ...
(0900–097F) **
Bengali Bengali or Bengalee, or Bengalese may refer to: *something of, from, or related to Bengal, a large region in South Asia * Bengalis, an ethnic and linguistic group of the region * Bengali language, the language they speak ** Bengali alphabet, the w ...
(0980–09FF) **
Gurmukhi Gurmukhī ( , Shahmukhi: ) is an abugida developed from the Laṇḍā scripts, standardized and used by the second Sikh guru, Guru Angad (1504–1552). Commonly regarded as a Sikh script, Gurmukhi is used in Punjab, India as the official scrip ...
(0A00–0A7F) **
Gujarati Gujarati may refer to: * something of, from, or related to Gujarat, a state of India * Gujarati people, the major ethnic group of Gujarat * Gujarati language, the Indo-Aryan language spoken by them * Gujarati languages, the Western Indo-Aryan sub- ...
(0A80–0AFF) ** Oriya (0B00–0B7F) **
Tamil Tamil may refer to: People, culture and language * Tamils, an ethno-linguistic group native to India, Sri Lanka, and some other parts of Asia **Sri Lankan Tamils, Tamil people native to Sri Lanka ** Myanmar or Burmese Tamils, Tamil people of Ind ...
(0B80–0BFF) **
Telugu Telugu may refer to: * Telugu language, a major Dravidian language of South India ** Telugu literature, is the body of works written in the Telugu language. * Telugu people, an ethno-linguistic group of India * Telugu script, used to write the Tel ...
(0C00–0C7F) **
Kannada Kannada () is a Dravidian language spoken predominantly in the state of Karnataka in southwestern India, and spoken by a minority of the population in all neighbouring states. It has 44 million native speakers, and is additionally a ...
(0C80–0CFF) **
Malayalam Malayalam (; , ) is a Dravidian languages, Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry (union territory), Puducherry (Mahé district) by the Malayali people. It is one of ...
(0D00–0D7F) ** Sinhala (0D80–0DFF) ** Thai (0E00–0E7F) ** Lao (0E80–0EFF) **
Tibetan Tibetan may mean: * of, from, or related to Tibet * Tibetan people, an ethnic group * Tibetan language: ** Classical Tibetan, the classical language used also as a contemporary written standard ** Standard Tibetan, the most widely used spoken dial ...
(0F00–0FFF) **
Myanmar Myanmar, officially the Republic of the Union of Myanmar; and also referred to as Burma (the official English name until 1989), is a country in northwest Southeast Asia. It is the largest country by area in Mainland Southeast Asia and has ...
(1000–109F) * Other alphabetic or syllabic left-to-right scripts: ** Georgian (10A0–10FF) **
Hangul Jamo This is the list of Hangul ''jamo'' (Korean alphabet letters which represent consonants and vowels in Korean) including obsolete ones. This list contains Unicode code points. In the lists below, * code points in were added in .
(1100–11FF) ** Ethiopic (1200–137F) ** Ethiopic Supplement (1380–139F) **
Cherokee The Cherokee (; , or ) people are one of the Indigenous peoples of the Southeastern Woodlands of the United States. Prior to the 18th century, they were concentrated in their homelands, in towns along river valleys of what is now southwestern ...
(13A0–13FF) **
Unified Canadian Aboriginal Syllabics Unified Canadian Aboriginal Syllabics is a Unicode block containing syllabic characters for writing Inuktitut, Carrier, Cree (along with several of its dialect-specific characters), Ojibwe, Blackfoot and Canadian Athabascan languages. Additio ...
(1400–167F) **
Ogham Ogham (also ogam and ogom, , Modern Irish: ; , later ) is an Early Medieval alphabet used primarily to write the early Irish language (in the "orthodox" inscriptions, 4th to 6th centuries AD), and later the Old Irish language ( scholastic ...
(1680–169F) **
Runic Runes are the letters in a set of related alphabets, known as runic rows, runic alphabets or futharks (also, see '' futhark'' vs ''runic alphabet''), native to the Germanic peoples. Runes were primarily used to represent a sound value (a ...
(16A0–16FF) *
Philippine The Philippines, officially the Republic of the Philippines, is an Archipelagic state, archipelagic country in Southeast Asia. Located in the western Pacific Ocean, it consists of List of islands of the Philippines, 7,641 islands, with a tot ...
scripts: **
Tagalog Tagalog may refer to: Language * Tagalog language, a language spoken in the Philippines ** Old Tagalog, an archaic form of the language ** Batangas Tagalog, a dialect of the language * Tagalog script, the writing system historically used for Tagal ...
(1700–171F) ** Hanunoo (1720–173F) ** Buhid (1740–175F) **
Tagbanwa The Tagbanwa people (Tagbanwa script, Tagbanwa: ) are an Indigenous peoples of the Philippines, indigenous people and one of the oldest ethnic groups in the Philippines, mainly found in central and northern Palawan. Research has shown that the T ...
(1760–177F) * Khmer (1780–17FF) *
Mongolian Mongolian may refer to: * Something of, from, or related to Mongolia, a country in Asia * Mongolian people, or Mongols * Bogd Khanate of Mongolia, the government of Mongolia, 1911–1919 and 1921–1924 * Mongolian language * Mongolian alphabet * ...
(1800–18AF) *
Unified Canadian Aboriginal Syllabics Extended Unified Canadian Aboriginal Syllabics Extended is a Unicode block containing extensions to the Canadian syllabics contained in the Unified Canadian Aboriginal Syllabics Unicode block for some dialects of Cree, Ojibwe, Dene The Dene peopl ...
(18B0–18FF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout South Asia, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used b ...
scripts: **
Limbu Limbu may refer to: * Limbu people, an indigenous tribe living in Nepal, Sikkim (India) and Bhutan ** Limbu language, their Sino-Tibetan language *** Limbu script **** Limbu (Unicode block) * Rambahadur Limbu Rambahadur Limbu, (; 8 July 1939 ...
(1900–194F) * Tai scripts: ** Tai Le (1950–197F) ** New Tai Lue (1980–19DF) ** Khmer Symbols (19E0–19FF) ** Buginese (1A00–1A1F) **
Tai Tham Tai Tham script (''Dharma, Tham'' meaning "scripture") is an abugida writing system used mainly for a group of Southwestern Tai languages i.e., Northern Thai language, Northern Thai, Tai Lue language, Tai Lü, Khün language, Khün and Lao langu ...
(1A20–1AAF) *
Combining Diacritical Marks Extended Combining Diacritical Marks Extended is a Unicode block containing diacritical marks used in German dialectology (Teuthonista Teuthonista is a phonetic transcription system used predominantly for the transcription of High German languages, (Hig ...
(1AB0–1AFF) *
Indonesian Indonesian is anything of, from, or related to Indonesia, an archipelagic country in Southeast Asia. It may refer to: * Indonesians, citizens of Indonesia ** Native Indonesians, diverse groups of local inhabitants of the archipelago ** Indonesian ...
scripts: **
Balinese Balinese may refer to: *Bali, an Indonesian island *Balinese art *Balinese dance *Balinese people *Balinese language *Nusa Penida Balinese * Bali Aga Balinese **Balinese script **Balinese (Unicode block) *Balinese mythology *Balinese cat, a cat bre ...
(1B00–1B7F) ** Sundanese (1B80–1BBF) **
Batak Batak is a collective term used to identify a number of closely related Austronesian peoples, Austronesian ethnic groups predominantly found in North Sumatra, Indonesia, who speak Batak languages. The term is used to include the Karo people ( ...
(1BC0–1BFF) * Lepcha (1C00–1C4F) *
Ol Chiki The Ol Chiki () script, also known as Ol Chemetʼ (, , ), Ol Ciki, Ol, and sometimes as the Santali alphabet is the official writing system for Santali, an Austroasiatic language recognized as an official regional language in India. It was ...
(1C50–1C7F) * Other left-to-right alphabetic or syllabic supplements: **
Cyrillic Extended-C Cyrillic Extended-C is a Unicode block containing Cyrillic characters mostly for facsimile reprinting Old Believer Old Believers or Old Ritualists (Russian: староверы, ''starovery'' or старообрядцы, ''staroobryadtsy'') i ...
(1C80–1C8F) **
Georgian Extended Georgian Extended is a Unicode block containing Georgian ''Mtavruli'' ( ka, მთავრული, "title" or "heading") letters that function as uppercase versions of their ''Mkhedruli'' counterparts in the Georgian block. Unlike all other ...
(1C90–1CBF) * Sundanese Supplement (1CC0–1CCF) * Vedic Extensions (1CD0–1CFF) * Other left-to-right alphabetic supplements: **
Phonetic Extensions Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the ''Oxford English Dictionary'' and American dictionaries, and Americanist and Russianist phonetic notat ...
(1D00–1D7F) **
Phonetic Extensions Supplement Phonetic Extensions Supplement is a Unicode block containing characters for specialized and deprecated forms of the International Phonetic Alphabet The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based ...
(1D80–1DBF) **
Combining Diacritical Marks Supplement Combining Diacritical Marks Supplement is a Unicode block containing combining characters for the Uralic Phonetic Alphabet, Medievalist notations, and German dialectology (Teuthonista). It is an extension of the diacritic characters found in the ...
(1DC0–1DFF) **
Latin Extended Additional Latin Extended Additional is a Unicode block. The characters in this block are mostly precomposed combinations of Latin letters with one or more general diacritical marks. Ninety of the characters are used in the Vietnamese alphabet The Vie ...
(1E00–1EFF) **
Greek Extended Greek Extended is a Unicode block containing the accented vowels necessary for writing polytonic Greek. The regular, unaccented Greek characters as well as the characters with tonos and diaeresis can be found in the Greek and Coptic block. Gr ...
(1F00–1FFF) *
Symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise different concep ...
: **
General Punctuation General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic ...
(2000–206F) **
Superscripts and Subscripts Superscripts and Subscripts is a Unicode block containing superscript and subscript numerals, mathematical operators, and letters used in mathematics and phonetics. The use of subscripts and superscripts in Unicode allows any polynomial, chemic ...
(2070–209F) **
Currency Symbols A currency symbol or currency sign is a graphic symbol used to denote a currency unit. Usually it is defined by a monetary authority, such as the national central bank for the currency concerned. A symbol may be positioned in various ways, acc ...
(20A0–20CF) **
Combining Diacritical Marks for Symbols Combining Diacritical Marks for Symbols is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative ...
(20D0–20FF) **
Letterlike Symbols Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not exp ...
(2100–214F) **
Number Forms Number Forms is a Unicode block containing Unicode compatibility characters that have specific meaning as numbers, but are constructed from other characters. They consist primarily of vulgar fractions and Roman numerals. In addition to the ch ...
(2150–218F) ** Arrows (2190–21FF) **
Mathematical Operators Mathematical Operators is a Unicode block containing characters for mathematical, logical, and set notation. Notably absent are the plus sign (+), greater than sign (>) and less than sign (<), due to them already appearing in the Basic ...
(2200–22FF) **
Miscellaneous Technical Miscellaneous Technical is a Unicode block ranging from U+2300 to U+23FF. It contains various common symbols which are related to and used in the various technical, programming language, and academic professions. For example: * Symbol ⌂ (HTML ...
(2300–23FF) **
Control Pictures Control Pictures is a Unicode block containing characters for graphically representing the C0 control codes, and other control characters. Its block name in Unicode 1.0 was Pictures for Control Codes. Block History The following Unicode-rel ...
(2400–243F) **
Optical Character Recognition Optical character recognition or optical character reader (OCR) is the electronics, electronic or machine, mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo ...
(2440–245F) **
Enclosed Alphanumerics Enclosed Alphanumerics is a Unicode block of Typography, typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop. It is currently fully allocated. Within the Basic Multi ...
(2460–24FF) **
Box Drawing Box Drawing is a Unicode block containing characters for compatibility with legacy graphics standards that contained characters for making bordered charts and tables, i.e. box-drawing characters. Its block name in Unicode 1.0 was Form and Chart C ...
(2500–257F) **
Block Elements Block Elements is a Unicode block containing square block symbols of various fill and shading. Used along with block elements are box-drawing characters, shade characters, and terminal graphic characters. These can be used for filling regions of t ...
(2580–259F) **
Geometric Shapes A shape is a graphics, graphical representation of an object's form or its external boundary, outline, or external Surface (mathematics), surface. It is distinct from other object properties, such as color, Surface texture, texture, or material ...
(25A0–25FF) **
Miscellaneous Symbols Miscellaneous Symbols is a Unicode block (U+2600–U+26FF) containing glyphs representing concepts from a variety of categories: astrological, astronomical, chess, dice, musical notation, political symbols, recycling, religious symbols, trig ...
(2600–26FF) ** Dingbats (2700–27BF) **
Miscellaneous Mathematical Symbols-A Miscellaneous Mathematical Symbols-A is a Unicode block containing characters for mathematical, logical, and database notation. Character table Compact table History The following Unicode-related documents record the purpose and process ...
(27C0–27EF) **
Supplemental Arrows-A Supplemental Arrows-A is a Unicode block containing various arrow symbols. Block History The following Unicode-related documents record the purpose and process of defining specific characters in the Supplemental Arrows-A block: See also ...
(27F0–27FF) **
Braille Patterns The Unicode block Braille Patterns (U+2800..U+28FF) contains all 256 possible patterns of an 8-dot braille cell, thereby including the complete 6-dot cell range.
(2800–28FF) **
Supplemental Arrows-B Supplemental Arrows-B is a Unicode block containing miscellaneous arrows, arrow tails, crossing arrows used in knot descriptions, curved arrows, and harpoons. Block Emoji The Supplemental Arrows-B block contains two emoji: U+2934–U+2935. ...
(2900–297F) **
Miscellaneous Mathematical Symbols-B Miscellaneous Mathematical Symbols-B is a Unicode block containing miscellaneous mathematical symbols, including brackets, angles, and circle symbols. Block Some of these symbols are used in Z notation. Specifically * * * * * * The last two ...
(2980–29FF) **
Supplemental Mathematical Operators Supplemental Mathematical Operators is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and d ...
(2A00–2AFF) **
Miscellaneous Symbols and Arrows Miscellaneous Symbols and Arrows is a Unicode block containing arrows and geometric shapes with various fills, astrological symbols, technical symbols, intonation marks, and others. Block Emoji The Miscellaneous Symbols and Arrows block co ...
(2B00–2BFF) * Other left-to-right alphabetic scripts or supplements: **
Glagolitic The Glagolitic script ( , , ''glagolitsa'') is the oldest known Slavic alphabet. It is generally agreed that it was created in the 9th century for the purpose of translating liturgical texts into Old Church Slavonic by Saints Cyril and Methodi ...
(2C00–2C5F) **
Latin Extended-C Latin Extended-C is a Unicode block containing Latin characters for Uighur New Script, the Uralic Phonetic Alphabet, Shona, Claudian Latin and the Swedish Dialect Alphabet. Block History The following Unicode-related documents record the ...
(2C60–2C7F) ** Coptic (2C80–2CFF) **
Georgian Supplement Georgian Supplement is a Unicode block containing characters for the ecclesiastical form of the Georgian script, Nuskhuri ( ka, ნუსხური). To write the full ecclesiastical Khutsuri orthography, the Asomtavruli The Georgian scr ...
(2D00–2D2F) * African scripts: **
Tifinagh Tifinagh ( Tuareg Berber language: ; Neo-Tifinagh: ; Berber Latin alphabet: ; ) is a script used to write the Berber languages. Tifinagh is descended from the ancient Libyco-Berber alphabet. The traditional Tifinagh, sometimes called Tuareg Tifi ...
(2D30–2D7F) ** Ethiopic Extended (2D80–2DDF) * Other left-to-right alphabetic supplements: **
Cyrillic Extended-A Cyrillic Extended-A is a Unicode block containing Cyrillic combining characters used in Old Church Slavonic Old Church Slavonic or Old Slavonic ( ) is the first Slavic languages, Slavic literary language and the oldest extant written Slavoni ...
(2DE0–2DFF) **
Supplemental Punctuation Supplemental Punctuation is a Unicode block containing historic and specialized punctuation characters, including biblical editorial symbols, ancient Greek punctuation, and German dictionary marks. Additional punctuation characters are in the Ge ...
(2E00–2E7F) * CJK scripts and symbols: **
CJK Radicals Supplement CJK Radicals Supplement is a Unicode block containing alternative, often positional, forms of the Kangxi radical The ''Kangxi'' radicals (), also known as ''Zihui'' radicals, are a set of 214 Chinese character radicals, radicals that were c ...
(2E80–2EFF) **
Kangxi Radicals The ''Kangxi'' radicals (), also known as ''Zihui'' radicals, are a set of 214 radicals that were collated in the 18th-century '' Kangxi Dictionary'' to aid categorization of Chinese characters. They are primarily sorted by stroke count. They ...
(2F00–2FDF) ** Ideographic Description Characters (2FF0–2FFF) **
CJK Symbols and Punctuation CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character. Block The block has variation sequences defined for East ...
(3000–303F) **
Hiragana is a Japanese language, Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''. It is a phonetic lettering system. The word ''hiragana'' means "common" or "plain" kana (originally also "easy", ...
(3040–309F) **
Katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
(30A0–30FF) **
Bopomofo Bopomofo, also called Zhuyin Fuhao ( ; ), or simply Zhuyin, is a Chinese transliteration, transliteration system for Standard Chinese and other Sinitic languages. It is the principal method of teaching Chinese Mandarin pronunciation in Taiwa ...
(3100–312F) **
Hangul Compatibility Jamo Hangul Compatibility Jamo is a Unicode block containing Hangul characters for compatibility with the South Korean national standard KS X 1001 KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, i ...
(3130–318F) **
Kanbun ''Kanbun'' ( 'Han Chinese, Han writing') is a system for writing Literary Chinese used in Japan from the Nara period until the 20th century. Much of Japanese literature was written in this style and it was the general writing style for offici ...
(3190–319F) ** Bopomofo Extended (31A0–31BF) **
CJK Strokes Strokes ( zh, t=筆畫, s=笔画, p=bǐhuà) are the smallest structural units making up written Chinese characters. In the act of writing, a stroke is defined as a movement of a writing instrument on a writing material surface, or the trace l ...
(31C0–31EF) **
Katakana Phonetic Extensions Katakana Phonetic Extensions is a Unicode block containing additional small katakana characters for writing the Ainu language, in addition to characters in the Katakana is a Japanese syllabary, one component of the Japanese writing system a ...
(31F0–31FF) **
Enclosed CJK Letters and Months Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alp ...
(3200–32FF) **
CJK Compatibility CJK Compatibility is a Unicode block containing square symbols (both CJK and Latin alphanumeric) encoded for compatibility with East Asian character sets. In Unicode 1.0, it was divided into two blocks, named CJK Squared Words (U+3300–U+337F) ...
(3300–33FF) **
CJK Unified Ideographs Extension A __FORCETOC__ CJK Unified Ideographs Extension-A is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for adminis ...
(3400–4DBF) **
Yijing Hexagram Symbols Yijing Hexagram Symbols is a Unicode block containing the 64 hexagrams from the ''I Ching''. History The following Unicode-related documents record the purpose and process of defining specific characters in the Yijing Hexagram Symbols block: ...
(4DC0–4DFF) **
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Uni ...
(4E00–9FFF) * Yi Syllables (A000–A48F) *
Yi Radicals Yi Radicals is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. ...
(A490–A4CF) *
Lisu Lisu may refer to: *Lisu people, an ethnic group of the mountainous regions of Yunnan (China), Arunachal Pradesh (India), northern Myanmar and Thailand *Lisu language, Tibeto-Burman language spoken by the Lisu people **Fraser script or Old Lisu A ...
(A4D0–A4FF) * African scripts: ** Vai (A500–A63F) * Other left-to-right alphabetic supplements: **
Cyrillic Extended-B Cyrillic Extended-B is a Unicode block containing Cyrillic characters for writing Old Cyrillic and Old Abkhazian, and combining numeric signs for Cyrillic numerals used in early Slavic or Church Slavonic Church Slavonic is the conservative ...
(A640–A69F) * African scripts: ** Bamum (A6A0–A6FF) * Other left-to-right alphabetic supplements: **
Modifier Tone Letters Modifier Tone Letters is a Unicode block containing tone markings for Chinese, Chinantec, Africanist, and other phonetic transcriptions. It does not contain the standard IPA tone marks, which are found in Spacing Modifier Letters. are used to ...
(A700–A71F) **
Latin Extended-D Latin Extended-D is a Unicode block containing Latin (script), Latin characters for phonetic, Mayanist, and Medieval transcription and notation systems. 89 of the characters in this block are for medieval characters proposed by the Medieval Unic ...
(A720–A7FF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout South Asia, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used b ...
scripts: **
Syloti Nagri Sylheti Nagri or Sylheti Nāgarī (, , ), known in classical manuscripts as Sylhet Nagri () as well as by #Etymology and names, many other names, is an Indic script. The script was historically used in the regions of Bengal and Assam, that were ...
(A800–A82F) ** Common Indic Number Forms (A830–A83F) ** Phags-pa (A840–A87F) **
Saurashtra Saurashtra, Sourashtra, or variants may refer to: ** Kathiawar, also called Saurashtra Peninsula, a peninsula in western India ** Saurashtra (state), alias United State of Kathiawar, a former Indian state, merged into Bombay State and since its d ...
(A880–A8DF) ** Devanagari Extended (A8E0–A8FF) ** Kayah Li (A900–A92F) ** Rejang (A930–A95F) *
Hangul Jamo Extended-A Hangul Jamo Extended-A is a Unicode block containing ''choseong'' (initial consonant) forms of archaic Hangul consonant clusters. They can be used to dynamically compose syllables that are not available as precomposed Hangul syllables in Unic ...
(A960–A97F) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout South Asia, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used b ...
scripts: ** Javanese (A980–A9DF) **
Myanmar Extended-B Myanmar Extended-B is a Unicode block containing Burmese script characters for writing Pali and Tai Laing. History The following Unicode-related documents record the purpose and process of defining specific characters in the Myanmar Extended-B ...
(A9E0–A9FF) **
Cham Cham or CHAM may refer to: Ethnicities and languages *Chams, people in Vietnam and Cambodia **Cham language, the language of the Cham people ***Cham script *** Cham (Unicode block), a block of Unicode characters of the Cham script * Cham Albani ...
(AA00–AA5F) **
Myanmar Extended-A Myanmar Extended-A is a Unicode block containing Myanmar characters for writing the Khamti Shan and Aiton languages. Block The block has eleven variation sequences defined for standardized variants. They use (VS01) to denote the dotted let ...
(AA60–AA7F) **
Tai Viet The Tai Viet script ( Tai Dam: ("Tai script"), , , ) is a Brahmic script used by the Tai Dam people and various other Thai people in Vietnam and Thailand.Meetei Mayek Extensions (AAE0–AAFF) * Ethiopic Extended-A (AB00–AB2F) *
Latin Extended-E Latin Extended-E is a Unicode block containing Latin script characters used in German dialectology (Teuthonista), Anthropos (journal), Anthropos alphabet, Yakut scripts, Sakha and Americanist phonetic notation, Americanist usage. Block Histo ...
(AB30–AB6F) *
Cherokee Supplement Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as ...
(AB70–ABBF) *
Meetei Mayek The Meitei script (), also known as the Kanglei script () or the Kok Sam Lai script (), after its first three letters is an abugida in the Brahmic scripts family used to write the Meitei language, the official language of Manipur, Assam an ...
(ABC0–ABFF) *
Hangul Syllables Hangul Syllables is a Unicode block containing precomposed Hangul syllable blocks for modern Korean. The syllables Korean language and computers#Hangul in Unicode, can be directly mapped by algorithm to sequences of two or three characters in th ...
(AC00–D7AF) *
Hangul Jamo Extended-B Hangul Jamo Extended-B is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentati ...
(D7B0–D7FF) *
Surrogates ''Surrogates'' is a 2009 American science fiction action film based on the 2005–2006 comic book series ''The Surrogates''. Directed by Jonathan Mostow, it stars Bruce Willis as Tom Greer, an FBI agent who ventures out into the real world to ...
: ** High Surrogates (D800–DB7F) ** High Private Use Surrogates (DB80–DBFF) ** Low Surrogates (DC00–DFFF) *
Private Use Area In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the standard. Three Private Use Areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearly covering ...
(E000–F8FF) * CJK Compatibility Ideographs (F900–FAFF) *
Alphabetic Presentation Forms Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts. Block History The following Unicode-related documents record the purpose and process of defining specific characters in ...
(FB00–FB4F) *
Arabic Presentation Forms-A Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically ...
(FB50–FDFF) *
Variation Selectors Variation Selectors is a Unicode block containing 16 variation selectors used to specify a Variant form (Unicode), glyph variant for a preceding character. They are currently used to specify standardized variation sequences for mathematical symb ...
(FE00–FE0F) * Vertical Forms (FE10–FE1F) *
Combining Half Marks Combining Half Marks is a Unicode block containing diacritical combining characters for spanning multiple characters. Block History The following Unicode-related documents record the purpose and process of defining specific characters in the C ...
(FE20–FE2F) *
CJK Compatibility Forms CJK Compatibility Forms is a Unicode block containing vertical glyph variants for east Asian compatibility. Its block name in Unicode 1.0 was CNS 11643 Compatibility, in reference to CNS 11643. History The following Unicode-related documents ...
(FE30–FE4F) *
Small Form Variants Small Form Variants is a Unicode block containing small punctuation characters for compatibility with the Chinese National Standard CNS 11643 The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Sta ...
(FE50–FE6F) *
Arabic Presentation Forms-B Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP (''zero width no-break space'') is also here, which is only meant for a byte order mark (that ma ...
(FE70–FEFF) *
Halfwidth and Fullwidth Forms In CJK characters, CJK (Chinese, Japanese, and Korean) computing, graphic characters are traditionally classed into fullwidth and halfwidth characters. Unlike monospaced fonts, a halfwidth character occupies half the width of a fullwidth characte ...
(FF00–FFEF) * Specials (FFF0–FFFF)


Supplementary Multilingual Plane

Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts (except CJK ideographic), and symbols and notation used within certain fields. Scripts include
Linear B Linear B is a syllabary, syllabic script that was used for writing in Mycenaean Greek, the earliest Attested language, attested form of the Greek language. The script predates the Greek alphabet by several centuries, the earliest known examp ...
,
Egyptian hieroglyphs Ancient Egyptian hieroglyphs ( ) were the formal writing system used in Ancient Egypt for writing the Egyptian language. Hieroglyphs combined Ideogram, ideographic, logographic, syllabic and alphabetic elements, with more than 1,000 distinct char ...
, and
cuneiform Cuneiform is a Logogram, logo-Syllabary, syllabic writing system that was used to write several languages of the Ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform script ...
scripts. It also includes English reform orthographies like
Shavian The Shavian alphabet ( ; also known as the Shaw alphabet) is a constructed alphabet conceived as a way to provide simple, phonemic orthography for the English language to replace the inefficiencies and difficulties of conventional spelling ...
and Deseret, and some modern scripts like Osage,
Warang Citi Warang Chiti (also written Varang Kshiti; , IPA: /wɐrɐŋ ʧɪt̪ɪ/) is a writing system invented by Lako Bodra for the Ho language spoken in East India. It is used in primary and adult education and in various publications. It has mainly gain ...
, Adlam, Wancho and Toto. Symbols and notations include historic and modern
musical notation Musical notation is any system used to visually represent music. Systems of notation generally represent the elements of a piece of music that are considered important for its performance in the context of a given musical tradition. The proce ...
; mathematical alphanumerics; shorthands;
Emoji An emoji ( ; plural emoji or emojis; , ) is a pictogram, logogram, ideogram, or smiley embedded in text and used in electronic messages and web pages. The primary function of modern emoji is to fill in emotional cues otherwise missing from type ...
and other pictographic sets; and game symbols for
playing card A playing card is a piece of specially prepared card stock, heavy paper, thin cardboard, plastic-coated paper, cotton-paper blend, or thin plastic that is marked with distinguishing motifs. Often the front (face) and back of each card has a f ...
s,
mahjong Mahjong (English pronunciation: ; also transliterated as mah jongg, mah-jongg, and mahjongg) is a tile-based game that was developed in the 19th century in China and has spread throughout the world since the early 20th century. It is played ...
, and
dominoes Dominoes is a family of tile-based games played with gaming pieces. Each domino is a rectangular tile, usually with a line dividing its face into two square ''ends''. Each end is marked with a number of spots (also called ''Pip (counting), pips ...
. , the SMP comprises the following 161 blocks: *
Archaic Greek Ancient Greek (, ; ) includes the forms of the Greek language used in ancient Greece and the ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Dark Ages (), the Archai ...
and other left-to-right scripts: ** Linear B Syllabary (10000–1007F) ** Linear B Ideograms (10080–100FF) **
Aegean Numbers Aegean numbers was an additive sign-value numeral system used by the Minoan and Mycenaean civilizations. They are attested in the Linear A and Linear B scripts. They may have survived in the Cypro-Minoan script, where a single sign with "100 ...
(10100–1013F) ** Ancient Greek Numbers (10140–1018F) **
Ancient Symbols Ancient Symbols is a Unicode block containing Roman characters for currency, weights, and measures. It also contains the "GREEK SYMBOL TAU RHO" (tau rho or the ''staurogram'' (⳨)) at U+101A0. Block History The following Unicode-related doc ...
(10190–101CF) **
Phaistos Disc The Phaistos Disc, or Phaistos Disk, is a disc of fired clay from the island of Crete, Greece, possibly from the middle or late Minoan Bronze Age ( second millennium BC), bearing a text in an unknown script and language. Its purpose and its ori ...
(101D0–101FF) ** Lycian (10280–1029F) **
Carian Carian may refer to: *Caria, a region in Anatolia *Carians, an ancient Anatolian people *Carian language The Carian language is an extinct language of the Luwic languages, Luwic subgroup of the Anatolian languages, Anatolian branch of the Indo-Eu ...
(102A0–102DF) **
Coptic Epact Numbers Coptic Epact Numbers is a Unicode block containing Old Coptic number forms. These numbers were used in some regions instead of letters of the Coptic alphabet that were used for encoding numbers, as was common in much of the world at the time, ...
(102E0–102FF) ** Old Italic (10300–1032F) **
Gothic Gothic or Gothics may refer to: People and languages *Goths or Gothic people, a Germanic people **Gothic language, an extinct East Germanic language spoken by the Goths **Gothic alphabet, an alphabet used to write the Gothic language ** Gothic ( ...
(10330–1034F) **
Old Permic The Old Permic script (, ), sometimes known by its initial two characters as Abur or Anbur, is a "highly idiosyncratic adaptation" of the Cyrillic script once used to write medieval Komi (a member of the Permic branch of Finno-Ugric languages ...
(10350–1037F) **
Ugaritic Ugaritic () is an extinct Northwest Semitic languages, Northwest Semitic language known through the Ugaritic texts discovered by French archaeology, archaeologists in 1928 at Ugarit, including several major literary texts, notably the Baal cycl ...
(10380–1039F) **
Old Persian Old Persian is one of two directly attested Old Iranian languages (the other being Avestan) and is the ancestor of Middle Persian (the language of the Sasanian Empire). Like other Old Iranian languages, it was known to its native speakers as (I ...
(103A0–103DF) ** Deseret (10400–1044F) **
Shavian The Shavian alphabet ( ; also known as the Shaw alphabet) is a constructed alphabet conceived as a way to provide simple, phonemic orthography for the English language to replace the inefficiencies and difficulties of conventional spelling ...
(10450–1047F) **
Osmanya Osmanya (, ), known in Somali as ''Far Soomaali'' (, "Somali writing") and in Arabic as ''al-kitābah al-ʿuthmānīyah'' (; "Osman writing"), is an alphabetic script created to transcribe the Somali language. It was invented by Osman Yusuf Ke ...
(10480–104AF) ** Osage (104B0–104FF) **
Elbasan Elbasan ( , ; sq-definite, Elbasani, ) is the fourth most populous city of Albania and seat of Elbasan County and Elbasan Municipality. It lies to the north of the river Shkumbin between the Skanderbeg Mountains and the Myzeqe Plain in centr ...
(10500–1052F) **
Caucasian Albanian Caucasian Albania is a modern exonym for a former state located in ancient times in the Caucasus, mostly in what is now Azerbaijan (where both of its capitals were located). The modern endonyms for the area are ''Aghwank'' and ''Aluank'', among ...
(10530–1056F) ** Vithkuqi (10570–105BF) ** Todhri (105C0–105FF) **
Linear A Linear A is a writing system that was used by the Minoans of Crete from 1800 BC to 1450 BC. Linear A was the primary script used in Minoan palaces, palace and religious writings of the Minoan civilization. It evolved into Linear B, ...
(10600–1077F) **
Latin Extended-F Latin Extended-F is a Unicode block containing modifier letters, nearly all IPA and extIPA, for phonetic transcription. The Latin Extended-F and -G blocks contain the first Latin characters defined outside of the Basic Multilingual Plane (BMP). ...
(10780–107BF) * Right-to-left scripts: **
Cypriot Syllabary The Cypriot or Cypriote syllabary (also Classical Cypriot Syllabary) is a syllabary, syllabic script used in Iron Age Cyprus, from about the 11th to the 4th centuries BCE, when it was replaced by the Greek alphabet. It has been suggested that t ...
(10800–1083F) **
Imperial Aramaic Imperial Aramaic is a linguistic term, coined by modern Aramaic studies, scholars in order to designate a specific historical Variety (linguistics), variety of Aramaic language. The term is polysemic, with two distinctive meanings, wider (socioli ...
(10840–1085F) ** Palmyrene (10860–1087F) **
Nabataean The Nabataeans or Nabateans (; Nabataean Aramaic: , , vocalized as ) were an ancient Arab people who inhabited northern Arabia and the southern Levant. Their settlements—most prominently the assumed capital city of Raqmu (present-day Petr ...
(10880–108AF) ** Hatran (108E0–108FF) ** Phoenician (10900–1091F) ** Lydian (10920–1093F) ** Meroitic Hieroglyphs (10980–1099F) **
Meroitic Cursive The Meroitic script consists of two alphasyllabic scripts developed to write the Meroitic language at the beginning of the Meroitic Period (3rd century BC) of the Kingdom of Kush. The two scripts are Meroitic Cursive, derived from Demotic Egypt ...
(109A0–109FF) **
Kharoshthi Kharosthi script (), also known as the Gandhari script (), was an ancient script originally developed in the Gandhara Region of modern-day Pakistan, between the 5th and 3rd century BCE. used primarily by the people of Gandhara alongside vari ...
(10A00–10A5F) **
Old South Arabian Ancient South Arabian (ASA; also known as Old South Arabian, Epigraphic South Arabian, Ṣayhadic, or Yemenite) is a group of four closely related extinct languages ( Sabaean/Sabaic, Qatabanic, Hadramitic, Minaic) spoken in the far southern ...
(10A60–10A7F) **
Old North Arabian Languages and scripts in the 1st Century Arabia Ancient North Arabian (ANA) is a collection of scripts and a language or family of languages under the North Arabian languages branch along with Old Arabic that were used in north and central Ara ...
(10A80–10A9F) **
Manichaean Manichaeism (; in ; ) is an endangered former major world religion currently only practiced in China around Cao'an,R. van den Broek, Wouter J. Hanegraaff ''Gnosis and Hermeticism from Antiquity to Modern Times''. SUNY Press, 1998 p. 37 found ...
(10AC0–10AFF) **
Avestan Avestan ( ) is the liturgical language of Zoroastrianism. It belongs to the Iranian languages, Iranian branch of the Indo-European languages, Indo-European language family and was First language, originally spoken during the Avestan period, Old ...
(10B00–10B3F) **
Inscriptional Parthian Inscriptional Parthian was a script used to write the Parthian language, the majority of the text found were from clay fragments. This script was used from the 2nd century CE to the 5th century CE or in the Parthian Empire to the early Sasanian E ...
(10B40–10B5F) **
Inscriptional Pahlavi Inscriptional Pahlavi is the earliest attested form of Pahlavi scripts, and is evident in clay fragments that have been dated to the reign of Mithridates I (''r.'' 171–138 BC). Other early evidence includes the Pahlavi inscriptions of Parth ...
(10B60–10B7F) **
Psalter Pahlavi Psalter Pahlavi is a cursive abjad that was used for writing Middle Persian on paper; it is thus described as one of the Pahlavi scripts. It was written right to left, usually with spaces between words. It takes its name from the Pahlavi Psalt ...
(10B80–10BAF) **
Old Turkic Old Siberian Turkic, generally known as East Old Turkic and often shortened to Old Turkic, was a Siberian Turkic language spoken around East Turkistan and Mongolia. It was first discovered in inscriptions originating from the Second Turkic Kh ...
(10C00–10C4F) ** Old Hungarian (10C80–10CFF) ** Hanifi Rohingya (10D00–10D3F) ** Garay (10D40–10D8F) **
Rumi Numeral Symbols Rumi Numeral Symbols is a Unicode block containing numeric characters used in Fez, Morocco Fez () or Fes (; ) is a city in northern inland Morocco and the capital of the Fez-Meknes, Fez-Meknes administrative region. It is one of the List of ...
(10E60–10E7F) **
Yezidi Yazidis, also spelled Yezidis (; ), are a Kurdish-speaking endogamous religious group indigenous to Kurdistan, a geographical region in Western Asia that includes parts of Iraq, Syria, Turkey, and Iran. The majority of Yazidis remaining in ...
(10E80–10EBF) ** Arabic Extended-C (10EC0–10EFF) ** Old Sogdian (10F00–10F2F) ** Sogdian (10F30–10F6F) **
Old Uyghur Old Uyghur () was a Turkic language spoken in Qocho from the 9th–14th centuries as well as in Gansu. History Old Uyghur evolved from Old Turkic, a Siberian Turkic language, after the Uyghur Khaganate broke up and remnants of it migrated ...
(10F70–10FAF) ** Chorasmian (10FB0–10FDF) **
Elymaic The Elymaic alphabet is a right-to-left, non-joining abjad. It is derived from the Aramaic alphabet. Elymaic was used in the ancient state of Elymais, which was a semi-independent state of the 2nd century BCE to the early 3rd century CE, frequentl ...
(10FE0–10FFF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout South Asia, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used b ...
scripts: **
Brahmi Brahmi ( ; ; ISO: ''Brāhmī'') is a writing system from ancient India. "Until the late nineteenth century, the script of the Aśokan (non-Kharosthi) inscriptions and its immediate derivatives was referred to by various names such as 'lath' or ...
(11000–1107F) **
Kaithi Kaithi (), also called Kayathi (), Kayasthi (), or Kayastani, is a Brahmic script historically used across parts of Northern and Eastern India. It was prevalent in regions corresponding to modern-day Uttar Pradesh, Bihar, and Jharkhand. The s ...
(11080–110CF) **
Sora Sompeng The Sorang Sompeng script is used to write Sora, a Munda language with 300,000 speakers in India. The script was created by Mangei Gomango in 1936 and is used in religious contexts. The Sora language is also written in the Latin, Odia, and ...
(110D0–110FF) ** Chakma (11100–1114F) **
Mahajani Mahajani is a Laṇḍā mercantile script that was historically used in northern India for writing accounts and financial records in Marwari, Hindi and Punjabi. It is a Brahmic script and is written left-to-right. Mahajani refers to the Hin ...
(11150–1117F) ** Sharada (11180–111DF) **
Sinhala Archaic Numbers Sinhala Archaic Numbers is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentati ...
(111E0–111FF) **
Khojki Khojkī or Khojā Sindhī ( (Arabic script) खोजकी (Devanagari)), is a script used formerly and almost exclusively by the Khoja community of parts of the Indian subcontinent, including Sindh, Gujarat, and Punjab. However, this script ...
(11200–1124F) **
Multani Multani may refer to: People With the surname * Ali Haider Multani (1690–1785), Punjabi Sufi poet * Ayn al-Mulk Multani, commander of the Delhi Sultanate in India * Har Karan Ibn Mathuradas Kamboh Multani, writer during the Mughal Empire Ethni ...
(11280–112AF) ** Khudawadi (112B0–112FF) ** Grantha (11300–1137F) ** Tulu-Tigalari (11380–113FF) **
Newa Newar (; , ) is a Sino-Tibetan language spoken by the Newar people, the indigenous inhabitants of Nepal Mandala, which consists of the Kathmandu Valley and surrounding regions in Nepal. The language is known officially in Nepal as Nepal Bhas ...
(11400–1147F) **
Tirhuta The Tirhuta also known as Mithilakshar or Maithili script has historically been used for writing the Maithili, an Indo-Aryan language spoken by almost 35 million people of cultural Mithila. It was also used to write the Sanskrit language. The ...
(11480–114DF) ** Siddham (11580–115FF) **
Modi Narendra Damodardas Modi (born 17 September 1950) is an Indian politician who has served as the prime minister of India since 2014. Modi was the chief minister of Gujarat from 2001 to 2014 and is the member of parliament (MP) for Varanas ...
(11600–1165F) ** Mongolian Supplement (11660–1167F) **
Takri The Tākri script (Takri (Chamba district, Chamba): ; Takri (Jammu Division, Jammu/Dogri script, Dogra): ; sometimes called Tankri ) is an abugida writing system of the Brahmic scripts, Brahmic family of scripts. It is derived from the Sharada ...
(11680–116CF) **
Myanmar Extended-C Myanmar Extended-C is a Unicode block containing numerals for Eastern Pwo and Pa'O languages. History The following Unicode-related documents record the purpose and process of defining specific characters in the Myanmar Extended-C block: ...
(116D0–116FF) ** Ahom (11700–1174F) **
Dogra __NOTOC__ Dogras, or Dogra people, are an Indo-Aryan ethnic community of Pakistan and India. Dogra, Dogras or Dogri may also refer to: * Dogra dynasty, a Hindu dynasty of Kashmir * Dogri language, a language spoken by Dogras and other ethnic commu ...
(11800–1184F) **
Warang Citi Warang Chiti (also written Varang Kshiti; , IPA: /wɐrɐŋ ʧɪt̪ɪ/) is a writing system invented by Lako Bodra for the Ho language spoken in East India. It is used in primary and adult education and in various publications. It has mainly gain ...
(118A0–118FF) **
Dives Akuru Dhives Akuru, later called Dhivehi Akuru (meaning Maldivian letters) is a script formerly used for the Maldivian language. The name can be alternatively spelled Dives Akuru or Divehi Akuru using the ISO 15919 Romanization scheme, as the "d" is ...
(11900–1195F) **
Nandinagari Nandināgarī is a Brahmic script derived from the Nāgarī script which appeared in the 7th century AD.George Cardona and Danesh Jain (2003), The Indo-Aryan Languages, Routledge, , page 75 This script and its variants were used in the centr ...
(119A0–119FF) **
Zanabazar Square Zanabazar's square script is a horizontal Mongolian square script ( or ), an abugida developed by the monk and scholar Zanabazar based on the Tibetan alphabet to write Mongolian. It can also be used to write Tibetan language and Sanskrit as a ge ...
(11A00–11A4F) ** Soyombo (11A50–11AAF) * Unified Canadian Aboriginal Syllabics Extended-A (11AB0–11ABF) * Brahmic scripts: **
Pau Cin Hau Pau Cin Hau was the founder and the name of a religion followed by some Tedim, Hakha in Chin state and Kale in Sagaing division in the north-western part of Myanmar. Pau Cin Hau was born in the Tedim (Tiddim) in 1859; and lived until 1948. Relig ...
(11AC0–11AFF) **
Devanagari Extended-A Devanagari Extended-A is a Unicode block containing characters for auspicious signs from Indian Indian or Indians may refer to: Associated with India * of or related to India ** Indian people ** Indian diaspora ** Languages of India ** Indian ...
(11B00–11B5F) **
Sunuwar The Sunuwar or Koinch are a Tibeto-Burman ethnic group. ( Nepali:सुनुवार जाति, ''Sunuwār Jāti'') a Kirati tribe native to Nepal, parts of India (West Bengal and Sikkim) and southern Bhutan. They speak the Sunuwar languag ...
(11BC0–11BFF) ** Bhaiksuki (11C00–11C6F) ** Marchen (11C70–11CBF) **
Masaram Gondi Gondi has typically been written in Devanagari script or Telugu script, but native scripts are in existence. A Gond by the name of Munshi Mangal Singh Masaram designed a Brahmi-based script in 1918, and in 2006, a native script that dates up to 1 ...
(11D00–11D5F) **
Gunjala Gondi The Gunjala Gondi lipi or Gunjala Gondi script is a script used to write the Gondi language, a Dravidian language spoken by the Gond people of northern Telangana, eastern Maharashtra, southeastern Madhya Pradesh, and Chhattisgarh. Approximatel ...
(11D60–11DAF) **
Makasar Makassar ( ), formerly Ujung Pandang ( ), is the capital of the Indonesian province of South Sulawesi. It is the largest city in the region of Eastern Indonesia and the country's fifth-largest urban center after Jakarta, Surabaya, Medan, and Ba ...
(11EE0–11EFF) ** Kawi (11F00–11F5F) *
Lisu Supplement Lisu Supplement is a Unicode block containing supplementary characters of the Fraser alphabet, which is used to write the Lisu language. This is a supplement to the main Lisu block, with currently only a single character used for the Naxi langu ...
(11FB0–11FBF) *
Tamil Supplement Tamil Supplement is a Unicode block containing Tamil Tamil may refer to: People, culture and language * Tamils, an ethno-linguistic group native to India, Sri Lanka, and some other parts of Asia **Sri Lankan Tamils, Tamil people native to Sri ...
(11FC0–11FFF) * Cuneiform scripts: **
Cuneiform Cuneiform is a Logogram, logo-Syllabary, syllabic writing system that was used to write several languages of the Ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform script ...
(12000–123FF) ** Cuneiform Numbers and Punctuation (12400–1247F) **
Early Dynastic Cuneiform Early Dynastic Cuneiform is a Unicode block of the Supplementary Multilingual Plane (SMP), at U+12480–U+1254F, introduced in version 8.0 (June 2015). It is a supplement to the earlier encoding of the cuneiform script in the two blocks U ...
(12480–1254F) *
Cypro-Minoan The Cypro-Minoan syllabary (CM), more commonly called the Cypro-Minoan Script, is an undeciphered syllabary used on the island of Cyprus and at its trading partners during the late Bronze Age and early Iron Age (c. 1550–1050 BC). The term "Cy ...
(12F90–12FFF) * Hieroglyphic scripts: **
Egyptian Hieroglyphs Ancient Egyptian hieroglyphs ( ) were the formal writing system used in Ancient Egypt for writing the Egyptian language. Hieroglyphs combined Ideogram, ideographic, logographic, syllabic and alphabetic elements, with more than 1,000 distinct char ...
(13000–1342F) **
Egyptian Hieroglyph Format Controls Egyptian Hieroglyph Format Controls is a Unicode block containing formatting characters that enable full formatting of quadrats for Egyptian hieroglyphs. The block size was expanded by 32 code points in Unicode version 15.0 (version 14: → ver ...
(13430–1345F) **
Egyptian Hieroglyphs Extended-A Egyptian Hieroglyphs Extended-A is a Unicode block containing additional Egyptian hieroglyphs including those used in Ptolemaic texts. Block History The following Unicode-related documents record the purpose and process of defining specific c ...
(13460–143FF) **
Anatolian Hieroglyphs Anatolian hieroglyphs are an indigenous logographic script native to central Anatolia, consisting of some 500 signs. They were once commonly known as Hittite hieroglyphs, but the language they encode proved to be Luwian language, Luwian, not Hitt ...
(14400–1467F) * Gurung Khema (16100–1613F) * Bamum Supplement (16800–16A3F) * Mro (16A40–16A6F) *
Tangsa The Tangshang people or Tangsa, are of Tibeto-Burmese ethnic group of the Arunachal Pradesh and Assam. They also reside in Sagaing Region and parts of Kachin State of Myanmar. In Myanmar they were formerly known as Rangpang, Pangmi, and Haimi. ...
(16A70–16ACF) * Bassa Vah (16AD0–16AFF) *
Pahawh Hmong Pahawh Hmong (Romanized Popular Alphabet, RPA: Phaj hauj Hmoob , Pahawh: ; known also as ''Ntawv Pahawh, Ntawv Keeb, Ntawv Caub Fab, Ntawv Soob Lwj'') is an indigenous Semi-syllabary, semi-syllabic writing system, script, invented in 1959 by Sh ...
(16B00–16B8F) *
Kirat Rai Kirat Rai (also called Khambu Rai, Rai Barṇamālā and Kirat Khambu Rai) is a left-to-right abugida (a type of segmental writing system), based on the Sumhung Lipi of 1920s, used to write the Bantawa language in the Indian state of Sikkim. Kir ...
(16D40–16D7F) *
Medefaidrin Medefaidrin (Medefidrin), or ', is a constructed language and script created as a Christian sacred language by an Ibibio congregation in 1930s Nigeria. It has its roots in glossolalia ('speaking in tongues'). History Speakers consider Medefa ...
(16E40–16E9F) *
Miao Miao may refer to: * Miao people, linguistically and culturally related group of people, recognized as such by the government of the People's Republic of China * Miao script or Pollard script, writing system used for Miao languages * Miao (Unicode ...
(16F00–16F9F) * East Asian scripts: **
Ideographic Symbols and Punctuation Ideographic Symbols and Punctuation is a Unicode block containing symbols and punctuation marks used by ideographic scripts such as Tangut and Nüshu. History The following Unicode-related documents record the purpose and process of defining ...
(16FE0–16FFF) **
Tangut Tangut may refer to: *Tangut people, an ancient ethnic group in Northwest China *Tangut language, the extinct language spoken by the Tangut people *Tangut script, the writing system used to write the Tangut language *Tangut (Unicode block) *Wester ...
(17000–187FF) **
Tangut Components Tangut Components is a Unicode block containing components and radicals used in the modern study of the Tangut script The Tangut script ( Tangut: ; ) is a logographic writing system, formerly used for writing the extinct Tangut language of th ...
(18800–18AFF) **
Khitan Small Script The Khitan small script () was one of two writing systems used for the now-extinct Khitan language. It was used during the 10th–12th century by the Khitan people, who had created the Liao Empire in present-day northeastern China. In addition to ...
(18B00–18CFF) **
Tangut Supplement Tangut Supplement is a Unicode block containing characters from the Tangut script, which was used for writing the Tangut language spoken by the Tangut people in the Western Xia Empire, and in China during the Yuan dynasty and early Ming dynasty ...
(18D00–18D7F) **
Kana Extended-B Kana Extended-B is a Unicode block containing Taiwanese kana (that is, kana originally created by Japanese linguists to write Taiwanese Hokkien). Block History The following Unicode-related documents record the purpose and process of defining ...
(1AFF0–1AFFF) **
Kana Supplement Kana Supplement is a Unicode block containing one archaic katakana character and 255 hentaigana (non-standard Hiragana) characters. Additional hentaigana characters are encoded in the Kana Extended-A block. Block History The following Unicode- ...
(1B000–1B0FF) **
Kana Extended-A Kana Extended-A is a Unicode block containing hentaigana (non-standard hiragana) and historic kana characters. Additional hentaigana characters are encoded in the Kana Supplement block. Block History The following Unicode-related documents reco ...
(1B100–1B12F) **
Small Kana Extension Small Kana Extension is a Unicode block containing additional small variants for the Hiragana and Katakana syllabaries, in addition to those in the Hiragana, Katakana and Katakana Phonetic Extensions blocks. Block Unassigned code points in the U ...
(1B130–1B16F) ** Nushu (1B170–1B2FF) * Notational writing systems: ** Duployan (1BC00–1BC9F) ** Shorthand Format Controls (1BCA0–1BCAF) *
Symbols for Legacy Computing Supplement Symbols for Legacy Computing Supplement is a Unicode block containing additional graphic characters that were used for various home computers from the 1970s and 1980s, extending the set of characters provided by the Symbols for Legacy Computing b ...
(1CC00–1CEBF) *
Symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise different concep ...
and numerals: **
Musical notation Musical notation is any system used to visually represent music. Systems of notation generally represent the elements of a piece of music that are considered important for its performance in the context of a given musical tradition. The proce ...
: *** Znamenny Musical Notation (1CF00–1CFCF) ***
Byzantine Musical Symbols Byzantine Musical Symbols is a Unicode block containing characters for representing Byzantine music in ekphonetic notation. Block History The following Unicode-related documents record the purpose and process of defining specific characters in ...
(1D000–1D0FF) ***
Musical Symbols Musical symbols are marks and symbols in musical notation that indicate various aspects of how a piece of music is to be performed. There are symbols to communicate information about many musical elements, including Pitch (music), pitch, Duration ...
(1D100–1D1FF) ***
Ancient Greek Musical Notation Ancient Greek Musical Notation is a Unicode block containing symbols representing Musical system of ancient Greece, musical notations used in ancient Greece. Block History The following Unicode-related documents record the purpose and proces ...
(1D200–1D24F) **
Kaktovik Numerals The Kaktovik numerals or Kaktovik Iñupiaq numerals are a base-20 system of numerical digits created by Alaskan Iñupiat. They are visually iconic, with shapes that indicate the number being represented. The Iñupiaq language has a base ...
(1D2C0–1D2DF) ** Mayan Numerals (1D2E0–1D2FF) **
Mathematical symbols A mathematical symbol is a figure or a combination of figures that is used to represent a mathematical object, an action on mathematical objects, a relation between mathematical objects, or for structuring the other symbols that occur in a mathemat ...
: *** Tai Xuan Jing Symbols (1D300–1D35F) ***
Counting Rod Numerals Counting rods (筭) are small bars, typically 3–14 cm (1" to 6") long, that were used by mathematicians for calculation in ancient East Asia. They are placed either horizontally or vertically to represent any integer or rational number. ...
(1D360–1D37F) ***
Mathematical Alphanumeric Symbols Mathematical Alphanumeric Symbols is a Unicode block comprising styled forms of Latin alphabet, Latin and Greek alphabet, Greek letters and decimal numerical digit, digits that enable mathematicians to denote different notions with different l ...
(1D400–1D7FF) * Notational writing systems: **
Sutton SignWriting Sutton SignWriting, or simply SignWriting, is a system of written sign languages. It is highly featural and visually iconic: the shapes of the characters are abstract pictures of the hands, face, and body; and unlike most written words, which ...
(1D800–1DAAF) * Other left-to-right scripts: **
Latin Extended-G Latin Extended-G is a Unicode block containing additional characters for phonetic transcription. The Latin Extended-F and -G blocks contain the first Latin characters defined outside of the Basic Multilingual Plane In the Unicode standard, a p ...
(1DF00–1DFFF) ** Glagolitic Supplement (1E000–1E02F) **
Cyrillic Extended-D Cyrillic Extended-D is a Unicode block containing superscript and subscript Cyrillic characters used in Cyrillic-based phonetic transcription, as well as a combining character. The block contains the first Cyrillic characters defined outside of t ...
(1E030–1E08F) *
Nyiakeng Puachue Hmong Nyiakeng Puachue Hmong (Hmong: ; RPA: ''Ntawv Nyiajkeeb Puajtxwm Hmoob'') is an alphabet script devised for White Hmong and Green Hmong in the 1980s by Reverend Chervang Kong for use within his United Christians Liberty Evangelical Church. Th ...
(1E100–1E14F) * Toto (1E290–1E2BF) * Wancho (1E2C0–1E2FF) * Nag Mundari (1E4D0–1E4FF) *
Ol Onal The Ol Onal, also known as also known as Bhumij Lipi or Bhumij Onal, is an alphabetic writing system for the Bhumij language. Ol Onal script was created between 1981 and 1992 by ''Ol Guru'' Mahendra Nath Sardar. Ol Onal script is used to write ...
(1E5D0–1E5FF) * African scripts: **
Ethiopic Extended-B Ethiopic Extended-B is a Unicode block containing additional Geʽez characters for the Gurage languages Gurage (, Gurage: ጉራጌ) are a Semitic-speaking ethnic group inhabiting Ethiopia.G. W. E. Huntingford, "William A. Shack: The Gurage: ...
(1E7E0–1E7FF) **
Mende Kikakui The Mende Kikakui script is a syllabary used for writing the Mende language of Sierra Leone. History The script was devised by Mohamed Turay (ca. 1850-1923), an Islamic scholar, at a town called Maka (Barri Chiefdom, southern Sierra Leone) aro ...
(1E800–1E8DF) ** Adlam (1E900–1E95F) *
Symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise different concep ...
and numerals: ** Indic Siyaq Numbers (1EC70–1ECBF) ** Ottoman Siyaq Numbers (1ED00–1ED4F) **
Arabic Mathematical Alphabetic Symbols Arabic Mathematical Alphabetic Symbols is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative ...
(1EE00–1EEFF) ** Game tiles and cards: ***
Mahjong Tiles Mahjong tiles () are tiles of Chinese origin that are used to play mahjong as well as mahjong solitaire and other games. Although they are most commonly tiles, they may refer to playing cards with similar contents as well. Development The ...
(1F000–1F02F) ***
Domino Tiles Domino Tiles is a Unicode block containing characters for representing game situations in dominoes. The block includes symbols for the standard six dot tile set and backs in horizontal and vertical orientations. History The following Unicode ...
(1F030–1F09F) ***
Playing Cards A playing card is a piece of specially prepared card stock, heavy paper, thin cardboard, plastic-coated paper, cotton-paper blend, or thin plastic that is marked with distinguishing motifs. Often the front (face) and back of each card has a Pap ...
(1F0A0–1F0FF) **
Enclosed Alphanumeric Supplement Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supple ...
(1F100–1F1FF) **
Enclosed Ideographic Supplement Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more ...
(1F200–1F2FF) **
Miscellaneous Symbols and Pictographs Miscellaneous Symbols and Pictographs is a Unicode block containing meteorological and astronomical symbols, emoji characters largely for compatibility with Japanese telephone carriers' implementations of Shift JIS, and characters originally from ...
(1F300–1F5FF) **
Emoticons An emoticon (, , rarely , ), short for emotion icon, is a pictorial representation of a facial expression using characters—usually punctuation marks, numbers and letters—to express a person's feelings, mood or reaction, without needin ...
(1F600–1F64F) ** Ornamental Dingbats (1F650–1F67F) **
Transport and Map Symbols Transport and Map Symbols is a Unicode block containing transportation and map icons, largely for compatibility with Japanese telephone carriers' emoji implementations of Shift JIS, and to encode characters in the Wingdings and Wingdings 2 char ...
(1F680–1F6FF) **
Alchemical Symbols Alchemical symbols were used to denote chemical elements and compounds, as well as alchemy, alchemical apparatus and processes, until the 18th century. Although notation was partly standardized, style and symbol varied between alchemists. Lüdy ...
(1F700–1F77F) **
Geometric Shapes Extended Geometric Shapes Extended is a Unicode block containing Webdings/ Wingdings symbols, mostly different weights of squares, crosses, and saltires, and different weights of variously spoked asterisks, stars, and various color squares and circles fo ...
(1F780–1F7FF) **
Supplemental Arrows-C Supplemental Arrows-C is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation ...
(1F800–1F8FF) **
Supplemental Symbols and Pictographs Supplemental Symbols and Pictographs is a Unicode block containing emoji characters. It extends the set of symbols included in the Miscellaneous Symbols and Pictographs block. It also includes Typikon symbols. Emoji The Unicode 14.0 Supplemental ...
(1F900–1F9FF) **
Chess Symbols Chess Symbols is a Unicode block containing characters for fairy chess and related notations beyond the basic Western chess symbols (U+2654 to U+265F) in the Miscellaneous Symbols block, as well as symbols representing game pieces for xiangqi ...
(1FA00–1FA6F) **
Symbols and Pictographs Extended-A Symbols and Pictographs Extended-A is a Unicode block containing emoji characters. It extends the set of symbols included in the Supplemental Symbols and Pictographs block. All of the characters in the Symbols and Pictographs Extended-A block a ...
(1FA70–1FAFF) **
Symbols for Legacy Computing Symbols for Legacy Computing is a Unicode block containing graphic characters that were used for various home computers from the 1970s and 1980s and in teletext broadcasting standards. It includes characters from the Amstrad CPC, MSX, Mattel Aqua ...
(1FB00–1FBFF)


Supplementary Ideographic Plane

Plane 2, the Supplementary Ideographic Plane (SIP), is used for CJK Ideographs, mostly
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Uni ...
, that were not included in earlier character encoding standards. , the SIP comprises the following seven blocks: *
CJK Unified Ideographs Extension B CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for ...
(20000–2A6DF) * CJK Unified Ideographs Extension C (2A700–2B73F) * CJK Unified Ideographs Extension D (2B740–2B81F) * CJK Unified Ideographs Extension E (2B820–2CEAF) *
CJK Unified Ideographs Extension F __FORCETOC__ CJK Unified Ideographs Extension F is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, as well as more than a thousand Sawndip characters for writing the Zhuang language, which ...
(2CEB0–2EBEF) * CJK Unified Ideographs Extension I (2EBF0–2EE5F) *
CJK Compatibility Ideographs Supplement CJK Compatibility Ideographs Supplement is a Unicode block containing Han characters used only for roundtrip compatibility mapping with planes 3, 4, 5, 6, 7, and 15 of CNS 11643-1992. Block History The following Unicode-related documents recor ...
(2F800–2FA1F)


Tertiary Ideographic Plane

Plane 3 is the Tertiary Ideographic Plane (TIP).
CJK Unified Ideographs Extension G __FORCETOC__ CJK Unified Ideographs Extension G is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, and Vietnamese which were submitted to the Ideographic Research Group during 2015. It is the firs ...
was added to the TIP in Unicode 13.0, released in March 2020. It also is tentatively allocated for
Oracle Bone script Oracle bone script is the oldest attested form of written Chinese, dating to the late 2nd millennium BC. Inscriptions were made by carving characters into oracle bones, usually either the shoulder bones of oxen or the plastrons of turtl ...
and
Small Seal Script The small seal script is an archaic script style of written Chinese. It developed within the state of Qin during the Eastern Zhou dynasty (771–256 BC), and was then promulgated across China in order to replace script varieties used i ...
. , the TIP comprises the following two blocks: *
CJK Unified Ideographs Extension G __FORCETOC__ CJK Unified Ideographs Extension G is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, and Vietnamese which were submitted to the Ideographic Research Group during 2015. It is the firs ...
(30000–3134F) *
CJK Unified Ideographs Extension H __FORCETOC__ CJK Unified Ideographs Extension H is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, Sawndip, and Vietnamese submitted to the Ideographic Research Group The Ideographic Research Gro ...
(31350–323AF)


Unassigned planes

Planes 4 to 13 (planes to in
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
): No characters have yet been assigned, or proposed for assignment, to Planes 4 through 13.


Supplementary Special-purpose Plane

Plane 14 ( in
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
) is designated as the Supplementary Special-purpose Plane (SSP). It comprises the following two blocks, : * Tags (E0000–E007F) *
Variation Selectors Supplement Variation Selectors Supplement is a Unicode block containing additional variation selectors beyond those found in the Variation Selectors block. These combining characters are named ''variation selector-17'' (for U+E0100) through to ''variation ...
(E0100–E01EF) – used to indicate alternate glyphs for characters.


Private Use Area Planes

The two planes 15 and 16 (planes and in
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
) each contain a "
Private Use Area In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the standard. Three Private Use Areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearly covering ...
". They contain blocks named Supplementary Private Use Area-A (PUA-A) and -B (PUA-B). The Private Use Areas are available for use by parties outside ISO and Unicode (private use character encoding).


References

{{Unicode navigation
Plane Plane most often refers to: * Aero- or airplane, a powered, fixed-wing aircraft * Plane (geometry), a flat, 2-dimensional surface * Plane (mathematics), generalizations of a geometrical plane Plane or planes may also refer to: Biology * Plane ...