Over a thousand characters from the
Latin script
The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Gree ...
are encoded in the
Unicode Standard
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 cha ...
, grouped in several basic and extended Latin
blocks. The extended ranges contain mainly
precomposed letters plus diacritics that are equivalently encoded with
combining diacritics
In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).
Unicode als ...
, as well as some ligatures and distinct letters, used for example in the orthographies of various African languages (including
click symbols in Latin Extended-B) and the
Vietnamese alphabet
The Vietnamese alphabet (, ) is the modern writing script for the Vietnamese language. It uses the Latin script based on Romance languages like French language, French, originally developed by Francisco de Pina (1585–1625), a missionary from P ...
(Latin Extended Additional). Latin Extended-C contains additions for
Uighur and the
Claudian letters
The Claudian letters were a set of three new letters for the Latin alphabet developed by the Roman emperor Claudius, who reigned the Roman Empire from the year 41 to the year 54. These letters, according to the emperor, were much needed f ...
. Latin Extended-D comprises characters that are mostly of interest to medievalists. Latin Extended-E mostly comprises characters used for German dialectology (
Teuthonista
Teuthonista is a phonetic transcription system used predominantly for the transcription of High German languages, (High) German dialects. It is very similar to other Central European transcription systems from the early 20th century. The base cha ...
). Latin Extended-F and -G contain characters for
phonetic transcription
Phonetic transcription (also known as Phonetic script or Phonetic notation) is the visual representation of speech sounds (or ''phonetics'') by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the ...
.
Blocks
As of version of the Unicode Standard, 1,487 characters in the following 19 blocks are classified as belonging to the Latin script.
*
Basic Latin, 0000–007F. This block corresponds to
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
.
*
Latin-1 Supplement
The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) – FF (U+00FF). C1 Controls (0080–009F) are not graphic. T ...
, 0080–00FF. This block and the ASCII part collectively corresponds to IANA
Latin-1
ISO/IEC 8859-1:1998, ''Information technology—8-bit single-byte coded graphic character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987 ...
.
*
Latin Extended-A
Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy characte ...
, 0100–017F
*
Latin Extended-B
Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version ...
, 0180–024F
*
IPA Extensions
IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs ...
, 0250–02AF
*
Spacing Modifier Letters
Spacing Modifier Letters is a Unicode block containing characters for the IPA, UPA, and other phonetic transcriptions. Included are the IPA tone marks, and modifiers for aspiration and palatalization. The word ''spacing'' indicates that these ...
, 02B0–02FF
*
Phonetic Extensions
Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the ''Oxford English Dictionary'' and American dictionaries, and Americanist and Russianist phonetic notat ...
, 1D00–1D7F
*
Phonetic Extensions Supplement
Phonetic Extensions Supplement is a Unicode block containing characters for specialized and deprecated forms of the International Phonetic Alphabet
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based ...
, 1D80–1DBF
*
Latin Extended Additional
Latin Extended Additional is a Unicode block.
The characters in this block are mostly precomposed combinations of Latin letters with one or more general diacritical marks. Ninety of the characters are used in the Vietnamese alphabet
The Vie ...
, 1E00–1EFF
*
Superscripts and Subscripts
Superscripts and Subscripts is a Unicode block containing superscript and subscript numerals, mathematical operators, and letters used in mathematics and phonetics. The use of subscripts and superscripts in Unicode allows any polynomial, chemic ...
, 2070–209F
*
Letterlike Symbols
Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not exp ...
, 2100–214F
*
Number Forms
Number Forms is a Unicode block containing Unicode compatibility characters that have specific meaning as numbers, but are constructed from other characters. They consist primarily of vulgar fractions and Roman numerals. In addition to the ch ...
, 2150–218F
*
Latin Extended-C
Latin Extended-C is a Unicode block containing Latin characters for Uighur New Script, the Uralic Phonetic Alphabet, Shona, Claudian Latin and the Swedish Dialect Alphabet.
Block
History
The following Unicode-related documents record the ...
, 2C60–2C7F
*
Latin Extended-D
Latin Extended-D is a Unicode block containing Latin (script), Latin characters for phonetic, Mayanist, and Medieval transcription and notation systems. 89 of the characters in this block are for medieval characters proposed by the Medieval Unic ...
, A720–A7FF
*
Latin Extended-E
Latin Extended-E is a Unicode block containing Latin script characters used in German dialectology (Teuthonista), Anthropos (journal), Anthropos alphabet, Yakut scripts, Sakha and Americanist phonetic notation, Americanist usage.
Block
Histo ...
, AB30–AB6F
*
Alphabetic Presentation Forms
Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts.
Block
History
The following Unicode-related documents record the purpose and process of defining specific characters in ...
(Latin ligatures) FB00–FB4F
*
Halfwidth and Fullwidth Forms
In CJK characters, CJK (Chinese, Japanese, and Korean) computing, graphic characters are traditionally classed into fullwidth and halfwidth characters. Unlike monospaced fonts, a halfwidth character occupies half the width of a fullwidth characte ...
, FF00–FFEF
*
Latin Extended-F
Latin Extended-F is a Unicode block containing modifier letters, nearly all IPA and extIPA, for phonetic transcription. The Latin Extended-F and -G blocks contain the first Latin characters defined outside of the Basic Multilingual Plane (BMP). ...
, 10780–107BF
*
Latin Extended-G
Latin Extended-G is a Unicode block containing additional characters for phonetic transcription. The Latin Extended-F and -G blocks contain the first Latin characters defined outside of the Basic Multilingual Plane
In the Unicode standard, a p ...
, 1DF00–1DFFF
In addition, a number of Latin-like characters are encoded in the
Currency Symbols,
Control Pictures
Control Pictures is a Unicode block containing characters for graphically representing the C0 control codes, and other control characters. Its block name in Unicode 1.0 was Pictures for Control Codes.
Block
History
The following Unicode-rel ...
,
CJK Compatibility
CJK Compatibility is a Unicode block containing square symbols (both CJK and Latin alphanumeric) encoded for compatibility with East Asian character sets. In Unicode 1.0, it was divided into two blocks, named CJK Squared Words (U+3300–U+337F) ...
,
Enclosed Alphanumerics
Enclosed Alphanumerics is a Unicode block of Typography, typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.
It is currently fully allocated. Within the Basic Multi ...
,
Enclosed CJK Letters and Months
Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alp ...
,
Mathematical Alphanumeric Symbols
Mathematical Alphanumeric Symbols is a Unicode block comprising styled forms of Latin alphabet, Latin and Greek alphabet, Greek letters and decimal numerical digit, digits that enable mathematicians to denote different notions with different l ...
, and
Enclosed Alphanumeric Supplement
Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supple ...
blocks, but, although they are Latin letters graphically, they have the script property ''
common
Common may refer to:
As an Irish surname, it is anglicised from Irish Gaelic surname Ó Comáin.
Places
* Common, a townland in County Tyrone, Northern Ireland
* Boston Common, a central public park in Boston, Massachusetts
* Cambridge Com ...
'', and, so, do not belong to the Latin script in Unicode terms.
Lisu
Lisu may refer to:
*Lisu people, an ethnic group of the mountainous regions of Yunnan (China), Arunachal Pradesh (India), northern Myanmar and Thailand
*Lisu language, Tibeto-Burman language spoken by the Lisu people
**Fraser script or Old Lisu A ...
also consists almost entirely of Latin forms, but uses its own script property.
Table of characters
In this table those characters with the
Unicode script property of Latin are highlighted in colour, indicating the version of Unicode they were introduced in. Reserved code points (which may be assigned as characters at a future date) have a grey background. All characters that do not belong to the Latin script have a white background (and the version of Unicode they were introduced in is therefore not indicated).
See also
*
Universal Character Set characters
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/ WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set ( UCS, offici ...
*
Letterlike Symbols (Unicode block)
Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not exp ...
*
List of Latin-script letters
*
List of Latin letters by shape
*
Mathematical Alphanumeric Symbols
Mathematical Alphanumeric Symbols is a Unicode block comprising styled forms of Latin alphabet, Latin and Greek alphabet, Greek letters and decimal numerical digit, digits that enable mathematicians to denote different notions with different l ...
*
European Unicode subset (DIN 91379)
References
{{DEFAULTSORT:Latin Characters in Unicode
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
*