Unicode font
   HOME

TheInfoList



OR:

A Unicode font is a
computer font A computer font is implemented as a digital data file containing a set of graphically related glyphs. A computer font is designed and created using a font editor. A computer font specifically designed for the computer screen, and not for print ...
that maps glyphs to
code point In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but ...
s defined in the
Unicode Standard Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, whic ...
. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single
writing system A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable fo ...
, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and
Unicode symbols In computing, a Unicode symbol is a Unicode character which is not part of a script used to write a natural language, but is nonetheless available for use as part of a text. Many of the symbols are drawn from existing character sets or ISO/IEC or ...
are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a
TrueType TrueType is an outline font standard developed by Apple in the late 1980s as a competitor to Adobe's Type 1 fonts used in PostScript. It has become the most common format for fonts on the classic Mac OS, macOS, and Microsoft Windows operating ...
font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters (). This article lists some widely used Unicode fonts (shipped with an operating system or produced by a well-known commercial font company) that support a comparatively large number and broad range of Unicode characters.


Background

The
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
standard does not specify or create any font (
typeface A typeface (or font family) is the design of lettering that can include variations in size, weight (e.g. bold), slope (e.g. italic), width (e.g. condensed), and so on. Each of these variations of the typeface is a font. There are thousands o ...
), a collection of graphical shapes called glyphs, itself. Rather, it defines the abstract characters as a specific number (known as a ''code point'') and also defines the required changes of shape depending on the context the glyph is used in (e.g., combining characters,
precomposed character A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacri ...
s and
letter Letter, letters, or literature may refer to: Characters typeface * Letter (alphabet), a character representing one or more of the sounds used in speech; any of the symbols of an alphabet. * Letterform, the graphic form of a letter of the alphabe ...
-
diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacriti ...
combinations). The choice of font, which governs how the abstract characters in the Universal Coded Character Set (UCS) are converted into a bitmap or vector output that can then be viewed on a screen or printed, is left up to the user. If a font is chosen which does not contain a glyph for a code point used in the document, it typically displays a question mark, a box, or some other
substitute character In computer data, a substitute character (␚) is a control character that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unreprese ...
.
Computer font A computer font is implemented as a digital data file containing a set of graphically related glyphs. A computer font is designed and created using a font editor. A computer font specifically designed for the computer screen, and not for print ...
s use various techniques to display characters or glyphs. A
bitmap font A computer font is implemented as a digital data file containing a set of graphically related glyphs. A computer font is designed and created using a font editor. A computer font specifically designed for the computer screen, and not for print ...
contains a grid of dots known as
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the ...
s forming an image of each glyph in each face and size.
Outline font A computer font is implemented as a digital data file containing a set of graphically related glyphs. A computer font is designed and created using a font editor. A computer font specifically designed for the computer screen, and not for print ...
s (also known as vector fonts) use drawing instructions or mathematical formulæ to describe each glyph.
Stroke font A computer font is implemented as a digital data file containing a set of graphically related glyphs. A computer font is designed and created using a font editor. A computer font specifically designed for the computer screen, and not for print ...
s use a series of specified lines (for the glyph's border) and additional information to define the ''profile'', or ''size'' and shape of the line in a specific face and size, which together describe the appearance of the glyph. Fonts also include embedded special orthographic rules to output certain combinations of letterforms (an alternative symbols for the same letter) be combined into special ligature forms (mixed characters).
Operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...
s, web browsers ( user agent), and other software that extensively use typography, use a font to display text on the screen or print media, and can be programmed to use those embedded rules. Alternatively, they may use external script-shaping technologies (rendering technology or “
smart font Smart or SMART may refer to: Arts and entertainment * ''Smart'' (Hey! Say! JUMP album), 2014 * Smart (Hotels.com), former mascot of Hotels.com * ''Smart'' (Sleeper album), 1995 debut album by Sleeper * '' SMart'', a children's television se ...
” engine), and they can also be programmed to use either a large Unicode font, or use multiple different fonts for different characters or languages. No single "Unicode font" includes all the characters defined in the present
revision Revision is the process of revising. More specifically, it may refer to: * Update, a modification of software or a database * Revision control, the management of changes to sets of computer files * ''ReVisions'', a 2004 anthology of alternate hi ...
of
ISO 10646 ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Iso ...
(Unicode) standard, as more and more languages and characters are continually added to it, and common font formats cannot contain more than 65,535 glyphs (about half the number of characters encoded in Unicode). As a result, font developers and foundries incorporate new characters in newer versions or revisions of a font, or in separate auxiliary fonts intended specifically for particular languages. UCS has over 1.1 million code points, but only the first 65,536 (the Plane 0:
Basic Multilingual Plane In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecima ...
, or BMP) had entered into common use before 2000. :''See the
Unicode planes In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
article for more information on other planes, including: Plane 1:
Supplementary Multilingual Plane In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
(SMP), Plane 2:
Supplementary Ideographic Plane In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecima ...
(SIP), Plane 14: Supplementary Special-purpose Plane (SSP), Plane 15 and 16: reserved for
Private Use Areas In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearl ...
(PUA).'' The first Unicode fonts (with very large character sets and supporting many
Unicode blocks A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the a ...
) were
Lucida Sans Unicode In digital typography, Lucida Sans Unicode OpenType font from the design studio of Bigelow & HolmesAll Bigelow & Holmes Lucida typefaces are distributed by the designers througThe Lucida Fonts Storeand a subset of Lucida fonts is distributed bAs ...
(released March 1993), Unihan font (1993), and Everson Mono (1995).


Issues

There are typographical ambiguities in Unicode, so that some of the unified Han characters (seen in Chinese, Japanese, and Korean) will be typographically different in different regions. For example, Unicode point is typographically different between simplified Chinese and traditional Chinese. This has implications for the idea that a single typeface can satisfy the needs of all locales.Ken Lunde, ''CJKV Information Processing'', O'Reilly Inc, 1999. Page 128, "CJKV character form differences" The design of Unicode ensures that such differences do not create semantic ambiguity, but the use of incorrect forms is often considered visually awkward or aesthetically inappropriate to native readers of East Asian languages.


Application of Unicode fonts

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
is now the standard encoding for many new standards and protocols, and is built into the architecture of operating systems ( Microsoft Windows,
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, where its wild ancestor, ' ...
Mac OS, and many versions of
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
and
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, w ...
), programming languages (
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, ...
,
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
, Common LISP, APL), and libraries (IBM
International Components for Unicode International Components for Unicode (ICU) is an open-source project of mature C/ C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environ ...
(ICU), along with the
Pango Pango (stylized as Παν語) is a text (i.e. glyph) layout engine library which works with the HarfBuzz shaping engine for displaying multi-language text. Full-function rendering of text and cross-platform support is achieved when Pango is us ...
,
Graphite Graphite () is a crystalline form of the element carbon. It consists of stacked layers of graphene. Graphite occurs naturally and is the most stable form of carbon under standard conditions. Synthetic and natural graphite are consumed on lar ...
, Scribe,
Uniscribe Uniscribe is the Microsoft Windows set of services for rendering Unicode-encoded text, supporting complex text layout. It is implemented in the dynamic link library . Uniscribe has been released with Windows 2000 and Internet Explorer 5.0. In addi ...
, and
ATSUI The Apple Type Services for Unicode Imaging (ATSUI) is the set of services for rendering Unicode-encoded text introduced in Mac OS 8.5 and carried forward into Mac OS X. It replaced the WorldScript engine for legacy encodings. Obsolescence ...
rendering engines), font formats (
TrueType TrueType is an outline font standard developed by Apple in the late 1980s as a competitor to Adobe's Type 1 fonts used in PostScript. It has become the most common format for fonts on the classic Mac OS, macOS, and Microsoft Windows operating ...
and
OpenType OpenType is a format for scalable computer fonts. It was built on its predecessor TrueType, retaining TrueType's basic structure and adding many intricate data structures for prescribing typographic behavior. OpenType is a registered trademark ...
) and so on. Many other standards are also getting upgraded to be Unicode-compliant.


Utility software

Here is a selection of some of the
utility software Utility software is software designed to help analyze, configure, optimize or maintain a computer. It is used to support the computer infrastructure - in contrast to application software, which is aimed at directly performing tasks that benefit or ...
that can identify the characters present in a font file: *
Character Map Character Map is a utility included with Microsoft Windows operating systems and is used to view the characters in any installed font, to check what keyboard input ( Alt code) is used to enter those characters, and to copy characters to the cli ...
, applet included with Microsoft Windows * Font Book, application included with Mac OS * GNOME Character Map, application included with the GNOME desktop environment * BabelMap, third-party software for Windows


List of Unicode fonts

Of the many Unicode fonts available, those listed below are the most commonly used worldwide on mainstream computing platforms. ; Note :OTF+TTO:
OpenType OpenType is a format for scalable computer fonts. It was built on its predecessor TrueType, retaining TrueType's basic structure and adding many intricate data structures for prescribing typographic behavior. OpenType is a registered trademark ...
font with
TrueType TrueType is an outline font standard developed by Apple in the late 1980s as a competitor to Adobe's Type 1 fonts used in PostScript. It has become the most common format for fonts on the classic Mac OS, macOS, and Microsoft Windows operating ...
outlines. :OpenType fonts sometimes don't contain a one-by-one kernpair table but a kern-by-classes table where groups of similar characters are seen as one kern group. For instance, ''V'' and ''W'' have nearly the same left and right geometry. So “0” doesn't mean that no kerning is supported. :Register after "reasonable" period (author's words). :Includes more than 27,000 Hanzi glyphs from WenQuanYi Bitmap Song font. :Han Nom A covers mainly CJK U Ideographs Ext A, and Han Nom B covers mostly Ext B. :Sun-Ext A covers 102 blocks of different languages. Sun-ExtB covers mostly CJK Supplement, CJK U Ideographs Ext B, C, TaiXuan Jing. :Zen Hei, Zen Hei Mono and Zen Hei Sharp co-exist in a single TTC file; also with embedded bitmaps. Latin/Hangul derived from UnDotum, Bopomofo derived from cwTeX, mono-spaced Latin from M+ M2 Light. Full CJK coverage. Included with Fedora Linux, Ubuntu Linux.


Comparison of fonts

Number of characters included by the above version of fonts, for different
Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ...
s are listed below. ''Basic Latin (128: )'' means that in the range called 'Basic Latin', there are 128 assigned codes, numbered 0 to 7F. The cells then show the number of those codes which are covered by each font. Unicode blocks listed are valid for Unicode version 8.0. :Cells shaded green indicate complete coverage. :Cells shaded blue are not complete, but are the most complete of the fonts listed. :Empty cells indicate that no character exists in that block.


0000–077F


0780–139F


13A0–1DBF


1DC0–257F


2580–2DFF


2E00–4DBF


4DC0–FAFF


FB00–FFFF


List of SMP Unicode fonts


10000–1F9FF

Unicode blocks listed are valid for Unicode version 8.0.


List of SIP Unicode fonts


20000–2FFFF

Unicode blocks listed are valid for Unicode version 8.0.


List of SSP Unicode fonts


E0000–EFFFF

Unicode blocks listed are valid for Unicode version 8.0.


See also


References


External links


ISO/IEC JTC1/SC2/WG2
the working group in charge of ISO 10646

at Unicode.org
Unicode Font Guide For Free/Libre Open Source Operating Systems
— A huge index of high quality free fonts.

— Index of free and commercial Unicode fonts.

— Enable Unicode for applications.
Microsoft Typography – Fonts and Products
— Reference for determining which fonts are supplied with Microsoft products. {{Unicode navigation Unicode Unicode typefaces Natural language and computing