GNU Unifont is a free
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
bitmap font created by Roman Czyborra. The main Unifont covers all of the
Basic Multilingual Plane
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
(BMP). The "upper" companion covers significant parts of the
Supplementary Multilingual Plane
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
(SMP). The "Unifont JP" companion contains Japanese
kanji
are logographic Chinese characters, adapted from Chinese family of scripts, Chinese script, used in the writing of Japanese language, Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are ...
present in the
JIS X 0213
JIS X 0213 is a Japanese Industrial Standard defining coded character sets for encoding the characters used in Japan. This standard extends JIS X 0208. The first version was published in 2000 and revised in 2004 (JIS2004) and 2012. As well as ad ...
character set.
It is present in most free
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s and windowing systems such as
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
,
XFree86
XFree86 is an implementation of the X Window System. It was originally written for Unix-like operating systems on IBM PC compatibles and was available for many other operating systems and platforms. It is free software, free and Open-source softw ...
or the
X.Org Server, some embedded firmware such as
RockBox, as well as in
Minecraft Java Edition. The source code is released under the
GPL-2.0-or-later license. The font is released under the
GPL-2.0-or-later license with
Font-exception-2.0 (embedding the font in a document does not require the document to be placed under the same license); and from version 13.0.04, dual-licensed under
SIL Open Font License 1.1. The manual is released under the
GFDL-1.3-or-later license.
It became a GNU package in October 2013. The current maintainer is Paul Hardy.
Status
The
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
Basic Multilingual Plane
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
covers 2
16 (65,536) code points. Of this number, 2,048 are reserved for special use as UTF-16
surrogate pairs and 6,400 are reserved for
private use. This leaves 57,088 code points to which glyphs can be assigned. Some of these code points are special values that do not have an assigned glyph, but most do have assigned glyphs.
, the GNU Unifont has complete coverage of the
Basic Multilingual Plane
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
as defined in Unicode 12.1.0. Its companion fonts, Unifont Upper and Unifont CSUR, have significant coverage of the
Supplementary Multilingual Plane
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
and the
ConScript Unicode Registry
The ConScript Unicode Registry is a volunteer project to coordinate the assignment of code points in the Unicode Private Use Areas (PUA) for the encoding of artificial scripts, such as those for constructed languages. It was founded by John Woldema ...
, respectively.
For version 12.1.02, Unifont JP was released, which covers 10,000 Japanese
kanji
are logographic Chinese characters, adapted from Chinese family of scripts, Chinese script, used in the writing of Japanese language, Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are ...
present in the
JIS X 0213
JIS X 0213 is a Japanese Industrial Standard defining coded character sets for encoding the characters used in Japan. This standard extends JIS X 0208. The first version was published in 2000 and revised in 2004 (JIS2004) and 2012. As well as ad ...
character set, some of which are in the
Supplementary Ideographic Plane
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
. It is derived from Jiskan16, a public domain font.
Incomplete scripts can be added to by any contributor.
Most of the
CJK ideographs on the font have been copied from
WenQuanYi's
Unibit font with permission.
[
Unifont stores only one glyph per printable Unicode code point. Because of this, it does not feature the ]OpenType
OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corpora ...
features needed to render scripts with complex layouts correctly, and it does not correctly position the combining diacritics with base letters if these combinations are not encoded in Unicode in their pre-combined form; the contextual forms (including joining types, and subjoined clusters) are not handled as well. This increases the number of glyphs to include in the basic font and it is not currently possible (because of current OpenType limitations) to encode all the needed glyphs to represent all the required combinations that can exist in a single Unicode plane (this is also true for Chinese fonts that cannot cover completely all ideograms currently encoded in two planes, and also in a third plane). Unifont is then intended to only be used as a "last resort" default font, suitable for simple alphabetic scripts, or to render isolated characters, but will make actual texts difficult or sometimes impossible to read correctly. For correctly rendering Indic abugidas (and semitic abjads
An abjad ( or abgad) is a writing system in which only consonants are represented, leaving the vowel sounds to be inferred by the reader. This contrasts with alphabets, which provide graphemes for both consonants and vowels. The term was introd ...
if they are written with their optional combining diacritics), other fonts should be specified before this one, and additional fonts will be needed to cover Han ideographs encoded in supplementary planes, or to render most historic (or minority modern) scripts not encoded in the BMP.
Distribution
Unifont, as of version 15.0.6, is available in TTF (and OTF), BDF, PCF, .hex, and PSF formats for the "standard build". Only the TrueType build is split into two fonts.[
A few "specialized versions" have been built by request and made available by Paul Hardy. These include a bitmap TTF (SBIT) with empty glyphs filled with code-point values for ]FontForge
FontForge is a FOSS font editor which supports many common font formats. Developed primarily by George Williams until 2012, FontForge is free software and is distributed under a mix of the GNU General Public License Version 3 and the 3-clause ...
users to read, a PSF bitmap with glyphs for APL programmers, and single-file versions in Roman's .hex format (see below).[ The actual organization of the source consists of smaller .hex files to be stitched together and converted to other formats in a build.]
Vectorization
Luis Alejandro González Miranda wrote scripts to vectorize and convert the BDF font to TrueType
TrueType is an Computer font#Outline fonts, outline font standardization, standard developed by Apple Inc., Apple in the late 1980s as a competitor to Adobe Inc., Adobe's PostScript fonts#Type 1, Type 1 fonts used in PostScript. It has become the ...
format using FontForge
FontForge is a FOSS font editor which supports many common font formats. Developed primarily by George Williams until 2012, FontForge is free software and is distributed under a mix of the GNU General Public License Version 3 and the 3-clause ...
. Paul Hardy adjusted these scripts to handle combining characters (accents, etc.) for the latest TrueType versions.
.hex format
The GNU Unifont .hex format defines its glyphs as either 8 or 16 pixels in width by 16 pixels in height. Most Western script glyphs can be defined as 8 pixels wide, while other glyphs (notably the Chinese–Japanese–Korean, or CJK set) are typically defined as 16 pixels wide.
The unifont.hex file contains one line for each glyph. Each line consists of a four-digit Unicode hexadecimal code point, a colon, and the bitmap string. The bit string is 32 hexadecimal digits for an 8-pixel-wide glyph, or 64 hexadecimal digits for a 16-pixel-wide glyph. The goal is to create an intermediate format that would facilitate adding new glyphs.
The bit string is converted from hexadecimal to binary. A 1
bit in the binary bit string corresponds to an 'on' pixel. The pixel's bits are stored line by line, from the top to the bottom, in big-endian
'' Jonathan_Swift.html" ;"title="Gulliver's Travels'' by Jonathan Swift">Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined
In computing, endianness is the order in which bytes within a word (data type), word of d ...
order.
Example
This is an example font containing one glyph, for ASCII capital 'A'.
0041:0000000018242442427E424242420000
The first number is the hexadecimal Unicode code point, with range 0000 through FFFF. Hexadecimal 0041 is decimal 65, the code point for the letter 'A'. The colon separates the code point from the bitmap. In this example, the glyph is 8 pixels wide, so the bit string is 32 hexadecimal digits long.
The bit string begins with 8 zeros, so the top 4 rows will be empty (2 hexadecimal digits per 8 bit byte, with 8 bits per row for an 8 pixel-wide glyph). The bit string also ends with 4 zeros, so the bottom 2 rows will be empty. It is implicit from this that the default font descender is 2 rows below the baseline, and the capital height is 10 rows above the baseline. This is the case in the GNU Unifont with Latin glyphs.
Over time, a number of ways have been created to handle the format. The earliest way is the Perl script, which converts the string into an ASCII art
ASCII art is a graphic design technique that uses computers for presentation and consists of pictures pieced together from the 95 printable (from a total of 128) character (computing), characters defined by the ASCII Standard from 1963 and ASCI ...
representation to be edited in a text editor. Another method involves generating a bitmap image
In computing, a bitmap (also called raster) graphic is an image formed from rows of different colored pixels. A GIF is an example of a graphics image file that uses a bitmap.
As a noun, the term "bitmap" is very often used to refer to a partic ...
grid for an entire range of code points and working with an image editor. In either case, the edited glyphs are later converted back into .hex files for storage.
, +Decoded hexdraw representation of the example
, -
00 00000000 □□□□□□□□
00 00000000 □□□□□□□□
00 00000000 □□□□□□□□
00 00000000 □□□□□□□□
18 00011000 □□□██□□□
24 00100100 □□█□□█□□
24 00100100 □□█□□█□□
42 01000010 □█□□□□█□
42 01000010 □█□□□□█□
7E 01111110 □██████□
42 01000010 □█□□□□█□
42 01000010 □█□□□□█□
42 01000010 □█□□□□█□
42 01000010 □█□□□□█□
00 00000000 □□□□□□□□
00 00000000 □□□□□□□□
History
Roman Czyborra created the Unifont format in 1998 after earlier efforts dating to 1994.
In 2008, Luis Alejandro González Miranda wrote a program to convert Unifont into a TrueType font. Paul Hardy modified it later to support combining characters in the TrueType version.
Later, Richard Stallman
Richard Matthew Stallman ( ; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
published Unifont as a GNU package in October 2013, with Paul Hardy as its maintainer.
See also
*Fallback font
A fallback font is a reserve typeface containing symbols for as many Unicode characters as possible. When a display system encounters a character that is not part of the repertoire of any of the other available fonts, a symbol from a fallback fon ...
References
* The Unicode Consortium: ''The Unicode 5.0 Standard''. 5th, Addison Wesley 2007; .
External links
GNU Project Archives
{{DEFAULTSORT:Gnu Unifont
Free software Unicode typefaces
Unifont
Typefaces and fonts introduced in 1998
Raster typefaces