JBIG2
   HOME

TheInfoList



OR:

JBIG2 is an
image compression Image compression is a type of data compression applied to digital images, to reduce their cost for computer data storage, storage or data transmission, transmission. Algorithms may take advantage of visual perception and the statistical properti ...
standard for bi-level images, developed by the Joint Bi-level Image Experts Group. It is suitable for both
lossless Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statisti ...
and
lossy In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
compression. According to a press release from the Group, in its lossless mode JBIG2 typically generates files 3–5 times smaller than Fax Group 4 and 2–4 times smaller than
JBIG JBIG is an early lossless image compression standard from the Joint Bi-level Image Experts Group, standardized as ISO/ IEC standard 11544 and as ITU-T recommendation T.82 in March 1993. It is widely implemented in fax machines. Now that the new ...
, the previous bi-level compression standard released by the Group. JBIG2 was published in 2000 as the international standard ITU T.88, and in 2001 as
ISO The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. Me ...
/ IEC 14492.


Functionality

Ideally, a JBIG2 encoder will segment the input page into regions of text, regions of
halftone Halftone is the reprographic technique that simulates continuous tone, continuous-tone imagery through the use of dots, varying either in size or in spacing, thus generating a gradient-like effect.Campbell, Alastair. ''The Designer's Lexicon''. ...
images, and regions of other data. Regions that are neither text nor halftones are typically compressed using a context-dependent
arithmetic coding Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a String (computer science), string of characters is represented using a fixed number of bits per character, as in the American Standard Code for In ...
algorithm called the MQ coder. Textual regions are compressed as follows: the foreground pixels in the regions are grouped into symbols. A dictionary of symbols is then created and encoded, typically also using context-dependent arithmetic coding, and the regions are encoded by describing which symbols appear where. Typically, a symbol will correspond to a character of text, but this is not required by the compression method. For lossy compression the difference between similar symbols (e.g., slightly different impressions of the same letter) can be neglected; for lossless compression, this difference is taken into account by compressing one similar symbol using another as a template. Halftone images may be compressed by reconstructing the
grayscale image In digital photography, computer-generated imagery, and colorimetry, a greyscale (more common in Commonwealth English) or grayscale (more common in American English) image is one in which the value of each pixel is a single sample (signal), s ...
used to generate the halftone and then sending this image together with a dictionary of halftone patterns. Overall, the algorithm used by JBIG2 to compress text is very similar to the JB2 compression scheme used in the DjVu file format for coding binary images.
PDF Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
files versions 1.4 and above may contain JBIG2-compressed data.
Open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
decoders for JBIG2 are jbig2dec ( AGPL), the java-based jbig2-imageio ( Apache-2), the JavaScript-based jbig2.js ( Apache-2), and the decoder by Glyph & Cog LLC found in
Xpdf Xpdf is a free and open-source PDF viewer and toolkit based on the Qt framework. Versions prior to 4.00 were written for the X Window System and Motif. Functions Xpdf runs on nearly any Unix-like operating system. Binaries are also availabl ...
and Poppler (both GPL). An open-source encoder is jbig2enc ( Apache-2).


Technical details

Typically, a bi-level image consists mainly of a large amount of textual and halftone data, in which the same shapes appear repeatedly. The bi-level image is segmented into three regions: text, halftone, and generic regions. Each region is coded differently and the coding methodologies are described in the following passage.


Text image data

Text coding is based on the nature of human visual interpretation. A human observer cannot tell the difference between two instances of the same characters in a bi-level image even though they may not exactly match pixel by pixel. Therefore, only the bitmap of one representative character instance needs to be coded instead of coding the bitmaps of each occurrence of the same character individually. For each character instance, the coded instance of the character is then stored into a "symbol dictionary".F. Ono, W. Rucklidge, R. Arps, and C. Constantinescu, "JBIG2-the ultimate bi-level image coding standard", Image Processing, 2000. Proceedings. 2000 International Conference on, vol. 1, pp. 140–143 vol. 1, 2000. There are two encoding methods for text image data: pattern matching and substitution (PM&S) and soft pattern matching (SPM). ''Pattern matching and substitution'' (PM&S) is the more classic coding method. The encoder performs
image segmentation In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects (Set (mathematics), sets of pixels). The goal of segmen ...
to isolate character-sized chunks. For each individual chunk, the encoder looks for a match in the bitmap dictionary. If a match exists, we code an index of the corresponding representative bitmap in the dictionary and the position of the character on the page. The position is usually relative to another previously coded character. If a match is not found, the segmented pixel block is coded directly and added into the dictionary. Typical procedures of pattern matching and substitution algorithm are displayed in the left block diagram of the figure above. Although the method of PM&S can achieve outstanding compression, substitution errors could be made during the process if the image resolution is low. JBIG2 improves on PM&S with optional ''soft pattern matching'' (SPM). The same segmentation and searching is performed, but for each found match, the encoder saves not only the corresponding dictionary entry, but also ''refinement data'' describing the difference between the actual chunk and the dictionary chunk. Doing so greatly reduces substitution errors. Since the dictionary match requires that the actual character and the dictionary character are highly similar, SPM only adds a tiny amount of data.


Halftones

Halftone Halftone is the reprographic technique that simulates continuous tone, continuous-tone imagery through the use of dots, varying either in size or in spacing, thus generating a gradient-like effect.Campbell, Alastair. ''The Designer's Lexicon''. ...
images can be compressed using two methods. One of the methods is similar to the context-based
arithmetic coding Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a String (computer science), string of characters is represented using a fixed number of bits per character, as in the American Standard Code for In ...
algorithm, which adaptively positions the template pixels in order to obtain correlations between the adjacent pixels. In the second method, descreening is performed on the halftone image so that the image is converted back to grayscale. The converted grayscale values are then used as indexes of fixed-sized tiny bitmap patterns contained in a halftone bitmap dictionary. This allows decoder to successfully render a halftone image by presenting indexed dictionary bitmap patterns neighboring with each other.


Entropy coding

All three region types including text,
halftone Halftone is the reprographic technique that simulates continuous tone, continuous-tone imagery through the use of dots, varying either in size or in spacing, thus generating a gradient-like effect.Campbell, Alastair. ''The Designer's Lexicon''. ...
, and generic regions may all use arithmetic coding or huffman coding. JBIG2 specifically uses the MQ coder, the same entropy encoder employed by
JPEG 2000 JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi (later the JPEG president), with the intention of superseding their ...
.


Patents

Patents for JBIG2 are owned by IBM and Mitsubishi. Free licenses should be available after a request. JBIG and JBIG2 patents are not the same.


Character substitution errors in scanned documents

Some implementations of JBIG2 using lossy compression can potentially alter the characters in documents that are scanned to PDF. Unlike some other algorithms where
compression artifact A compression artifact (or artefact) is a noticeable distortion of media (including Image, images, Sound recording, audio, and video) caused by the application of lossy compression. Lossy data compression involves discarding some of the medi ...
s are obvious, such as blurring or mosquito noise, JBIG2's "
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...
" matches up similar-looking symbols. If the matching is implemented poorly, especially in low-resolution scans where characters are less clearly defined, similar characters may get erroneously swapped. But as noted by computer scientist David Kriesel, who discovered such a problem as described below, ''"the error cause is not JBIG2 itself"''. In 2013, various substitutions (e.g., replacing "6" with "8") were reported to happen on many
Xerox Xerox Holdings Corporation (, ) is an American corporation that sells print and electronic document, digital document products and services in more than 160 countries. Xerox was the pioneer of the photocopier market, beginning with the introduc ...
Workcentre
photocopier A photocopier (also called copier or copy machine, and formerly Xerox machine, the generic trademark) is a machine that makes copies of documents and other visual images onto paper or plastic film quickly and cheaply. Most modern photocopiers ...
and printer machines. Numbers printed on scanned (but not OCR-ed) documents had potentially been altered. This has been demonstrated on construction blueprints and some tables of numbers; the potential impact of such substitution errors in documents such as
medical prescription A prescription, often abbreviated or Rx, is a formal communication from physicians or other registered healthcare professionals to a pharmacist, authorizing them to dispense a specific prescription drug for a specific patient. Historicall ...
s was briefly mentioned. German computer scientist David Kriesel and Xerox were investigating this. Xerox subsequently acknowledged that this was a long-standing software defect, and their initial statements in suggesting that only non-factory settings could introduce the substitution were incorrect. No attempt was made to recall or mandate updates to the affected devices – which was acknowledged to affect more than a dozen product families. However, in August 2013 a
software patch Software consists of computer programs that instruct the execution of a computer. Software also includes design documents and specifications. The history of software is closely tied to the development of digital computers in the mid-20th cen ...
was made available, that when installed, automatically disabled pattern matching. Documents previously scanned continue to potentially contain errors making their veracity difficult to substantiate. Following publicity about the potential for errors authorities in some countries made statements to prevent the use of JBIG2. In Germany the
Federal Office for Information Security The Federal Office for Information Security (, abbreviated as BSI) is the German upper-level federal agency in charge of managing computer and communication security for the German government. Its areas of expertise and responsibility includ ...
has issued a technical guideline that says the JBIG2 encoding "MUST NOT be used" for "replacement scanning". In Switzerland the Coordination Office for the Permanent Archiving of Electronic Documents (Koordinationsstelle für die dauerhafte Archivierung elektronischer Unterlagen) has recommended against the use of JBIG2 when creating PDF documents.


Exploit

A vulnerability in the
Xpdf Xpdf is a free and open-source PDF viewer and toolkit based on the Qt framework. Versions prior to 4.00 were written for the X Window System and Motif. Functions Xpdf runs on nearly any Unix-like operating system. Binaries are also availabl ...
implementation of JBIG2, re-used in Apple's
iOS Ios, Io or Nio (, ; ; locally Nios, Νιός) is a Greek island in the Cyclades group in the Aegean Sea. Ios is a hilly island with cliffs down to the sea on most sides. It is situated halfway between Naxos and Santorini. It is about long an ...
phone operating software, was used by the Pegasus spyware to implement a zero-click attack on
iPhone The iPhone is a line of smartphones developed and marketed by Apple that run iOS, the company's own mobile operating system. The first-generation iPhone was announced by then–Apple CEO and co-founder Steve Jobs on January 9, 2007, at ...
s by constructing an emulated computer architecture inside a JBIG2 stream. Apple fixed this "
FORCEDENTRY FORCEDENTRY, also capitalized as ForcedEntry, is a security exploit allegedly developed by NSO Group to deploy their Pegasus spyware. It enables the " zero-click" exploit that is prevalent in iOS 13 and below, but also compromises recent safegu ...
" vulnerability in iOS 14.8 in September 2021.


See also

*
JBIG JBIG is an early lossless image compression standard from the Joint Bi-level Image Experts Group, standardized as ISO/ IEC standard 11544 and as ITU-T recommendation T.82 in March 1993. It is widely implemented in fax machines. Now that the new ...


References


External links


T.88: Lossy/lossless coding of bi-level images
{{DEFAULTSORT:Jbig2 Lossless compression algorithms Lossy compression algorithms Graphics file formats Image compression 2