PGP Word List
   HOME

TheInfoList



OR:

The PGP Word List ("
Pretty Good Privacy Pretty Good Privacy (PGP) is an encryption software, encryption program that provides cryptographic privacy and authentication for data communication. PGP is used for digital signature, signing, encrypting, and decrypting texts, Email, e-mail ...
word list", also called a biometric word list for reasons explained below) is a list of
word A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...
s for conveying data
bytes The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
in a clear unambiguous way via a voice channel. They are analogous in purpose to the
NATO phonetic alphabet The International Radiotelephony Spelling Alphabet or simply the Radiotelephony Spelling Alphabet, commonly known as the NATO phonetic alphabet, is the most widely used set of clear-code words for communicating the letters of the Latin/Roman ...
, except that a longer list of words is used, each word corresponding to one of the 256 distinct numeric byte values.


History and structure

The PGP Word List was designed in 1995 by Patrick Juola, a
computational linguist Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
, and Philip Zimmermann, creator of PGP. The words were carefully chosen for their
phonetic Phonetics is a branch of linguistics that studies how humans produce and perceive sounds or, in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians ...
distinctiveness, using
genetic algorithms In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to g ...
to select lists of words that had optimum separations in
phoneme A phoneme () is any set of similar Phone (phonetics), speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible Phonetics, phonetic unit—that helps distinguish one word fr ...
space. The candidate word lists were randomly drawn from
Grady Ward William Grady Ward (born April 4, 1951) is an American software engineer, lexicographer, and Internet activist who has been prominent in the Scientology versus the Internet controversy. Biography Grady Ward created the Moby Project, an extens ...
's Moby Pronunciator list as raw material for the search, successively refined by the genetic algorithms. The automated search converged to an optimized solution in about 40 hours on a
DEC Alpha Alpha (original name Alpha AXP) is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set computers ( ...
, a particularly fast machine in that era. The Zimmermann–Juola list was originally designed to be used in PGPfone, a secure VoIP application, to allow the two parties to verbally compare a short authentication string to detect a
man-in-the-middle attack In cryptography and computer security, a man-in-the-middle (MITM) attack, or on-path attack, is a cyberattack where the attacker secretly relays and possibly alters the communications between two parties who believe that they are directly communi ...
(MiTM). It was called a
biometric Biometrics are body measurements and calculations related to human characteristics and features. Biometric authentication (or realistic authentication) is used in computer science as a form of identification and access control. It is also used t ...
word list because the authentication depended on the two human users recognizing each other's distinct voices as they read and compared the words over the voice channel, binding the identity of the speaker with the words, which helped protect against the MiTM attack. The list can be used in many other situations where a biometric binding of identity is not needed, so calling it a biometric word list may be imprecise. Later, it was used in PGP to compare and verify PGP
public key Public-key cryptography, or asymmetric cryptography, is the field of cryptographic systems that use pairs of related keys. Each key pair consists of a public key and a corresponding private key. Key pairs are generated with cryptographic alg ...
fingerprints A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surf ...
over a voice channel. This is known in PGP applications as the "biometric" representation. When it was applied to PGP, the list of words was further refined, with contributions by
Jon Callas Jon Callas is an American computer security expert, software engineer, user experience designer, and technologist who is the co-founder and former CTO of the global encrypted communications service Silent Circle.http://www.linkedin.com/in/joncal ...
. More recently, it has been used in Zfone and the
ZRTP ZRTP (composed of Z and Real-time Transport Protocol) is a cryptographic key-agreement protocol to negotiate the keys for encryption between two end points in a Voice over IP (VoIP) phone telephony call based on the Real-time Transport Protocol ...
protocol, the successor to PGPfone. The list is actually composed of two lists, each containing 256 phonetically distinct words, in which each word represents a different byte value between 0 and 255. Two lists are used because reading aloud long random sequences of human words usually risks three kinds of errors: 1) transposition of two consecutive words, 2) duplicate words, or 3) omitted words. To detect all three kinds of errors, the two lists are used alternately for the even-offset bytes and the odd-offset bytes in the byte sequence. Each byte value is actually represented by two different words, depending on whether that byte appears at an even or an odd offset from the beginning of the byte sequence. The two lists are readily distinguished by the number of
syllables A syllable is a basic unit of organization within a sequence of speech sounds, such as within a word, typically defined by linguists as a ''nucleus'' (most often a vowel) with optional sounds before or after that nucleus (''margins'', which are ...
; the even list has words of two syllables, the odd list has three. The two lists have a maximum word length of 9 and 11 letters, respectively. Using a two-list scheme was suggested by Zhahai Stewart.


Word lists

Here are the two lists of words as presented in the PGPfone Owner's Manual.


Examples

Each byte in a bytestring is encoded as a single word. A sequence of bytes is rendered in
network byte order '' Jonathan_Swift.html" ;"title="Gulliver's Travels'' by Jonathan Swift">Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word (data type), word of d ...
, from left to right. For example, the leftmost (i.e. byte 0) is considered "even" and is encoded using the PGP Even Word table. The next byte to the right (i.e. byte 1) is considered "odd" and is encoded using the PGP Odd Word table. This process repeats until all bytes are encoded. Thus, "E582" produces "topmost Istanbul", whereas "82E5" produces "miser travesty". A PGP public key fingerprint that displayed in
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
as :E582 94F2 E9A2 2748 6E8B :061B 31CC 528F D7FA 3F19 would display in PGP Words (the "biometric" fingerprint) as :topmost Istanbul Pluto vagabond treadmill Pacific brackish dictator goldfish Medusa :afflict bravado chatter revolver Dupont midsummer stopwatch whimsical cowbell bottomless The order of bytes in a bytestring depends on
endianness file:Gullivers_travels.jpg, ''Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word (data type), word of digital data are transmitted over a data comm ...
.


Other word lists for data

There are several other word lists for conveying data in a clear unambiguous way via a voice channel: * the
NATO phonetic alphabet The International Radiotelephony Spelling Alphabet or simply the Radiotelephony Spelling Alphabet, commonly known as the NATO phonetic alphabet, is the most widely used set of clear-code words for communicating the letters of the Latin/Roman ...
maps individual letters and digits to individual words * the S/KEY system maps 64 bit numbers to 6 short words of 1 to 4 characters each from a publicly accessible 2048-word dictionary. The same dictionary is used in RFC 1760 and RFC 2289. * the
Diceware Diceware is a method for creating passphrases, passwords, and other cryptographic variables using ordinary dice as a hardware random number generator. For each word in the passphrase, five rolls of a six-sided die are required. The numbers f ...
system maps five base-6 random digits (almost 13 bits of entropy) to a word from a dictionary of 7,776 distinct words. ** the
Electronic Frontier Foundation The Electronic Frontier Foundation (EFF) is an American international non-profit digital rights group based in San Francisco, California. It was founded in 1990 to promote Internet civil liberties. It provides funds for legal defense in court, ...
has published a set of improved word lists based on the same concept * FIPS 181: Automated Password Generator converts random numbers into somewhat pronounceable "words". * mnemonic encoding converts 32 bits of data into 3 words from a vocabulary of 1626 words.mnemonic encoding
an
updated code
/ref> * what3words encodes geographic coordinates in 3 dictionary words. * the BIP39 standard permits encoding a cryptographic key of fixed size (128 or 256 bits, usually the unencrypted master key of a
Cryptocurrency wallet A cryptocurrency wallet is a device, physical medium, program or an online service which stores the Public-key cryptography, public and/or private keys for cryptocurrency transactions. In addition to this basic function of storing the keys, a cr ...
) into a short sequence of readable words known as the
seed phrase A cryptocurrency wallet is a device, physical medium, program or an online service which stores the public and/or private keys for cryptocurrency transactions. In addition to this basic function of storing the keys, a cryptocurrency wallet more ...
, for the purpose of storing the key offline. This is used in cryptocurrencies such as
Bitcoin Bitcoin (abbreviation: BTC; Currency symbol, sign: ₿) is the first Decentralized application, decentralized cryptocurrency. Based on a free-market ideology, bitcoin was invented in 2008 when an unknown entity published a white paper under ...
or Monero. * Like the PGP word list, th
Bytewords
standard maps each possible byte to a word. There is only one list, rather than two. The words are uniformly four letters long and can be uniquely identified by their first and last letters


References

:''This article incorporates material that is copyrighted by PGP Corporation and has been licensed under the GNU Free Documentation License. (per Jon Callas, CTO, CSO PGP Corporation, 4-Jan-2007)'' {{reflist Spelling alphabets Binary-to-text encoding formats Military communications Cryptography OpenPGP