HOME

TheInfoList



OR:

In
philology Philology () is the study of language in Oral tradition, oral and writing, written historical sources. It is the intersection of textual criticism, literary criticism, history, and linguistics with strong ties to etymology. Philology is also de ...
and
linguistics Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
, decipherment is the discovery of the meaning of the symbols found in extinct
language Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
s and/or
alphabet An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
s. Decipherment is possible with respect to languages and scripts. One can also study or try to decipher how spoken languages that no longer exist were once pronounced, or how living languages used to be pronounced in prior eras. Notable examples of decipherment include the decipherment of ancient Egyptian scripts and the decipherment of cuneiform. A notable decipherment in recent years is that of the Linear Elamite script. Today, at least a dozen languages remain undeciphered. Historically speaking, decipherments do not come suddenly through single individuals who "crack" ancient scripts. Instead, they emerge from the incremental progress brought about by a broader community of researchers. Decipherment should not be confused with
cryptanalysis Cryptanalysis (from the Greek ''kryptós'', "hidden", and ''analýein'', "to analyze") refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic se ...
, which aims to decipher special written codes or
cipher In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is ''encipherment''. To encipher or encode i ...
s used in intentionally concealed secret communication (especially during war). It should also not be confused with determining the meaning of ambiguous text in a known language (interpretation).


Categories

Gelb and Whiting classify the four situations of an undeciphered language and how difficult decipherment will be in each of them: * Type O: known writing and known language. Although decipherment in this case is trivial, useful information can be gleaned when a known language is written in an alphabet other than the one it is commonly written in. Studying the writing of the Phoenician or Sumerian languages in the
Greek alphabet The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC. It was derived from the earlier Phoenician alphabet, and is the earliest known alphabetic script to systematically write vowels as wel ...
allows information about pronunciation and vocalization to be gleaned that cannot be obtained when studying the expression of these languages in their normal writing system. * Type I: unknown writing and known language. Deciphered languages in this category include Phoenician,
Ugaritic Ugaritic () is an extinct Northwest Semitic languages, Northwest Semitic language known through the Ugaritic texts discovered by French archaeology, archaeologists in 1928 at Ugarit, including several major literary texts, notably the Baal cycl ...
, Cypriot, and
Linear B Linear B is a syllabary, syllabic script that was used for writing in Mycenaean Greek, the earliest Attested language, attested form of the Greek language. The script predates the Greek alphabet by several centuries, the earliest known examp ...
. In this situation,
alphabet An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
ic systems are the easiest to decipher, followed by syllabic languages, and finally the most difficult being logo-syllabic. * Type II: known writing and unknown language. An example is Linear A. Strictly speaking, this situation is not one of decipherment but of linguistic analysis. Decipherment in this category is considered extremely difficult to achieve on the basis of internal information only. * Type III: unknown writing and unknown language. Examples include the Archanes script and the Archanes formula, Phaistos disk, Cretan hieroglyphs, and Cypro-Minoan syllabary. When this situation occurs in an isolated culture and without the availability of outside information, decipherment is typically considered impossible.


Methods

There is no single recipe or linear method for decipherment, however: instead, philologists and linguists must rely on a set of heuristic devices that have been established. Broadly, it is important to be familiar with the relevant texts where the script or language occurs in, access to accurate drawings or photographs of these texts, information about their relative chronology, and background information on where the texts occur in (their geography, perhaps being found in the context of a funerary monument, etc). These methods can be divided into approaches utilizing external or internal information.


External information

Many successful decipherments have proceeded from the discovery of external information, a common example being through the use of
multilingual inscription In epigraphy, a multilingual inscription is an inscription that includes the same text in two or more languages. A bilingual is an inscription that includes the same text in two languages (or trilingual in the case of three languages, etc.). Mult ...
s, such as the
Rosetta Stone The Rosetta Stone is a stele of granodiorite inscribed with three versions of a Rosetta Stone decree, decree issued in 196 BC during the Ptolemaic dynasty of ancient Egypt, Egypt, on behalf of King Ptolemy V Epiphanes. The top and middle texts ...
(with the same text in three scripts: Demotic, hieroglyphic, and Greek) that enabled the decipherment of Egyptian hieroglyphic. In principle, multilingual text may be insufficient for a decipherment as translation is not a linear and reversible process, but instead represents an encoding of the message in a different symbolic system. Translating a text from one language into a second, and then from the second language back into the first, rarely reproduces exactly the original writing. Likewise, unless a significant number of words are contained in the multilingual text, limited information can be gleaned from it.


Internal information

Internal approaches are multi-step: one must first ensure that the writing they are looking at represents real writing, as opposed to a grouping of pictorial representations or a modern-day forgery without further meaning. This is commonly approached with methods from the field of grammatology. Prior to decipherment of meaning, one can then determine the number of distinct
grapheme In linguistics, a grapheme is the smallest functional unit of a writing system. The word ''grapheme'' is derived from Ancient Greek ('write'), and the suffix ''-eme'' by analogy with ''phoneme'' and other emic units. The study of graphemes ...
s (which, in turn, allows one to tell if the writing system is alphabetic, syllabic, or logo-syllabic; this is because such writing systems typically do not overlap in the number of graphemes they use), the sequence of writing (whether it be from left to right, right to left, top to bottom, etc.), and the determination of whether individual words are properly segmented when the alphabet is written (such as with the use of a space or a different special mark) or not. If a repetitive schematic arrangement can be identified, this can help in decipherment. For example, if the last line of a text has a small number, it can be reasonably guessed to be referring to the date, where one of the words means "year" and, sometimes, a royal name also appears. Another case is when the text contains many small numbers, followed by a word, followed by a larger number; here, the word likely means "total" or "sum". After one has exhausted the information that can be inferentially derived from probable content, they must transition to the systematic application of statistical tools. These include methods concerning the frequency of appearance of each symbol, the order in which these symbols typically appear, whether some symbols appear at the beginning or end of words, etc. There are situations where orthographic features of a language make it difficult if not impossible to decipher specific features (especially without certain outside information), such as when an alphabet does not express double consonants. Additional, and more complex methods, also exist. Eventually, the application of such statistical methods becomes exceedingly laborious, in which computers might be used to apply them automatically.


Computational approaches

Computational approaches towards the decipherment of unknown languages began to appear in the late 1990s. Typically, there are two types of computational approaches used in language decipherment: approaches meant to produce translations in known languages, and approaches used to detect new information that might enable future efforts at translation. The second approach is more common, and includes things such as the detection of cognates or related words, discovery of the closest known language, word alignments, and more.


Artificial intelligence

In recent years, there has been a growing emphasis on methods utilizing
artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
for the decipherment of lost languages, especially through
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
(NLP) methods. Proof-of-concept methods have independently re-deciphered
Ugaritic Ugaritic () is an extinct Northwest Semitic languages, Northwest Semitic language known through the Ugaritic texts discovered by French archaeology, archaeologists in 1928 at Ugarit, including several major literary texts, notably the Baal cycl ...
and
Linear B Linear B is a syllabary, syllabic script that was used for writing in Mycenaean Greek, the earliest Attested language, attested form of the Greek language. The script predates the Greek alphabet by several centuries, the earliest known examp ...
using data from similar languages, in this case
Hebrew Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
and
Ancient Greek Ancient Greek (, ; ) includes the forms of the Greek language used in ancient Greece and the classical antiquity, ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Greek ...
.


Deciphering pronunciation

Related to attempts to decipher the meaning of languages and alphabets, include attempts to decipher how extinct writing systems, or older versions of contemporary writing systems (such as English in the 1600s) were pronounced. Several methods and criteria have been developed in this regard. Important criteria include (1) Rhymes and the testimony of poetry (2) Evidence from occasional spellings and misspellings (3) Interpretations of material in one language from authors in foreign languages (4) Information obtained from related languages (5) Grammatical changes in spelling over time. For example, analysis of poetry focuses on the use of wordplay or literary techniques between words that have a similar sound.
Shakespeare William Shakespeare ( 23 April 1564 – 23 April 1616) was an English playwright, poet and actor. He is widely regarded as the greatest writer in the English language and the world's pre-eminent dramatist. He is often called England's natio ...
's play ''
Romeo and Juliet ''The Tragedy of Romeo and Juliet'', often shortened to ''Romeo and Juliet'', is a Shakespearean tragedy, tragedy written by William Shakespeare about the romance between two young Italians from feuding families. It was among Shakespeare's ...
'' contains wordplay that relies on a similar sound between the words "soul" and "soles", allowing confidence that the similar pronunciation between the terms today also existed in Shakespeare's time. Another common source of information on pronunciation is when earlier texts use
rhyme A rhyme is a repetition of similar sounds (usually the exact same phonemes) in the final Stress (linguistics), stressed syllables and any following syllables of two or more words. Most often, this kind of rhyming (''perfect rhyming'') is consciou ...
, such as when consecutive lines in poetry end in the similar or the same sound. This method does have some limitations however, as texts may use rhymes that rely on visual similarities between words (such as 'love' and 'remove') as opposed to auditory similarities, and that rhymes can be
imperfect The imperfect ( abbreviated ) is a verb form that combines past tense (reference to a past time) and imperfective aspect (reference to a continuing or repeated event or state). It can have meanings similar to the English "was doing (something)" o ...
. Another source of information about pronunciation comes from explicit description of pronunciations from earlier texts, as in the case of the ''Grammatica Anglicana'', such as in the following comment about the letter : "In the long time it naturally soundeth sharp, and high; as in chósen, hósen, hóly, fólly . .In the short time more flat, and a kin to u; as còsen, dòsen, mòther, bròther, lòve, pròve". Another example comes from detailed comments on pronunciations of
Sanskrit Sanskrit (; stem form ; nominal singular , ,) is a classical language belonging to the Indo-Aryan languages, Indo-Aryan branch of the Indo-European languages. It arose in northwest South Asia after its predecessor languages had Trans-cultural ...
from the surviving works of Sanskrit grammarians.


Challenges

Many challenges exist in the decipherment of languages, including when: * When it is not known which language is closest to it. * When the words in the script are not clearly segmented, like in some Iberian languages. * When the writing system is not known. In specific, if there is little certainty towards the number of graphemes that exist in a certain writing system, it cannot be determined if that system is an alphabet, a syllabry, a logosyllabry, or something else. * When the reading direction is not known. For example, it may not be clear if a writing system is meant to be read from left to right, or from right to left. * When it is not known if a script uses punctuation or spaces between words. * When the language of a script subject to decipherment efforts is not known. * When there is a small dataset available to learn about the properties of a script. This could lead to issues such as an incomplete vocabulary being known for the script. * When the typical order between subjects, objects, and verbs is not known. * When it is not known whether or how certain words can change their form. * When it is not known when multiple symbols are used to represent the same sound, syllable, word, concept, or idea (allographs). * When it is not clear how the penmanship or the style of writing of a particular scribe relates to the style of writing of another scribe working in the same text (the same letters or words might be written in a way that looks different), in which case it is difficult to correlate information across multiple examples of the use of the writing system. * When it is not known if certain words change their meaning depending on the context they appear in (homonyms). * When the context of discovery of a writing is not known. This is because information about the location out of which a writing system came from can provide valuable information about its relationship to known languages. * When adequate digital datasets for documented writing systems is not available, limiting the ability to use computational methods for decipherment. * When sufficient hardware resources, such as high performance computing, is not available (which might be necessary for more energy-intensive computational methods).


Relationship to cryptanalysis

Decipherment overlaps with another technical field known as
cryptanalysis Cryptanalysis (from the Greek ''kryptós'', "hidden", and ''analýein'', "to analyze") refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic se ...
, a field that aims to decipher writings used in secret communication, known as
ciphertext In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext ...
. A famous case of this was in the
cryptanalysis of the Enigma Cryptanalysis of the Enigma ciphering system enabled the western Allies of World War II, Allies in World War II to read substantial amounts of Morse code, Morse-coded radio communications of the Axis powers that had been enciphered using Enigm ...
during the
World War II World War II or the Second World War (1 September 1939 – 2 September 1945) was a World war, global conflict between two coalitions: the Allies of World War II, Allies and the Axis powers. World War II by country, Nearly all of the wo ...
. Many other ciphers from past wars have only recently been cracked. Unlike in language decipherment, however, actors using ciphertext intentionally lay obstacles to prevent outsiders from uncovering the meaning of the communication system.


History

Interest in ancient scripts and dead languages began to arise by the
Renaissance The Renaissance ( , ) is a Periodization, period of history and a European cultural movement covering the 15th and 16th centuries. It marked the transition from the Middle Ages to modernity and was characterized by an effort to revive and sur ...
, if not earlier. Extensive information began to be collected about these scripts in the 16th and 17th centuries, and a typology of writing was established in the 17th century. The first serious decipherments, however, did not take place until the 18th century. In 1754, Swinton and Barthélemy independently deciphered the Aramaic script as represented in Palmyrene inscriptions from the 3rd century AD. In 1787, Silvestre de Sacy deciphered the Sasanian script, which was the script used in
Ancient Persia The history of Iran (also known as Persia) is intertwined with Greater Iran, which is a socio-cultural region encompassing all of the areas that have witnessed significant settlement or influence exerted by the Iranian peoples and the Iranian ...
to write down the Middle Iranian language used in the
Sasanian empire The Sasanian Empire (), officially Eranshahr ( , "Empire of the Iranian peoples, Iranians"), was an List of monarchs of Iran, Iranian empire that was founded and ruled by the House of Sasan from 224 to 651. Enduring for over four centuries, th ...
. Both decipherments relied on bilingual texts where Greek was included as the second script. It was also in the 18th century when the methodological framework for deciphering scripts and languages began to be established. For example, in 1714, Leibniz advocated that parallel content in bilingual inscriptions could be specified by correlating where personal names occur in both inscriptions. By the 19th century, the prerequisites for decipherment began to become widely available. These included extensive knowledge about the scripts themselves, adequate editions of known texts from that script, philological skills, and the ability to reconstruct linguistic forms from the limited available evidence. The 19th century saw two major successes in decipherment: that of Egyptian hieroglyphic and
cuneiform Cuneiform is a Logogram, logo-Syllabary, syllabic writing system that was used to write several languages of the Ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform script ...
.


Notable decipherers


See also


Deciphered scripts

*
Cuneiform Cuneiform is a Logogram, logo-Syllabary, syllabic writing system that was used to write several languages of the Ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform script ...
*
Egyptian hieroglyphs Ancient Egyptian hieroglyphs ( ) were the formal writing system used in Ancient Egypt for writing the Egyptian language. Hieroglyphs combined Ideogram, ideographic, logographic, syllabic and alphabetic elements, with more than 1,000 distinct char ...
* Kharoshthi *
Linear B Linear B is a syllabary, syllabic script that was used for writing in Mycenaean Greek, the earliest Attested language, attested form of the Greek language. The script predates the Greek alphabet by several centuries, the earliest known examp ...
* Mayan * Staveless Runes *
Cypriot Syllabary The Cypriot or Cypriote syllabary (also Classical Cypriot Syllabary) is a syllabary, syllabic script used in Iron Age Cyprus, from about the 11th to the 4th centuries BCE, when it was replaced by the Greek alphabet. It has been suggested that t ...


Undeciphered scripts

* Rongorongo ( Decipherment of rongorongo) * Indus script * Cretan hieroglyphs * Byblos syllabary * Linear A * Cypro-Minoan syllabary * Espanca *
Numidian language Numidian was a language spoken in ancient Numidia. The script in which it was written, the Libyco-Berber alphabet (from which Tifinagh descended), has been almost fully deciphered and most characters (apart from a few exceptions restricted to ...


Undeciphered texts

* Phaistos Disc * Rohonc Codex * Voynich Manuscript


References


Further reading

* * {{Authority control Cryptography Writing systems Genetics terminology Philology Decipherment