Secure voice (alternatively secure speech or ciphony) is a term in

cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...

for the encryption of voice communication over a range of communication types such as radio,

telephone A telephone is a telecommunications device that permits two or more users to conduct a conversation when they are too far apart to be easily heard directly. A telephone converts sound, typically and most efficiently the human voice, into e ...

or IP.

History

The implementation of voice encryption dates back to

World War II World War II or the Second World War, often abbreviated as WWII or WW2, was a world war that lasted from 1939 to 1945. It involved the vast majority of the world's countries—including all of the great powers—forming two opposing ...

when secure communication was paramount to the US armed forces. During that time, noise was simply added to a voice signal to prevent enemies from listening to the conversations. Noise was added by playing a record of noise in sync with the voice signal and when the voice signal reached the receiver, the noise signal was subtracted out, leaving the original voice signal. In order to subtract out the noise, the receiver need to have exactly the same noise signal and the noise records were only made in pairs; one for the transmitter and one for the receiver. Having only two copies of records made it impossible for the wrong receiver to decrypt the signal. To implement the system, the army contracted

Bell Laboratories Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mult ...

and they developed a system called SIGSALY. With SIGSALY, ten channels were used to sample the

voice frequency A voice frequency (VF) or voice band is the range of audio frequencies used for the transmission of speech. Frequency band In telephony, the usable voice frequency band ranges from approximately 300 to 3400 Hz. It is for this reason tha ...

spectrum from 250 Hz to 3 kHz and two channels were allocated to sample voice pitch and background hiss. In the time of SIGSALY, the transistor had not been developed and the digital sampling was done by circuits using the model 2051

Thyratron A thyratron is a type of gas-filled tube used as a high-power electrical switch and controlled rectifier. Thyratrons can handle much greater currents than similar hard-vacuum tubes. Electron multiplication occurs when the gas becomes ionized, p ...

vacuum tube. Each SIGSALY terminal used 40 racks of equipment weighing 55 tons and filled a large room. This equipment included radio transmitters and receivers and large phonograph turntables. The voice was keyed to two vinyl phonograph records that contained a

Frequency Shift Keying Frequency-shift keying (FSK) is a frequency modulation scheme in which digital information is transmitted through discrete frequency changes of a carrier signal. The technology is used for communication systems such as telemetry, weather ball ...

(FSK) audio tone. The records were played on large precise turntables in sync with the voice transmission. From the introduction of voice encryption to today, encryption techniques have evolved drastically. Digital technology has effectively replaced old analog methods of voice encryption and by using complex algorithms, voice encryption has become much more secure and efficient. One relatively modern voice encryption method is Sub-band coding. With Sub-band Coding, the voice signal is split into multiple frequency bands, using multiple bandpass filters that cover specific frequency ranges of interest. The output signals from the bandpass filters are then lowpass translated to reduce the bandwidth, which reduces the sampling rate. The lowpass signals are then quantized and encoded using special techniques like,

pulse-code modulation Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM Stream (comp ...

(PCM). After the encoding stage, the signals are multiplexed and sent out along the communication network. When the signal reaches the receiver, the inverse operations are applied to the signal to get it back to its original state. A speech scrambling system was developed at

in the 1970s by

Subhash Kak Subhash Kak is an Indian-American computer scientist and historical revisionist. He is the Regents Professor of Computer Science Department at Oklahoma State University–Stillwater, an honorary visiting professor of engineering at Jawaharlal N ...

and

Nikil Jayant Nikil S. Jayant (1945 -- ) is an Indian-American communications engineer. He was a prominent long-term researcher at Bell Laboratories and subsequently a professor at Georgia Institute of Technology. He received his Ph.D. in Electrical Communicatio ...

. In this system permutation matrices were used to scramble coded representations (such as

and variants) of the speech data.

Motorola Motorola, Inc. () was an American multinational telecommunications company based in Schaumburg, Illinois, United States. After having lost $4.3 billion from 2007 to 2009, the company split into two independent public companies, Motorol ...

developed a voice encryption system called Digital Voice Protection (DVP) as part of their first generation of voice encryption techniques. DVP uses a self-synchronizing encryption technique known as cipher feedback (CFB). The extremely high number of possible keys associated with the early DVP algorithm, makes the algorithm very robust and gives a high level of security. As with other symmetric keyed encryption systems, the encryption key is required to decrypt the signal with a special decryption algorithm.

Digital

A digital secure voice usually includes two components, a digitizer to convert between speech and digital signals and an

encryption In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can de ...

system to provide confidentiality. It is difficult in practice to send the encrypted signal over the same

voiceband A voice frequency (VF) or voice band is the range of audio frequencies used for the transmission of speech. Frequency band In telephony, the usable voice frequency band ranges from approximately 300 to 3400 Hz. It is for this reason th ...

communication circuits used to transmit unencrypted voice, e.g. analog

telephone line A telephone line or telephone circuit (or just line or circuit industrywide) is a single-user circuit on a telephone communication system. It is designed to reproduce speech of a quality that is understandable. It is the physical wire or ot ...

s or

mobile radio Mobile radio or mobiles refer to wireless communications systems and devices which are based on radio frequencies(using commonly UHF or VHF frequencies), and where the path of communications is movable on either end. There are a variety of view ...

s, due to bandwidth expansion. This has led to the use of Voice Coders (

vocoder A vocoder (, a portmanteau of ''voice'' and ''encoder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder was ...

s) to achieve tight bandwidth compression of the speech signals. NSA's

STU-III STU-III (Secure Telephone Unit - third generation) is a family of secure telephones introduced in 1987 by the NSA for use by the United States government, its contractors, and its allies. STU-III desk units look much like typical office telephone ...

, KY-57 and SCIP are examples of systems that operate over existing voice circuits. The STE system, by contrast, requires wide bandwidth

ISDN Integrated Services Digital Network (ISDN) is a set of communication standards for simultaneous digital transmission of voice, video, data, and other network services over the digitalised circuits of the public switched telephone network. Work ...

lines for its normal mode of operation. For encrypting

GSM The Global System for Mobile Communications (GSM) is a standard developed by the European Telecommunications Standards Institute (ETSI) to describe the protocols for second-generation ( 2G) digital cellular networks used by mobile devices such ...

and

VoIP Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms Internet t ...

, which are natively digital, the standard protocol

ZRTP ZRTP (composed of Z and Real-time Transport Protocol) is a cryptographic key-agreement protocol to negotiate the keys for encryption between two end points in a Voice over IP (VoIP) phone telephony call based on the Real-time Transport Protocol. ...

could be used as an

end-to-end encryption End-to-end encryption (E2EE) is a system of communication where only the communicating users can read the messages. In principle, it prevents potential eavesdroppers – including telecommunications service providers, telecom providers, Internet ...

technology. Secure voice's robustness greatly benefits from having the voice data compressed into very low bit-rates by special component called

speech coding Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic d ...

, voice compression or voice coder (also known as

). The old secure voice compression standards include (

CVSD Continuously variable slope delta modulation (CVSD or CVSDM) is a voice coding method. It is a delta modulation with variable step size (i.e., special case of adaptive delta modulation), first proposed by Greefkes and Riemens in 1970. CVSD encode ...

CELP Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algori ...

LPC-10e FIPS 137, originally issued as FED-STD-1015, is a secure telephony speech encoding standard for Linear Predictive Coding vocoder developed by the United States Department of Defense and finished on November 28, 1984. It was based on the earlier STA ...

and

MELP Mixed-excitation linear prediction (MELP) is a United States Department of Defense speech coding standard used mainly in military applications and satellite communications, secure voice, and secure radio devices. Its standardization and later devel ...

, where the latest standard is the state of the art MELPe algorithm.

Digital methods using voice compression: MELP or MELPe

The

MELPe Mixed-excitation linear prediction (MELP) is a United States Department of Defense speech coding standard used mainly in military applications and satellite communications, secure voice, and secure radio devices. Its standardization and later deve ...

or enhanced-

(Mixed Excitation Linear Prediction) is a

United States Department of Defense The United States Department of Defense (DoD, USDOD or DOD) is an executive branch department of the federal government charged with coordinating and supervising all agencies and functions of the government directly related to national sec ...

speech coding standard used mainly in military applications and satellite communications, secure voice, and secure radio devices. Its development was led and supported by NSA, and NATO. The US government's MELPe secure voice standard is also known as MIL-STD-3005, and the NATO's

secure voice standard is also known as

STANAG In NATO, a standardization agreement (STANAG, redundantly: STANAG agreement) defines processes, procedures, terms, and conditions for common military or technical procedures or equipment between the member countries of the alliance. Each NATO st ...

-4591. The initial MELP was invented by Alan McCree around 1995. That initial speech coder was standardized in 1997 and was known as MIL-STD-3005. It surpassed other candidate vocoders in the US DoD competition, including: (a) Frequency Selective Harmonic Coder (FSHC), (b) Advanced Multi-Band Excitation (AMBE), (c) Enhanced Multiband Excitation (EMBE), (d) Sinusoid Transform Coder (STC), and (e) Subband LPC Coder (SBC). Due to its lower complexity than Waveform Interpolative (WI) coder, the MELP vocoder won the DoD competition and was selected for

MIL-STD A United States defense standard, often called a military standard, "MIL-STD", "MIL-SPEC", or (informally) "MilSpecs", is used to help achieve standardization objectives by the U.S. Department of Defense. Standardization is beneficial in achievin ...

-3005. Between 1998 and 2001, a new MELP-based vocoder was created at half the rate (i.e. 1200 bit/s) and substantial enhancements were added to the MIL-STD-3005 by SignalCom (later acquired by

Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washin ...

AT&T Corporation AT&T Corporation, originally the American Telephone and Telegraph Company, is the subsidiary of AT&T Inc. that provides voice, video, data, and Internet telecommunications and professional services to businesses, consumers, and government agen ...

, and Compandent which included (a) additional new vocoder at half the rate (i.e. 1200 bit/s), (b) substantially improved encoding (analysis), (c) substantially improved decoding (synthesis), (d) Noise-Preprocessing for removing background noise, (e) transcoding between the 2400 bit/s and 1200 bit/s bitstreams, and (f) new postfilter. This fairly significant development was aimed to create a new coder at half the rate and have it interoperable with the old MELP standard. This enhanced-MELP (also known as MELPe) was adopted as the new MIL-STD-3005 in 2001 in form of annexes and supplements made to the original MIL-STD-3005, enabling the same quality as the old 2400 bit/s MELP's at half the rate. One of the greatest advantages of the new 2400 bit/s MELPe is that it shares the same bit format as MELP, and hence can interoperate with legacy MELP systems, but would deliver better quality at both ends. MELPe provides much better quality than all older military standards, especially in noisy environments such as battlefield and vehicles and aircraft. In 2002, following extensive competition and testing, the 2400 and 1200 bit/s US DoD MELPe was adopted also as

NATO The North Atlantic Treaty Organization (NATO, ; french: Organisation du traité de l'Atlantique nord, ), also called the North Atlantic Alliance, is an intergovernmental military alliance between 30 member states – 28 European and two No ...

standard, known as

-4591. As part of NATO testing for new NATO standard, MELPe was tested against other candidates such as

France France (), officially the French Republic ( ), is a country primarily located in Western Europe. It also comprises of overseas regions and territories in the Americas and the Atlantic, Pacific and Indian Oceans. Its metropolitan area ...

's HSX (Harmonic Stochastic eXcitation) and

Turkey Turkey ( tr, Türkiye ), officially the Republic of Türkiye ( tr, Türkiye Cumhuriyeti, links=no ), is a transcontinental country located mainly on the Anatolian Peninsula in Western Asia, with a small portion on the Balkan Peninsula in ...

's SB-LPC (Split-Band Linear Predictive Coding), as well as the old secure voice standards such as FS1015

(2.4 kbit/s),

FS1016 FS-1016 (also called FED-STD-1016) is a deprecated secure telephony speech encoding standard for Code-excited linear prediction (CELP) developed by the United States Department of Defense and finalized February 14, 1991. Unlike the vocoder used ...

(4.8 kbit/s) and

(16 kbit/s). Subsequently, the MELPe won also the NATO competition, surpassing the quality of all other candidates as well as the quality of all old secure voice standards (CVSD,

and

). The

competition concluded that MELPe substantially improved performance (in terms of speech quality, intelligibility, and noise immunity), while reducing throughput requirements. The NATO testing also included interoperability tests, used over 200 hours of speech data, and was conducted by 3 test laboratories worldwide. Compandent Inc, as a part of MELPe-based projects performed for NSA and

, provided NSA and NATO with special test-bed platform known as MELCODER device that provided the golden reference for real-time implementation of MELPe. The low-cost FLEXI-232 Data Terminal Equipment (DTE) made by Compandent, which are based on the MELCODER golden reference, are very popular and widely used for evaluating and testing MELPe in real-time, various channels & networks, and field conditions. The

competition concluded that

substantially improved performance (in terms of speech quality, intelligibility, and noise immunity), while reducing throughput requirements. The NATO testing also included interoperability tests, used over 200 hours of speech data, and was conducted by 3 test laboratories worldwide. In 2005, a new 600 bit/s rate MELPe variation by

Thales Group Thales Group () is a French multinational company that designs, develops and manufactures electrical systems as well as devices and equipment for the aerospace, defence, transportation and security sectors. The company is headquartered in Paris' ...

(

) was added (without extensive competition and testing as performed for the 2400/1200 bit/s MELPe) to the NATO standard STANAG-4591, and there are more advanced efforts to lower the bitrates to 300 bit/s and even 150 bit/s. In 2010 Lincoln Labs., Compandent, BBN, and General Dynamics also developed for DARPA a 300 bit/s MELP device .Alan McCree, “A scalable phonetic vocoder framework using joint predictive vector quantization of MELP parameters,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 2006, pp. I 705–708, Toulouse, France Its quality was better than the 600 bit/s MELPe, but its delay was longer.

Other

Aleksandr Solzhenitsyn Aleksandr Isayevich Solzhenitsyn. (11 December 1918 – 3 August 2008) was a Russian novelist. One of the most famous Soviet dissidents, Solzhenitsyn was an outspoken critic of communism and helped to raise global awareness of political repres ...

's novel First Circle the character Volodin's recorded phone call is traced to him since it is not properly encrypted. Its decipherment makes use of spectral analysis.

References

{{Cryptography navbox , machines Cryptography Secure communication Speech codecs

History

Digital

Digital methods using voice compression: MELP or MELPe

Other

See also

References