{{More citations needed, date=May 2025 The Speex project is an attempt to create a

free software Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...

speech codec, unencumbered by

patent A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an sufficiency of disclosure, enabling discl ...

restrictions. Speex is licensed under the

BSD License BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lic ...

and is used with the Xiph.org Foundation's Ogg container format. The Speex coder uses the Ogg

bitstream A bitstream (or bit stream), also known as binary sequence, is a sequence of bits. A bytestream is a sequence of bytes. Typically, each byte is an 8-bit quantity, and so the term octet stream is sometimes used interchangeably. An octet may ...

format, and the Speex designers see their project as complementary to the Vorbis general-purpose audio compression project.

Description

Unlike many other speech codecs, Speex is not targeted at cell phones but rather at

voice over IP Voice over Internet Protocol (VoIP), also known as IP telephony, is a set of technologies used primarily for voice communication sessions over Internet Protocol (IP) networks, such as the Internet. VoIP enables voice calls to be transmitted as ...

(VoIP) and file-based compression. The design goals have been to make a codec that would allow both very good quality speech and low bit rate, which led to the development of a codec with multiple bit rates. Very good quality also meant the support of

wideband In communications, a system is wideband when the message bandwidth significantly exceeds the coherence bandwidth of the channel. Some communication links have such a high data rate that they are forced to use a wide bandwidth; other links ma ...

(16 kHz

sampling rate In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples". A sample is a value of the signal at a point in time and/or s ...

) in addition to narrowband (telephone quality, 8 kHz sampling rate). Designing for VoIP instead of cell phone use means that Speex must be robust to lost packets, but not to corrupted ones since UDP ensures that packets either arrive unaltered or don't arrive. All this led to the choice of Code Excited Linear Prediction ( CELP) as the encoding technique to use for Speex. One of the main reasons is that CELP has long proved that it could do the job and scale well to both low

bit rate In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time. The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction ...

s (as evidenced by DoD CELP @ 4.8 kbit/s) and high bit rates (as with G.728 @ 16 kbit/s). The main characteristics can be summarized as follows: *

Free software Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...

/open-source,

and

royalty Royalty may refer to: * the mystique/prestige bestowed upon monarchs ** one or more monarchs, such as kings, queens, emperors, empresses, princes, princesses, etc. *** royal family, the immediate family of a king or queen-regnant, and sometimes h ...

-free * Integration of narrowband and wideband in the same bit-stream * Wide range of bit rates available (from 2 kbit/s to 44 kbit/s) * Dynamic bit rate switching and Variable bit-rate (VBR) * Voice Activity Detection (VAD, integrated with VBR) * Variable complexity * Ultra-wideband mode at 32 kHz (up to 48 kHz) * Intensity

stereo Stereophonic sound, commonly shortened to stereo, is a method of sound reproduction that recreates a multi-directional, 3-dimensional audible perspective. This is usually achieved by using two independent audio channels through a configurat ...

encoding option

Feature description

Sampling rate

Speex is mainly designed for 3 different sampling rates: 8 kHz, 16 kHz, and 32 kHz. These are respectively referred to as narrowband, wideband and ultra-wideband.

Quality

Speex encoding is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an

integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...

, while for variable bit-rate (VBR), the parameter is a float.

Complexity (variable)

With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way that's similar to the -1 to -9 options to

gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...

and

bzip2 bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities such as tar for tasks such as handli ...

compression utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPU requirements for complexity 10 is about 5 times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like DTMF tones.

Variable Bit-Rate (VBR)

Variable bit-rate (VBR) allows a codec to change its bit rate dynamically to adapt to the "difficulty" of the audio being encoded. In the example of Speex, sounds like

vowel A vowel is a speech sound pronounced without any stricture in the vocal tract, forming the nucleus of a syllable. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness a ...

s and high-energy transients require a higher bit rate to achieve good quality, while

fricative A fricative is a consonant produced by forcing air through a narrow channel made by placing two articulators close together. These may be the lower lip against the upper teeth, in the case of ; the back of the tongue against the soft palate in ...

s (e.g. s and f sounds) can be coded adequately with fewer bits. For this reason, VBR can achieve lower bit rate for the same quality, or a better quality for a certain bit rate. Despite its advantages, VBR has two main drawbacks: first, by only specifying quality, there's no guarantee about the final average bit-rate. Second, for some real-time applications like

(VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel.

Average Bit-Rate (ABR)

Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bitrate.

Voice Activity Detection (VAD)

When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encode them with just enough bits to reproduce the background noise. This is called "comfort noise generation" (CNG).

Discontinuous Transmission (DTX)

Discontinuous transmission is an addition to VAD/VBR operation, that allows to stop transmitting completely when the background noise is stationary. In file-based operation, since we cannot just stop writing to the file, only 5 bits are used for such frames (corresponding to 250 bit/s).

Perceptual enhancement

Perceptual enhancement is a part of the decoder which, when turned on, tries to reduce (the perception of) the noise produced by the coding/decoding process. In most cases, perceptual enhancement makes the sound further from the original objectively (signal-to-noise ratio), but in the end it still sounds better (subjective improvement).

Algorithmic delay

Every speech codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of "look-ahead" required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values don't account for the CPU time it takes to encode or decode the frames.

Large application base

There is already a large base of applications supporting the speex codec, from

teleconference A teleconference or telecon is a live exchange of information among several people remote from one another but linked by a communications system. Terms such as audio conferencing, telephone conferencing, and phone conferencing are also someti ...

software to

streaming Streaming media refers to multimedia delivered through a network for playback using a media player. Media is transferred in a ''stream'' of packets from a server to a client and is rendered in real-time; this contrasts with file downl ...

, P2P, and audio processing applications. Most of these are based on the DirectDS filter, OpenACM codec -- Netmeeting on

Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...

, or OpenH323 on

Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...

( GnomeMeeting), for example. There are also plugins for the

Winamp Winamp is a media player (software), media player for Microsoft Windows originally developed by Justin Frankel and Dmitry Boldyrev by their company Nullsoft, which they later sold to AOL in 1999 for $80 million. It was then acquired by Rad ...

and XMMS players. The

MIME A mime artist, or simply mime (from Greek language, Greek , , "imitator, actor"), is a person who uses ''mime'' (also called ''pantomime'' outside of Britain), the acting out of a story through body motions without the use of speech, as a the ...

type for Speex is audio/x-speex. The type audio/speex will be applied for in the near future. See the plugin and software page on speex.org site for more details.

References

External links

Official Speex homepage

Linux.conf.au, a Linux conference, publishes its talks in Speex format
* Speex is an ideal

codec A codec is a computer hardware or software component that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder o ...

for church

sermon A sermon is a religious discourse or oration by a preacher, usually a member of clergy. Sermons address a scriptural, theological, or moral topic, usually expounding on a type of belief, law, or behavior within both past and present context ...

s. A variety of examples can be found a
Redeemer Lutheran
an
Grace Lutheran
churches. ''This article uses material from th
Speex Codec Manual
which is copyright (c) Jean-Marc Valin and licensed under the terms of the GFDL'' Audio codecs Speech codecs