HOME





Cepstral Normalization
Cepstral mean and variance normalization (CMVN) is a computationally efficient normalization technique for robust speech recognition. The performance of CMVN is known to degrade for short utterances. This is due to insufficient data for parameter estimation and loss of discriminable information as all utterances are forced to have zero mean and unit variance. CMVN minimizes distortion by noise contamination for robust feature extraction Feature may refer to: Computing * Feature recognition, could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (machine learning), in statistics: individual measurable properties of the phenome ... by linearly transforming the cepstral coefficients to have the same segmental statistics. Cepstral Normalization has been effective in the CMU Sphinx for maintaining a high level of recognition accuracy over a wide variety of acoustical environments. Liu, F., Stern, R., Huang, X., and Acero ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Cepstrum
In Fourier analysis, the cepstrum (; plural ''cepstra'', adjective ''cepstral'') is the result of computing the inverse Fourier transform (IFT) of the logarithm of the estimated signal spectrum. The method is a tool for investigating periodic structures in frequency spectra. The ''power cepstrum'' has applications in the analysis of human speech. The term ''cepstrum'' was derived by reversing the first four letters of ''spectrum''. Operations on cepstra are labelled ''quefrency analysis'' (or ''quefrency alanysisB. P. Bogert, M. J. R. Healy, and J. W. Tukey, ''The Quefrency of Time Series for Echoes: Cepstrum, Pseudo Autocovariance, Cross-Cepstrum and Saphe Cracking'', ''Proceedings of the Symposium on Time Series Analysis'' (M. Rosenblatt, Ed) Chapter 15, 209-243. New York: Wiley, 1963.''), ''liftering'', or ''cepstral analysis''. It may be pronounced in the two ways given, the second having the advantage of avoiding confusion with ''kepstrum''. Origin The concept of the ce ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Computation
A computation is any type of arithmetic or non-arithmetic calculation that is well-defined. Common examples of computation are mathematical equation solving and the execution of computer algorithms. Mechanical or electronic devices (or, historically, people) that perform computations are known as ''computers''. Computer science is an academic field that involves the study of computation. Introduction The notion that mathematical statements should be 'well-defined' had been argued by mathematicians since at least the 1600s, but agreement on a suitable definition proved elusive. A candidate definition was proposed independently by several mathematicians in the 1930s. The best-known variant was formalised by the mathematician Alan Turing, who defined a well-defined statement or calculation as any statement that could be expressed in terms of the initialisation parameters of a Turing machine. Other (mathematically equivalent) definitions include Alonzo Church's '' lambda-defin ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Audio Normalization
Audio normalization is the application of a constant amount of gain to an audio recording to bring the amplitude to a target level (the norm). Because the same amount of gain is applied across the entire recording, the signal-to-noise ratio and relative dynamics are unchanged. Normalization is one of the functions commonly provided by a digital audio workstation. Two principal types of audio normalization exist. Peak normalization adjusts the recording based on the highest signal level present in the recording. Loudness normalization adjusts the recording based on perceived loudness. Normalization differs from dynamic range compression, which applies varying levels of gain over a recording to fit the level within a minimum and maximum range. Normalization adjusts the gain by a constant value across the entire recording. Peak normalization One type of normalization is peak normalization, wherein the gain is changed to bring the highest PCM sample value or analog signal peak ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Speech Recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis. Some speech recognition systems require "training" (also called "enrollment") where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker-independent" systems. Systems that use training are called "speaker dependent". Speech recognition applications include voice user interfaces ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Degradation (telecommunications)
In telecommunication, degradation is the loss of quality of an electronic signal, which may be categorized as either "'' graceful''" or "''catastrophic''", and has the following meanings: #The deterioration in quality, level, or standard of performance of a functional unit. #In communications, a condition in which one or more of the required performance parameters fall outside predetermined limits, resulting in a lower quality of service. There are several forms and causes of degradation in electric signals, both in the time domain and in the physical domain, including runt pulse, voltage spike, jitter, wander, swim, drift, glitch, ringing, crosstalk, antenna effect (not the same antenna effect as in IC manufacturing), and phase noise. Degradation usually refers to reduction in quality of an analog or digital signal. When a signal is being transmitted or received, it undergoes changes which are undesirable. These changes are called degradation. Degradation is usually caused ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Utterance
In spoken language analysis, an utterance is a continuous piece of speech, by one person, before or after which there is silence on the part of the person. In the case of oral language, spoken languages, it is generally, but not always, bounded by silence. In written language, utterances only exist indirectly, though their representations or portrayals. They can be represented and delineated in written language in many ways. In spoken language, utterances have several characteristics such as paralinguistic features, which are aspects of speech such as facial expression, gesture, and posture. Prosody (linguistics) , Prosodic features include Stress (linguistics), stress, Intonation (linguistics), intonation, and Paralanguage, tone of voice, as well as ellipsis, which are words that the listener inserts in spoken language to fill gaps. Moreover, other aspects of utterances found in spoken languages are non-fluency features including: voiced/un-voiced pauses (i.e. "umm"), tag quest ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Parameter Estimation
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An ''estimator'' attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered: * The probabilistic approach (described in this article) assumes that the measured data is random with probability distribution dependent on the parameters of interest * The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector. Examples For example, it is desired to estimate the proportion of a population of voters who will vote for a particular candidate. That proportion is the parameter sought; the estimate is based on a small random sample of voters. Alternatively, it is ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Mean (mathematics)
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statistics. Each attempts to summarize or typify a given group of data, illustrating the magnitude and sign of the data set. Which of these measures is most illuminating depends on what is being measured, and on context and purpose. The ''arithmetic mean'', also known as "arithmetic average", is the sum of the values divided by the number of values. The arithmetic mean of a set of numbers ''x''1, ''x''2, ..., x''n'' is typically denoted using an overhead bar, \bar. If the numbers are from observing a sample of a larger group, the arithmetic mean is termed the ''sample mean'' (\bar) to distinguish it from the group mean (or expected value) of the underlying distribution, denoted \mu or \mu_x. Outside probability and statistics, a wide range of ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by \sigma^2, s^2, \operatorname(X), V(X), or \mathbb(X). An advantage of variance as a measure of dispersion is that it is more amenable to algebraic manipulation than other measures of dispersion such as the expected absolute deviation; for example, the variance of a sum of uncorrelated random variables is equal to the sum of their variances. A disadvantage of the variance for practical applications is that, unlike the standard deviation, its units differ from the random variable, which is why the standard devi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Feature Extraction
Feature may refer to: Computing * Feature recognition, could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (machine learning), in statistics: individual measurable properties of the phenomena being observed * Software feature, a distinguishing characteristic of a software program Science and analysis * Feature data, in geographic information systems, comprise information about an entity with a geographic location * Features, in audio signal processing, an aim to capture specific aspects of audio signals in a numeric way * Feature (archaeology), any dug, built, or dumped evidence of human activity Media * Feature film, a film with a running time long enough to be considered the principal or sole film to fill a program ** Feature length, the standardized length of such films * Feature story, a piece of non-fiction writing about news * Radio documentary (feature), a radio program devoted to covering a particular topic in so ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




CMU Sphinx
CMU Sphinx, also called Sphinx for short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University. These include a series of speech recognizers (Sphinx 2 - 4) and an acoustic model trainer (SphinxTrain). In 2000, the Sphinx group at Carnegie Mellon committed to open source several speech recognizer components, including Sphinx 2 and later Sphinx 3 (in 2001). The speech decoders come with acoustic models and sample applications. The available resources include in addition software for acoustic model training, language model compilation and a public domain pronunciation dictionary, cmudict. Sphinx encompasses a number of software systems, described below. Sphinx Sphinx is a continuous-speech, speaker-independent recognition system making use of hidden Markov acoustic models ( HMMs) and an n-gram statistical language model. It was developed by Kai-Fu Lee. Sphinx featured feasibility of continuous-speech, speaker-independ ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]