Connectionist temporal classification (CTC) is a type of neural network output and associated scoring function, for training
recurrent neural network
Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important. Unlike feedforward neural networks, which proces ...
s (RNNs) such as
LSTM
Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hi ...
networks to tackle sequence problems where the timing is variable. It can be used for tasks like on-line
handwriting recognition
Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwriting, handwritten input from sources such as paper documents, photographs, touch-screens ...
or recognizing phonemes in speech audio. CTC refers to the outputs and scoring, and is independent of the underlying neural network structure. It was introduced in 2006.
The input is a sequence of observations, and the outputs are a sequence of labels, which can include blank outputs. The difficulty of training comes from there being many more observations than there are labels. For example, in speech audio there can be multiple time slices which correspond to a single phoneme. Since we don't know the alignment of the observed sequence with the target labels we predict a probability distribution at each time step.
A CTC network has a continuous output (e.g.
softmax), which is fitted through training to model the probability of a label. CTC does not attempt to learn boundaries and timings: Label sequences are considered equivalent if they differ only in alignment, ignoring blanks. Equivalent label sequences can occur in many ways – which makes scoring a non-trivial task, but there is an efficient
forward–backward algorithm for that.
CTC scores can then be used with the back-propagation algorithm to update the neural network weights.
Alternative approaches to a CTC-fitted neural network include a
hidden Markov model
A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent (or ''hidden'') Markov process (referred to as X). An HMM requires that there be an observable process Y whose outcomes depend on the outcomes of X ...
(HMM).
In 2009, a Connectionist Temporal Classification (CTC)-trained LSTM network was the first RNN to win pattern recognition contests when it won several competitions in connected
handwriting recognition
Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwriting, handwritten input from sources such as paper documents, photographs, touch-screens ...
.
In 2014, the Chinese company
Baidu
Baidu, Inc. ( ; ) is a Chinese multinational technology company specializing in Internet services and artificial intelligence. It holds a dominant position in China's search engine market (via Baidu Search), and provides a wide variety of o ...
used a bidirectional RNN (not an LSTM) trained on the CTC loss function to break the 2S09 Switchboard Hub5'00 speech recognition dataset
benchmark without using any traditional speech processing methods.
In 2015, it was used in
Google voice search and dictation on
Android devices
Android is an operating system based on a modified version of the Linux kernel and other open-source software, designed primarily for touchscreen-based mobile devices such as smartphones and tablets. Android has historically been developed by ...
.
CTC are limited to monotonic alignment, which is not a problem for voice recognition, but may be a problem for language translation, as later words in a language A may correspond to earlier words in language B, since the word ordering is different for different languages.
References
External links
Section 16.4, "CTC"in Jurafsky and Martin'
''Speech and Language Processing'' 3rd edition
* {{Cite journal , last=Hannun , first=Awni , date=2017-11-27 , title=Sequence Modeling with CTC , url=https://distill.pub/2017/ctc , journal=Distill , language=en , volume=2 , issue=11 , pages=e8 , doi=10.23915/distill.00008 , issn=2476-0757, doi-access=free
Artificial neural networks