Julius is a
speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ma ...
engine, specifically a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. It can perform almost
real-time computing
Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constra ...
(RTC) decoding on most current
personal computer
A personal computer (PC) is a multi-purpose microcomputer whose size, capabilities, and price make it feasible for individual use. Personal computers are intended to be operated directly by an end user, rather than by a computer expert or tech ...
s (PCs) in 60k word dictation task using word
trigram
Trigrams are a special case of the ''n''-gram, where ''n'' is 3. They are often used in natural language processing for performing statistical analysis of texts and in cryptography for control and use of ciphers and codes.
Frequency
Context ...
(3-gram) and context-dependent
Hidden Markov model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
(HMM). Major search methods are fully incorporated.
It is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state
triphone In linguistics, a triphone is a sequence of three consecutive phonemes. Triphones are useful in models of natural language processing where they are used to establish the various contexts in which a phoneme can occur in a particular natural languag ...
s and tied-mixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit. The main platform is
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
and other
Unix
Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
workstations, and it works on
Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
. Julius is
free and open-source software
Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
, released under a revised
BSD
The Berkeley Software Distribution or Berkeley Standard Distribution (BSD) is a discontinued operating system based on Research Unix, developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Be ...
style
software license
A software license is a legal instrument (usually by way of contract law, with or without printed material) governing the use or redistribution of software. Under United States copyright law, all software is copyright protected, in both sour ...
.
Julius has been developed as part of a free software toolkit for Japanese LVCSR research since 1997, and the work has been continued at Continuous Speech Recognition Consortium (CSRC), Japan from 2000 to 2003.
From rev.3.4, a grammar-based recognition parser named ''Julian'' is integrated into Julius. Julian is a modified version of Julius that uses hand-designed type of
finite-state machine
A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number ...
(FSM) termed a
deterministic finite automaton
In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automa ...
(DFA) grammar as a language model. It can be used to build a kind of voice command system of small vocabulary, or various spoken
dialog system
A dialogue system, or conversational agent (CA), is a computer system intended to converse with a human. Dialogue systems employed one or more of text, speech, graphics, haptics, gestures, and other modes for communication on both the input and o ...
tasks.
About models
To run, the Julius recognizer needs a
language model
A language model is a probability distribution over sequences of words. Given any sequence of words of length , a language model assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Language models generate probabilities by training on ...
and an
acoustic model for each language.
Julius adopts acoustic models in Hidden Markov Model Toolkit (
HTK)
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
format, pronunciation dictionary in HTK-like format, and word 3-gram language models in ARPA standard format: forward 2-gram and reverse 3-gram as trained from
speech corpus
A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions.
In speech technology, speech corpora are used, among other things, to create acoustic models (which can then be used with a speech recognition or sp ...
with reversed word order.
Although Julius is only distributed with Japanese models, the
VoxForge project is working to create English acoustic models for use with the Julius Speech Recognition Engine.
In April 2018, thanks to the effort of
Mozilla
Mozilla (stylized as moz://a) is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, spreads and supports Mozilla products, thereby promoting exclusively free software and open standards, w ...
foundation, a 350-hour audio corpus of spoken English was made available. The new English ENVR-v5.4 open-source speech model was released along with Polish PLPL-v7.1 models and are available from SourceForge.
See also
*
List of speech recognition software
Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways.
Acoustic models and speech corpus (compilation)
The following l ...
References
External links
* , at osdn.jp
Speech recognition software
Computational linguistics
Free software projects
{{Comp-ling-stub