HOME

TheInfoList



OR:

As of the early 2000s, several
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ma ...
(SR) software packages exist for
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
. Some of them are
free and open-source software Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
and others are
proprietary software Proprietary software is computer software, software that is deemed within the free and open-source software to be non-free because its creator, publisher, or other rightsholder or rightsholder partner exercises a legal monopoly afforded by modern ...
. Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language.
Voice control A voice-user interface (VUI) makes spoken human interaction with computers possible, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device co ...
may refer to software used for communicating operational commands to a computer.


Linux native speech recognition


History

In the late 1990s, a Linux version of ViaVoice, created by IBM, was made available to users for no charge. In 2002, the free
software development kit A software development kit (SDK) is a collection of software development tools in one installable package. They facilitate the creation of applications by having a compiler, debugger and sometimes a software framework. They are normally specific ...
(SDK) was removed by the developer.


Development status

In the early 2000s, there was a push to get a high-quality Linux native speech recognition engine developed. As a result, several projects dedicated to creating Linux speech recognition programs were begun, such as Mycroft, which is similar to Microsoft
Cortana Cortana may refer to: * ''Cortana'' (gastropod), a gastropod genus * Cortana (''Halo''), character in the ''Halo'' franchise *Cortana (virtual assistant), virtual assistant from Microsoft *Cortana, or Curtana, a ceremonial sword used in the coronat ...
, but open-source.


Speech sample crowdsourcing

It is essential to compile a
speech corpus A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other things, to create acoustic models (which can then be used with a speech recognition or sp ...
to produce acoustic models for
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ma ...
projects.
VoxForge VoxForge is a free speech corpus and acoustic model repository for open source speech recognition engines. VoxForge was set up to collect transcribed speech to create a free GPL speech corpus for use with open source speech recognition engines. ...
is a free speech corpus and acoustic model repository that was built to collect transcribed speech to be used in speech recognition projects. VoxForge accepts
crowdsourced Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digita ...
speech samples and corrections of recognized speech sequences. It is licensed under a
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end user In product development, an end user (sometimes end-user) is a person who ultimately uses or is intended to ulti ...
(GPL).


Speech recognition concept

The first step is to begin recording an audio stream on a computer. The user has two main processing options: * ''Discrete speech recognition'' (DSR) – processes information on a local machine entirely. This refers to self-contained systems in which all aspects of SR are performed entirely within the user's computer. This is becoming critical for protecting intellectual property (IP) and avoiding unwanted surveillance (2018). * ''Remote'' or ''server-based'' SR – transmits an audio speech file to a remote server to convert the file into a text string file. Due to recent
cloud storage Cloud storage is a model of computer data storage in which the digital data is stored in logical pools, said to be on "the cloud". The physical storage spans multiple servers (sometimes in multiple locations), and the physical environment is ty ...
schemes and data mining, this method more easily allows surveillance, theft of information, and inserting malware. Remote recognition was formerly used by
smartphone A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, whic ...
s because they lacked sufficient performance, working
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
, or
storage Storage may refer to: Goods Containers * Dry cask storage, for storing high-level radioactive waste * Food storage * Intermodal container, cargo shipping * Storage tank Facilities * Garage (residential), a storage space normally used to store car ...
to process speech recognition within the phone. These limits have largely been overcome although server-based SR on mobile devices remains universal.


Speech recognition in browser

Discrete speech recognition can be performed within a
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
and works well with supported browsers. Remote SR does not require installing software on a desktop computer or mobile device as it is mainly a server-based system with the inherent security issues noted above. * ''Remote'': The dictation service records an audio track of the user via a web browser. * ''DSR'': Some solutions work on a client only, without sending data to servers.


Free speech recognition engines

The following is a list of projects dedicated to implementing speech recognition in Linux, and major native solutions. These are not end-user applications. These are programming
libraries A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
that may be used to develop end-user applications. *
CMU Sphinx CMU Sphinx, also called Sphinx for short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University. These include a series of speech recognizers (Sphinx 2 - 4) and an acoustic model traine ...
is a general term to describe a group of speech recognition systems developed at Carnegie Mellon University. * HTK is the most famous and widely used speech recognition software before Kaldi. * Julius is a high-performance, two-pass ''large vocabulary continuous speech recognition'' (LVCSR) decoder software for speech-related researchers and developers. *
Kaldi Kaldi or Khalid was a legendary Arab Ethiopian goatherd who discovered the coffee plant around 850 CE, according to popular legend, show some artwork depicting him, after which it entered the Islamic world and then the rest of the world. Story I ...
is a toolkit for speech recognition provided under the Apache licence. *
Mozilla Mozilla (stylized as moz://a) is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, spreads and supports Mozilla products, thereby promoting exclusively free software and open standards, w ...
DeepSpeech is developing an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
Speech-To-Text engine based on Baidu's deep speech research paper. *
VoxForge VoxForge is a free speech corpus and acoustic model repository for open source speech recognition engines. VoxForge was set up to collect transcribed speech to create a free GPL speech corpus for use with open source speech recognition engines. ...
is a free speech corpus and acoustic model repository for open-source speech recognition engines.


Proprietary speech recognition engines

* Janus Recognition Toolkit (JRTk) is a closed source speech recognition toolkit mainly targeted at Linux developed by the Interactive Systems Laboratories developed at
Carnegie Mellon University Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania. One of its predecessors was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools; it became the Carnegie Institute of Technology ...
and
Karlsruhe Institute of Technology The Karlsruhe Institute of Technology (KIT; german: Karlsruher Institut für Technologie) is a public research university in Karlsruhe, Germany. The institute is a national research center of the Helmholtz Association. KIT was created in 2009 w ...
for which commercial and research licenses are available.


Voice control and keyboard shortcuts

Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language.
Voice control A voice-user interface (VUI) makes spoken human interaction with computers possible, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device co ...
may refer to software used for sending operational commands to a computer or appliance. Voice control typically requires a much smaller vocabulary and thus is much easier to implement. Simple software combined with
keyboard shortcut computing, a keyboard shortcut also known as hotkey is a series of one or several keys to quickly invoke a software program or perform a preprogrammed action. This action may be part of the standard functionality of the operating system or ...
s, have the earliest potential for practically accurate voice control in Linux.


Running Windows speech recognition software with Linux


Via compatibility layer

It is possible to use programs such as
Dragon NaturallySpeaking Dragon NaturallySpeaking (also known as Dragon for PC, or DNS) is a speech recognition software package developed by Dragon Systems of Newton, Massachusetts, which was acquired in turn by Lernout & Hauspie Speech Products, Nuance Communication ...
in Linux, by using
Wine Wine is an alcoholic drink typically made from fermented grapes. Yeast consumes the sugar in the grapes and converts it to ethanol and carbon dioxide, releasing heat in the process. Different varieties of grapes and strains of yeasts are ...
, though some problems may arise, depending on which version is used.


Via virtualized Windows

It is also possible to use Windows speech recognition software under Linux. Using no-cost
virtualization In computing, virtualization or virtualisation (sometimes abbreviated v12n, a numeronym) is the act of creating a virtual (rather than actual) version of something at the same abstraction level, including virtual computer hardware platforms, st ...
software, it is possible to run Windows and NaturallySpeaking under Linux. VMware Server or
VirtualBox Oracle VM VirtualBox (formerly Sun VirtualBox, Sun xVM VirtualBox and Innotek VirtualBox) is a type-2 hypervisor for x86 virtualization developed by Oracle Corporation. VirtualBox was originally created by Innotek GmbH, which was acquired by ...
support copy and paste to/from a virtual machine, making dictated text easily transferable to/from the virtual machine.


See also

*
List of speech recognition software Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways. Acoustic models and speech corpus (compilation) The following l ...
*


References


External links


Accessibility, SpeechRecognition – Ubuntu Help
{{DEFAULTSORT:Speech Recognition In Linux Ergonomics GNOME Accessibility Linux audio video-related software Speech recognition