Loquendo was an Italian multinational computer software technology corporation, headquartered in

Torino Turin ( , ; ; , then ) is a city and an important business and cultural centre in northern Italy. It is the capital city of Piedmont and of the Metropolitan City of Turin, and was the first Italian capital from 1861 to 1865. The city is main ...

, Italy, that provides speech recognition, speech synthesis, speaker verification and identification applications. Loquendo, which was founded in 2001 under the

Telecom Italia TIM S.p.A. (formerly Telecom Italia S.p.A.) is an Italian telecommunications company with headquarters in Rome, Milan, and Naples (with the Telecom Italia Tower), which provides fixed, public and mobile telephony, and DSL data services. It is ...

Lab (formerly,

CSELT Telecom Italia Lab S.p.A. (formerly Centro Studi e Laboratori Telecomunicazioni S.p.A.; CSELT) is an Italian research center for telecommunication based in Torino, the biggest in Italy and one of the most important in Europe. It played a major r ...

), also had offices in United Kingdom, Spain, Germany, France, and the United States. Current business products to can be found in portable and in-car navigation devices, assistive devices for the differently able,

smartphones A smartphone is a mobile phone with advanced computing capabilities. It typically has a touchscreen interface, allowing users to access a wide range of applications and services, such as web browsing, email, and social media, as well as mult ...

, ebook readers, talking ATMs,

computer games A video game or computer game is an electronic game that involves interaction with a user interface or input device (such as a joystick, game controller, controller, computer keyboard, keyboard, or motion sensing device) to generate visual fe ...

, voice-controlled domestic appliances and others. The voice synthesis and speech recognition systems is used in a new e-health application as part of Spain's Junta de Andalucía Government Health Service's virtual assistant. Loquendo's products have been the recipient of several awards including being a Speech Technologies Speech Engine Leader in 2007, 2008, and 2009 It was rated as 'Market Leader' by Speech Technologies in 2009 and 2010. On 30 September 2011, Nuance announced that it had acquired Loquendo.

History

Loquendo was originally a research group created in the mid-seventies by managers at IRI- STET in the

laboratories in

Turin Turin ( , ; ; , then ) is a city and an important business and cultural centre in northern Italy. It is the capital city of Piedmont and of the Metropolitan City of Turin, and was the first Italian capital from 1861 to 1865. The city is main ...

before becoming a company in its own right in 2001.

Speech synthesis

Building on the recommendations of the

University of Padua The University of Padua (, UNIPD) is an Italian public research university in Padua, Italy. It was founded in 1222 by a group of students and teachers from the University of Bologna, who previously settled in Vicenza; thus, it is the second-oldest ...

, by applying the technique of so-called

diphone In phonetics, a diphone is an adjacent pair of phones in an utterance. For example, in aɪfəʊn the diphones are a ɪ �f ə �ʊ �n The term is usually used to refer to a recording of the transition between two phones. In the following ...

s (the union of a consonant and a vowel, that counts 150 in total for the Italian) the voice technology group led by Giulio Modena created the first speech synthesizer with high intelligibility able to speak (and sing) Italian in 1975. It was called MUSA (MUltichannel Speaking Automaton), which demonstrated what was possible with the technology of the time. The results achieved in those years were condensed into an audio disc at 45 rpm published in 1978, distributed in thousands of copies through the mass communication media. The auto track, after a short spoken self-presentation of the system, contained a funny Italian version of the song '' Frère Jacques'' carried out in polyphony (''a cappella'') with more singing voices (MUSA could manage up to 8 synthesis channels in parallel). The evolution of this prototype, with the increase in the number of diphones (about 1000), the refinement of the tools of language analysis, and improved waveform management led to a marked improvement of the synthetic voice too. This led to the creation of the first integrated circuit of "voice synthesizer" developed internally in

, which was manufactured by SGS (catalog as

Zilog Zilog, Inc. is an American manufacturer of microprocessors, microcontrollers, and application-specific embedded System on a chip, system-on-chip (SoC) products. The company was founded in 1974 by Federico Faggin and Ralph Ungermann, who were soo ...

Z80 The Zilog Z80 is an 8-bit microprocessor designed by Zilog that played an important role in the evolution of early personal computing. Launched in 1976, it was designed to be software-compatible with the Intel 8080, offering a compelling altern ...

microprocessor's peripheral (with the code M8950). Later in the nineties, " ELOQUENS" was born, a multi-platform software speech synthesizer aimed for various operating systems including

DOS DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...

Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...

System 7 System 7 (later named Mac OS 7) is the seventh major release of the classic Mac OS operating system for Macintosh computers, made by Apple Computer. It was launched on May 13, 1991, to succeed System 6 with virtual memory, personal file shari ...

Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...

OS/2 OS/2 is a Proprietary software, proprietary computer operating system for x86 and PowerPC based personal computers. It was created and initially developed jointly by IBM and Microsoft, under the leadership of IBM software designer Ed Iacobucci, ...

) and telephone boards with very large numbers of channels, such as those used by the Italian telephone operator to build the reverse telephoner subscribers information service (used to obtain a subscriber's identity and address from their telephone number). Towards the end of the 1990s speech synthesis took on a new approach, instead of passing diphones it would use the selection and concatenation of acoustic units of variable length, an approach made possible by the increased power of computers and especially the increasing capacity of mass storage systems. This resulted in "ACTOR" – "The human sounding voice" – which began to have a large audience due to the number of telephone services and applications created by Loquendo related companies. In the year 2000, the synthesizer was released from the research labs as a commercial product, including a number of editing tools to produce synthetic audio enriched with emotions, and it was also released as an SW library for use in various products, from small portable devices such as mobile phones, navigators and palm computers, to multichannel/multilingual telephone servers for (semi)automatic call centers. The Loquendo speech synthesis has become an

internet meme An Internet meme, or meme (, Help:Pronunciation respelling key, ''MEEM''), is a cultural item (such as an idea, behavior, or style) that spreads across the Internet, primarily through Social media, social media platforms. Internet memes manif ...

YouTube YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...

, though it is more common in videos of the Spanish language. It is often used in creepypastas and parody dubbings (often with vulgar language).

Speech recognition

Shortly after the start of the research into speech synthesis, they began research on

speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...

and at the beginning of the eighties produced the first prototype, able to recognize the ten digits and a few simple commands. Applying the

Hidden Markov models A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent (or ''hidden'') Markov process (referred to as X). An HMM requires that there be an observable process Y whose outcomes depend on the outcomes of X ...

in 1984 led to the development of a speech recognizer which could recognize connected words and sentences, created in collaboration with ELSAG, another company in the IRI- STET group. Even in collaboration with ELSAG, in 1986 was presented RIPAC ''(RIconoscimento PArlato Connesso)'', an early microprocessor aimed to perform recognition of the connected speech. This processor had VLSI levels of integration and was composed of 70.000

transistors A transistor is a semiconductor device used to Electronic amplifier, amplify or electronic switch, switch electrical signals and electric power, power. It is one of the basic building blocks of modern electronics. It is composed of semicondu ...

. The need to produce independent speech recognizer telephone applications leads to the creation of speech databases with the recorded voices of hundreds of different people and in 1987 the first large database, obtained through recording the voices of more than 1000 people calling from all over Italy with an automatic procedure, was used in the creation of a specially crafted phone server at CSELT labs. This saved material saved allowed the training of Markov models, and, by using sophisticated algorithms led to the development of "AURIS", the first commercial recognizer that could "turn" in a variety of devices with

Digital signal processor A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on metal–oxide–semiconductor (MOS) integrated circuit chips. ...

s (DSP). In the nineties, a large cross-European collaboration began and, along with a dozen other companies and universities across Europe a very large speech database was collected throughout Europe, with the voices of more than 65000 people. This material, combined with a new mixed approach of

and

Neural networks A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...

led to "FLEXUS", the first flexible vocabulary speech recognizer, which allowed many varied telephone services to use automatic speech recognition in their human interfaces. Merging "FLEXUS" and "ACTOR" into a single system created "Dialogos", allowing the creation of cutting-edge telephone services. The birth of Loquendo as a company led to the development of many languages and the release of the recognizer in the form of library software for the creation of various telephony applications. They also introduced several systems to write state-finite grammars and natural language models systems. The speech databases recording campaigns continue having moved on from Europe to Mediterranean countries, to the South, Center and North America, and finally to countries in the Far East. Overall countless hours of speech have been recorded by contacting hundreds of thousands of people in the listed regions. The recordings have been collected both for fixed telephone networks, as well as in moving vehicles for mobile phones and also using high quality microphones in domestic environments for consumer applications such as video games, appliances, and home automation in general.

Speaker recognition

CSELTPortableMobilePhoneWithSpeechRecogniserPrototype

Research activities into

speaker recognition Speaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to ''speaker recognition'' or speech recognition. Speaker verification ...

were initiated in the early Eighties. Later, in the middle of two-thousands, speech databases tailored for this task became available. In collaboration with Politecnico of Turin they began experiments on two different fronts: speaker ''"identification"'' and ''"verification"''. The success of the research has also pushed the company to move to the development of products specifically for these tasks through the enabling platforms described below.

Speech coding

The research activities into

Speech coding Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...

started even before the ones on speech recognition and synthesis, aiming to build equipment such as

CODEC A codec is a computer hardware or software component that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder o ...

and echo canceler to be able to increase as much as possible the number of telephone conversations that can flow through a single cable (or satellite connection) without losing voice intelligibility. In the late seventies, studies and experiments led to the creation of algorithms to encode the telephonic speech signal and set-up the European regulation

CCITT The International Telecommunication Union Telecommunication Standardization Sector (ITU-T) is one of the three Sectors (branches) of the International Telecommunication Union (ITU). It is responsible for coordinating standards for telecommunicat ...

known as encoding

A-law An A-law algorithm is a standard companding algorithm, used in European 8-bit PCM digital communications systems to optimize, i.e. modify, the dynamic range of an analog signal for digitizing. It is one of the two companding algorithms in the ...

(8-bit logarithm encoding law "A" for audio signal 8 kHz band limited). This standard was then used in the

for 64 kbit/s

ISDN Integrated Services Digital Network (ISDN) is a set of communication standards for simultaneous digital transmission of voice, video, data, and other network services over the digitalised circuits of the public switched telephone network. ...

telephone lines. In subsequent years they built stronger codecs (used telephone exchanges) and, within the PAN-Europe consortium

GSM The Global System for Mobile Communications (GSM) is a family of standards to describe the protocols for second-generation (2G) digital cellular networks, as used by mobile devices such as mobile phones and Mobile broadband modem, mobile broadba ...

, the codec to use in second-generation mobile phones. At the same time they built a

to transmit high-quality signals in spite of the 8 kHz band limit of the telephone cables, which was useful for audio and video conference applications.

Enabling platforms

In the late nineties, the development of the Internet in the form known today (hypertext resident on different servers that span the planet in one big network) led to the need to make these texts available in voice over the phone. At the same time, the IVR –

Interactive Voice Response Interactive voice response (IVR) is a technology that allows telephone users to interact with a computer-operated telephone system through the use of voice and DTMF tones input with a keypad. In telephony, IVR allows customers to interact with a ...

, became increasingly popular and used hardware and software tools to quickly develop new telephony applications. It became evident that the previous development models that led to the development of complex systems such as automation of directory inquiry service or Automatic Information Service Stations were too rigid and would not easily allow the development of new applications. It was therefore felt that there was a need for enabling platforms for automatic voice telephone systems that are both scalable and easily programmable. To this end there was created a special working group to develop a

voice browser {{Short description, Interactive voice user interface A voice browser is a Application software, software application that presents an interactive voice user interface to the user in a manner analogous to the functioning of a web browser interpretin ...

prototype, to be shown to the public at SMAU 2000, with the name " VoxNauta". It was such a success that

decided to close its original research labs and create Loquendo on 1 February 2001. Over the years "VoxNauta" was further developed in various scalable forms: from small servers to large enterprise systems with thousands of lines and has been installed in hundreds of companies around the world. The birth of standards to write telephone services to connect server hosting the speech technologies to servers hosting the telephone boards pushes the development of solo SW. The emergence of standards in the writing of telephone services (

VoiceXML VoiceXML (VXML) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It is used for developing audio and voice response applications, such as banking systems and automated customer service ...

) and protocols ( MRCP) for connecting servers hosting the speech technologies to servers hosting the telephone boards led to the creation of Speech Server software, hosting text-to-speech and speech-recognizer engines from Loquendo This continuing research and development have led Loquendo to be one of the most widely known brands in the field of synthesis and voice recognition.

The brand

The name Loquendo was devised by the wife of the founding CEO, Silvano Giorcelli, while the logo was created by the

graphic department. When displayed as an animated gif the three ripples above the "O" turn on in sequence, giving the sense of the emission of sound. The brand has not been protected by the company, there are other Italian companies whose name directly derives from Loquendo, and this has contributed to its widespread use, even at the expense of competing brands.

Sale of the company

Over the years there have been rumors of the sale of Loquendo to other companies. The most recent was in the summer of 2011, when it was announced that two multinational USA based companies, Nuance and

Avaya Avaya LLC(), formerly Avaya Inc., is an American multinational technology company headquartered in Morristown, New Jersey, that provides cloud communications and workstream collaboration services. The company's platform includes unified commun ...

, were looking into the possibility of a takeover. As Nuance was a direct competitor of the Italian company there was some worry by Loquendo workers that were worried about the possible dismemberment of research and development and the disappearance from Italy of an excellent brand with forty years experience. A purchase by Avaya seemed more desirable as its activities were complementary to the activity carried on by Loquendo;

in fact did not own any speech technology and therefore could have been very interested in the possibility of in-house development rather than purchasing them from outside companies. These reports were followed with great interest by the workers, local authorities in Turin and Piedmont and the entire international scientific community. On 13 August 2011,

publicly announced the sale of its entire stake in Loquendo to Nuance for 53 million euros.

Awards and Recognitions

* CSELT won the «Telework Award», the first prize of the European Telework Week 1998 because the experimental demonstration of the usefulness of CSELT technologies for disabled users, such as quadriplegics or blind people, with the combination of different voice technologies (remarkable for their high quality).

Products

speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...

* speaker verification *

References

Bibliography

*(it) Luigi Bonavoglia, ''"CSELT trent'anni"'', Ed. CSELT, 199

*(it) Roberto Billi (curator), with the following Authors of CSELT: Agostino Appendino, Giancario Babini, Paolo Baggia, Roberto Billi, Alfredo Biocca, Pier Giorgio Bosco, Franco Canavesio, Giuseppe Castagneri, Alberto Ciaramella, Morena Danieli, Fulvio Faraci, Luciano Fissore, Roberto Gemello, Elisabetta Gerbino, Egidio Giachin, Giorgio Micca, Roberto Montagna, Luciano Nebbia, Silvia Quazza, Daniele Roffinella, Luciano Rosboch, Claudio Rullent, Pier Luigi Salza, Stefano Sandri, ''"Tecnologie vocali per l'interazione uomo-macchina. Nuovi servizi a portata di voce"'', Ed. Telecom Lab 1995, , *(en) Pirani, Giancarlo, ed. Advanced algorithms and architectures for speech understanding. Vol. 1. Springer Science & Business Media, 2013. {{ISBN, 978-3-540-53402-0 *(it) ''Quarant'anni d'innovazione'', ed. Millennium s.r.l, (supplemento al num 224 di Media Duemila, 2005) *(it
torinowireless.it
*(it
smau.it
*(it
corriere.it
*(it
isticom.it
*(it
deputatids.it
*(it
h-care.eu
*(it) Forum P.A. 17–20 maggio 2010 – Cartella Stampa AVAYA

External links

Loquendo website
(archival index) Companies based in Turin