SpeechBot was a
web search engine
A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
for
streaming media
Streaming media refers to multimedia delivered through a Computer network, network for playback using a Media player (disambiguation), media player. Media is transferred in a ''stream'' of Network packet, packets from a Server (computing), ...
content developed at
Compaq
Compaq Computer Corporation was an American information technology, information technology company founded in 1982 that developed, sold, and supported computers and related products and services. Compaq produced some of the first IBM PC compati ...
's (later
HP) research laboratories in
Cambridge, MA and
Australia
Australia, officially the Commonwealth of Australia, is a country comprising mainland Australia, the mainland of the Australia (continent), Australian continent, the island of Tasmania and list of islands of Australia, numerous smaller isl ...
. Compaq launched the website at Streaming Media West 1999 in San Jose, CA. The
internet radio shows indexed by SpeechBot included
The Motley Fool,
Fresh Air,
Talk of the Nation,
The Dr. Laura Program, and
Dreamland with
Art Bell. By June 2003, the service had indexed over 17,000 hours of multimedia content. The website was taken offline in 2005, after HP closed their Cambridge research lab.
The SpeechBot indexing
workflow involved a farm of
Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
workstations that retrieved the streaming content; and a
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
cluster running
speech recognition to transcribe the spoken audio. The
web server
A web server is computer software and underlying Computer hardware, hardware that accepts requests via Hypertext Transfer Protocol, HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, co ...
,
search index and
metadata library were hosted on
AlphaServers running
Tru64 UNIX
Tru64 UNIX is a discontinued 64-bit UNIX operating system for the DEC Alpha, Alpha instruction set architecture (ISA), currently owned by Hewlett-Packard (HP). Previously, Tru64 UNIX was a product of Compaq, and before that, Digital Equipment Corp ...
.
If
transcripts were already available, then these were aligned to the audio stream; otherwise, an approximate transcript was produced using speech recognition. The Calista recognizer that was used was derived from
Sphinx-3. Due to the low quality of streaming audio at the time, the
word error rate was quite high, but most searches were still able to retrieve relevant hits.
The search results linked to the offset in the stream that corresponded to the search phrase, so that users did not need to listen to the entire program to find the section of interest.
References
Further reading
*
*
*
*
*
*
*
*
Hewlett-Packard
Defunct internet search engines
1999 software
Internet properties established in 1999
Internet properties disestablished in 2005
{{web-software-stub