HOME

TheInfoList



OR:

Lingua Libre is an online collaborative project and tool by the Wikimedia France association, which aims to build a
collaborative Collaboration (from Latin ''com-'' "with" + ''laborare'' "to labor", "to work") is the process of two or more people, entities or organizations working together to complete a task or achieve a goal. Collaboration is similar to cooperation. Most ...
, multilingual, audiovisual corpus under
free license A free license or open license is a license which allows others to reuse another creator’s work as they wish. Without a special license, these uses are normally prohibited by copyright, patent or commercial license. Most free licenses are wo ...
.


Description

Lingua Libre enables to record words, phrases or sentences of any language, oral ( audio recording) or signed ( video recording). Words are presented to the speaker in the form of a list, created on the spot or in advance, or reusing an existing Wikimedia category. The speaker simply reads the word displayed on the screen, and the software moves on to the next word when it detects a silence after the read word. This principle, borrowed from the open source software Shtooka recorder with the help of its creator, Nicolas Vion, makes it possible to record several hundreds of words per hour. The recordings are then uploaded automatically from the web client to the Wikimedia Commons media library. In spring 2021, Lingua Libre was offline due to a fire in Strasbourg, but no audio recordings were lost.


Use of the recordings

The recordings can be consulted either on Lingua Libre or on Commons. They are mainly used on other Wikimedia projects, for example to illustrate entries on Wiktionaries or proper nouns in Wikipedia articles. The re-use of the recordings in a language teaching context is envisaged. Language learners can freely download pronunciations and use them on GoldenDict, a popular dictionary Software. Thus, audio recordings can be used as ''“Pronunciation Dictionaries”'' on GoldenDict without needing internet connection. The recordings are also reused in
Natural Language Processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
projects, for example to drive Mozilla's DeepSpeech speech recognition engines.


Versions

Lingua Libre was initiated on January 23, 2015 and has had three successive versions:


Lingua Libre v.1 (2016)

As part of the ''Languages of France'' project, which aims to document and promote the regional languages of France on Wikimedia and Internet projects in general, the conception of Lingua Libre started in November 2015, partly funded by the DGLFLF (
General Delegation for the French language and the languages of France General Delegation for the French language and the languages of France (french: Délégation générale à la langue française et aux langues de France, DGLFLF) is, in France, a unit of the Ministry of Culture and Communication. Its mission is to ...
). The first version of the project is launched in August 2016. Only suitable for audio recording, Lingua Libre is shown during a workshop on Occitan language in December 2016, and then presented to the online Wikimedia community and at international events in 2017.


Lingua Libre v.2 (2018)

A complete rebuilding is launched at the end of 2017. The new version of Lingua Libre is based on MediaWiki, uses Wikibase and OAuth to better integrate into the Wikimedia environment. The interface is translated via
Translatewiki.net translatewiki.net, formerly named Betawiki, is a web-based translation platform powered by the Translate extension for MediaWiki. It can be used to translate various kinds of texts but is commonly used for creating localisations for software ...
so that the project can be used by a large number of communities. The new version of the site is ready in June 2018 and opens to the public in August 2018.


Lingua Libre v.2.2 (2020)

In 2020, important changes are made to the platform; a new look is developed especially for the site, the
.org The domain name .org is a generic top-level domain (gTLD) of the Domain Name System (DNS) used on the Internet. The name is truncated from ''organization''. It was one of the original domains established in 1985, and has been operated by th ...
domain replaces the
.fr .fr is the Internet country code top-level domain (ccTLD) in the Domain Name System of the Internet for France. It is administered by AFNIC. The domain includes all individuals and organizations registered at the Association française pour le ...
domain used until then. Lingua Libre now supports signed language through video recording. File:Lingua Libre recording studio.png, alt=Screenshot of the Recording Studio in September 2017, Recording Studio in September 2017 (v.1) File:Enregistrement de mots sur Lingua Libre.jpg, alt=Screenshot of the Recording Studio in December 2018, Recording Studio in December 2018 (v.2) File:Lingua Libre - Record Wizard - Studio.png, alt=Screenshot of the Recording Studio in October 2020, Recording Studio in October 2020 (v.2.2)


Statistics

In the first two years of the project's launch, approximately 10,000 recordings were made. The transition to v.2 is accompanied by a sharp increase in the contribution. The number of recordings is multiplied by 10 in less than a year, exceeding the 100,000 threshold in May 2019. These recordings were made by 127 speakers in almost 50 languages. By September 2020, the platform had more than 300,000 recordings in 90 languages with more than 350 speakers. The 500,000 recordings milestone was reached in June 2021, thanks to 540 speakers of 120 languages. Lingua Libre's statistics page


See also

* Forvo * Common Voice * GoldenDict * Tatoeba


References


External links

* * {{Portal bar, Linguistics Free software MediaWiki websites Wikis Language documentation Corpus linguistics Linguistics 2016 software Creative Commons-licensed databases