Lingua Libre

Lingua Libre is an online collaborative project and tool by the Wikimédia France association, which aims to build a collaborative, multilingual, audiovisual speech corpus under a free license.

Lingua Libre
Overview of the website's homepage in December 2020
Type of site
Language recording tool,
Online linguistic media library
Available inMultilingual
OwnerWikimédia France
Created byWikimedia France and the Wikimedia community
URLlingualibre.org
AdvertisingNo
CommercialNo
RegistrationOptional, but required for recording
LaunchedAugust 2016 (2016-08)
Current statusActive
Content license
Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Description

Lingua Libre enables the recording of words, phrases or sentences of any language, oral (audio recording) or signed (video recording).

A recording session with a speaker of the Atikamekw language in 2017 in Montreal.

Words are presented to the speaker in the form of a list, created on the spot, in advance, or by reusing an existing Wikimedia category. The speaker simply reads the word displayed on the screen, and the software moves on to the next word when it detects a silence after the read word.[1] This principle, borrowed from the open source software Shtooka recorder with the help of its creator, Nicolas Vion, makes it possible to record several hundreds of words per hour. The recordings are then uploaded automatically from the web client to the Wikimedia Commons media library.

In spring 2021, Lingua Libre was offline due to a fire in Strasbourg,[2] but no audio recordings were lost.[3]

Use of the recordings

The recordings can be consulted either on Lingua Libre or on Commons. They are mainly used on other Wikimedia projects, for example to illustrate entries on Wiktionaries or proper nouns in Wikipedia articles.[1]

The re-use of the recordings in a language teaching context is envisaged. Language learners can freely download pronunciations and use them on GoldenDict, a popular dictionary software.[4] Thus, audio recordings can be used as “Pronunciation Dictionaries” on GoldenDict without needing internet connection.

The recordings are also reused in Natural Language Processing projects, for example to drive Mozilla's DeepSpeech speech recognition engines.[5]

Versions

Lingua Libre was initiated on January 23, 2015[6] and has had three successive versions:

Lingua Libre v.1 (2016)

As part of the Languages of France project, which aims to document and promote the regional languages of France on Wikimedia and Internet projects in general, the conception of Lingua Libre started in November 2015, partly funded by the DGLFLF (General Delegation for the French language and the languages of France). The first version of the project was launched in August 2016. Only suitable for audio recording, Lingua Libre was shown during a workshop on Occitan language in December 2016,[7][8] and then presented to the online Wikimedia community[9] and at international events in 2017.

Lingua Libre v.2 (2018)

A complete rebuilding was launched at the end of 2017. The new version of Lingua Libre is based on MediaWiki, uses Wikibase and OAuth to better integrate into the Wikimedia environment. The interface is translated via Translatewiki.net so that the project can be used by a large number of communities. The new version of the site was ready in June 2018[10] and opened to the public in August 2018.

Lingua Libre v.2.2 (2020)

In 2020, important changes were made to the platform; a new look was developed especially for the site, the .org domain replaced the .fr domain used until then,[11] and added support for sign languages through video recording.

Statistics

In the first two years of the project's launch, approximately 10,000 recordings were made. The transition to v.2 was accompanied by a sharp increase in the contributions. The number of recordings multiplied by 10 in less than a year, exceeding the 100,000 threshold in May 2019. These recordings were made by 127 speakers in almost 50 languages.[12] By September 2020, the platform had more than 300,000 recordings in 90 languages with more than 350 speakers. The 500,000 recordings milestone was reached in June 2021, thanks to 540 speakers of 120 languages.[13]

See also

References

  1. Sabine Buchwald (2019-08-04). "Wie Wikipedia Bairisch lernt". Süddeutsche Zeitung (in German).
  2. "France : un incendie se déclare au datacenter OVHcloud de Strasbourg". French Wikinews (in French). March 11, 2021.
  3. "Lingua Libre 2.3 - Phoenix Edition ǃ". Meta Wikimedia. March 19, 2021.
  4. "LinguaLibre:Apps - Lingua Libre". lingualibre.org. Retrieved 2023-06-12.
  5. "Modèle français 0.4 pour DeepSpeech v0.6". Mozilla Discourse. March 10, 2020.
  6. Rémy Gerbet (2018-05-14), "Lingua Libre : un nouvel outil collaboratif pour le public et les chercheurs", Culture et Recherche (in French) (137): 52, ISSN 1950-6295
  7. "Oc-a-thon 2016 : deux journées contributives sur l'occitan les 9 et 10 décembre". French Ministry of Culture (in French). 2016-11-16.
  8. Mathieu Denel (21 December 2016). "L'oc-a-thon, un edit-a-thon pour enrichir les projets Wikimedia et Lingua Libre en langue occitane". Wikimédia France Web Blog (in French) (published 2016-12-21). Retrieved 2020-12-03.
  9. French-speaking Wiktionarists (2017-08-01). "Lingua Libre". Actualités du Wiktionnaire (in French). Retrieved 2020-12-02.
  10. French-speaking Wiktionarists (2018-07-01). "Lingua Libre". Actualités du Wiktionnaire (in French). Retrieved 2020-12-02.
  11. Sara Krichen (2 June 2020). "Lingua Libre fait peau neuve !". Wikimédia France Web Blog (in French) (published 2020-06-02). Retrieved 2020-12-02.
  12. Miguel Trancozo Trevino (2020-04-15). "The many languages missing from the internet". BBC.
  13. Lingua Libre's statistics page
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.