FAME! - The Frisian Audio Mining Enterprise

TitleFAME! - The Frisian Audio Mining Enterprise
Publication TypePresentation
Year of Publication2015
Conference NameDag van de Fonetiek 2015
AuthorsYılmaz, Emre, Maaike Andringa, Sigrid Kingma, Frits van der Kuip, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk van den Heuvel, and David van Leeuwen
PublisherNederlandse Vereniging voor Fonetische Wetenschappen
Conference LocationUtrecht, The Netherlands
Abstract

We have recently presented a new speech database containing 18.5 hours of annotated radio broadcasts in the Frisian language. Frisian is mostly spoken in the province Fryslân and it is the second official language of the Netherlands. The native speakers of Frisian are mostly bilingual and often code-switch in daily conversations due to the extensive influence of the Dutch language.

Considering the longitudinal and code-switching nature of the data, an appropriate annotation protocol has been designed and the data is manually annotated with the orthographic transcription, speaker identities, dialect information, code-switching details and background noise/music information. This data has been collected in the scope of the FAME! (Frisian Audio Mining Enterprise) Project which aims to build a spoken document retrieval system for the disclosure of the archives of Omrop Fryslân (Frisian Broadcast) covering a large time span from 1950s to present and a wide variety of topics. Omrop Fryslân is the regional public broadcaster of the province Fryslân and the main data provider of this project with a radio broadcast archive containing more than 2600 hours of recordings.

In this presentation, we will address both the disclosure of this "big data", especially its phonetic aspects, and the rich potential of code switching research using this new database.