2015 FAME! - The Frisian Audio Mining Enterprise

Authors
Emre Yılmaz, Maaike Andringa, Sigrid Kingma, Frits van der Kuip, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk van den Heuvel & David van Leeuwen
Abstract
We have recently presented a new speech database containing 18.5 hours of annotated radio broadcasts in the Frisian language. Frisian is mostly spoken in the province Fryslân and it is the second official language of the Netherlands. The native speakers of Frisian are mostly bilingual and often code-switch in daily conversations due to the extensive influence of the Dutch language.

Considering the longitudinal and code-switching nature of the data, an appropriate annotation protocol has been designed and the data is manually annotated with the orthographic transcription, speaker identities, dialect information, code-switching details and background noise/music information. This data has been collected in the scope of the FAME! (Frisian Audio Mining Enterprise) Project which aims to build a spoken document retrieval system for the disclosure of the archives of Omrop Fryslân (Frisian Broadcast) covering a large time span from 1950s to present and a wide variety of topics. Omrop Fryslân is the regional public broadcaster of the province Fryslân and the main data provider of this project with a radio broadcast archive containing more than 2600 hours of recordings.

In this presentation, we will address both the disclosure of this "big data", especially its phonetic aspects, and the rich potential of code switching research using this new database.
Publication type
Presentation
Year of publication
2015
Conference location
Utrecht
Conference name
Dag van de Fonetiek 2015
Publisher
Nederlandse Vereniging voor Fonetische Wetenschappen