JASMIN-CGN: Uitbreiding van het CGN met spraak van Jongeren, Anderstaligen en Senioren
|Title||JASMIN-CGN: Uitbreiding van het CGN met spraak van Jongeren, Anderstaligen en Senioren|
|Year of Publication||2006|
|Conference Name||Dag van de Fonetiek 2006|
|Authors||van Herwijnen, Olga, and Catia Cucchiarini|
|Publisher||Nederlandse Vereniging voor Fonetische Wetenschappen|
|Conference Location||Utrecht, The Netherlands|
Large speech corpora constitute an indispensable resource for conducting research in speech processing and for developing real-life speech applications. In 2004 the Spoken Dutch Corpus (Corpus Gesproken Nederlands – CGN: a corpus of standard Dutch as spoken by adult natives in the Netherlands and Flanders) became available. Owing to budget constraints, CGN does not include speech of children, non-natives, elderly people and recordings of speech produced in human-machine interactions. Since such recordings would be extremely useful for conducting research and for developing HLT applications for these specific groups of speakers of Dutch, a project was started to extend CGN by collecting a corpus of contemporary Dutch as spoken by children of different age groups, non-natives with different mother tongues and elderly people in the Netherlands and Flanders (JASMIN-CGN). In addition, in this project speech material will be collected in a communication setting that was not envisaged in CGN: human-machine interaction. One third of the data will be collected in Flanders and two thirds in the Netherlands. In this talk I will discuss the rationale of the project, the corpus design, the speech material, the procedure and the use that can be made of the results of this project.