Methodologies for improving the g2p conversion of Dutch names

Authors

Henk van den Heuvel

Abstract

Names pose particular problems for grapheme-to-phoneme (g2p) converters. This is due to their non-standard orthography caused by foreign origin or fossilisation of older spelling forms. In the Autonomata project a variety of techniques is studied to improve the g2p conversion of Dutch names, more specifically: first names, second names, street names and town names. In Autonomata, a standard g2p converter is augmented with a name-specific phoneme-to-phoneme (p2p) converter that captures the peculiarities of names. Based on large collections of names with a manually verified phonetic transcription, the p2p is trained with the specific information it requires. Various inductive and deductive approaches are studied to achive this goal. We will exemplify our approach by showing results on the g2p of Dutch first names.

Autonomata is carried out in the framework of the STEVIN-programme.

Partners in the project are the Radboud University Nijmegen, Ghent University, Utrecht University, Nuance, and TeleAtlas.

Publication type

Presentation

Year of publication

2006

Conference location

Nijmegen

Conference name

Summer Meeting on Corpus-based Research 2006

Publisher

Nederlandse Vereniging voor Fonetische Wetenschappen