Automatic phonetic transcription of large speech corpora: a comparative study
Title | Automatic phonetic transcription of large speech corpora: a comparative study |
Publication Type | Presentation |
Year of Publication | 2006 |
Conference Name | Summer Meeting on Corpus-based Research |
Authors | Van Bael, Christophe |
Publisher | Nederlandse Vereniging voor Fonetische Wetenschappen |
Conference Location | Nijmegen, The Netherlands |
Abstract | In a recent study, we investigated whether automatic transcription procedures can approximate manually verified phonetic transcriptions typically delivered with contemporary large speech corpora. Ten automatic procedures were used to generate a broad phonetic transcription of well-prepared speech (read-aloud texts) and spontaneous speech (telephone dialogues) from the Spoken Dutch Corpus. The resulting transcriptions were compared to manually verified phonetic transcriptions from the same corpus. We found that signal-based procedures could not approximate the manually verified phonetic transcriptions. A knowledge-based procedure did not give optimal results either. Quite surprisingly, a procedure in which a canonical transcription, through the use of decision trees and a small sample of manually verified phonetic transcriptions, was modelled towards the target transcription, performed best. The number and the nature of the remaining discrepancies compared to inter-labeller disagreements reported in the literature. This implies that future corpus designers should consider the use of automatic transcription procedures as a valid and cheap alternative to expensive human experts. |