Automatic assessment of native, normally formed, read or repeated speech

TitleAutomatic assessment of native, normally formed, read or repeated speech
Publication TypePresentation
Year of Publication2010
AuthorsVan hamme, Hugo
PublisherNederlandse Vereniging voor Fonetische Wetenschappen
Conference LocationNijmegen, The Netherlands

In reading education and speech therapy, teachers and therapists often need to assess if a known utterance is pronounced up to the expected standard. While training their reading skills, regular pupils as well as persons with a reading disorder may produce reading miscues. One of the tasks of the teacher or therapist is to detect these (reading skill evaluation) and give corrective feedback (training). In another setting, persons who have lost their hearing and who have a cochlear implant need to be trained to their new bionic ear. A therapist will read a sentence which the patient is to repeat as accurately as possible.

In the therapy and evaluation setting of the above examples, a one-on-one setting is used in practice. This is an expensive solution in terms of labour cost as well as in terms of logistics to bring patient and therapist together. Reading training is often done collectively in today’s classrooms, but a more personalized training is in demand. The result is that the number of one-on-one practice hours is reduced from the ideal. This calls for computer programs that incorporate automated methods of speech assessment and that the pupil/patient can use in addition to the scheduled contact hours. Additionally, automated methods have the advantage to have endless patience and do not suffer from examiner bias, i.e. apply the same metrics to all, irrespective of examiner, place, time and history.

In this contribution, we show how speech recognition technology can be applied to come to an automated assessment. We describe a method for dealing with imperfect phone recognition while exploiting acoustic, lexical and phonotactic knowledge as well as knowledge of the intended sentence. Finally, by giving performance data in real settings, we show what we can and cannot expect from automated speech assessment.