ASR-based CALL: integrating automatic speech recognition (ASR) in computer-assisted language learning (CALL)

Abstract
More and more computer-assisted language learning (CALL) applications have 'speech inside'. However, in most cases the speech is produced by the system, i.e. speech is output. The CALL system reads utterances, avatars or movies are shown, and the student has to listen and respond (usually, by means of a mouse or a keyboard). In some of these CALL systems the student is also asked to speak. What these systems do with these utterances spoken by the students differs, e.g. nothing at all, or the speech is recorded to give the teacher the possibility to listen to it (afterwards), or the student immediately has the opportunity to listen to (and/or look at a display of) the recorded utterance, and possibly compare it with an example of a correctly pronounced utterance.

In a few systems automatic speech recognition (ASR) is used to give more detailed feedback. ASR can be briefly described as the conversion of speech into text by a computer. The performance of ASR systems has gradually improved over the last decades, but ASR is certainly not error-free, and probably it will never be, especially for so-called a-typical speech (speech of non-natives or people with communicative disabilities). An important question then is, when and how ASR can usefully be incorporated in applications, such as CALL applications. In my presentation, I will make clear what ASR can and what it cannot (yet) do, within the context of CALL, a-typical speech. Although ASR is not error-free it can successfully be applied in many applications, if one carefully takes its limitations into account. The most well-known application at the moment is probably the reading-tutor, but there are other possibilities. I will present some examples of such applications.