@article {368, title = {Audiovisual cues to uncertainty}, year = {2003}, publisher = {Nederlandse Vereniging voor Fonetische Wetenschappen}, address = {Tilburg, The Netherlands}, abstract = {

Uncertainty is an inherent element of human-machine communication. Uncertainty is often not made explicit in the system{\textquoteright}s output, nor have there been many efforts to detect uncertainty in users{\textquoteright} reactions. Our work concerned with the expression of (degree of) uncertainty in spoken interactions. It focuses on the production and perception of auditory and visual cues, in isolation and in combination. The ultimate goal is to implement possible audiovisual cues to uncertainty in a synthetic talking head of an embodied conversational agent (ECA). We conjecture that a user{\textquoteright}s acceptance of incorrect system output is higher if the system made it clear in its self- presentation that it is not sure about the answer. Our approach builds on previous studies on the so-called Feeling-of-Knowing (FOK) (e.g. Smith and Clark, 1993), be it that we also include possible visual cues to FOK. Following earlier procedures, our study A consists of three parts: first, in an individually performed test, subjects are instructed to answer 40 factual questions (e.g. what is the capital of Switzerland?, who wrote Hamlet?); questions are posed by the experimentor whom the subjects cannot see, and the responses by the subject are videotaped (front view of head). After this test, the same sequence of questions is again presented to them, but now they have to express on a 7-point scale how sure they are that they would recognize the correct answer if they would have to find it in a multiple-choice test. The final test consists of this actual multiple-choice test. All utterances from the first test of Study A (800 in total) were transcribed orthographically and manually labelled regarding a number of auditive and visual features by four independent transcribers on the basis of an explicit labelling protocol, which included various double-checks. On average, subjects knew the answer to 30 of the 40 questions. When they did not know or could not remember the answer, they sometimes made a guess or gave a non- answer. It appears that their FOK scores not only correlate with their performance in the third multiple-choice test, but also with particular features of the utterances of the first test: lower scores, in line with previous results, correlate with long delay, the occurrence of filled pauses and question intonation. In addition, it appears that speakers tend to use more words, when they have a lower FOK. Regarding the visual cues, low FOK is reflected in averted gaze, more head movements, eyebrow frowning, and overall more body movement. Also, a puzzling look appears to correlate with low FOK, whereas a self-congratulatory expression is more typical for high FOK answers. The goal of our Study B is to explore whether observers of the speakers{\textquoteright} answers of Study A are able to guess these speakers{\textquoteright} FOK scores. In particular, we are interested in whether a bimodal presentation of stimuli leads to better FOK predictions than the unimodal components in isolation. To test this, we are currently preparing a perception test in which a subset of the utterances of Study A will be presented to subjects. These are instructed to guess what the speaker{\textquoteright}s FOK was when s/he gave an answer (cfr. Brennan and Williams, 1995). Stimuli will be presented in three conditions: image only, sound only, both image and sound. From the original 800 responses, we select 60 utterances, with an equal amount of answers and non- answers, and an even distribution of high and low FOK scores. While we expect that we get the best performance for bimodal stimuli, it remains an interesting empirical question whether the auditory or the visual features from the unimodal stimuli will turn out to be more informative for FOK predictions. The experiment is currently taking place. Results of this additional test will also be discussed in my talk.

}, author = {Marc Swerts} }