We are developing a dialogue manager (DM) for a multimodal interactive Question Answering (QA)
system. Our QA system presents answers using text and pictures, and the user may pose follow-up
questions using text or speech, while indicating screen elements with the mouse. We developed a
corpus of multimodal follow-up questions for this system. This paper describes a detailed analysis of
this corpus, and its impact on the implementation of our system.
We found that users pose two major types of follow-up question: regular questions, which may be
reformulated in such a way as to be answerable to a QA system, and questions asking about a specific
picture. We found that indicating screen elements with the mouse is done often, even in cases where it
may be considered redundant, and that these mouse gestures appear to have a close correspondence
to regular anaphors in the utterance. We also found that users use a limited number of ways to indicate
screen elements with the mouse.
We argue that our QA system will need to annotate its pictures with information about the visual
elements that the picture is made up of. This enables appropriate anaphor resolution and answering
identity questions about these elements. We present our first results of follow-up question handling and
deictic reference resolution, using annotations we made for the pictures of the corpus.
|Publisher||Centro de Linguistica Aplicada|
|Conference||Tenth international symposium on social communication, Cuba|
|City||Santiago de Cuba|
|Period||26/01/07 → …|
- HMI-SLT: Speech and Language Technology
- HMI-MR: MULTIMEDIA RETRIEVAL
- HMI-MI: MULTIMODAL INTERACTIONS