Recognition of meeting actions using information obtained from different modalities -a semantic approach-

N. Jovanovic

    Research output: Book/ReportReportOther research output

    31 Downloads (Pure)


    Meetings play an important role in everyday life. Meeting minutes can serve as a summary of a meeting but they can't provide a trustworthy representation of the meeting. A solution to this problem is to provide audio and video recordings of the meeting. It gives a more realistic representation of meetings but if we need some particular information like "What was discussed?" than it is necessary to replay these recordings several times from the beginning. Therefore, it is very important to develop a system which will enable easy and efficient access to the meetings that have been archived. It is also important to enable the searching in a meeting archive by some criteria like topics, dates, participants, some specific actions during the meeting and also to retrieve a summary according to the user's specification. These are objectives of the M4 (Multimodal Meeting Manager) project. Meetings take place in smart rooms. Smart rooms are environments equipped with multimodal sensors and computers. Smart rooms can automatically identify attendants, transcribe and identify what they say etc. The M4 project is built on the ideas of smart rooms. It is concerned with the construction of a demonstration system to enable structuring, browsing and querying of an archive of automatically analysed meetings, using the outputs of a set of multimodal sensors. There are some ongoing projects with similar issues. The ICSI project is also concerned with the development of a system for recording and browsing meetings; based only on audio data [23]. The closest project to M4 is the Meeting Room project at Carnegie Mellon University [24]. It is concerned with the recording and browsing of meetings using audio and video data. The M4 project proposes several innovations: multimodal localization and tracking of meeting focus, automatic multimodal emotion and intent recognition, gesture and action recognition, textual and multimodal summarization and a framework for integration of multimodal data. A meeting is a dynamic process which consists of group interactions between meeting participants. The group interactions in meetings are called meeting actions. In human-human interaction several communication systems are in use. Humans communicate not only by words but also by face expression, gaze, body and hand gestures etc. These verbal and non-verbal signals are highly connected and they together transmit complete information. In this report we will describe our semantic approach in modelling a meeting as a sequence of meeting actions. The semantic approach is based on representing the meaning of multimodal behaviour of a meeting participant using information obtained from different sources, as well as on recognition of meeting actions using semantic features (dialog acts, topics, participants activities, states and roles etc.) extracted from participants multimodal behaviour.
    Original languageUndefined
    Place of PublicationEnschede
    PublisherCentre for Telematics and Information Technology (CTIT)
    Number of pages40
    Publication statusPublished - Oct 2003

    Publication series

    NameCTIT technical report series
    PublisherUniversity of Twente, Centre for Telematics and Information Technology (CTIT)


    • IR-56961
    • EWI-5806

    Cite this