The Development of the AMI System for the Transcription of Speech in Meetings

Thomas Hain, Lukas Burget, John Dines, Iain McCowan, Giulia Garau, Martin Karafiat, Mike Lincoln, Roeland J.F. Ordelman, Darren Moore, Vincent Wan, Steve Renals

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    10 Citations (Scopus)
    335 Downloads (Pure)


    The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. This paper describes the development of a baseline automatic speech transcription system for meetings in the context of the AMI (Augmented Multiparty Interaction) project. We present several techniques important to processing of this data and show the performance in terms of word error rates (WERs). An important aspect of transcription of this data is the necessary flexibility in terms of audio pre-processing. Real world systems have to deal with flexible input, for example by using microphone arrays or randomly placed microphones in a room. Automatic segmentation and microphone array processing techniques are described and the effect on WERs is discussed. The system and its components presented in this paper yield competitive performance and form a baseline for future research in this domain.
    Original languageUndefined
    Title of host publicationProceedings 2nd Workshop on Multimodal Interaction and Related Machine Learning Algorithms
    EditorsSteve Renals, Samy Bengio
    Place of PublicationBerlin
    Number of pages13
    ISBN (Print)978-3-540-32549-9
    Publication statusPublished - 2005
    Event2nd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2005 - Edinburgh, United Kingdom
    Duration: 11 Jul 200513 Jul 2005
    Conference number: MLMI

    Publication series

    NameLecture Notes in Computer Science
    PublisherSpringer Verlag
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349


    Workshop2nd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2005
    Country/TerritoryUnited Kingdom


    • EWI-1830
    • METIS-227320
    • IR-89653

    Cite this