The Development of the AMI System for the Transcription of Speech in Meetings

Thomas Hain, Lukas Burget, John Dines, Iain McCowan, Giulia Garau, Martin Karafiat, Mike Lincoln, Roeland J.F. Ordelman, Darren Moore, Vincent Wan, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

10 Citations (Scopus)
148 Downloads (Pure)

Abstract

The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. This paper describes the development of a baseline automatic speech transcription system for meetings in the context of the AMI (Augmented Multiparty Interaction) project. We present several techniques important to processing of this data and show the performance in terms of word error rates (WERs). An important aspect of transcription of this data is the necessary flexibility in terms of audio pre-processing. Real world systems have to deal with flexible input, for example by using microphone arrays or randomly placed microphones in a room. Automatic segmentation and microphone array processing techniques are described and the effect on WERs is discussed. The system and its components presented in this paper yield competitive performance and form a baseline for future research in this domain.
Original languageUndefined
Title of host publicationProceedings 2nd Workshop on Multimodal Interaction and Related Machine Learning Algorithms
EditorsSteve Renals, Samy Bengio
Place of PublicationBerlin
PublisherSpringer
Pages344-356
Number of pages13
ISBN (Print)978-3-540-32549-9
DOIs
Publication statusPublished - 2005

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
Number3869
Volume3869
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • EWI-1830
  • METIS-227320
  • IR-89653

Cite this

Hain, T., Burget, L., Dines, J., McCowan, I., Garau, G., Karafiat, M., ... Renals, S. (2005). The Development of the AMI System for the Transcription of Speech in Meetings. In S. Renals, & S. Bengio (Eds.), Proceedings 2nd Workshop on Multimodal Interaction and Related Machine Learning Algorithms (pp. 344-356). (Lecture Notes in Computer Science; Vol. 3869, No. 3869). Berlin: Springer. https://doi.org/10.1007/11677482_30
Hain, Thomas ; Burget, Lukas ; Dines, John ; McCowan, Iain ; Garau, Giulia ; Karafiat, Martin ; Lincoln, Mike ; Ordelman, Roeland J.F. ; Moore, Darren ; Wan, Vincent ; Renals, Steve. / The Development of the AMI System for the Transcription of Speech in Meetings. Proceedings 2nd Workshop on Multimodal Interaction and Related Machine Learning Algorithms. editor / Steve Renals ; Samy Bengio. Berlin : Springer, 2005. pp. 344-356 (Lecture Notes in Computer Science; 3869).
@inproceedings{44c0b02554d44b6b9acc39f05e9d29f6,
title = "The Development of the AMI System for the Transcription of Speech in Meetings",
abstract = "The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. This paper describes the development of a baseline automatic speech transcription system for meetings in the context of the AMI (Augmented Multiparty Interaction) project. We present several techniques important to processing of this data and show the performance in terms of word error rates (WERs). An important aspect of transcription of this data is the necessary flexibility in terms of audio pre-processing. Real world systems have to deal with flexible input, for example by using microphone arrays or randomly placed microphones in a room. Automatic segmentation and microphone array processing techniques are described and the effect on WERs is discussed. The system and its components presented in this paper yield competitive performance and form a baseline for future research in this domain.",
keywords = "EWI-1830, METIS-227320, IR-89653",
author = "Thomas Hain and Lukas Burget and John Dines and Iain McCowan and Giulia Garau and Martin Karafiat and Mike Lincoln and Ordelman, {Roeland J.F.} and Darren Moore and Vincent Wan and Steve Renals",
note = "Imported from HMI",
year = "2005",
doi = "10.1007/11677482_30",
language = "Undefined",
isbn = "978-3-540-32549-9",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
number = "3869",
pages = "344--356",
editor = "Steve Renals and Samy Bengio",
booktitle = "Proceedings 2nd Workshop on Multimodal Interaction and Related Machine Learning Algorithms",

}

Hain, T, Burget, L, Dines, J, McCowan, I, Garau, G, Karafiat, M, Lincoln, M, Ordelman, RJF, Moore, D, Wan, V & Renals, S 2005, The Development of the AMI System for the Transcription of Speech in Meetings. in S Renals & S Bengio (eds), Proceedings 2nd Workshop on Multimodal Interaction and Related Machine Learning Algorithms. Lecture Notes in Computer Science, no. 3869, vol. 3869, Springer, Berlin, pp. 344-356. https://doi.org/10.1007/11677482_30

The Development of the AMI System for the Transcription of Speech in Meetings. / Hain, Thomas; Burget, Lukas; Dines, John; McCowan, Iain; Garau, Giulia; Karafiat, Martin; Lincoln, Mike; Ordelman, Roeland J.F.; Moore, Darren; Wan, Vincent; Renals, Steve.

Proceedings 2nd Workshop on Multimodal Interaction and Related Machine Learning Algorithms. ed. / Steve Renals; Samy Bengio. Berlin : Springer, 2005. p. 344-356 (Lecture Notes in Computer Science; Vol. 3869, No. 3869).

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - The Development of the AMI System for the Transcription of Speech in Meetings

AU - Hain, Thomas

AU - Burget, Lukas

AU - Dines, John

AU - McCowan, Iain

AU - Garau, Giulia

AU - Karafiat, Martin

AU - Lincoln, Mike

AU - Ordelman, Roeland J.F.

AU - Moore, Darren

AU - Wan, Vincent

AU - Renals, Steve

N1 - Imported from HMI

PY - 2005

Y1 - 2005

N2 - The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. This paper describes the development of a baseline automatic speech transcription system for meetings in the context of the AMI (Augmented Multiparty Interaction) project. We present several techniques important to processing of this data and show the performance in terms of word error rates (WERs). An important aspect of transcription of this data is the necessary flexibility in terms of audio pre-processing. Real world systems have to deal with flexible input, for example by using microphone arrays or randomly placed microphones in a room. Automatic segmentation and microphone array processing techniques are described and the effect on WERs is discussed. The system and its components presented in this paper yield competitive performance and form a baseline for future research in this domain.

AB - The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. This paper describes the development of a baseline automatic speech transcription system for meetings in the context of the AMI (Augmented Multiparty Interaction) project. We present several techniques important to processing of this data and show the performance in terms of word error rates (WERs). An important aspect of transcription of this data is the necessary flexibility in terms of audio pre-processing. Real world systems have to deal with flexible input, for example by using microphone arrays or randomly placed microphones in a room. Automatic segmentation and microphone array processing techniques are described and the effect on WERs is discussed. The system and its components presented in this paper yield competitive performance and form a baseline for future research in this domain.

KW - EWI-1830

KW - METIS-227320

KW - IR-89653

U2 - 10.1007/11677482_30

DO - 10.1007/11677482_30

M3 - Conference contribution

SN - 978-3-540-32549-9

T3 - Lecture Notes in Computer Science

SP - 344

EP - 356

BT - Proceedings 2nd Workshop on Multimodal Interaction and Related Machine Learning Algorithms

A2 - Renals, Steve

A2 - Bengio, Samy

PB - Springer

CY - Berlin

ER -

Hain T, Burget L, Dines J, McCowan I, Garau G, Karafiat M et al. The Development of the AMI System for the Transcription of Speech in Meetings. In Renals S, Bengio S, editors, Proceedings 2nd Workshop on Multimodal Interaction and Related Machine Learning Algorithms. Berlin: Springer. 2005. p. 344-356. (Lecture Notes in Computer Science; 3869). https://doi.org/10.1007/11677482_30