The 2005 AMI System for the Transcription of Speech in Meetings

Thomas Hain, Lukas Burget, John Dines, Giulia Gaurau, Martin Karafiat, Mike Lincoln, Iain McCowan, Roeland J.F. Ordelman, Darren Moore, Vincent Wan, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademic

7 Citations (Scopus)
57 Downloads (Pure)

Abstract

In this paper we describe the 2005 AMI system for the transcription of speech in meetings used for participation in the 2005 NIST RT evaluations. The system was designed for participation in the speech to text part of the evaluations, in particular for transcription of speech recorded with multiple distant microphones and independent headset microphones. System performance was tested on both conference room and lecture style meetings. Although input sources are processed using different front-ends, the recognition process is based on a unified system architecture. The system operates in multiple passes and makes use of state of the art technologies such as discriminative training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, speaker adaptation with maximum likelihood linear regression and minimum word error rate decoding. In this paper we describe the system performance on the official development and test sets for the NIST RT05s evaluations. The system was jointly developed in less than 10 months by a multi-site team and was shown to achieve very competitive performance.
Original languageUndefined
Title of host publication2nd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2005
Place of PublicationBerlin
PublisherSpringer
Pages450-462
ISBN (Print)978-3-540-32549-9
DOIs
Publication statusPublished - 2005

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
Number3869
Volume3869
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • IR-65564
  • METIS-227319
  • EC Grant Agreement nr.: FP6/506811
  • EWI-1829

Cite this

Hain, T., Burget, L., Dines, J., Gaurau, G., Karafiat, M., Lincoln, M., ... Renals, S. (2005). The 2005 AMI System for the Transcription of Speech in Meetings. In 2nd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2005 (pp. 450-462). (Lecture Notes in Computer Science; Vol. 3869, No. 3869). Berlin: Springer. https://doi.org/10.1007/11677482_38
Hain, Thomas ; Burget, Lukas ; Dines, John ; Gaurau, Giulia ; Karafiat, Martin ; Lincoln, Mike ; McCowan, Iain ; Ordelman, Roeland J.F. ; Moore, Darren ; Wan, Vincent ; Renals, Steve. / The 2005 AMI System for the Transcription of Speech in Meetings. 2nd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2005. Berlin : Springer, 2005. pp. 450-462 (Lecture Notes in Computer Science; 3869).
@inproceedings{f06315d162444519b96c4dde9b96d259,
title = "The 2005 AMI System for the Transcription of Speech in Meetings",
abstract = "In this paper we describe the 2005 AMI system for the transcription of speech in meetings used for participation in the 2005 NIST RT evaluations. The system was designed for participation in the speech to text part of the evaluations, in particular for transcription of speech recorded with multiple distant microphones and independent headset microphones. System performance was tested on both conference room and lecture style meetings. Although input sources are processed using different front-ends, the recognition process is based on a unified system architecture. The system operates in multiple passes and makes use of state of the art technologies such as discriminative training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, speaker adaptation with maximum likelihood linear regression and minimum word error rate decoding. In this paper we describe the system performance on the official development and test sets for the NIST RT05s evaluations. The system was jointly developed in less than 10 months by a multi-site team and was shown to achieve very competitive performance.",
keywords = "IR-65564, METIS-227319, EC Grant Agreement nr.: FP6/506811, EWI-1829",
author = "Thomas Hain and Lukas Burget and John Dines and Giulia Gaurau and Martin Karafiat and Mike Lincoln and Iain McCowan and Ordelman, {Roeland J.F.} and Darren Moore and Vincent Wan and Steve Renals",
note = "Imported from HMI",
year = "2005",
doi = "10.1007/11677482_38",
language = "Undefined",
isbn = "978-3-540-32549-9",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
number = "3869",
pages = "450--462",
booktitle = "2nd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2005",

}

Hain, T, Burget, L, Dines, J, Gaurau, G, Karafiat, M, Lincoln, M, McCowan, I, Ordelman, RJF, Moore, D, Wan, V & Renals, S 2005, The 2005 AMI System for the Transcription of Speech in Meetings. in 2nd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2005. Lecture Notes in Computer Science, no. 3869, vol. 3869, Springer, Berlin, pp. 450-462. https://doi.org/10.1007/11677482_38

The 2005 AMI System for the Transcription of Speech in Meetings. / Hain, Thomas; Burget, Lukas; Dines, John; Gaurau, Giulia; Karafiat, Martin; Lincoln, Mike; McCowan, Iain; Ordelman, Roeland J.F.; Moore, Darren; Wan, Vincent; Renals, Steve.

2nd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2005. Berlin : Springer, 2005. p. 450-462 (Lecture Notes in Computer Science; Vol. 3869, No. 3869).

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademic

TY - GEN

T1 - The 2005 AMI System for the Transcription of Speech in Meetings

AU - Hain, Thomas

AU - Burget, Lukas

AU - Dines, John

AU - Gaurau, Giulia

AU - Karafiat, Martin

AU - Lincoln, Mike

AU - McCowan, Iain

AU - Ordelman, Roeland J.F.

AU - Moore, Darren

AU - Wan, Vincent

AU - Renals, Steve

N1 - Imported from HMI

PY - 2005

Y1 - 2005

N2 - In this paper we describe the 2005 AMI system for the transcription of speech in meetings used for participation in the 2005 NIST RT evaluations. The system was designed for participation in the speech to text part of the evaluations, in particular for transcription of speech recorded with multiple distant microphones and independent headset microphones. System performance was tested on both conference room and lecture style meetings. Although input sources are processed using different front-ends, the recognition process is based on a unified system architecture. The system operates in multiple passes and makes use of state of the art technologies such as discriminative training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, speaker adaptation with maximum likelihood linear regression and minimum word error rate decoding. In this paper we describe the system performance on the official development and test sets for the NIST RT05s evaluations. The system was jointly developed in less than 10 months by a multi-site team and was shown to achieve very competitive performance.

AB - In this paper we describe the 2005 AMI system for the transcription of speech in meetings used for participation in the 2005 NIST RT evaluations. The system was designed for participation in the speech to text part of the evaluations, in particular for transcription of speech recorded with multiple distant microphones and independent headset microphones. System performance was tested on both conference room and lecture style meetings. Although input sources are processed using different front-ends, the recognition process is based on a unified system architecture. The system operates in multiple passes and makes use of state of the art technologies such as discriminative training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, speaker adaptation with maximum likelihood linear regression and minimum word error rate decoding. In this paper we describe the system performance on the official development and test sets for the NIST RT05s evaluations. The system was jointly developed in less than 10 months by a multi-site team and was shown to achieve very competitive performance.

KW - IR-65564

KW - METIS-227319

KW - EC Grant Agreement nr.: FP6/506811

KW - EWI-1829

U2 - 10.1007/11677482_38

DO - 10.1007/11677482_38

M3 - Conference contribution

SN - 978-3-540-32549-9

T3 - Lecture Notes in Computer Science

SP - 450

EP - 462

BT - 2nd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2005

PB - Springer

CY - Berlin

ER -

Hain T, Burget L, Dines J, Gaurau G, Karafiat M, Lincoln M et al. The 2005 AMI System for the Transcription of Speech in Meetings. In 2nd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2005. Berlin: Springer. 2005. p. 450-462. (Lecture Notes in Computer Science; 3869). https://doi.org/10.1007/11677482_38