Evaluation of Noisy Transcripts for Spoken Document Retrieval

Laurens Bastiaan van der Werff

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

22 Downloads (Pure)

Abstract

This thesis introduces a novel framework for the evaluation of Automatic Speech Recognition (ASR) transcripts in an Spoken Document Retrieval (SDR) context. The basic premise is that ASR transcripts must be evaluated by measuring the impact of noise in the transcripts on the search results of a traditional retrievaltask. In this framework, calculating ASR-for-SDR is done through a direct comparison between ranked result lists of IR tasks on a reference and a hypothesis transcript. After proving the theoretical viability of the proposed framework, we investigated the practical aspects and investigated how much reference transcripts are necessary to achieve the high correlations with MAP that were found. This was done by testing on a large number of subsets of various sizes from a 400-hour collection, necessitating the use of artificial queries. We developed an automatic query generation algorithm that was able to generate artificial queries that resulted in our measures having as high a correlation with MAP as when real queries are used. If we allow for a relative standard deviation of the linear correlations at a (somewhat arbitrary) maximum of 3%, we can estimate ASR-for-SDR performance using as little as three hours of reference transcripts. This amount is roughly equal to what is required to do traditional intrinsic evaluation using WER. We therefore concluded that extrinsic evaluation of ASR-for-SDR performance can be done as easily as intrinsic evaluation, without needing more or different resources. Although we strongly recommend using human-generated queries that truly reflect the envisaged use of the SDR system, artificial queries were found to be a reasonable alternative when real queries cannot be easily obtained.
Original languageUndefined
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • de Jong, Franciska M.G., Supervisor
Thesis sponsors
Award date5 Jul 2012
Place of PublicationEnschede
Publisher
Print ISBNs978-94-6203-066-4
Publication statusPublished - 5 Jul 2012

Keywords

  • IR-80673
  • EWI-22067
  • METIS-287935
  • CATCH

Cite this

van der Werff, L. B. (2012). Evaluation of Noisy Transcripts for Spoken Document Retrieval. Enschede: University of Twente.
van der Werff, Laurens Bastiaan. / Evaluation of Noisy Transcripts for Spoken Document Retrieval. Enschede : University of Twente, 2012. 136 p.
@phdthesis{fc9d97a0d2994de79497eaeb569d8228,
title = "Evaluation of Noisy Transcripts for Spoken Document Retrieval",
abstract = "This thesis introduces a novel framework for the evaluation of Automatic Speech Recognition (ASR) transcripts in an Spoken Document Retrieval (SDR) context. The basic premise is that ASR transcripts must be evaluated by measuring the impact of noise in the transcripts on the search results of a traditional retrievaltask. In this framework, calculating ASR-for-SDR is done through a direct comparison between ranked result lists of IR tasks on a reference and a hypothesis transcript. After proving the theoretical viability of the proposed framework, we investigated the practical aspects and investigated how much reference transcripts are necessary to achieve the high correlations with MAP that were found. This was done by testing on a large number of subsets of various sizes from a 400-hour collection, necessitating the use of artificial queries. We developed an automatic query generation algorithm that was able to generate artificial queries that resulted in our measures having as high a correlation with MAP as when real queries are used. If we allow for a relative standard deviation of the linear correlations at a (somewhat arbitrary) maximum of 3{\%}, we can estimate ASR-for-SDR performance using as little as three hours of reference transcripts. This amount is roughly equal to what is required to do traditional intrinsic evaluation using WER. We therefore concluded that extrinsic evaluation of ASR-for-SDR performance can be done as easily as intrinsic evaluation, without needing more or different resources. Although we strongly recommend using human-generated queries that truly reflect the envisaged use of the SDR system, artificial queries were found to be a reasonable alternative when real queries cannot be easily obtained.",
keywords = "IR-80673, EWI-22067, METIS-287935, CATCH",
author = "{van der Werff}, {Laurens Bastiaan}",
note = "SIKS Dissertation Series; no. 2012-24 ; Continuous Access to Cultural Heritage (CATCH)",
year = "2012",
month = "7",
day = "5",
language = "Undefined",
isbn = "978-94-6203-066-4",
publisher = "University of Twente",
address = "Netherlands",
school = "University of Twente",

}

van der Werff, LB 2012, 'Evaluation of Noisy Transcripts for Spoken Document Retrieval', University of Twente, Enschede.

Evaluation of Noisy Transcripts for Spoken Document Retrieval. / van der Werff, Laurens Bastiaan.

Enschede : University of Twente, 2012. 136 p.

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

TY - THES

T1 - Evaluation of Noisy Transcripts for Spoken Document Retrieval

AU - van der Werff, Laurens Bastiaan

N1 - SIKS Dissertation Series; no. 2012-24 ; Continuous Access to Cultural Heritage (CATCH)

PY - 2012/7/5

Y1 - 2012/7/5

N2 - This thesis introduces a novel framework for the evaluation of Automatic Speech Recognition (ASR) transcripts in an Spoken Document Retrieval (SDR) context. The basic premise is that ASR transcripts must be evaluated by measuring the impact of noise in the transcripts on the search results of a traditional retrievaltask. In this framework, calculating ASR-for-SDR is done through a direct comparison between ranked result lists of IR tasks on a reference and a hypothesis transcript. After proving the theoretical viability of the proposed framework, we investigated the practical aspects and investigated how much reference transcripts are necessary to achieve the high correlations with MAP that were found. This was done by testing on a large number of subsets of various sizes from a 400-hour collection, necessitating the use of artificial queries. We developed an automatic query generation algorithm that was able to generate artificial queries that resulted in our measures having as high a correlation with MAP as when real queries are used. If we allow for a relative standard deviation of the linear correlations at a (somewhat arbitrary) maximum of 3%, we can estimate ASR-for-SDR performance using as little as three hours of reference transcripts. This amount is roughly equal to what is required to do traditional intrinsic evaluation using WER. We therefore concluded that extrinsic evaluation of ASR-for-SDR performance can be done as easily as intrinsic evaluation, without needing more or different resources. Although we strongly recommend using human-generated queries that truly reflect the envisaged use of the SDR system, artificial queries were found to be a reasonable alternative when real queries cannot be easily obtained.

AB - This thesis introduces a novel framework for the evaluation of Automatic Speech Recognition (ASR) transcripts in an Spoken Document Retrieval (SDR) context. The basic premise is that ASR transcripts must be evaluated by measuring the impact of noise in the transcripts on the search results of a traditional retrievaltask. In this framework, calculating ASR-for-SDR is done through a direct comparison between ranked result lists of IR tasks on a reference and a hypothesis transcript. After proving the theoretical viability of the proposed framework, we investigated the practical aspects and investigated how much reference transcripts are necessary to achieve the high correlations with MAP that were found. This was done by testing on a large number of subsets of various sizes from a 400-hour collection, necessitating the use of artificial queries. We developed an automatic query generation algorithm that was able to generate artificial queries that resulted in our measures having as high a correlation with MAP as when real queries are used. If we allow for a relative standard deviation of the linear correlations at a (somewhat arbitrary) maximum of 3%, we can estimate ASR-for-SDR performance using as little as three hours of reference transcripts. This amount is roughly equal to what is required to do traditional intrinsic evaluation using WER. We therefore concluded that extrinsic evaluation of ASR-for-SDR performance can be done as easily as intrinsic evaluation, without needing more or different resources. Although we strongly recommend using human-generated queries that truly reflect the envisaged use of the SDR system, artificial queries were found to be a reasonable alternative when real queries cannot be easily obtained.

KW - IR-80673

KW - EWI-22067

KW - METIS-287935

KW - CATCH

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-94-6203-066-4

PB - University of Twente

CY - Enschede

ER -

van der Werff LB. Evaluation of Noisy Transcripts for Spoken Document Retrieval. Enschede: University of Twente, 2012. 136 p.