This thesis introduces a novel framework for the evaluation of Automatic Speech Recognition (ASR) transcripts in an Spoken Document Retrieval (SDR) context. The basic premise is that ASR transcripts must be evaluated by measuring the impact of noise in the transcripts on the search results of a traditional retrievaltask. In this framework, calculating ASR-for-SDR is done through a direct comparison between ranked result lists of IR tasks on a reference and a hypothesis transcript. After proving the theoretical viability of the proposed framework, we investigated the practical aspects and investigated how much reference transcripts are necessary to achieve the high correlations with MAP that were found. This was done by testing on a large number of subsets of various sizes from a 400-hour collection, necessitating the use of artificial queries. We developed an automatic query generation algorithm that was able to generate artificial queries that resulted in our measures having as high a correlation with MAP as when real queries are used. If we allow for a relative standard deviation of the linear correlations at a (somewhat arbitrary) maximum of 3%, we can estimate ASR-for-SDR performance using as little as three hours of reference transcripts. This amount is roughly equal to what is required to do traditional intrinsic evaluation using WER. We therefore concluded that extrinsic evaluation of ASR-for-SDR performance can be done as easily as intrinsic evaluation, without needing more or different resources. Although we strongly recommend using human-generated queries that truly reflect the envisaged use of the SDR system, artificial queries were found to be a reasonable alternative when real queries cannot be easily obtained.
|Award date||5 Jul 2012|
|Place of Publication||Enschede|
|Publication status||Published - 5 Jul 2012|