Within the context of international benchmarks and collection specific projects, much work on spoken document retrieval has been done in recent years. In 2000 the issue of automatic speech recognition for spoken document retrieval was declared 'solved' for the broadcast news domain. Many collections, however, are not in this domain and automatic speech recognition for these collections may contain specific new challenges. This requires a method to evaluate automatic speech recognition optimization schemes for these application areas. Traditional measures such as word error rate and story word error rate are not ideal for this. In this paper, three new metrics are proposed. Their behaviour is investigated on a cultural heritage collection and performance is compared to traditional measurements on TREC broadcast news data.
|Publisher||Centre for Telematics and Information Technology, University of Twente|
|Workshop||ACM/SIGIR Workshop on Searching Spontaneous Conversational Speech, SSCS 2007|
|Period||27/07/07 → 27/07/07|
- Spoken Document Retrieval
- Automatic Speech Recognition