Abstract
This paper addresses compound splitting for Dutch in the context of broadcast news transcription. Language models were created using original text versions and text versions that were decomposed using a data-driven compound splitting algorithm. Language model performances were compared in terms of out-of- vocabulary rates and word error rates in a real-world broadcast news transcription task. It was concluded that compound splitting does improve ASR performance. Best results were obtained when frequent compounds were not decomposed.
Original language | Undefined |
---|---|
Title of host publication | Eurospeech 2003 |
Place of Publication | Geneva |
Publisher | ISCA |
Pages | - |
Number of pages | 4 |
Publication status | Published - 2003 |
Event | 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland Duration: 1 Sept 2003 → 4 Sept 2003 Conference number: 8 |
Publication series
Name | |
---|---|
Publisher | ISCA |
ISSN (Print) | 1018-4074 |
Conference
Conference | 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 |
---|---|
Abbreviated title | EUROSPEECH 2003 |
Country/Territory | Switzerland |
City | Geneva |
Period | 1/09/03 → 4/09/03 |
Keywords
- Spoken Document Retrieval
- Audio search
- HMI-MR: MULTIMEDIA RETRIEVAL
- HMI-SLT: Speech and Language Technology
- METIS-217551
- IR-63377
- EWI-6705