Compound Decomposition in Dutch Large Vocabulary Speech Recognition

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    28 Citations (Scopus)
    1 Downloads (Pure)

    Abstract

    This paper addresses compound splitting for Dutch in the context of broadcast news transcription. Language models were created using original text versions and text versions that were decomposed using a data-driven compound splitting algorithm. Language model performances were compared in terms of out-of- vocabulary rates and word error rates in a real-world broadcast news transcription task. It was concluded that compound splitting does improve ASR performance. Best results were obtained when frequent compounds were not decomposed.
    Original languageUndefined
    Title of host publicationEurospeech 2003
    Place of PublicationGeneva
    PublisherISCA
    Pages-
    Number of pages4
    Publication statusPublished - 2003
    EventEurospeech 2003 - Geneva, Switzerland
    Duration: 1 Sept 20034 Sept 2003

    Publication series

    Name
    PublisherISCA
    ISSN (Print)1018-4074

    Conference

    ConferenceEurospeech 2003
    Period1/09/034/09/03
    OtherSeptember 1-4, 2003

    Keywords

    • Spoken Document Retrieval
    • Audio search
    • HMI-MR: MULTIMEDIA RETRIEVAL
    • HMI-SLT: Speech and Language Technology
    • METIS-217551
    • IR-63377
    • EWI-6705

    Cite this