Compound Decomposition in Dutch Large Vocabulary Speech Recognition

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    28 Citations (Scopus)
    11 Downloads (Pure)

    Abstract

    This paper addresses compound splitting for Dutch in the context of broadcast news transcription. Language models were created using original text versions and text versions that were decomposed using a data-driven compound splitting algorithm. Language model performances were compared in terms of out-of- vocabulary rates and word error rates in a real-world broadcast news transcription task. It was concluded that compound splitting does improve ASR performance. Best results were obtained when frequent compounds were not decomposed.
    Original languageUndefined
    Title of host publicationEurospeech 2003
    Place of PublicationGeneva
    PublisherISCA
    Pages-
    Number of pages4
    Publication statusPublished - 2003
    Event8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
    Duration: 1 Sept 20034 Sept 2003
    Conference number: 8

    Publication series

    Name
    PublisherISCA
    ISSN (Print)1018-4074

    Conference

    Conference8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
    Abbreviated titleEUROSPEECH 2003
    Country/TerritorySwitzerland
    CityGeneva
    Period1/09/034/09/03

    Keywords

    • Spoken Document Retrieval
    • Audio search
    • HMI-MR: MULTIMEDIA RETRIEVAL
    • HMI-SLT: Speech and Language Technology
    • METIS-217551
    • IR-63377
    • EWI-6705

    Cite this