Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

M.A.H. Huijbregts, Franciska M.G. de Jong

    Research output: Contribution to journalArticleAcademicpeer-review

    19 Citations (Scopus)
    1 Downloads (Pure)


    In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because no parameter tuning is needed and no training data is required to train models for specific sounds, the classifier is able to process a wide range of audio types with varying conditions and thereby contributes to the development of a more robust automatic speech recognition framework. Our speech/non-speech classification system does not attempt to classify all audible non-speech in a single run. Instead, first a bootstrap speech/silence classification is obtained using a standard speech/non-speech classifier. Next, models for speech, silence and audible non-speech are trained on the target audio using the bootstrap classification. The experiments show that the performance of the proposed system is 83% and 44% (relative) better than that of a common broadcast news speech/non-speech classifier when applied to a collection of meetings recorded with table-top microphones and a collection of Dutch television broadcasts used for TRECVID 2007.
    Original languageUndefined
    Pages (from-to)143-153
    Number of pages11
    JournalSpeech communication
    Issue number2
    Publication statusPublished - Feb 2011


    • EWI-18833
    • SHoUT toolkit
    • Speech/non-speech classification
    • rich transcription
    • IR-75066
    • EC Grant Agreement nr.: FP6/506811
    • EC Grant Agreement nr.: FP6/027413
    • METIS-277450
    • EC Grant Agreement nr.: FP6/027685

    Cite this