Detection of nonverbal vocalizations using Gaussian Mixture Models: looking for fillers and laughter in conversational speech

Teun F. Krikke, Khiet Phuong Truong

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    13 Citations (Scopus)
    182 Downloads (Pure)

    Abstract

    In this paper, we analyze acoustic profiles of fillers (i.e. filled pauses, FPs) and laughter with the aim to automatically localize these nonverbal vocalizations in a stream of audio. Among other features, we use voice quality features to capture the distinctive production modes of laughter and spectral similarity measures to capture the stability of the oral tract that is characteristic for FPs. Classification experiments with Gaussian Mixture Models and various sets of features are performed. We find that Mel-Frequency Cepstrum Coefficients are performing relatively well in comparison to other features for both FPs and laughter. In order to address the large variation in the frame-wise decision scores (e.g., log-likelihood ratios) observed in sequences of frames we apply a median filter to these scores, which yields large performance improvements. Our analyses and results are presented within the framework of this year’s Interspeech Computational Paralinguistics sub-Challenge on Social Signals.
    Original languageUndefined
    Title of host publicationProceedings of the 14th Annual Conference of the International Speech Communication Association, Interspeech 2013
    Place of PublicationBaixas, Framce
    PublisherInternational Speech Communication Association
    Pages163-167
    Number of pages5
    ISBN (Print)2308-457X
    Publication statusPublished - Aug 2013
    Event14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France
    Duration: 25 Aug 201329 Aug 2013
    Conference number: 14
    http://www.interspeech2013.org/

    Publication series

    Name
    PublisherInternational Speech Communication Association
    ISSN (Print)2308-457X

    Conference

    Conference14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013
    Abbreviated titleINTERSPEECH
    Country/TerritoryFrance
    CityLyon
    Period25/08/1329/08/13
    Internet address

    Keywords

    • EWI-24531
    • Nonverbal vocalizations
    • filled pauses
    • IR-89700
    • Detection
    • Laughter
    • METIS-302882
    • Social Signal Processing

    Cite this