Online Detection Of Vocal Listener Responses With Maximum Latency Constraints

Daniel Neiberg, Khiet Phuong Truong

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    10 Citations (Scopus)
    61 Downloads (Pure)

    Abstract

    When human listeners utter Listener Responses (e.g. back-channels or acknowledgments) such as 'yeah' and 'mmhmm', interlocutors commonly continue to speak or resume their speech even before the listener has ﬿nished his/her response. This type of speech interactivity results in frequent speech overlap which is common in human-human conversation. To allow for this type of speech interactivity to occur between humans and spoken dialog systems, which will result in more human-like continuous and smoother human-machine interaction, we propose an on-line classi﬿er which can classify incoming speech as Listener Responses. We show that it is possible to detect vocal Listener Responses using maximum latency thresholds of 100-500 ms, thereby obtaining equal error rates ranging from 34% to 28% by using an energy based voice activity detector.
    Original languageUndefined
    Title of host publicationProceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
    Place of PublicationUSA
    PublisherIEEE
    Pages5836-5839
    Number of pages4
    ISBN (Print)978-1-4577-0538-0
    DOIs
    Publication statusPublished - May 2011
    EventIEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2011 - Prague, Czech Republic
    Duration: 22 May 201127 May 2011

    Publication series

    Name
    PublisherIEEE Signal Processing Society

    Conference

    ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2011
    Abbreviated titleICASSP
    Country/TerritoryCzech Republic
    CityPrague
    Period22/05/1127/05/11

    Keywords

    • METIS-277647
    • EC Grant Agreement nr.: FP7/231287
    • EWI-20186
    • IR-77316

    Cite this