Online Detection Of Vocal Listener Responses With Maximum Latency Constraints

Daniel Neiberg, Khiet Phuong Truong

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

6 Citations (Scopus)
9 Downloads (Pure)

Abstract

When human listeners utter Listener Responses (e.g. back-channels or acknowledgments) such as 'yeah' and 'mmhmm', interlocutors commonly continue to speak or resume their speech even before the listener has ﬿nished his/her response. This type of speech interactivity results in frequent speech overlap which is common in human-human conversation. To allow for this type of speech interactivity to occur between humans and spoken dialog systems, which will result in more human-like continuous and smoother human-machine interaction, we propose an on-line classi﬿er which can classify incoming speech as Listener Responses. We show that it is possible to detect vocal Listener Responses using maximum latency thresholds of 100-500 ms, thereby obtaining equal error rates ranging from 34% to 28% by using an energy based voice activity detector.
Original languageUndefined
Title of host publicationProceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Place of PublicationUSA
PublisherIEEE Signal Processing Society
Pages5836-5839
Number of pages4
ISBN (Print)978-1-4577-0538-0
DOIs
Publication statusPublished - May 2011
EventIEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: 22 May 201127 May 2011

Publication series

Name
PublisherIEEE Signal Processing Society

Conference

ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2011
Abbreviated titleICASSP
CountryCzech Republic
CityPrague
Period22/05/1127/05/11

Keywords

  • METIS-277647
  • EC Grant Agreement nr.: FP7/231287
  • EWI-20186
  • IR-77316

Cite this

Neiberg, D., & Truong, K. P. (2011). Online Detection Of Vocal Listener Responses With Maximum Latency Constraints. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 5836-5839). USA: IEEE Signal Processing Society. https://doi.org/10.1109/ICASSP.2011.5947688
Neiberg, Daniel ; Truong, Khiet Phuong. / Online Detection Of Vocal Listener Responses With Maximum Latency Constraints. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). USA : IEEE Signal Processing Society, 2011. pp. 5836-5839
@inproceedings{dadd51c204e3461ab7a28116fc34df91,
title = "Online Detection Of Vocal Listener Responses With Maximum Latency Constraints",
abstract = "When human listeners utter Listener Responses (e.g. back-channels or acknowledgments) such as 'yeah' and 'mmhmm', interlocutors commonly continue to speak or resume their speech even before the listener has ﬿nished his/her response. This type of speech interactivity results in frequent speech overlap which is common in human-human conversation. To allow for this type of speech interactivity to occur between humans and spoken dialog systems, which will result in more human-like continuous and smoother human-machine interaction, we propose an on-line classi﬿er which can classify incoming speech as Listener Responses. We show that it is possible to detect vocal Listener Responses using maximum latency thresholds of 100-500 ms, thereby obtaining equal error rates ranging from 34{\%} to 28{\%} by using an energy based voice activity detector.",
keywords = "METIS-277647, EC Grant Agreement nr.: FP7/231287, EWI-20186, IR-77316",
author = "Daniel Neiberg and Truong, {Khiet Phuong}",
year = "2011",
month = "5",
doi = "10.1109/ICASSP.2011.5947688",
language = "Undefined",
isbn = "978-1-4577-0538-0",
publisher = "IEEE Signal Processing Society",
pages = "5836--5839",
booktitle = "Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)",

}

Neiberg, D & Truong, KP 2011, Online Detection Of Vocal Listener Responses With Maximum Latency Constraints. in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE Signal Processing Society, USA, pp. 5836-5839, IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2011, Prague, Czech Republic, 22/05/11. https://doi.org/10.1109/ICASSP.2011.5947688

Online Detection Of Vocal Listener Responses With Maximum Latency Constraints. / Neiberg, Daniel; Truong, Khiet Phuong.

Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). USA : IEEE Signal Processing Society, 2011. p. 5836-5839.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Online Detection Of Vocal Listener Responses With Maximum Latency Constraints

AU - Neiberg, Daniel

AU - Truong, Khiet Phuong

PY - 2011/5

Y1 - 2011/5

N2 - When human listeners utter Listener Responses (e.g. back-channels or acknowledgments) such as 'yeah' and 'mmhmm', interlocutors commonly continue to speak or resume their speech even before the listener has ﬿nished his/her response. This type of speech interactivity results in frequent speech overlap which is common in human-human conversation. To allow for this type of speech interactivity to occur between humans and spoken dialog systems, which will result in more human-like continuous and smoother human-machine interaction, we propose an on-line classi﬿er which can classify incoming speech as Listener Responses. We show that it is possible to detect vocal Listener Responses using maximum latency thresholds of 100-500 ms, thereby obtaining equal error rates ranging from 34% to 28% by using an energy based voice activity detector.

AB - When human listeners utter Listener Responses (e.g. back-channels or acknowledgments) such as 'yeah' and 'mmhmm', interlocutors commonly continue to speak or resume their speech even before the listener has ﬿nished his/her response. This type of speech interactivity results in frequent speech overlap which is common in human-human conversation. To allow for this type of speech interactivity to occur between humans and spoken dialog systems, which will result in more human-like continuous and smoother human-machine interaction, we propose an on-line classi﬿er which can classify incoming speech as Listener Responses. We show that it is possible to detect vocal Listener Responses using maximum latency thresholds of 100-500 ms, thereby obtaining equal error rates ranging from 34% to 28% by using an energy based voice activity detector.

KW - METIS-277647

KW - EC Grant Agreement nr.: FP7/231287

KW - EWI-20186

KW - IR-77316

U2 - 10.1109/ICASSP.2011.5947688

DO - 10.1109/ICASSP.2011.5947688

M3 - Conference contribution

SN - 978-1-4577-0538-0

SP - 5836

EP - 5839

BT - Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

PB - IEEE Signal Processing Society

CY - USA

ER -

Neiberg D, Truong KP. Online Detection Of Vocal Listener Responses With Maximum Latency Constraints. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). USA: IEEE Signal Processing Society. 2011. p. 5836-5839 https://doi.org/10.1109/ICASSP.2011.5947688