TY - GEN
T1 - Audiovisual laughter detection based on temporal features
AU - Petridis, Stavros
AU - Pantic, Maja
N1 - Conference code: 10
PY - 2008/10
Y1 - 2008/10
N2 - Previous research on automatic laughter detection has mainly been focused on audio-based detection. In this study we present an audio-visual approach to distinguishing laughter from speech based on temporal features and we show that integrating the information from audio and video channels leads to improved performance over single-modal approaches. Static features are extracted on an audio/video frame basis and then combined with temporal features extracted over a temporal window, describing the evolution of static features over time. The use of several different temporal features has been investigated and it has been shown that the addition of temporal information results in an improved performance over utilizing static information only. It is common to use a fixed set of temporal features which implies that all static features will exhibit the same behaviour over a temporal window. However, this does not always hold and we show that when AdaBoost is used as a feature selector, different temporal features for each static feature are selected, i.e., the temporal evolution of each static feature is described by different statistical measures. When tested on 96 audiovisual sequences, depicting spontaneously displayed (as opposed to posed) laughter and speech episodes, in a person independent way the proposed audiovisual approach achieves an F1 rate of over 89%.
AB - Previous research on automatic laughter detection has mainly been focused on audio-based detection. In this study we present an audio-visual approach to distinguishing laughter from speech based on temporal features and we show that integrating the information from audio and video channels leads to improved performance over single-modal approaches. Static features are extracted on an audio/video frame basis and then combined with temporal features extracted over a temporal window, describing the evolution of static features over time. The use of several different temporal features has been investigated and it has been shown that the addition of temporal information results in an improved performance over utilizing static information only. It is common to use a fixed set of temporal features which implies that all static features will exhibit the same behaviour over a temporal window. However, this does not always hold and we show that when AdaBoost is used as a feature selector, different temporal features for each static feature are selected, i.e., the temporal evolution of each static feature is described by different statistical measures. When tested on 96 audiovisual sequences, depicting spontaneously displayed (as opposed to posed) laughter and speech episodes, in a person independent way the proposed audiovisual approach achieves an F1 rate of over 89%.
KW - EC Grant Agreement nr.: FP7/211486
KW - EC Grant Agreement nr.: FP6/0027787
KW - HMI-MI: MULTIMODAL INTERACTIONS
KW - Audiovisual data processing
KW - Laughter detection
KW - Computing methodologies
KW - Non-linguistic information processing
U2 - 10.1145/1452392.1452402
DO - 10.1145/1452392.1452402
M3 - Conference contribution
SN - 978-1-60558-198-9
SP - 37
EP - 44
BT - ICMI '08
PB - Association for Computing Machinery
CY - New York
T2 - 10th International Conference on Multimodal Interfaces, ICMI 2008
Y2 - 20 October 2008 through 22 October 2008
ER -