Abstract

Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by combining (fusing) the results of a separate audio and video classifier on the decision level. The video-classifier uses features based on the principal components of 20 tracked facial points, for audio we use the commonly used PLP and RASTA-PLP features. Our results indicate that RASTA-PLP features outperform PLP features for laughter detection in audio. We compared hidden Markov models (HMMs), Gaussian mixture models (GMMs) and support vector machines (SVM) based classifiers, and found that RASTA-PLP combined with a GMM resulted in the best performance for the audio modality. The video features classified using a SVM resulted in the best single-modality performance. Fusion on the decision-level resulted in laughter detection with a significantly better performance than single-modality classification.
Original languageUndefined
Place of PublicationEnschede
PublisherCentre for Telematics and Information Technology (CTIT)
Number of pages41
StatePublished - 3 Dec 2007

Publication series

NameCTIT Technical Report Series
PublisherCentre for Telematics and Information Technology, University of Twente
No.TR-CTIT-07-84
ISSN (Print)1381-3625

Fingerprint

Classifiers
Support vector machines
Hidden Markov models
Fusion reactions

Keywords

  • HMI-MI: MULTIMODAL INTERACTIONS
  • METIS-247035
  • IR-64474
  • HMI-CI: Computational Intelligence
  • HMI-MR: MULTIMEDIA RETRIEVAL
  • EWI-11435

Cite this

Reuderink, B. (2007). Fusion for Audio-Visual Laughter Detection. (CTIT Technical Report Series; No. TR-CTIT-07-84). Enschede: Centre for Telematics and Information Technology (CTIT).

Reuderink, B. / Fusion for Audio-Visual Laughter Detection.

Enschede : Centre for Telematics and Information Technology (CTIT), 2007. 41 p. (CTIT Technical Report Series; No. TR-CTIT-07-84).

Research output: ProfessionalReport

@book{0b4bfa25db954f32aa4f67cc4cdfe352,
title = "Fusion for Audio-Visual Laughter Detection",
abstract = "Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by combining (fusing) the results of a separate audio and video classifier on the decision level. The video-classifier uses features based on the principal components of 20 tracked facial points, for audio we use the commonly used PLP and RASTA-PLP features. Our results indicate that RASTA-PLP features outperform PLP features for laughter detection in audio. We compared hidden Markov models (HMMs), Gaussian mixture models (GMMs) and support vector machines (SVM) based classifiers, and found that RASTA-PLP combined with a GMM resulted in the best performance for the audio modality. The video features classified using a SVM resulted in the best single-modality performance. Fusion on the decision-level resulted in laughter detection with a significantly better performance than single-modality classification.",
keywords = "HMI-MI: MULTIMODAL INTERACTIONS, METIS-247035, IR-64474, HMI-CI: Computational Intelligence, HMI-MR: MULTIMEDIA RETRIEVAL, EWI-11435",
author = "B. Reuderink",
year = "2007",
month = "12",
series = "CTIT Technical Report Series",
publisher = "Centre for Telematics and Information Technology (CTIT)",
number = "TR-CTIT-07-84",
address = "Netherlands",

}

Reuderink, B 2007, Fusion for Audio-Visual Laughter Detection. CTIT Technical Report Series, no. TR-CTIT-07-84, Centre for Telematics and Information Technology (CTIT), Enschede.

Fusion for Audio-Visual Laughter Detection. / Reuderink, B.

Enschede : Centre for Telematics and Information Technology (CTIT), 2007. 41 p. (CTIT Technical Report Series; No. TR-CTIT-07-84).

Research output: ProfessionalReport

TY - BOOK

T1 - Fusion for Audio-Visual Laughter Detection

AU - Reuderink,B.

PY - 2007/12/3

Y1 - 2007/12/3

N2 - Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by combining (fusing) the results of a separate audio and video classifier on the decision level. The video-classifier uses features based on the principal components of 20 tracked facial points, for audio we use the commonly used PLP and RASTA-PLP features. Our results indicate that RASTA-PLP features outperform PLP features for laughter detection in audio. We compared hidden Markov models (HMMs), Gaussian mixture models (GMMs) and support vector machines (SVM) based classifiers, and found that RASTA-PLP combined with a GMM resulted in the best performance for the audio modality. The video features classified using a SVM resulted in the best single-modality performance. Fusion on the decision-level resulted in laughter detection with a significantly better performance than single-modality classification.

AB - Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by combining (fusing) the results of a separate audio and video classifier on the decision level. The video-classifier uses features based on the principal components of 20 tracked facial points, for audio we use the commonly used PLP and RASTA-PLP features. Our results indicate that RASTA-PLP features outperform PLP features for laughter detection in audio. We compared hidden Markov models (HMMs), Gaussian mixture models (GMMs) and support vector machines (SVM) based classifiers, and found that RASTA-PLP combined with a GMM resulted in the best performance for the audio modality. The video features classified using a SVM resulted in the best single-modality performance. Fusion on the decision-level resulted in laughter detection with a significantly better performance than single-modality classification.

KW - HMI-MI: MULTIMODAL INTERACTIONS

KW - METIS-247035

KW - IR-64474

KW - HMI-CI: Computational Intelligence

KW - HMI-MR: MULTIMEDIA RETRIEVAL

KW - EWI-11435

M3 - Report

T3 - CTIT Technical Report Series

BT - Fusion for Audio-Visual Laughter Detection

PB - Centre for Telematics and Information Technology (CTIT)

ER -

Reuderink B. Fusion for Audio-Visual Laughter Detection. Enschede: Centre for Telematics and Information Technology (CTIT), 2007. 41 p. (CTIT Technical Report Series; TR-CTIT-07-84).