Abstract
The automatic detection and classification of social signals is an important task, given the fundamental role nonverbal behavioral cues play in human communication. We present the first cross-lingual study on the detection of laughter and fillers in conversational and spontaneous speech collected 'in the wild' over IP (internet protocol). Further, this is the first comparison of LSTM and GRU networks to shed light on their performance differences. We report frame-based results in terms of the unweighted-average area-under-the-curve (UAAUC) measure and will shortly discuss its suitability for this task. In the mono-lingual setup our best deep BLSTM system achieves 87.0% and 86.3% UAAUC for English and German, respectively. Interestingly, the cross-lingual results are only slightly lower, yielding 83.7% for a system trained on English, but tested on German, and 85.0% in the opposite case. We show that LSTM and GRU architectures are valid alternatives for e. g., on-line and compute-sensitive applications, since their application incurs a relative UAAUC decrease of only approximately 5% with respect to our best systems. Finally, we apply additional smoothing to correct for erroneous spikes and drops in the posterior trajectories to obtain an additional gain in all setups.
Original language | English |
---|---|
Title of host publication | Proceedings Interspeech 2017 |
Pages | 2371-2375 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 2017 |
Event | 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017: Situated interaction - Stockholm, Sweden Duration: 20 Aug 2017 → 24 Aug 2017 Conference number: 18 http://www.interspeech2017.org/ |
Conference
Conference | 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 |
---|---|
Abbreviated title | INTERSPEECH |
Country/Territory | Sweden |
City | Stockholm |
Period | 20/08/17 → 24/08/17 |
Internet address |
Keywords
- Computational paralinguistics
- Cross-lingual
- Deep neural networks
- GRU
- LSTM
- Social signal classification