Abstract
In this paper we discuss the speech activity detection system that we used for detecting speech regions in the Dutch TRECVID video collection. The system is designed to filter non-speech like music or sound effects out of the signal without the use of predefined non-speech models. Because the system trains its models on-line, it is robust for handling out-of-domain data. The speech activity error rate on an out-of-domain test set, recordings of English conference meetings, was 4.4%. The overall error rate on twelve randomly selected five minute TRECVID fragments was 11.5%.
Original language | English |
---|---|
Title of host publication | Proceedings of Interspeech 2007 |
Place of Publication | Antwerp |
Publisher | International Speech Communication Association (ISCA) |
Pages | FrC.P3-4 |
Number of pages | 4 |
ISBN (Print) | 1990-9772 |
Publication status | Published - 27 Aug 2007 |
Event | 8th Annual Conference of the International Speech Communication Association, INTERSPEECH 2007 - Antwerp, Belgium Duration: 27 Aug 2007 → 31 Aug 2007 Conference number: 8 https://www.interspeech2007.org/ |
Publication series
Name | |
---|---|
Publisher | International Speech Communication Association |
Number | LNCS4549 |
ISSN (Print) | 1990-9772 |
Conference
Conference | 8th Annual Conference of the International Speech Communication Association, INTERSPEECH 2007 |
---|---|
Abbreviated title | INTERSPEECH |
Country | Belgium |
City | Antwerp |
Period | 27/08/07 → 31/08/07 |
Internet address |
Keywords
- IR-64329
- Speech activity detection
- EC Grant Agreement nr.: FP6/027685
- METIS-241881
- EC Grant Agreement nr.: FP6/027413
- EWI-11003
- EC Grant Agreement nr.: FP6/506811