Abstract
Accent is an important biometric characteristic that is defined by the presence of specific traits in the speaking style of an individual. These are identified by patterns in the speech production system, such as those present in the vocal tract or in lip movements. Evidence from linguistics and speech processing research suggests that visual information enhances speech recognition. Intrigued by these findings, along with the assumption that visually perceivable accent-related patterns are transferred from the mother tongue to a foreign language, we investigate the task of discriminating native from non-native speech in English, employing visual features only. Training and evaluation is performed on segments of continuous visual speech, captured by mobile phones, where all speakers read the same text. We apply various appearance descriptors to represent the mouth region at each video frame. Vocabulary-based histograms, being the final representation of dynamic features for all utterances, are used for recognition. Binary classification experiments, discriminating native and non-native speakers, are conducted in a subject-independent manner. Our results show that this task can be addressed by means of an automated approach that uses visual features only
Original language | Undefined |
---|---|
Title of host publication | Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014 |
Place of Publication | USA |
Publisher | IEEE |
Pages | 4828-4832 |
Number of pages | 5 |
ISBN (Print) | 978-1-4799-2892-7 |
DOIs | |
Publication status | Published - May 2014 |
Event | IEEE International Conference on Acoustic, Speech and Signal Processing, ICASSP 2014 - Fortezza dal Basso, Florence, Italy Duration: 4 May 2014 → 9 May 2014 http://www.icassp2014.org/home.html |
Publication series
Name | |
---|---|
Publisher | IEEE Computer Society |
Conference
Conference | IEEE International Conference on Acoustic, Speech and Signal Processing, ICASSP 2014 |
---|---|
Abbreviated title | ICASSP |
Country/Territory | Italy |
City | Florence |
Period | 4/05/14 → 9/05/14 |
Internet address |
Keywords
- HMI-HF: Human Factors
- EWI-25820
- Non-Native Speech Identification
- METIS-309946
- IR-95227
- Accent Classification
- Visual Speech Processing