Visual-only discrimination between native and non-native speech

Christos Georgakis, Stavros Petridis, Maja Pantic

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    6 Citations (Scopus)
    82 Downloads (Pure)


    Accent is an important biometric characteristic that is defined by the presence of specific traits in the speaking style of an individual. These are identified by patterns in the speech production system, such as those present in the vocal tract or in lip movements. Evidence from linguistics and speech processing research suggests that visual information enhances speech recognition. Intrigued by these findings, along with the assumption that visually perceivable accent-related patterns are transferred from the mother tongue to a foreign language, we investigate the task of discriminating native from non-native speech in English, employing visual features only. Training and evaluation is performed on segments of continuous visual speech, captured by mobile phones, where all speakers read the same text. We apply various appearance descriptors to represent the mouth region at each video frame. Vocabulary-based histograms, being the final representation of dynamic features for all utterances, are used for recognition. Binary classification experiments, discriminating native and non-native speakers, are conducted in a subject-independent manner. Our results show that this task can be addressed by means of an automated approach that uses visual features only
    Original languageUndefined
    Title of host publicationProceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014
    Place of PublicationUSA
    Number of pages5
    ISBN (Print)978-1-4799-2892-7
    Publication statusPublished - May 2014
    EventIEEE International Conference on Acoustic, Speech and Signal Processing, ICASSP 2014 - Fortezza dal Basso, Florence, Italy
    Duration: 4 May 20149 May 2014

    Publication series

    PublisherIEEE Computer Society


    ConferenceIEEE International Conference on Acoustic, Speech and Signal Processing, ICASSP 2014
    Abbreviated titleICASSP
    Internet address


    • HMI-HF: Human Factors
    • EWI-25820
    • Non-Native Speech Identification
    • METIS-309946
    • IR-95227
    • Accent Classification
    • Visual Speech Processing

    Cite this