Visual-only discrimination between native and non-native speech

Christos Georgakis, Stavros Petridis, Maja Pantic

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    4 Citations (Scopus)
    29 Downloads (Pure)

    Abstract

    Accent is an important biometric characteristic that is defined by the presence of specific traits in the speaking style of an individual. These are identified by patterns in the speech production system, such as those present in the vocal tract or in lip movements. Evidence from linguistics and speech processing research suggests that visual information enhances speech recognition. Intrigued by these findings, along with the assumption that visually perceivable accent-related patterns are transferred from the mother tongue to a foreign language, we investigate the task of discriminating native from non-native speech in English, employing visual features only. Training and evaluation is performed on segments of continuous visual speech, captured by mobile phones, where all speakers read the same text. We apply various appearance descriptors to represent the mouth region at each video frame. Vocabulary-based histograms, being the final representation of dynamic features for all utterances, are used for recognition. Binary classification experiments, discriminating native and non-native speakers, are conducted in a subject-independent manner. Our results show that this task can be addressed by means of an automated approach that uses visual features only
    Original languageUndefined
    Title of host publicationProceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014
    Place of PublicationUSA
    PublisherIEEE Computer Society
    Pages4828-4832
    Number of pages5
    ISBN (Print)978-1-4799-2892-7
    DOIs
    Publication statusPublished - May 2014
    EventIEEE International Conference on Acoustic, Speech and Signal Processing, ICASSP 2014 - Fortezza dal Basso, Florence, Italy
    Duration: 4 May 20149 May 2014
    http://www.icassp2014.org/home.html

    Publication series

    Name
    PublisherIEEE Computer Society

    Conference

    ConferenceIEEE International Conference on Acoustic, Speech and Signal Processing, ICASSP 2014
    Abbreviated titleICASSP
    CountryItaly
    CityFlorence
    Period4/05/149/05/14
    Internet address

    Keywords

    • HMI-HF: Human Factors
    • EWI-25820
    • Non-Native Speech Identification
    • METIS-309946
    • IR-95227
    • Accent Classification
    • Visual Speech Processing

    Cite this

    Georgakis, C., Petridis, S., & Pantic, M. (2014). Visual-only discrimination between native and non-native speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014 (pp. 4828-4832). USA: IEEE Computer Society. https://doi.org/10.1109/ICASSP.2014.6854519
    Georgakis, Christos ; Petridis, Stavros ; Pantic, Maja. / Visual-only discrimination between native and non-native speech. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014. USA : IEEE Computer Society, 2014. pp. 4828-4832
    @inproceedings{23e83bf83b7442c886ab77d48a9c4f2b,
    title = "Visual-only discrimination between native and non-native speech",
    abstract = "Accent is an important biometric characteristic that is defined by the presence of specific traits in the speaking style of an individual. These are identified by patterns in the speech production system, such as those present in the vocal tract or in lip movements. Evidence from linguistics and speech processing research suggests that visual information enhances speech recognition. Intrigued by these findings, along with the assumption that visually perceivable accent-related patterns are transferred from the mother tongue to a foreign language, we investigate the task of discriminating native from non-native speech in English, employing visual features only. Training and evaluation is performed on segments of continuous visual speech, captured by mobile phones, where all speakers read the same text. We apply various appearance descriptors to represent the mouth region at each video frame. Vocabulary-based histograms, being the final representation of dynamic features for all utterances, are used for recognition. Binary classification experiments, discriminating native and non-native speakers, are conducted in a subject-independent manner. Our results show that this task can be addressed by means of an automated approach that uses visual features only",
    keywords = "HMI-HF: Human Factors, EWI-25820, Non-Native Speech Identification, METIS-309946, IR-95227, Accent Classification, Visual Speech Processing",
    author = "Christos Georgakis and Stavros Petridis and Maja Pantic",
    note = "10.1109/ICASSP.2014.6854519",
    year = "2014",
    month = "5",
    doi = "10.1109/ICASSP.2014.6854519",
    language = "Undefined",
    isbn = "978-1-4799-2892-7",
    publisher = "IEEE Computer Society",
    pages = "4828--4832",
    booktitle = "Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014",
    address = "United States",

    }

    Georgakis, C, Petridis, S & Pantic, M 2014, Visual-only discrimination between native and non-native speech. in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014. IEEE Computer Society, USA, pp. 4828-4832, IEEE International Conference on Acoustic, Speech and Signal Processing, ICASSP 2014, Florence, Italy, 4/05/14. https://doi.org/10.1109/ICASSP.2014.6854519

    Visual-only discrimination between native and non-native speech. / Georgakis, Christos; Petridis, Stavros; Pantic, Maja.

    Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014. USA : IEEE Computer Society, 2014. p. 4828-4832.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    TY - GEN

    T1 - Visual-only discrimination between native and non-native speech

    AU - Georgakis, Christos

    AU - Petridis, Stavros

    AU - Pantic, Maja

    N1 - 10.1109/ICASSP.2014.6854519

    PY - 2014/5

    Y1 - 2014/5

    N2 - Accent is an important biometric characteristic that is defined by the presence of specific traits in the speaking style of an individual. These are identified by patterns in the speech production system, such as those present in the vocal tract or in lip movements. Evidence from linguistics and speech processing research suggests that visual information enhances speech recognition. Intrigued by these findings, along with the assumption that visually perceivable accent-related patterns are transferred from the mother tongue to a foreign language, we investigate the task of discriminating native from non-native speech in English, employing visual features only. Training and evaluation is performed on segments of continuous visual speech, captured by mobile phones, where all speakers read the same text. We apply various appearance descriptors to represent the mouth region at each video frame. Vocabulary-based histograms, being the final representation of dynamic features for all utterances, are used for recognition. Binary classification experiments, discriminating native and non-native speakers, are conducted in a subject-independent manner. Our results show that this task can be addressed by means of an automated approach that uses visual features only

    AB - Accent is an important biometric characteristic that is defined by the presence of specific traits in the speaking style of an individual. These are identified by patterns in the speech production system, such as those present in the vocal tract or in lip movements. Evidence from linguistics and speech processing research suggests that visual information enhances speech recognition. Intrigued by these findings, along with the assumption that visually perceivable accent-related patterns are transferred from the mother tongue to a foreign language, we investigate the task of discriminating native from non-native speech in English, employing visual features only. Training and evaluation is performed on segments of continuous visual speech, captured by mobile phones, where all speakers read the same text. We apply various appearance descriptors to represent the mouth region at each video frame. Vocabulary-based histograms, being the final representation of dynamic features for all utterances, are used for recognition. Binary classification experiments, discriminating native and non-native speakers, are conducted in a subject-independent manner. Our results show that this task can be addressed by means of an automated approach that uses visual features only

    KW - HMI-HF: Human Factors

    KW - EWI-25820

    KW - Non-Native Speech Identification

    KW - METIS-309946

    KW - IR-95227

    KW - Accent Classification

    KW - Visual Speech Processing

    U2 - 10.1109/ICASSP.2014.6854519

    DO - 10.1109/ICASSP.2014.6854519

    M3 - Conference contribution

    SN - 978-1-4799-2892-7

    SP - 4828

    EP - 4832

    BT - Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014

    PB - IEEE Computer Society

    CY - USA

    ER -

    Georgakis C, Petridis S, Pantic M. Visual-only discrimination between native and non-native speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014. USA: IEEE Computer Society. 2014. p. 4828-4832 https://doi.org/10.1109/ICASSP.2014.6854519