Classifying visemes for automatic lipreading

Michiel Visser, Mannes Poel, Antinus Nijholt

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    10 Citations (Scopus)

    Abstract

    Automatic lipreading is automatic speech recognition that uses only visual information. The relevant data in a video signal is isolated and features are extracted from it. From a sequence of feature vectors, where every vector represents one video image, a sequence of higher level semantic elements is formed. These semantic elements are "visemes" the visual equivalent of "phonemes" The developed prototype uses a Time Delayed Neural Network to classify the visemes.
    Original languageUndefined
    Title of host publicationInternational Workshop Text, Speech and Dialogue (TSD'99)
    EditorsVaclav Matousek, Pavel Mautner, Jana Ocelikovi, Petr Sojka
    Place of PublicationBerlin
    PublisherSpringer
    Pages349-352
    Number of pages4
    ISBN (Print)3-540-66494-7
    DOIs
    Publication statusPublished - 1 Sep 1999

    Publication series

    NameLecture Notes in Computer Science
    PublisherSpringer Verlag
    Volume1692
    ISSN (Print)0302-9743

    Keywords

    • EWI-9759
    • IR-64013
    • METIS-119592
    • HMI-MI: MULTIMODAL INTERACTIONS

    Cite this

    Visser, M., Poel, M., & Nijholt, A. (1999). Classifying visemes for automatic lipreading. In V. Matousek, P. Mautner, J. Ocelikovi, & P. Sojka (Eds.), International Workshop Text, Speech and Dialogue (TSD'99) (pp. 349-352). (Lecture Notes in Computer Science; Vol. 1692). Berlin: Springer. https://doi.org/10.1007/3-540-48239-3_65