• 10 Citations

Abstract

Automatic lipreading is automatic speech recognition that uses only visual information. The relevant data in a video signal is isolated and features are extracted from it. From a sequence of feature vectors, where every vector represents one video image, a sequence of higher level semantic elements is formed. These semantic elements are "visemes" the visual equivalent of "phonemes" The developed prototype uses a Time Delayed Neural Network to classify the visemes.
Original languageUndefined
Title of host publicationInternational Workshop Text, Speech and Dialogue (TSD'99)
EditorsVaclav Matousek, Pavel Mautner, Jana Ocelikovi, Petr Sojka
Place of PublicationBerlin
PublisherSpringer
Pages349-352
Number of pages4
ISBN (Print)3-540-66494-7
DOIs
StatePublished - 1 Sep 1999

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
Volume1692
ISSN (Print)0302-9743

Fingerprint

Semantics
Speech recognition
Neural networks

Keywords

  • EWI-9759
  • IR-64013
  • METIS-119592
  • HMI-MI: MULTIMODAL INTERACTIONS

Cite this

Visser, M., Poel, M., & Nijholt, A. (1999). Classifying visemes for automatic lipreading. In V. Matousek, P. Mautner, J. Ocelikovi, & P. Sojka (Eds.), International Workshop Text, Speech and Dialogue (TSD'99) (pp. 349-352). (Lecture Notes in Computer Science; Vol. 1692). Berlin: Springer. DOI: 10.1007/3-540-48239-3_65

Visser, Michiel; Poel, Mannes; Nijholt, Antinus / Classifying visemes for automatic lipreading.

International Workshop Text, Speech and Dialogue (TSD'99). ed. / Vaclav Matousek; Pavel Mautner; Jana Ocelikovi; Petr Sojka. Berlin : Springer, 1999. p. 349-352 (Lecture Notes in Computer Science; Vol. 1692).

Research output: Scientific - peer-reviewConference contribution

@inbook{ffebb10e45cc47529eebf3059c5dde69,
title = "Classifying visemes for automatic lipreading",
abstract = "Automatic lipreading is automatic speech recognition that uses only visual information. The relevant data in a video signal is isolated and features are extracted from it. From a sequence of feature vectors, where every vector represents one video image, a sequence of higher level semantic elements is formed. These semantic elements are {"}visemes{"} the visual equivalent of {"}phonemes{"} The developed prototype uses a Time Delayed Neural Network to classify the visemes.",
keywords = "EWI-9759, IR-64013, METIS-119592, HMI-MI: MULTIMODAL INTERACTIONS",
author = "Michiel Visser and Mannes Poel and Antinus Nijholt",
year = "1999",
month = "9",
doi = "10.1007/3-540-48239-3_65",
isbn = "3-540-66494-7",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "349--352",
editor = "Vaclav Matousek and Pavel Mautner and Jana Ocelikovi and Petr Sojka",
booktitle = "International Workshop Text, Speech and Dialogue (TSD'99)",

}

Visser, M, Poel, M & Nijholt, A 1999, Classifying visemes for automatic lipreading. in V Matousek, P Mautner, J Ocelikovi & P Sojka (eds), International Workshop Text, Speech and Dialogue (TSD'99). Lecture Notes in Computer Science, vol. 1692, Springer, Berlin, pp. 349-352. DOI: 10.1007/3-540-48239-3_65

Classifying visemes for automatic lipreading. / Visser, Michiel; Poel, Mannes; Nijholt, Antinus.

International Workshop Text, Speech and Dialogue (TSD'99). ed. / Vaclav Matousek; Pavel Mautner; Jana Ocelikovi; Petr Sojka. Berlin : Springer, 1999. p. 349-352 (Lecture Notes in Computer Science; Vol. 1692).

Research output: Scientific - peer-reviewConference contribution

TY - CHAP

T1 - Classifying visemes for automatic lipreading

AU - Visser,Michiel

AU - Poel,Mannes

AU - Nijholt,Antinus

PY - 1999/9/1

Y1 - 1999/9/1

N2 - Automatic lipreading is automatic speech recognition that uses only visual information. The relevant data in a video signal is isolated and features are extracted from it. From a sequence of feature vectors, where every vector represents one video image, a sequence of higher level semantic elements is formed. These semantic elements are "visemes" the visual equivalent of "phonemes" The developed prototype uses a Time Delayed Neural Network to classify the visemes.

AB - Automatic lipreading is automatic speech recognition that uses only visual information. The relevant data in a video signal is isolated and features are extracted from it. From a sequence of feature vectors, where every vector represents one video image, a sequence of higher level semantic elements is formed. These semantic elements are "visemes" the visual equivalent of "phonemes" The developed prototype uses a Time Delayed Neural Network to classify the visemes.

KW - EWI-9759

KW - IR-64013

KW - METIS-119592

KW - HMI-MI: MULTIMODAL INTERACTIONS

U2 - 10.1007/3-540-48239-3_65

DO - 10.1007/3-540-48239-3_65

M3 - Conference contribution

SN - 3-540-66494-7

T3 - Lecture Notes in Computer Science

SP - 349

EP - 352

BT - International Workshop Text, Speech and Dialogue (TSD'99)

PB - Springer

ER -

Visser M, Poel M, Nijholt A. Classifying visemes for automatic lipreading. In Matousek V, Mautner P, Ocelikovi J, Sojka P, editors, International Workshop Text, Speech and Dialogue (TSD'99). Berlin: Springer. 1999. p. 349-352. (Lecture Notes in Computer Science). Available from, DOI: 10.1007/3-540-48239-3_65