Abstract
Automatic lipreading is automatic speech recognition that uses only visual information. The relevant data in a video signal is isolated and features are extracted from it. From a sequence of feature vectors, where every vector represents one video image, a sequence of higher level semantic elements is formed. These semantic elements are "visemes" the visual equivalent of "phonemes" The developed prototype uses a Time Delayed Neural Network to classify the visemes.
Original language | Undefined |
---|---|
Title of host publication | International Workshop Text, Speech and Dialogue (TSD'99) |
Editors | Vaclav Matousek, Pavel Mautner, Jana Ocelikovi, Petr Sojka |
Place of Publication | Berlin |
Publisher | Springer |
Pages | 349-352 |
Number of pages | 4 |
ISBN (Print) | 3-540-66494-7 |
DOIs | |
Publication status | Published - 1 Sept 1999 |
Event | 2nd Text, Speech & Dialogue Workshop, TSD 1999 - Plzen, Czech Republic Duration: 13 Sept 1999 → 17 Sept 1999 Conference number: 2 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer Verlag |
Volume | 1692 |
ISSN (Print) | 0302-9743 |
Workshop
Workshop | 2nd Text, Speech & Dialogue Workshop, TSD 1999 |
---|---|
Abbreviated title | TSD 1999 |
Country/Territory | Czech Republic |
City | Plzen |
Period | 13/09/99 → 17/09/99 |
Keywords
- EWI-9759
- IR-64013
- METIS-119592
- HMI-MI: MULTIMODAL INTERACTIONS