Automated Metadata Extraction for Semantic Access to Spoken Word Archives

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademic

22 Downloads (Pure)

Abstract

Archival practice is shifting from the analogue to the digital world. A specific subset of heritage collections that impose interesting challenges for the field of language and speech technology are spoken word archives. Given the enormous backlog at audiovisual archives of unannotated materials and the generally global level of item description, collection disclosure and item access are both at risk, and (semi-)automated methods for analysis and annotation may help to increase the use and reuse of these rich content collections. In several HMI projects the interplay has been investigated between evolving user scenarios and user requirements for spoken audio collections on the one hand, and the potential of automatic annotation and search technology for the improved accessibility and search paradigms on the other hand. In this paper we will present an overview of the state-of-the-art in metadata generation for audio content and explain the crucial importance of involving user groups in the design of research agendas and road maps for novel applications in this domain.
Original languageUndefined
Title of host publicationProceedings 12th International Symposium on Social Communication
EditorsL. Ruiz Miyares, M.R. Alvarez Silva
Place of PublicationSantiago de Cuba, Cuba
PublisherCentre for Applied Linguistics
Pages896-905
Number of pages10
ISBN (Print)978-959-7174-19-6
Publication statusPublished - 17 Jan 2011
Event12th International Symposium on Social Communication 2011 - Santiago de Cuba, Cuba
Duration: 17 Jan 201121 Jan 2011
Conference number: 12

Publication series

Name
PublisherCentro de Lingüística Aplicada

Conference

Conference12th International Symposium on Social Communication 2011
CountryCuba
CitySantiago de Cuba
Period17/01/1121/01/11

Keywords

  • METIS-277425
  • IR-75826
  • EWI-18431
  • HMI-MR: MULTIMEDIA RETRIEVAL
  • HMI-SLT: Speech and Language Technology

Cite this

de Jong, F. M. G., Heeren, W. F. L., van Hessen, A. J., Ordelman, R. J. F., & Nijholt, A. (2011). Automated Metadata Extraction for Semantic Access to Spoken Word Archives. In L. Ruiz Miyares, & M. R. Alvarez Silva (Eds.), Proceedings 12th International Symposium on Social Communication (pp. 896-905). Santiago de Cuba, Cuba: Centre for Applied Linguistics.
de Jong, Franciska M.G. ; Heeren, W.F.L. ; van Hessen, Adrianus J. ; Ordelman, Roeland J.F. ; Nijholt, Antinus. / Automated Metadata Extraction for Semantic Access to Spoken Word Archives. Proceedings 12th International Symposium on Social Communication. editor / L. Ruiz Miyares ; M.R. Alvarez Silva. Santiago de Cuba, Cuba : Centre for Applied Linguistics, 2011. pp. 896-905
@inproceedings{b797656a3c6842afb5165e83d6e03d8d,
title = "Automated Metadata Extraction for Semantic Access to Spoken Word Archives",
abstract = "Archival practice is shifting from the analogue to the digital world. A specific subset of heritage collections that impose interesting challenges for the field of language and speech technology are spoken word archives. Given the enormous backlog at audiovisual archives of unannotated materials and the generally global level of item description, collection disclosure and item access are both at risk, and (semi-)automated methods for analysis and annotation may help to increase the use and reuse of these rich content collections. In several HMI projects the interplay has been investigated between evolving user scenarios and user requirements for spoken audio collections on the one hand, and the potential of automatic annotation and search technology for the improved accessibility and search paradigms on the other hand. In this paper we will present an overview of the state-of-the-art in metadata generation for audio content and explain the crucial importance of involving user groups in the design of research agendas and road maps for novel applications in this domain.",
keywords = "METIS-277425, IR-75826, EWI-18431, HMI-MR: MULTIMEDIA RETRIEVAL, HMI-SLT: Speech and Language Technology",
author = "{de Jong}, {Franciska M.G.} and W.F.L. Heeren and {van Hessen}, {Adrianus J.} and Ordelman, {Roeland J.F.} and Antinus Nijholt",
note = "cultural heritage, spoken audio collection, automatic annotation, speech technology, information retrieval",
year = "2011",
month = "1",
day = "17",
language = "Undefined",
isbn = "978-959-7174-19-6",
publisher = "Centre for Applied Linguistics",
pages = "896--905",
editor = "{Ruiz Miyares}, L. and {Alvarez Silva}, M.R.",
booktitle = "Proceedings 12th International Symposium on Social Communication",

}

de Jong, FMG, Heeren, WFL, van Hessen, AJ, Ordelman, RJF & Nijholt, A 2011, Automated Metadata Extraction for Semantic Access to Spoken Word Archives. in L Ruiz Miyares & MR Alvarez Silva (eds), Proceedings 12th International Symposium on Social Communication. Centre for Applied Linguistics, Santiago de Cuba, Cuba, pp. 896-905, 12th International Symposium on Social Communication 2011, Santiago de Cuba, Cuba, 17/01/11.

Automated Metadata Extraction for Semantic Access to Spoken Word Archives. / de Jong, Franciska M.G.; Heeren, W.F.L.; van Hessen, Adrianus J.; Ordelman, Roeland J.F.; Nijholt, Antinus.

Proceedings 12th International Symposium on Social Communication. ed. / L. Ruiz Miyares; M.R. Alvarez Silva. Santiago de Cuba, Cuba : Centre for Applied Linguistics, 2011. p. 896-905.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademic

TY - GEN

T1 - Automated Metadata Extraction for Semantic Access to Spoken Word Archives

AU - de Jong, Franciska M.G.

AU - Heeren, W.F.L.

AU - van Hessen, Adrianus J.

AU - Ordelman, Roeland J.F.

AU - Nijholt, Antinus

N1 - cultural heritage, spoken audio collection, automatic annotation, speech technology, information retrieval

PY - 2011/1/17

Y1 - 2011/1/17

N2 - Archival practice is shifting from the analogue to the digital world. A specific subset of heritage collections that impose interesting challenges for the field of language and speech technology are spoken word archives. Given the enormous backlog at audiovisual archives of unannotated materials and the generally global level of item description, collection disclosure and item access are both at risk, and (semi-)automated methods for analysis and annotation may help to increase the use and reuse of these rich content collections. In several HMI projects the interplay has been investigated between evolving user scenarios and user requirements for spoken audio collections on the one hand, and the potential of automatic annotation and search technology for the improved accessibility and search paradigms on the other hand. In this paper we will present an overview of the state-of-the-art in metadata generation for audio content and explain the crucial importance of involving user groups in the design of research agendas and road maps for novel applications in this domain.

AB - Archival practice is shifting from the analogue to the digital world. A specific subset of heritage collections that impose interesting challenges for the field of language and speech technology are spoken word archives. Given the enormous backlog at audiovisual archives of unannotated materials and the generally global level of item description, collection disclosure and item access are both at risk, and (semi-)automated methods for analysis and annotation may help to increase the use and reuse of these rich content collections. In several HMI projects the interplay has been investigated between evolving user scenarios and user requirements for spoken audio collections on the one hand, and the potential of automatic annotation and search technology for the improved accessibility and search paradigms on the other hand. In this paper we will present an overview of the state-of-the-art in metadata generation for audio content and explain the crucial importance of involving user groups in the design of research agendas and road maps for novel applications in this domain.

KW - METIS-277425

KW - IR-75826

KW - EWI-18431

KW - HMI-MR: MULTIMEDIA RETRIEVAL

KW - HMI-SLT: Speech and Language Technology

M3 - Conference contribution

SN - 978-959-7174-19-6

SP - 896

EP - 905

BT - Proceedings 12th International Symposium on Social Communication

A2 - Ruiz Miyares, L.

A2 - Alvarez Silva, M.R.

PB - Centre for Applied Linguistics

CY - Santiago de Cuba, Cuba

ER -

de Jong FMG, Heeren WFL, van Hessen AJ, Ordelman RJF, Nijholt A. Automated Metadata Extraction for Semantic Access to Spoken Word Archives. In Ruiz Miyares L, Alvarez Silva MR, editors, Proceedings 12th International Symposium on Social Communication. Santiago de Cuba, Cuba: Centre for Applied Linguistics. 2011. p. 896-905