Automated Metadata Extraction for Semantic Access to Spoken Word Archives

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademic

    23 Downloads (Pure)

    Abstract

    Archival practice is shifting from the analogue to the digital world. A specific subset of heritage collections that impose interesting challenges for the field of language and speech technology are spoken word archives. Given the enormous backlog at audiovisual archives of unannotated materials and the generally global level of item description, collection disclosure and item access are both at risk, and (semi-)automated methods for analysis and annotation may help to increase the use and reuse of these rich content collections. In several HMI projects the interplay has been investigated between evolving user scenarios and user requirements for spoken audio collections on the one hand, and the potential of automatic annotation and search technology for the improved accessibility and search paradigms on the other hand. In this paper we will present an overview of the state-of-the-art in metadata generation for audio content and explain the crucial importance of involving user groups in the design of research agendas and road maps for novel applications in this domain.
    Original languageUndefined
    Title of host publicationProceedings 12th International Symposium on Social Communication
    EditorsL. Ruiz Miyares, M.R. Alvarez Silva
    Place of PublicationSantiago de Cuba, Cuba
    PublisherCentre for Applied Linguistics
    Pages896-905
    Number of pages10
    ISBN (Print)978-959-7174-19-6
    Publication statusPublished - 17 Jan 2011
    Event12th International Symposium on Social Communication 2011 - Santiago de Cuba, Cuba
    Duration: 17 Jan 201121 Jan 2011
    Conference number: 12

    Publication series

    Name
    PublisherCentro de Lingüística Aplicada

    Conference

    Conference12th International Symposium on Social Communication 2011
    CountryCuba
    CitySantiago de Cuba
    Period17/01/1121/01/11

    Keywords

    • METIS-277425
    • IR-75826
    • EWI-18431
    • HMI-MR: MULTIMEDIA RETRIEVAL
    • HMI-SLT: Speech and Language Technology

    Cite this

    de Jong, F. M. G., Heeren, W. F. L., van Hessen, A. J., Ordelman, R. J. F., & Nijholt, A. (2011). Automated Metadata Extraction for Semantic Access to Spoken Word Archives. In L. Ruiz Miyares, & M. R. Alvarez Silva (Eds.), Proceedings 12th International Symposium on Social Communication (pp. 896-905). Santiago de Cuba, Cuba: Centre for Applied Linguistics.
    de Jong, Franciska M.G. ; Heeren, W.F.L. ; van Hessen, Adrianus J. ; Ordelman, Roeland J.F. ; Nijholt, Antinus. / Automated Metadata Extraction for Semantic Access to Spoken Word Archives. Proceedings 12th International Symposium on Social Communication. editor / L. Ruiz Miyares ; M.R. Alvarez Silva. Santiago de Cuba, Cuba : Centre for Applied Linguistics, 2011. pp. 896-905
    @inproceedings{b797656a3c6842afb5165e83d6e03d8d,
    title = "Automated Metadata Extraction for Semantic Access to Spoken Word Archives",
    abstract = "Archival practice is shifting from the analogue to the digital world. A specific subset of heritage collections that impose interesting challenges for the field of language and speech technology are spoken word archives. Given the enormous backlog at audiovisual archives of unannotated materials and the generally global level of item description, collection disclosure and item access are both at risk, and (semi-)automated methods for analysis and annotation may help to increase the use and reuse of these rich content collections. In several HMI projects the interplay has been investigated between evolving user scenarios and user requirements for spoken audio collections on the one hand, and the potential of automatic annotation and search technology for the improved accessibility and search paradigms on the other hand. In this paper we will present an overview of the state-of-the-art in metadata generation for audio content and explain the crucial importance of involving user groups in the design of research agendas and road maps for novel applications in this domain.",
    keywords = "METIS-277425, IR-75826, EWI-18431, HMI-MR: MULTIMEDIA RETRIEVAL, HMI-SLT: Speech and Language Technology",
    author = "{de Jong}, {Franciska M.G.} and W.F.L. Heeren and {van Hessen}, {Adrianus J.} and Ordelman, {Roeland J.F.} and Antinus Nijholt",
    note = "cultural heritage, spoken audio collection, automatic annotation, speech technology, information retrieval",
    year = "2011",
    month = "1",
    day = "17",
    language = "Undefined",
    isbn = "978-959-7174-19-6",
    publisher = "Centre for Applied Linguistics",
    pages = "896--905",
    editor = "{Ruiz Miyares}, L. and {Alvarez Silva}, M.R.",
    booktitle = "Proceedings 12th International Symposium on Social Communication",

    }

    de Jong, FMG, Heeren, WFL, van Hessen, AJ, Ordelman, RJF & Nijholt, A 2011, Automated Metadata Extraction for Semantic Access to Spoken Word Archives. in L Ruiz Miyares & MR Alvarez Silva (eds), Proceedings 12th International Symposium on Social Communication. Centre for Applied Linguistics, Santiago de Cuba, Cuba, pp. 896-905, 12th International Symposium on Social Communication 2011, Santiago de Cuba, Cuba, 17/01/11.

    Automated Metadata Extraction for Semantic Access to Spoken Word Archives. / de Jong, Franciska M.G.; Heeren, W.F.L.; van Hessen, Adrianus J.; Ordelman, Roeland J.F.; Nijholt, Antinus.

    Proceedings 12th International Symposium on Social Communication. ed. / L. Ruiz Miyares; M.R. Alvarez Silva. Santiago de Cuba, Cuba : Centre for Applied Linguistics, 2011. p. 896-905.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademic

    TY - GEN

    T1 - Automated Metadata Extraction for Semantic Access to Spoken Word Archives

    AU - de Jong, Franciska M.G.

    AU - Heeren, W.F.L.

    AU - van Hessen, Adrianus J.

    AU - Ordelman, Roeland J.F.

    AU - Nijholt, Antinus

    N1 - cultural heritage, spoken audio collection, automatic annotation, speech technology, information retrieval

    PY - 2011/1/17

    Y1 - 2011/1/17

    N2 - Archival practice is shifting from the analogue to the digital world. A specific subset of heritage collections that impose interesting challenges for the field of language and speech technology are spoken word archives. Given the enormous backlog at audiovisual archives of unannotated materials and the generally global level of item description, collection disclosure and item access are both at risk, and (semi-)automated methods for analysis and annotation may help to increase the use and reuse of these rich content collections. In several HMI projects the interplay has been investigated between evolving user scenarios and user requirements for spoken audio collections on the one hand, and the potential of automatic annotation and search technology for the improved accessibility and search paradigms on the other hand. In this paper we will present an overview of the state-of-the-art in metadata generation for audio content and explain the crucial importance of involving user groups in the design of research agendas and road maps for novel applications in this domain.

    AB - Archival practice is shifting from the analogue to the digital world. A specific subset of heritage collections that impose interesting challenges for the field of language and speech technology are spoken word archives. Given the enormous backlog at audiovisual archives of unannotated materials and the generally global level of item description, collection disclosure and item access are both at risk, and (semi-)automated methods for analysis and annotation may help to increase the use and reuse of these rich content collections. In several HMI projects the interplay has been investigated between evolving user scenarios and user requirements for spoken audio collections on the one hand, and the potential of automatic annotation and search technology for the improved accessibility and search paradigms on the other hand. In this paper we will present an overview of the state-of-the-art in metadata generation for audio content and explain the crucial importance of involving user groups in the design of research agendas and road maps for novel applications in this domain.

    KW - METIS-277425

    KW - IR-75826

    KW - EWI-18431

    KW - HMI-MR: MULTIMEDIA RETRIEVAL

    KW - HMI-SLT: Speech and Language Technology

    M3 - Conference contribution

    SN - 978-959-7174-19-6

    SP - 896

    EP - 905

    BT - Proceedings 12th International Symposium on Social Communication

    A2 - Ruiz Miyares, L.

    A2 - Alvarez Silva, M.R.

    PB - Centre for Applied Linguistics

    CY - Santiago de Cuba, Cuba

    ER -

    de Jong FMG, Heeren WFL, van Hessen AJ, Ordelman RJF, Nijholt A. Automated Metadata Extraction for Semantic Access to Spoken Word Archives. In Ruiz Miyares L, Alvarez Silva MR, editors, Proceedings 12th International Symposium on Social Communication. Santiago de Cuba, Cuba: Centre for Applied Linguistics. 2011. p. 896-905