Enriching Textual Documents with Timecodes from Video Fragments

Ielka van der Sluis, Franciska de Jong

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    3 Downloads (Pure)

    Abstract

    The OLIVE project aims the development of a multilingual indexing tool for broadcast material based on speech recognition, which automatically produces indexes from the sound track of a program (television or radio). Such a tool allows multimedia archives to be searched by keywords and corresponding fragments to be retrieved. This paper gives a report on the alignment module, which is one of the components of the retrieval environment to be developed in OLIVE. It assigns time-codes to non-time-coded textual documents that are describing the content of the video. Timecoding of these textual documents will increase the overall level of disclosure. Basis for the assignment is some similarity measure between the non-timed-coded texts and subtitle files or the transcripts from speech recognition. The core of the alignment module is a generic algorithm for generating the links that are the basis for the insertion of time-codes into non-time-coded texts. An additional step combines similarity values with locality information. The data used during testing are closed-caption files of Dutch news-broadcasts and the autocue files of these broadcasts. Adaptations to the initial algorithm for which improved perfomance figures were found involved a threshold related to the sentencelength, and the applications of a high- and low-frequency term stoplist compiled from the time-coded text under consideration.
    Original languageEnglish
    Title of host publicationRIAO 2000
    Subtitle of host publicationConference proceedings Content-Based Multimedia Information Access
    EditorsJoseph-Jean Mariani
    Place of PublicationParis, France
    PublisherCentre de Hautes Etudes Internationales d'Informatique Documentaire (CID)
    Pages431-440
    ISBN (Print)2-905450-07-X
    Publication statusPublished - 14 Apr 2000
    Event6th International Conference on Computer-Assisted Information Retrieval, RIAO 2000: (Recherche d'Information et ses Applications) - Paris, France
    Duration: 12 Apr 200014 Apr 2000
    Conference number: 6

    Conference

    Conference6th International Conference on Computer-Assisted Information Retrieval, RIAO 2000
    Abbreviated titleRIAO
    CountryFrance
    CityParis
    Period12/04/0014/04/00

    Fingerprint Dive into the research topics of 'Enriching Textual Documents with Timecodes from Video Fragments'. Together they form a unique fingerprint.

  • Cite this

    van der Sluis, I., & de Jong, F. (2000). Enriching Textual Documents with Timecodes from Video Fragments. In J-J. Mariani (Ed.), RIAO 2000: Conference proceedings Content-Based Multimedia Information Access (pp. 431-440). Paris, France: Centre de Hautes Etudes Internationales d'Informatique Documentaire (CID).