The OLIVE project aims the development of a multilingual indexing tool for broadcast material based on speech recognition, which automatically produces indexes from the sound track of a program (television or radio). Such a tool allows multimedia archives to be searched by keywords and corresponding fragments to be retrieved. This paper gives a report on the alignment module, which is one of the components of the retrieval environment to be developed in OLIVE. It assigns time-codes to non-time-coded textual documents that are describing the content of the video. Timecoding of these textual documents will increase the overall level of disclosure. Basis for the assignment is some similarity measure between the non-timed-coded texts and subtitle files or the transcripts from speech recognition. The core of the alignment module is a generic algorithm for generating the links that are the basis for the insertion of time-codes into non-time-coded texts. An additional step combines similarity values with locality information. The data used during testing are closed-caption files of Dutch news-broadcasts and the autocue files of these broadcasts. Adaptations to the initial algorithm for which improved perfomance figures were found involved a threshold related to the sentencelength, and the applications of a high- and low-frequency term stoplist compiled from the time-coded text under consideration.
|Title of host publication||RIAO 2000|
|Subtitle of host publication||Conference proceedings Content-Based Multimedia Information Access|
|Place of Publication||Paris, France|
|Publisher||Centre de Hautes Etudes Internationales d'Informatique Documentaire (CID)|
|Publication status||Published - 14 Apr 2000|
|Event||6th International Conference on Computer-Assisted Information Retrieval, RIAO 2000: (Recherche d'Information et ses Applications) - Paris, France|
Duration: 12 Apr 2000 → 14 Apr 2000
Conference number: 6
|Conference||6th International Conference on Computer-Assisted Information Retrieval, RIAO 2000|
|Period||12/04/00 → 14/04/00|
van der Sluis, I., & de Jong, F. (2000). Enriching Textual Documents with Timecodes from Video Fragments. In J-J. Mariani (Ed.), RIAO 2000: Conference proceedings Content-Based Multimedia Information Access (pp. 431-440). Paris, France: Centre de Hautes Etudes Internationales d'Informatique Documentaire (CID).