Exploiting Speech Recognition Transcripts for Narrative Peak Detection in Short-Form Documentaries

Martha Larson, Bart Jochems, Ewine Smits, Roeland J.F. Ordelman

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    Narrative peaks are points at which the viewer perceives a spike in the level of dramatic tension within the narrative flow of a video. This paper reports on four approaches to narrative peak detection in television documentaries that were developed by a joint team consisting of members from Delft University of Technology and the University of Twente within the framework of the VideoCLEF 2009 Affect Detection task. The approaches make use of speech recognition transcripts and seek to exploit various sources of evidence in order to automatically identify narrative peaks. These sources include speaker style (word choice), stylistic devices (use of repetitions), strategies strengthening viewers’ feelings of involvement (direct audience address) and emotional speech. These approaches are compared to a challenging baseline that predicts the presence of narrative peaks at fixed points in the video, presumed to be dictated by natural narrative rhythm or production convention. Two approaches deliver top narrative peak detection results. One uses counts of personal pronouns to identify points in the video where viewers feel most directly involved. The other uses affective word ratings to calculate scores reflecting emotional language.
    Original languageUndefined
    Title of host publication10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009
    EditorsCarol Peters, Barbara Caputo, Julio Gonzalo
    Place of PublicationBerlin
    PublisherSpringer
    Pages385-392
    Number of pages8
    ISBN (Print)978-3-642-15750-9
    DOIs
    Publication statusPublished - 2010
    Event10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009 - Corfu, Greece
    Duration: 30 Sep 20092 Oct 2009
    Conference number: 10

    Publication series

    NameLecture Notes in Computer Science
    PublisherSpringer Verlag
    Volume6242
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Workshop

    Workshop10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009
    Abbreviated titleCLEF
    CountryGreece
    CityCorfu
    Period30/09/092/10/09

    Keywords

    • METIS-278696
    • IR-78253
    • EWI-19371
    • HMI-SLT: Speech and Language Technology
    • HMI-MR: MULTIMEDIA RETRIEVAL

    Cite this

    Larson, M., Jochems, B., Smits, E., & Ordelman, R. J. F. (2010). Exploiting Speech Recognition Transcripts for Narrative Peak Detection in Short-Form Documentaries. In C. Peters, B. Caputo, & J. Gonzalo (Eds.), 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009 (pp. 385-392). (Lecture Notes in Computer Science; Vol. 6242). Berlin: Springer. https://doi.org/10.1007/978-3-642-15751-6_50