An implicit spatiotemporal shape model for human activity localization and recognition

A. Oikonomopoulos, I. Patras, Maja Pantic

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    22 Citations (Scopus)
    161 Downloads (Pure)


    In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, dasiavisual wordspsila and dasiavisual verbspsila. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.
    Original languageUndefined
    Title of host publicationIEEE International Conference on Computer Vision and Pattern Recognition
    Place of PublicationLos Alamitos
    Number of pages7
    ISBN (Print)978-1-4244-3994-2
    Publication statusPublished - 2009
    EventIEEE International Conference on Computer Vision and Pattern Recognition, CVPR '09 - Miami, FL, USA
    Duration: 20 Jun 200925 Jun 2009

    Publication series

    PublisherIEEE Computer Society Press


    ConferenceIEEE International Conference on Computer Vision and Pattern Recognition, CVPR '09
    Other20-25 June 2009


    • METIS-264326
    • HMI-HF: Human Factors
    • Temporal segmentation
    • Radon transform
    • activities recovering
    • EWI-17213
    • EC Grant Agreement nr.: FP7/231287
    • visual verbs
    • visual words
    • probabilistic spatiotemporal voting scheme
    • spatial co-occurrences
    • human activity localization
    • human activity recognition
    • mean shift mode estimation
    • training set
    • unsegmented image sequences
    • class-specific codebooks
    • implicit representation
    • implicit spatiotemporal shape model
    • IR-69561

    Cite this