An implicit spatiotemporal shape model for human activity localization and recognition

A. Oikonomopoulos, I. Patras, Maja Pantic

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    22 Citations (Scopus)
    27 Downloads (Pure)

    Abstract

    In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, dasiavisual wordspsila and dasiavisual verbspsila. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.
    Original languageUndefined
    Title of host publicationIEEE International Conference on Computer Vision and Pattern Recognition
    Place of PublicationLos Alamitos
    PublisherIEEE Computer Society
    Pages27-33
    Number of pages7
    ISBN (Print)978-1-4244-3994-2
    DOIs
    Publication statusPublished - 2009

    Publication series

    Name
    PublisherIEEE Computer Society Press
    Volume3

    Keywords

    • METIS-264326
    • HMI-HF: Human Factors
    • Temporal segmentation
    • Radon transform
    • activities recovering
    • EWI-17213
    • EC Grant Agreement nr.: FP7/231287
    • visual verbs
    • visual words
    • probabilistic spatiotemporal voting scheme
    • spatial co-occurrences
    • human activity localization
    • human activity recognition
    • mean shift mode estimation
    • training set
    • unsegmented image sequences
    • class-specific codebooks
    • implicit representation
    • implicit spatiotemporal shape model
    • HMI-MI: MULTIMODAL INTERACTIONS
    • IR-69561

    Cite this

    Oikonomopoulos, A., Patras, I., & Pantic, M. (2009). An implicit spatiotemporal shape model for human activity localization and recognition. In IEEE International Conference on Computer Vision and Pattern Recognition (pp. 27-33). [10.1109/CVPR.2009.5204262] Los Alamitos: IEEE Computer Society. https://doi.org/10.1109/CVPR.2009.5204262
    Oikonomopoulos, A. ; Patras, I. ; Pantic, Maja. / An implicit spatiotemporal shape model for human activity localization and recognition. IEEE International Conference on Computer Vision and Pattern Recognition. Los Alamitos : IEEE Computer Society, 2009. pp. 27-33
    @inproceedings{30188368db794edcbca0e976be0d5ae9,
    title = "An implicit spatiotemporal shape model for human activity localization and recognition",
    abstract = "In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, dasiavisual wordspsila and dasiavisual verbspsila. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.",
    keywords = "METIS-264326, HMI-HF: Human Factors, Temporal segmentation, Radon transform, activities recovering, EWI-17213, EC Grant Agreement nr.: FP7/231287, visual verbs, visual words, probabilistic spatiotemporal voting scheme, spatial co-occurrences, human activity localization, human activity recognition, mean shift mode estimation, training set, unsegmented image sequences, class-specific codebooks, implicit representation, implicit spatiotemporal shape model, HMI-MI: MULTIMODAL INTERACTIONS, IR-69561",
    author = "A. Oikonomopoulos and I. Patras and Maja Pantic",
    note = "10.1109/CVPR.2009.5204262",
    year = "2009",
    doi = "10.1109/CVPR.2009.5204262",
    language = "Undefined",
    isbn = "978-1-4244-3994-2",
    publisher = "IEEE Computer Society",
    pages = "27--33",
    booktitle = "IEEE International Conference on Computer Vision and Pattern Recognition",
    address = "United States",

    }

    Oikonomopoulos, A, Patras, I & Pantic, M 2009, An implicit spatiotemporal shape model for human activity localization and recognition. in IEEE International Conference on Computer Vision and Pattern Recognition., 10.1109/CVPR.2009.5204262, IEEE Computer Society, Los Alamitos, pp. 27-33. https://doi.org/10.1109/CVPR.2009.5204262

    An implicit spatiotemporal shape model for human activity localization and recognition. / Oikonomopoulos, A.; Patras, I.; Pantic, Maja.

    IEEE International Conference on Computer Vision and Pattern Recognition. Los Alamitos : IEEE Computer Society, 2009. p. 27-33 10.1109/CVPR.2009.5204262.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    TY - GEN

    T1 - An implicit spatiotemporal shape model for human activity localization and recognition

    AU - Oikonomopoulos, A.

    AU - Patras, I.

    AU - Pantic, Maja

    N1 - 10.1109/CVPR.2009.5204262

    PY - 2009

    Y1 - 2009

    N2 - In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, dasiavisual wordspsila and dasiavisual verbspsila. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.

    AB - In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, dasiavisual wordspsila and dasiavisual verbspsila. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.

    KW - METIS-264326

    KW - HMI-HF: Human Factors

    KW - Temporal segmentation

    KW - Radon transform

    KW - activities recovering

    KW - EWI-17213

    KW - EC Grant Agreement nr.: FP7/231287

    KW - visual verbs

    KW - visual words

    KW - probabilistic spatiotemporal voting scheme

    KW - spatial co-occurrences

    KW - human activity localization

    KW - human activity recognition

    KW - mean shift mode estimation

    KW - training set

    KW - unsegmented image sequences

    KW - class-specific codebooks

    KW - implicit representation

    KW - implicit spatiotemporal shape model

    KW - HMI-MI: MULTIMODAL INTERACTIONS

    KW - IR-69561

    U2 - 10.1109/CVPR.2009.5204262

    DO - 10.1109/CVPR.2009.5204262

    M3 - Conference contribution

    SN - 978-1-4244-3994-2

    SP - 27

    EP - 33

    BT - IEEE International Conference on Computer Vision and Pattern Recognition

    PB - IEEE Computer Society

    CY - Los Alamitos

    ER -

    Oikonomopoulos A, Patras I, Pantic M. An implicit spatiotemporal shape model for human activity localization and recognition. In IEEE International Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society. 2009. p. 27-33. 10.1109/CVPR.2009.5204262 https://doi.org/10.1109/CVPR.2009.5204262