TY - GEN
T1 - An implicit spatiotemporal shape model for human activity localization and recognition
AU - Oikonomopoulos, A.
AU - Patras, I.
AU - Pantic, Maja
N1 - 10.1109/CVPR.2009.5204262
PY - 2009
Y1 - 2009
N2 - In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, dasiavisual wordspsila and dasiavisual verbspsila. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.
AB - In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, dasiavisual wordspsila and dasiavisual verbspsila. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.
KW - METIS-264326
KW - HMI-HF: Human Factors
KW - Temporal segmentation
KW - Radon transform
KW - activities recovering
KW - EWI-17213
KW - EC Grant Agreement nr.: FP7/231287
KW - visual verbs
KW - visual words
KW - probabilistic spatiotemporal voting scheme
KW - spatial co-occurrences
KW - human activity localization
KW - human activity recognition
KW - mean shift mode estimation
KW - training set
KW - unsegmented image sequences
KW - class-specific codebooks
KW - implicit representation
KW - implicit spatiotemporal shape model
KW - HMI-MI: MULTIMODAL INTERACTIONS
KW - IR-69561
U2 - 10.1109/CVPR.2009.5204262
DO - 10.1109/CVPR.2009.5204262
M3 - Conference contribution
SN - 978-1-4244-3994-2
SP - 27
EP - 33
BT - IEEE International Conference on Computer Vision and Pattern Recognition
PB - IEEE
CY - Los Alamitos
T2 - IEEE International Conference on Computer Vision and Pattern Recognition, CVPR '09
Y2 - 20 June 2009 through 25 June 2009
ER -