Abstract
While wearable cameras are becoming increasingly popular, locating relevant information in large unstructured collections of egocentric images is still a tedious and time consuming process. This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments, hence making an important step towards the goal of automatically annotating these photos for browsing and retrieval. In the proposed method, first, contextual and semantic information is extracted for each image by employing a Convolutional Neural Networks approach. Later, a vocabulary of concepts is defined in a semantic space by relying on linguistic information. Finally, by exploiting the temporal coherence of concepts in photo streams, images which share contextual and semantic attributes are grouped together. The resulting temporal segmentation is particularly suited for further analysis, ranging from event recognition to semantic indexing and summarization. Experimental results over egocentric set of nearly 31,000 images, show the prominence of the proposed approach over state-of-the-art methods. (C) 2016 Published by Elsevier Inc.
Original language | English |
---|---|
Pages (from-to) | 55-69 |
Number of pages | 15 |
Journal | Computer vision and image understanding |
Volume | 155 |
DOIs | |
Publication status | Published - Feb 2017 |
Externally published | Yes |
Keywords
- Temporal segmentation
- Egocentric vision
- Photo streams clustering
- Wearable cameras
- Database
- Health
- Video