DescriptionA large amount of the available visual material on the web and in digital archives has textual information associated with it. Many images are accompanied by captions and video's often have subtitles or at least soundtracks. These collateral texts are very useful for indexing and disclosing visual material using standard text retrieval techniques. When, for example, we want to find images of prime-minister Kok, simply searching newspaper captions for these terms will do, since most images of the prime-minister will have at least one of the words prime-minister and Kok in it's caption.
However, not all images have textual descriptions. We present a method to disclose these images using the descriptions of other images. Starting from a set of images that do have descriptions, we build a multimodal space in which meaningful relations between textual and visual features exist. Then we project new images without textual descriptions onto this space using their visual features. These newly added images then automatically inherit the textual descriptions from visually similar images.
We experimented with this approach using a train-set of captioned newspaper photographs and a test-set of newspaper photographs from which we removed the captions. Several experiments in which we evaluated the usefulness of the inherited descriptions show that our approach can provide valuable textual indexing terms that support the retrieval of images without captions.
|Period||3 Nov 2000|
|Event title||11th Meeting on Computational Linguistics in the Netherlands, CLIN 2000: null|
|Degree of Recognition||International|