Content-based video retrieval is emerging as an important part in the process of utilization of various multimedia documents. In this report we present a novel system for the automatic indexing and content-based retrieval of multimedia documents. We chose the domain of Formula 1 sport videos because the manual annotation of Formula 1 races is complicated and time consuming. Our system uses multi-modal clues, obtained from three different multimedia components: audio, video, and superimposed text. The audio and video feature extraction subsystems are developed to extract important parameters from multimedia documents. We also performed text detection and recognition to extract some semantic information superimposed in the Formula 1 race video. To unify the audio and video clues we employed dynamic Bayesian networks. Many experiments that we carried out are also presented, as well as the results and conclusions drawn from them.
|Name||CTIT Technical Report Series|