Semantic Multimedia Information Analysis for Retrieval Applications

Abstract

Most of the research in multimedia retrieval applications has focused on retrieval by content or retrieval by example. Since the classical review by Smeulders, Worring, Santini, Gupta, and Jain (2000), a new interest has grown immensely in the multimedia information retrieval community: retrieval by semantics. This exciting new research area arises as a combination of multimedia understanding, information extraction, information retrieval, and digital libraries. This chapter presents a comprehensive review of analysis algorithms in order to extract semantic information from multimedia content. We discuss statistical approaches to analyze images and video content and conclude with a discussion regarding the described methods. 334 Magalhães & Rüger Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Introduction: Multimedia Analysis The growing interest in managing multimedia collections effectively and efficiently has created new research interest that arises as a combination of multimedia understanding, information extraction, information retrieval, and digital libraries. This growing interest has resulted in the creation of a video retrieval track in TREC conference series in parallel with the text retrieval track (TRECVID, 2004). Figure 1 illustrates a simplified multimedia information retrieval application composed by a multimedia database, analysis algorithms, a description database, and a user interface application. Analysis algorithms extract features from multimedia content and store them as descriptions of that content. A user then deploys these indexing descriptions in order to search the multimedia database. A semantic multimedia information retrieval application (Figure 1) differs eminently from traditional retrieval applications on the low-level analysis algorithms; its algorithms are responsible for extracting semantic information used to index multimedia content by its semantic. Multimedia content can be indexed in many ways, and each index can refer to different modalities and/or parts of the multimedia piece. Multimedia content is composed of the visual track, sound track, speech track, and text. All these modalities are arranged temporally to provide a meaningful way to transmit information and/or entertainment. The way video documents are temporally structured can be distinguished in two levels: semantic and syntactic structure (Figure 2). At the syntactic level, the video is segmented into shots (visual or audio) that form a uniform segment (e.g., visually similar frames); representative key-frames are extracted from each shot, and scenes group neighboring similar shots into a single segment. The segmentation of video into its syntactic structure of video has been studied widely (Brunelli, Mich, & Modena, 1999; Wang, Liu, & Huang, 2000). Figure 1. A typical multimedia information retrieval application Low-level features Semantic features Human decision Color Shapes Textures Motion Multimedia content descriptions

6 Figures and Tables

Cite this paper

@inproceedings{Magalhes2006SemanticMI, title={Semantic Multimedia Information Analysis for Retrieval Applications}, author={Jo{\~a}o Magalh{\~a}es}, year={2006} }