Learn More
Detecting Image Near-Duplicate (IND) is an important problem in a variety of applications, such as copyright infringement detection and multimedia linking. Traditional image similarity models are often difficult to identify IND due to their inability to capture scene composition and semantics. We present a part-based image similarity measure derived from(More)
In this paper, we describe the IBM Research system for analysis, indexing, and retrieval of video, which was applied to the TREC-2002 video retrieval benchmark. The system explores methods for fully-automatic content analysis , shot boundary detection, multi-modal feature extraction , statistical modeling for semantic concept detection, and speech(More)
We have developed generic and domain-specific video algorithms for caption text extraction and recognition in digital video. Our system includes several unique features: for caption box location, we combine the compressed-domain features derived from DCT coefficients and motion vectors. Long-term temporal consistency is employed to enhance localization(More)
We have developed a novel system for baseball video event detection and summarization using superimposed caption text detection and recognition. The system detects different types of semantic level events in baseball video including scoring and last pitch of each batter. The system has two components: event detection and event boundary detection. Event(More)
Data clustering is an important technique for visual data management. Most previous work focuses on clustering video data within single sources. In this paper, we address the problem of clustering across sources, and propose novel spectral clustering algorithms for multi-source clustering problems. Spectral clustering is a new discriminative method(More)
Videotext recognition is challenging due to low resolution, diverse fonts/styles, and cluttered background. Past methods enhanced recognition by using multiple frame averaging, image interpolation and lexicon correction, but recognition using multi-modality language models has not been explored. In this paper, we present a formal Bayesian framework for(More)
We present a novel discriminative-generative hybrid approach in this paper, with emphasis on application in multiview object detection. Our method includes a novel generative model called Random Attributed Relational Graph (RARG) which is able to capture the structural and appearance characteristics of parts extracted from objects. We develop new(More)
This paper presents a system to detect and extract overlay text in digital video. To overcome the problems of the previous approaches, we employed a multiple hypothesis filtering approach: The sub-images in the region-of-interests (ROI) detected by a localization procedure are decomposed into several hypothetical binary images using color space(More)
In this paper, classifier fusion is adopted to demonstrate improved performance for our text overlay detections in the NIST TREC-2002 Video Retrieval Benchmark. A normalized ensemble fusion is explored to combine two text overlay detection models. The fusion incorporates normalization of confidence scores, aggregation via combiner function, and an optimize(More)
IMS was introduced into R5 core network by 3GPP, and SIP was chosen as the call/session control portocol. However¿CSIP message's size is becoming a performance bottleneck. Based on the SigComp, this paper gives a new compression algorithm. It includes LZSS and Huffman with pretreatment. And this paper gives a SigComp session delay model. Simulation results(More)