Learn More
In this paper, we present a novel system and effective algorithms for soccer video segmentation. The output, about whether the ball is in play, reveals high-level structure of the content. The first step is to classify each sample frame into 3 kinds of view using a unique domain-specific feature, grass-area-ratio. Here the grass value and classification(More)
We present a technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models. We compare our new technique to standard NMF and to a state-of-the-art Wiener filter implementation and show improvements in speech quality across a range of interfering noise types.
Low-level appearance as well as spatio-temporal features, appropriately quantized and aggregated into Bag-of-Words (BoW) descriptors, have been shown to be effective in many detection and recognition tasks. However, their effcacy for complex event recognition in unconstrained videos have not been systematically evaluated. In this paper, we use the NIST(More)
In this paper, we present algorithms for parsing the structure of produced soccer programs. The problem is important in the context of a personalized video streaming and browsing system. While prior work focuses on the detection of special events such as goals or corner kicks, this paper is concerned with generic structural elements of the game. We begin by(More)
—This paper describes tools and techniques for representing motion information in the context of MPEG-7 standardization for multimedia description interfaces. It first gives an overview of the current organization of the set of MPEG-7 motion descriptions , then illustrates this by presenting two of them, motion activity and motion trajectory, in more(More)
The problem of adaptively selecting pooling regions for the classification of complex video events is considered. Complex events are defined as events composed of several characteristic behaviors, whose temporal configuration can change from sequence to sequence. A dynamic pooling operator is defined so as to enable a unified solution to the problems of(More)
We developed a unified framework to extract highlights from three sports: baseball, golf and soccer by detecting some of the common audio events that are directly indicative of highlights. We used MPEG-7 audio features and entropic prior Hidden Markov Models(HMM) as the audio features and classifier respectively to recognize these common audio events.(More)
We present a system that improves accuracy of food intake assessment using computer vision techniques. Traditional dietetic method suffers from the drawback of either inaccurate assessment or complex lab measurement. Our solution is to use a mobile phone to capture images of foods, recognize food types, estimate their respective volumes and finally return(More)
In our past work we have used temporal patterns of motion activity to extract sports highlights. We have also used audio classification based approaches to develop a common audio-based platform for feature extraction that works across three different sports. In this paper, we combine the two aforementioned complementary approaches so as to get higher(More)
We propose to use action, scene and object concepts as semantic attributes for classification of video events in InTheWild content, such as YouTube videos. We model events using a variety of complementary semantic attribute features developed in a semantic concept space. Our contribution is to systematically demonstrate the advantages of this concept-based(More)