Ajay Divakaran

Learn More
In this paper, we present a novel system and effective algorithms for soccer video segmentation. The output, about whether the ball is in play, reveals high-level structure of the content. The first step is to classify each sample frame into 3 kinds of view using a unique domain-specific feature, grass-area-ratio. Here the grass value and classification(More)
We present a technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models. We compare our new technique to standard NMF and to a state-of-the-art Wiener filter implementation and show improvements in speech quality across a range of interfering noise types.
Low-level appearance as well as spatio-temporal features, appropriately quantized and aggregated into Bag-of-Words (BoW) descriptors, have been shown to be effective in many detection and recognition tasks. However, their effcacy for complex event recognition in unconstrained videos have not been systematically evaluated. In this paper, we use the NIST(More)
In this paper, we present statistical techniques for parsing the structure of produced soccer programs. The problem is important for applications such as personalized video streaming and browsing systems, in which video are segmented into different states and important states are selected based on user preferences. While prior work focuses on the detection(More)
In this paper, we present algorithms for parsing the structure of produced soccer programs. The problem is important in the context of a personalized video streaming and browsing system. While prior work focuses on the detection of special events such as goals or corner kicks, this paper is concerned with generic structural elements of the game. We begin by(More)
This paper describes tools and techniques for representing motion information in the context of MPEG-7 standardization for multimedia description interfaces. It first gives an overview of the current organization of the set of MPEG-7 motion descriptions, then illustrates this by presenting two of them, motion activity and motion trajectory, in more detail.(More)
The problem of adaptively selecting pooling regions for the classification of complex video events is considered. Complex events are defined as events composed of several characteristic behaviors, whose temporal configuration can change from sequence to sequence. A dynamic pooling operator is defined so as to enable a unified solution to the problems of(More)
We developed a unified framework to extract highlights from three sports: baseball, golf and soccer by detecting some of the common audio events that are directly indicative of highlights. We used MPEG-7 audio features and entropic prior Hidden Markov Models(HMM) as the audio features and classifier respectively to recognize these common audio events.(More)
Structure elements in a time sequence (e.g. video) are repetitive segments with consistent deterministic or stochastic characteristics. While most existing work in detecting structurs follow a supervised paradigm, we propose a fully unsupervised statistical solution in this paper. We present a unified approach to structure discovery from long video(More)
We present a system that improves accuracy of food intake assessment using computer vision techniques. Traditional dietetic method suffers from the drawback of either inaccurate assessment or complex lab measurement. Our solution is to use a mobile phone to capture images of foods, recognize food types, estimate their respective volumes and finally return(More)