Learn More
In this work we introduced SnooperTrack, an algorithm for the automatic detection and tracking of text objects — such as store names, traffic signs, license plates, and advertisements — in videos of outdoor scenes. The purpose is to improve the performances of text detection process in still images by taking advantage of the temporal coherence(More)
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may(More)
This paper introduces a regularization method to explicitly control the rank of a learned symmetric positive semidefinite distance matrix in distance metric learning. To this end, we propose to incorporate in the objective function a linear regularization term that minimizes the k smallest eigenvalues of the distance matrix. It is equivalent to minimizing(More)
—Automatic detection of a falling person in video sequences has interesting applications in video-surveillance and is an important part of future pervasive home monitoring systems. In this paper, we propose a multiview approach to achieve this goal, where motion is modeled using a layered hidden Markov model (LHMM). The posture classification is performed(More)
Automatic detection of a falling person in video sequences is an important part of future pervasive home monitoring systems. We propose here a robust method to achieve this goal. Motion is modeled by a hierarchical hidden Markov model (HHMM) whose first layer states are related to the orientation of the tracked person. Finding a consistent way for robustly(More)
In this paper, we propose a hybrid architecture that combines the image modeling strengths of the bag of words framework with the representational power and adaptability of learning deep architectures. Local gradient-based descriptors, such as SIFT, are encoded via a hierarchical coding scheme composed of spatial aggregating restricted Boltzmann machines(More)
We discuss the use of histogram of oriented gradients (HOG) descriptors as an effective tool for text description and recognition. Specifically, we propose a HOG-based texture descriptor (T-HOG) that uses a partition of the image into overlapping horizontal cells with gradual boundaries, to characterize single-line texts in outdoor scenes. The input of our(More)
This paper presents an extension of the HMAX model, a neural network model for image classification. The HMAX model can be described as a four-level architecture, with the first level consisting of multiscale and multiorientation local filters. We introduce two main contributions to this model. First, we improve the way the local filters at the first level(More)
In this paper, we address the challenging problem of categorizing video sequences composed of dynamic natural scenes. Contrarily to previous methods that rely on handcrafted descriptors, we propose here to represent videos using unsupervised learning of motion features. Our method encompasses three main contributions: 1) Based on the Slow Feature Analysis(More)