Natalia Neverova

Learn More
We present a method for gesture detection and localization based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at two temporal scales. Key to our technique is a training strategy which exploits i)(More)
We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i)(More)
We propose a generalized approach to human gesture recognition based on multiple data modalities such as depth video, articulated pose and speech. In our system, each gesture is decomposed into large-scale body motion and local subtle movements such as hand articulation. The idea of learning at multiple scales is also applied to the temporal dimension, such(More)
Lighting conditions estimation is a crucial point in many applications. In this paper, we show that combining color images with corresponding depth maps (provided by modern depth sensors) allows to improve estimation of positions and colors of multiple lights in a scene. Since usually such devices provide low-quality images, for many steps of our framework(More)
We present a large-scale study exploring the capability of temporal deep neural networks to interpret natural human kinematics and introduce the first method for active biometric authentication with mobile inertial sensors. At Google, we have created a first-of-its-kind dataset of human movements, passively collected by 1500 volunteers using their(More)
The availability of cheap and effective depth sensors has resulted in recent advances in human pose estimation and tracking. Detailed estimation of hand pose, however, remains a challenge since fingers are often occluded and may only represent just a few pixels. Moreover, labelled data is difficult to obtain. We propose a deep learning basedapproach for(More)
This paper presents a method for extracting blur/sharp regions of interest (ROI) that benefits of using a combination of edge and region based approaches. It can be considered as a preliminary step for many vision applications tending to focus only on the most salient areas in low depth-of-field images. To localize focused regions, we first classify each(More)
The ability to predict and therefore to anticipate the future is an important attribute of intelligence. It is also of utmost importance in real-time systems, e.g. in robotics or autonomous driving, which depend on visual scene understanding for decision making. While prediction of the raw RGB pixel values in future video frames has been studied in previous(More)
Using the Manhattan world assumption we propose a new method for global 21/2D geometry estimation of indoor environments from single low quality RGB-D images. This method exploits both color and depth information at the same time and allows to obtain a full representation of an indoor scene from only a single shot of the Kinect sensor. The main novelty of(More)