Katerina Fragkiadaki

Learn More
We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and de-coder networks before and after recurrent layers. We test instantiations of ERD architectures in the tasks of motion capture (mocap)(More)
We propose a detection-free system for segmenting multiple interacting and deforming people in a video. People detectors often fail under close agent interaction, limiting the performance of detection based tracking methods. Motion information often fails to separate similarly moving agents or to group distinctly moving articulated body parts. We formulate(More)
We propose a tracking framework that mediates grouping cues from two levels of tracking granularities, detection tracklets and point trajectories, for segmenting objects in crowded scenes. Detection tracklets capture objects when they are mostly visible. They may be sparse in time, may miss partially occluded or deformed objects, or contain false positives.(More)
Our goal is to segment a video sequence into moving objects and the world scene. In recent work, spectral embedding of point trajectories based on 2D motion cues accumulated from their lifespans, has shown to outperform factor-ization and per frame segmentation methods for video seg-mentation. The scale and kinematic nature of the moving objects and the(More)
Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feed-forward processing. Feedforward architectures can learn rich representations of the input space but do not explicitly model dependencies in the output spaces, that are quite structured for(More)
We segment moving objects in videos by ranking spatio-temporal segment proposals according to " moving object-ness " ; how likely they are to contain a moving object. In each video frame, we compute segment proposals using multiple figure-ground segmentations on per frame motion boundaries. We rank them with a Moving Objectness Detector trained on image and(More)
Extracting 3D shape of deforming objects in monocular videos, a task known as non-rigid structure-from-motion (NRSfM), has so far been studied only on synthetic datasets and controlled environments. Typically, the objects to reconstruct are pre-segmented, they exhibit limited rotations and occlusions, or full-length trajectories are assumed. In order to(More)
The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents. In this paper, we explore how an agent can be equipped with an internal model of the dynamics of the external world, and how it can use this model to plan novel actions by running multiple internal simulations (" visual(More)
We present a method that segments moving objects in monocular uncalibrated videos using a combination of moving segment proposal generation and moving object-ness ranking. We compute segment proposals in each frame by multiple figure-ground segmentations on optical flow boundaries, we call them Moving Object Proposals (MOPs). MOPs increase the object(More)