• Corpus ID: 16991828

R-CNNs for Pose Estimation and Action Detection

@article{Gkioxari2014RCNNsFP,
  title={R-CNNs for Pose Estimation and Action Detection},
  author={Georgia Gkioxari and Bharath Hariharan and Ross B. Girshick and Jitendra Malik},
  journal={ArXiv},
  year={2014},
  volume={abs/1406.5212}
}
We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images. Our approach involves training an R-CNN detector with loss functions depending on the task being tackled. We evaluate our method on the challenging PASCAL VOC dataset and compare it to previous leading approaches. Our method gives state-of-the-art results for keypoint and action prediction. Additionally, we introduce a new dataset for action detection… 

Figures and Tables from this paper

Viewpoints and keypoints
TLDR
The problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details is characterized and it is demonstrated that leveraging viewpoint estimates can substantially improve local appearance based keypoint predictions.
LCR-Net: Localization-Classification-Regression for Human Pose
TLDR
This work proposes an end-to-end architecture for joint 2D and 3D human pose estimation in natural images that significantly outperforms the state of the art in 3D pose estimation on Human3.6M, a controlled environment.
Single Image Action Recognition by Predicting Space-Time Saliency
TLDR
This work uses the predicted future motion in the static image as a means of compensating for the missing temporal information, while using the saliency map to represent the the spatial information in the form of location and shape of what is predicted as significant.
Monocular human pose estimation: A survey of deep learning-based methods
3D CNN for Human Action Recognition
TLDR
This paper proposes a HAR approach based on a 3D CNN model, and applies the developed model to recognize human actions of KTH and J-HMDB datasets, and achieves state of the art performance in comparison to baseline methods.
Convolutional Models for Joint Object Categorization and Pose Estimation
TLDR
This paper investigates and analyze the layers of various CNN models and extensively compare between them with the goal of discovering how the layer of distributed representations of CNNs represent object pose information and how this contradicts with object category representations.
DeepCAMP: Deep Convolutional Action & Attribute Mid-Level Patterns
TLDR
A novel convolutional neural network that mines mid-level image patches that are sufficiently dedicated to resolve the corresponding subtleties and train a newly designed CNN (DeepPattern) that learns discriminative patch groups.
Understanding holistic human pose using class-specific convolutional neural network
TLDR
This paper presents a method to capture human pose from individual real-world RGB images using a deep learning technique, and introduces a classification scheme for this problem, which reasons the pose holistically.
A Multi-Modal Approach to Infer Image Affect
TLDR
This paper combines three additional modalities, namely, human pose, text-based tagging and CNN extracted features / predictions, for the first time all of the modalities were extracted using deep neural networks.
Region-Based Convolutional Networks for Accurate Object Detection and Segmentation
TLDR
A simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent.
...
...

References

SHOWING 1-10 OF 26 REFERENCES
Using k-Poselets for Detecting People and Localizing Their Keypoints
TLDR
A k-poselet is a deformable part model with k parts, where each of the parts is a poselet, aligned to a specific configuration of keypoints based on ground-truth annotations, which enables a unified approach to person detection and keypoint prediction.
DeepPose: Human Pose Estimation via Deep Neural Networks
TLDR
The pose estimation is formulated as a DNN-based regression problem towards body joints and a cascade of such DNN regres- sors which results in high precision pose estimates.
Action recognition from a distributed representation of pose and appearance
TLDR
This work presents a distributed representation of pose and appearance of people called the “poselet activation vector”, which can be used to estimate the pose of people defined by the 3D orientations of the head and torso in the challenging PASCAL VOC 2010 person detection dataset.
Articulated Pose Estimation Using Discriminative Armlet Classifiers
TLDR
This work proposes a rich representation which, in addition to standard HOG features, integrates the information of strong contours, skin color and contextual cues in a principled manner, and outperforms Yang and Ramanan [26], the state-of-the-art technique.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks
TLDR
This work designs a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset, and shows that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification.
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
TLDR
DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.
Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation
TLDR
A new annotated database of challenging consumer images is introduced, an order of magnitude larger than currently available datasets, and over 50% relative improvement in pose estimation accuracy over a state-of-the-art method is demonstrated.
Cascaded Models for Articulated Pose Estimation
TLDR
This work proposes to learn a sequence of structured models at different pose resolutions, where coarse models filter the pose space for the next level via their max-marginals, and trains the cascade to prune as much as possible while preserving true poses for the final level pictorial structure model.
Learning hierarchical poselets for human parsing
TLDR
A structured model to organize poselets in a hierarchical way and learn the model parameters in a max-margin framework and demonstrates the superior performance of the proposed approach on two datasets with aggressive pose variations.
...
...