Deep Reinforcement Learning for Active Human Pose Estimation

@article{Grtner2020DeepRL,
  title={Deep Reinforcement Learning for Active Human Pose Estimation},
  author={Erik G{\"a}rtner and Aleksis Pirinen and Cristian Sminchisescu},
  journal={ArXiv},
  year={2020},
  volume={abs/2001.02024}
}
Most 3d human pose estimation methods assume that input – be it images of a scene collected from one or several viewpoints, or from a video – is given. Consequently, they focus on estimates leveraging prior knowledge and measurement by fusing information spatially and/or temporally, whenever available. In this paper we address the problem of an active observer with freedom to move and explore the scene spatially – in ‘time-freeze’ mode – and/or temporally, by selecting informative viewpoints… 

Figures and Tables from this paper

Image Classification by Reinforcement Learning with Two-State Q-Learning

  • A. M. HafizG. M. Bhat
  • Computer Science
    Handbook of Intelligent Computing and Optimization for Sustainable Development
  • 2022
TLDR
Because the proposed Hybrid Classifier uses only two Q-states it is straightforward and consequently has much lesser number of optimization parameters, and thus also has a simple reward function.

Deep Q-Network Based Multi-agent Reinforcement Learning with Binary Action Agents.

TLDR
This work proposes a simple but efficient DQN based MAS for RL which uses shared state and rewards, but agent-specific actions, for updation of the experience replay pool of the DQNs, where each agent is a D QN.

Meta Agent Teaming Active Learning for Pose Estimation

TLDR
A novel Meta Agent Teaming Active Learning (MATAL) framework to actively select and label informative images for effective learning and can save around 40% labeling efforts on average compared to state-of-the-art active learning frameworks.

Real-Time Multitarget Tracking for Panoramic Video Based on Dual Neural Networks for Multisensor Information Fusion

  • Qing Lin
  • Computer Science
    Mathematical Problems in Engineering
  • 2022
TLDR
The proposed panoramic video multitarget real-time tracking algorithm based on the dual neural network can effectively improve the target tracking accuracy of the model on degraded frames, and the stability of the algorithm for target location and category detection is effectively improved by multiframe feature fusion.

Efficient Virtual View Selection for 3D Hand Pose Estimation

TLDR
A new virtual view selection and fusion module for 3D hand pose estimation from single depth is proposed to automatically select multiple virtual viewpoints for pose estimation and fuse the results of all and find this empirically delivers accurate and robust pose estimation.

Multi-Agent Deep Reinforcement Learning for Online 3D Human Poses Estimation

TLDR
This paper addresses the problem of online view selection for a fixed number of cameras to estimate multi-person 3D poses actively and is the first to address online active multi-view 3D pose estimation with multi-agent reinforcement learning.

Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration

TLDR
The Glimpse-Attend-and-Explore model, which employs self-attention to guide the visual exploration instead of task-specific uncertainty maps, can be used for both dense and sparse prediction tasks; and uses a contrastive stream to further improve the representations learned.

Deep Full-Body HPE for Activity Recognition from RGB Frames Only

TLDR
A Deep Full-Body-H PE (DFB-HPE) approach from RGB images only based on ConvNets and SVM, which achieves the best HPE performance, as well as the best activity recognition precision on the CAD-60 dataset.

Embodied Visual Active Learning for Semantic Segmentation

TLDR
This work extensively evaluates the proposed models using the photorealistic Matterport3D simulator and shows that a fully learnt method outperforms comparable pre-specified counterparts, even when requesting fewer annotations.

References

SHOWING 1-10 OF 41 REFERENCES

Domes to Drones: Self-Supervised Active Triangulation for 3D Human Pose Reconstruction

TLDR
This work introduces ACTOR, an active triangulation agent for 3d human pose reconstruction, a fully trainable agent consisting of a 2d pose estimation network and a deep reinforcement learning-based policy for camera viewpoint selection that produces significantly more accurate 3d pose reconstructions.

Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints

TLDR
This paper leverage state-of-the-art deep multi-task neural networks and parametric human and scene modeling, towards a fully automatic monocular visual sensing system for multiple interacting people, which infers the 2d and 3d pose and shape of multiple people from a single image.

Target-driven visual navigation in indoor scenes using deep reinforcement learning

TLDR
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine.

LCR-Net: Localization-Classification-Regression for Human Pose

TLDR
This work proposes an end-to-end architecture for joint 2D and 3D human pose estimation in natural images that significantly outperforms the state of the art in 3D pose estimation on Human3.6M, a controlled environment.

Action-Driven Visual Object Tracking With Deep Reinforcement Learning

TLDR
The fast version of the proposed method, which operates in real time on graphics processing unit, outperforms the state-of-the-art real-time trackers with an accuracy improvement of more than 8%.

Ordinal Depth Supervision for 3D Human Pose Estimation

TLDR
This work proposes to use a weaker supervision signal provided by the ordinal depths of human joints, which achieves new state-of-the-art performance for the relevant benchmarks and validate the effectiveness of ordinal depth supervision for 3D human pose.

Deep Reinforcement Learning for Visual Object Tracking in Videos

TLDR
This paper introduces a fully end-to-end approach for visual tracking in videos that learns to predict the bounding box locations of a target object at every frame and is the first neural-network tracker that combines convolutional and recurrent networks with RL algorithms.

Convolutional Pose Machines

TLDR
This work designs a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference in structured prediction tasks such as articulated pose estimation.

Geometry-Aware Recurrent Neural Networks for Active Visual Recognition

We present recurrent geometry-aware neural networks that integrate visual in- formation across multiple views of a scene into 3D latent feature tensors, while maintaining an one-to-one mapping

Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera

TLDR
This work proposes a method that combines a single hand-held camera and a set of Inertial Measurement Units (IMUs) attached at the body limbs to estimate accurate 3D poses in the wild and obtains an accuracy of 26 mm, which makes it accurate enough to serve as a benchmark for image-based 3D pose estimation in theWild.