Test-Time Personalization with a Transformer for Human Pose Estimation
@article{Hao2021TestTimePW, title={Test-Time Personalization with a Transformer for Human Pose Estimation}, author={Miao Hao and Yizhuo Li and Zonglin Di and Nitesh B. Gundavarapu and Xiaolong Wang}, journal={ArXiv}, year={2021}, volume={abs/2107.02133} }
We propose to personalize a 2D human pose estimator given a set of test images of a person without using any manual annotations. While there is a significant advancement in human pose estimation, it is still very challenging for a model to generalize to different unknown environments and unseen persons. Instead of using a fixed model for every test case, we adapt our pose estimator during test time to exploit person-specific information. We first train our model on diverse data with both a…
Figures and Tables from this paper
13 Citations
Deep Learning-Based Human Pose Estimation: A Survey
- Computer ScienceTsinghua Science and Technology
- 2019
A comprehensive survey of deep learning based human pose estimation methods and analyzes the methodologies employed and summarizes and discusses recent works with a methodology-based taxonomy.
A new benchmark for group distribution shifts in hand grasp regression for object manipulation. Can meta-learning raise the bar?
- Computer ScienceArXiv
- 2022
A novel benchmark for object group distribution shifts in hand and object pose regression for object grasping is proposed and the hypothesis that meta-learning a baseline pose regression neural network can adapt to these shifts and generalize better to unknown objects is tested.
Boost Test-Time Performance with Closed-Loop Inference
- Computer ScienceArXiv
- 2022
A general Closed-Loop Inference (CLI) method is proposed, which first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops and construct looped inference, so that the original erroneous predictions on these hard test samples can be corrected with little additional effort.
A Survey on Vision Transformer
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2022
This paper reviews these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages, and takes a brief look at the self-attention mechanism in computer vision, as it is the base component in transformer.
Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging
- Computer ScienceArXiv
- 2022
A principled Degradation-Aware Unfolding Framework (DAUF) that estimates parameters from the compressed image and physical mask, and then uses these parameters to control each iteration, and customize a novel Half-Shuffle Transformer (HST) that simultaneously captures local contents and non-local dependencies.
MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction
- Environmental Science, Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
- 2022
This work proposes a novel Transformer-based method, Multi-stage Spectral-wise Transformer (MST++), for efficient spectral reconstruction that significantly outperforms other state-of-the-art methods.
Improving ProtoNet for Few-Shot Video Object Recognition: Winner of ORBIT Challenge 2022
- Computer ScienceArXiv
- 2022
This work re-factor and re-implement the official codebase to encourage modularity, compatibility and improved performance, and accelerates the data loading in both training and testing.
Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction
- Computer ScienceECCV
- 2022
A novel Transformer-based method, coarse-to-fine sparse Transformer (CST), firstly embedding HSI sparsity into deep learning for HSI reconstruction and comprehensive experiments show that this CST significantly outperforms state-of-the-art methods while requiring cheaper computational costs.
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
This work introduces a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables, namely invariant variables, style confounders, and spurious features, and introduces a learning framework that treats each group separately.
Skeleton2Humanoid: Animating Simulated Characters for Physically-plausible Motion In-betweening
- Computer ScienceACM Multimedia
- 2022
Experiments on the challenging LaFAN1 dataset show the proposed Skeleton2Humanoid system can outperform prior methods significantly in terms of both physical plausibility and accuracy.
References
SHOWING 1-10 OF 75 REFERENCES
Personalizing Human Video Pose Estimation
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
A personalized ConvNet pose estimator that automatically adapts itself to the uniqueness of a person's appearance to improve pose estimation in long videos and outperforms the state of the art (including top ConvNet methods) by a large margin on three standard benchmarks, as well as on a new challenging YouTube video dataset.
Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
A weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure to regularize the 3D pose prediction, which is effective in the absence of ground truth depth labels.
End-to-End Trainable Multi-Instance Pose Estimation with Transformers
- Computer ScienceArXiv
- 2021
This model is the first end-to-end trainable multi-instance pose estimation method and it is hoped it will serve as a simple and promising alternative to other bottom-up and topdown approaches.
2D Human Pose Estimation: New Benchmark and State of the Art Analysis
- Computer Science2014 IEEE Conference on Computer Vision and Pattern Recognition
- 2014
A novel benchmark "MPII Human Pose" is introduced that makes a significant advance in terms of diversity and difficulty, a contribution that is required for future developments in human body models.
OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2021
OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset.
TFPose: Direct Human Pose Estimation with Transformers
- Computer ScienceArXiv
- 2021
A human pose estimation framework that solves the task in the regression-based fashion, and can inherently take advantages of the structured relationship between keypoints, bypassing the drawbacks of the heatmapbased pose estimation methods.
Towards Accurate Multi-person Pose Estimation in the Wild
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
This work proposes a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task by using a novel form of keypoint-based Non-Maximum-Suppression (NMS), instead of the cruder box-level NMS, and by introducing a novel aggregation procedure to obtain highly localized keypoint predictions.
3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce…
Learning Feature Pyramids for Human Pose Estimation
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
This work designs a Pyramid Residual Module (PRMs) to enhance the invariance in scales of DCNNs and provides theoretic derivation to extend the current weight initialization scheme to multi-branch network structures.
Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation
- Computer Science2021 International Conference on 3D Vision (3DV)
- 2021
This paper augments existing 2D datasets with high-quality 3D pose fits by augmenting them with Exemplar Fine-Tuning (EFT), and shows that EFT produces 3D annotations that result in better downstream performance and are qualitatively preferable in an extensive human-based assessment.