• Corpus ID: 235731535

Test-Time Personalization with a Transformer for Human Pose Estimation

  title={Test-Time Personalization with a Transformer for Human Pose Estimation},
  author={Miao Hao and Yizhuo Li and Zonglin Di and Nitesh B. Gundavarapu and Xiaolong Wang},
We propose to personalize a 2D human pose estimator given a set of test images of a person without using any manual annotations. While there is a significant advancement in human pose estimation, it is still very challenging for a model to generalize to different unknown environments and unseen persons. Instead of using a fixed model for every test case, we adapt our pose estimator during test time to exploit person-specific information. We first train our model on diverse data with both a… 

Figures and Tables from this paper

Deep Learning-Based Human Pose Estimation: A Survey

A comprehensive survey of deep learning based human pose estimation methods and analyzes the methodologies employed and summarizes and discusses recent works with a methodology-based taxonomy.

A new benchmark for group distribution shifts in hand grasp regression for object manipulation. Can meta-learning raise the bar?

A novel benchmark for object group distribution shifts in hand and object pose regression for object grasping is proposed and the hypothesis that meta-learning a baseline pose regression neural network can adapt to these shifts and generalize better to unknown objects is tested.

Boost Test-Time Performance with Closed-Loop Inference

A general Closed-Loop Inference (CLI) method is proposed, which first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops and construct looped inference, so that the original erroneous predictions on these hard test samples can be corrected with little additional effort.

A Survey on Vision Transformer

  • Kai HanYunhe Wang D. Tao
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2022
This paper reviews these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages, and takes a brief look at the self-attention mechanism in computer vision, as it is the base component in transformer.

Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging

A principled Degradation-Aware Unfolding Framework (DAUF) that estimates parameters from the compressed image and physical mask, and then uses these parameters to control each iteration, and customize a novel Half-Shuffle Transformer (HST) that simultaneously captures local contents and non-local dependencies.

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction

  • Yuanhao CaiJing Lin L. Gool
  • Environmental Science, Computer Science
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2022
This work proposes a novel Transformer-based method, Multi-stage Spectral-wise Transformer (MST++), for efficient spectral reconstruction that significantly outperforms other state-of-the-art methods.

Improving ProtoNet for Few-Shot Video Object Recognition: Winner of ORBIT Challenge 2022

This work re-factor and re-implement the official codebase to encourage modularity, compatibility and improved performance, and accelerates the data loading in both training and testing.

Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction

A novel Transformer-based method, coarse-to-fine sparse Transformer (CST), firstly embedding HSI sparsity into deep learning for HSI reconstruction and comprehensive experiments show that this CST significantly outperforms state-of-the-art methods while requiring cheaper computational costs.

Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective

This work introduces a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables, namely invariant variables, style confounders, and spurious features, and introduces a learning framework that treats each group separately.

Skeleton2Humanoid: Animating Simulated Characters for Physically-plausible Motion In-betweening

Experiments on the challenging LaFAN1 dataset show the proposed Skeleton2Humanoid system can outperform prior methods significantly in terms of both physical plausibility and accuracy.



Personalizing Human Video Pose Estimation

A personalized ConvNet pose estimator that automatically adapts itself to the uniqueness of a person's appearance to improve pose estimation in long videos and outperforms the state of the art (including top ConvNet methods) by a large margin on three standard benchmarks, as well as on a new challenging YouTube video dataset.

Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach

A weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure to regularize the 3D pose prediction, which is effective in the absence of ground truth depth labels.

End-to-End Trainable Multi-Instance Pose Estimation with Transformers

This model is the first end-to-end trainable multi-instance pose estimation method and it is hoped it will serve as a simple and promising alternative to other bottom-up and topdown approaches.

2D Human Pose Estimation: New Benchmark and State of the Art Analysis

A novel benchmark "MPII Human Pose" is introduced that makes a significant advance in terms of diversity and difficulty, a contribution that is required for future developments in human body models.

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset.

TFPose: Direct Human Pose Estimation with Transformers

A human pose estimation framework that solves the task in the regression-based fashion, and can inherently take advantages of the structured relationship between keypoints, bypassing the drawbacks of the heatmapbased pose estimation methods.

Towards Accurate Multi-person Pose Estimation in the Wild

This work proposes a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task by using a novel form of keypoint-based Non-Maximum-Suppression (NMS), instead of the cruder box-level NMS, and by introducing a novel aggregation procedure to obtain highly localized keypoint predictions.

3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training

In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce

Learning Feature Pyramids for Human Pose Estimation

This work designs a Pyramid Residual Module (PRMs) to enhance the invariance in scales of DCNNs and provides theoretic derivation to extend the current weight initialization scheme to multi-branch network structures.

Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation

This paper augments existing 2D datasets with high-quality 3D pose fits by augmenting them with Exemplar Fine-Tuning (EFT), and shows that EFT produces 3D annotations that result in better downstream performance and are qualitatively preferable in an extensive human-based assessment.