• Corpus ID: 235731535

Test-Time Personalization with a Transformer for Human Pose Estimation

  title={Test-Time Personalization with a Transformer for Human Pose Estimation},
  author={Miao Hao and Yizhuo Li and Zonglin Di and Nitesh B. Gundavarapu and Xiaolong Wang},
  booktitle={Neural Information Processing Systems},
We propose to personalize a 2D human pose estimator given a set of test images of a person without using any manual annotations. While there is a significant advancement in human pose estimation, it is still very challenging for a model to generalize to different unknown environments and unseen persons. Instead of using a fixed model for every test case, we adapt our pose estimator during test time to exploit person-specific information. We first train our model on diverse data with both a… 

Figures and Tables from this paper

Deep Learning-Based Human Pose Estimation: A Survey

A comprehensive survey of deep learning based human pose estimation methods and analyzes the methodologies employed and summarizes and discusses recent works with a methodology-based taxonomy.

A new benchmark for group distribution shifts in hand grasp regression for object manipulation. Can meta-learning raise the bar?

A novel benchmark for object group distribution shifts in hand and object pose regression for object grasping is proposed and the hypothesis that meta-learning a baseline pose regression neural network can adapt to these shifts and generalize better to unknown objects is tested.

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

To design a framework that can take full advantage of multi-modality, where each modality provides regularized self-supervisory signals to other modalities, two complementary modules within and across the modalities are proposed.

Boost Test-Time Performance with Closed-Loop Inference

A general Closed-Loop Inference (CLI) method is proposed, which first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops and construct looped inference, so that the original erroneous predictions on these hard test samples can be corrected with little additional effort.

Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts

This work proposes a novel framework for unsupervised test-time adaptation, which is formulated as a knowledge distillation process to address domain shift and incorporates Mixture-of-Experts (MoE) as teachers, where each expert is separately trained on different source domains to maximize their speciality.

A Survey on Vision Transformer

  • Kai HanYunhe Wang D. Tao
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2022
This paper reviews these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages, and takes a brief look at the self-attention mechanism in computer vision, as it is the base component in transformer.

Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging

A principled Degradation-Aware Unfolding Framework (DAUF) that estimates parameters from the compressed image and physical mask, and then uses these parameters to control each iteration, and customize a novel Half-Shuffle Transformer (HST) that simultaneously captures local contents and non-local dependencies.

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction

  • Yuanhao CaiJing Lin L. Gool
  • Environmental Science, Computer Science
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2022
This work proposes a novel Transformer-based method, Multi-stage Spectral-wise Transformer (MST++), for efficient spectral reconstruction that significantly outperforms other state-of-the-art methods.

Improving ProtoNet for Few-Shot Video Object Recognition: Winner of ORBIT Challenge 2022

This work re-factor and re-implement the official codebase to encourage modularity, compatibility and improved performance, and accelerates the data loading in both training and testing.

Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction

A novel Transformer-based method, coarse-to-fine sparse Transformer (CST), firstly embedding HSI sparsity into deep learning for HSI reconstruction and comprehensive experiments show that this CST significantly outperforms state-of-the-art methods while requiring cheaper computational costs.



Personalizing Human Video Pose Estimation

A personalized ConvNet pose estimator that automatically adapts itself to the uniqueness of a person's appearance to improve pose estimation in long videos and outperforms the state of the art (including top ConvNet methods) by a large margin on three standard benchmarks, as well as on a new challenging YouTube video dataset.

Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach

A weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure to regularize the 3D pose prediction, which is effective in the absence of ground truth depth labels.

End-to-End Trainable Multi-Instance Pose Estimation with Transformers

This model is the first end-to-end trainable multi-instance pose estimation method and it is hoped it will serve as a simple and promising alternative to other bottom-up and topdown approaches.

2D Human Pose Estimation: New Benchmark and State of the Art Analysis

A novel benchmark "MPII Human Pose" is introduced that makes a significant advance in terms of diversity and difficulty, a contribution that is required for future developments in human body models.

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset.

TFPose: Direct Human Pose Estimation with Transformers

A human pose estimation framework that solves the task in the regression-based fashion, and can inherently take advantages of the structured relationship between keypoints, bypassing the drawbacks of the heatmapbased pose estimation methods.

Towards Accurate Multi-person Pose Estimation in the Wild

This work proposes a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task by using a novel form of keypoint-based Non-Maximum-Suppression (NMS), instead of the cruder box-level NMS, and by introducing a novel aggregation procedure to obtain highly localized keypoint predictions.

3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training

In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce

Learning Feature Pyramids for Human Pose Estimation

This work designs a Pyramid Residual Module (PRMs) to enhance the invariance in scales of DCNNs and provides theoretic derivation to extend the current weight initialization scheme to multi-branch network structures.

Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation

This paper augments existing 2D datasets with high-quality 3D pose fits by augmenting them with Exemplar Fine-Tuning (EFT), and shows that EFT produces 3D annotations that result in better downstream performance and are qualitatively preferable in an extensive human-based assessment.