Test-Time Training with Masked Autoencoders

  title={Test-Time Training with Masked Autoencoders},
  author={Yossi Gandelsman and Yu Sun and Xinlei Chen and Alexei A. Efros},
Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision. In this paper, we use masked autoencoders for this one-sample learning problem. Empirically, our simple method improves generalization on many visual benchmarks for distribution shifts. Theo-retically, we characterize this improvement in terms of the bias-variance trade-off. 

Figures and Tables from this paper

Test-Time Adaptation via Conjugate Pseudo-labels

This paper analyzes test-time adaptation through the lens of the training losses’s convex conjugate function, and shows that under natural conditions, this (unsupervised) conjugates can be viewed as a good local approximation to the original supervised loss and indeed, it recovers the “best” losses found by meta-learning.

ActMAD: Activation Matching to Align Distributions for Test-Time-Training

This work proposes to perform this adaptation via Activation Matching (ActMAD), which analyzes activations of the model and align activation statistics of the OOD test data to those of the training data, and model the distribution of each feature in multiple layers across the network.

Masked Unsupervised Self-training for Label-free Image Classification

This paper proposes Masked Unsupervised Self-Training (MUST), a new unsupervised adaptation method which leverages two different and complementary sources of training signals: pseudo-labels and raw images, which jointly optimizes three objectives to learn both class-level global feature and pixel-level local feature and enforces a regularization between the two.

EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization

This paper presents a simple yet effective approach that improves continual test-time adaptation (TTA) in a memory-efficient manner and outperforms other state-of-the-art methods on various benchmarks for image classification and semantic segmentation tasks.

MATE: Masked Autoencoders are Online 3D Test-Time Learners

The MATE is the first Test-Time-Training method designed for 3D data, which makes deep networks trained for point cloud classification robust to distribution shifts occurring in test data, and can effectively adapt given as few as 5% of tokens of each test sample, making it extremely lightweight.

Test-time adaptation with slot-centric models

Slot-TTA is proposed, a semi-supervised instance segmentation model equipped with a slot-centric image rendering component that is adapted per scene at test time through gradient descent on reconstruction or novel view synthesis objectives, and it is shown that test-time adaptation greatly improves segmentation in out-of-distribution scenes.

Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

Theoretically, it is proved that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers, and it is shown that in such settings, selectively fine- tuning a subset of layers matches or outperforms commonly used fine- Tuning approaches.

Video Test-Time Adaptation for Action Recognition

This work proposes an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step, consisting in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics.

Towards Understanding GD with Hard and Conjugate Pseudo-labels for Test-Time Adaptation

This work considers a setting that a model needs to adapt to a new domain under distribution shifts, given that only unlabeled test samples from the new domain are accessible at test time, and shows that for square loss, GD with conjugate labels converges to an $\epsilon$-optimal predictor under a Gaussian model for any arbitrarily small $\EPsilon$, while GD with hard pseudo-labels fails in this task.

Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features

A lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features with a small target dataset, resulting in better generalization due to a favorable bias-variance tradeoff.



Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

This work turns a single unlabeled test sample into a self-supervised learning problem, on which the model parameters are updated before making a prediction, which leads to improvements on diverse image classification benchmarks aimed at evaluating robustness to distribution shifts.

Tent: Fully Test-Time Adaptation by Entropy Minimization

Tent reduces generalization error for image classification on corrupted ImageNet and CIFAR-10/100 and reaches a new state-of-the-art error on ImageNet-C, and optimize the model for confidence as measured by the entropy of its predictions.

TTT++: When Does Self-Supervised Test-Time Training Fail or Thrive?

A test-time feature alignment strategy utilizing offline feature summarization and online moment matching, which regularizes adaptation without revisiting training data is introduced, which indicates that storing and exploiting extra information, in addition to model parameters, can be a promising direction towards robust test- time adaptation.

Extracting and composing robust features with denoising autoencoders

This work introduces and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.

RandAugment: Practical data augmentation with no separate search

RandAugment can be used uniformly across different tasks and datasets and works out of the box, matching or surpassing all previous learned augmentation approaches on CIFAR-10, CIFar-100, SVHN, and ImageNet.

Online domain adaptation of a pre-trained cascade of classifiers

This work presents an on-line approach for rapidly adapting a “black box” classifier to a new test data set without retraining the classifier or examining the original optimization criterion.

Co-Training for Domain Adaptation

An algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident, and is named CODA (Co-training for domain adaptation).

Learning Transferable Features with Deep Adaptation Networks

A new Deep Adaptation Network (DAN) architecture is proposed, which generalizes deep convolutional neural network to the domain adaptation scenario and can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding.

Improving the Robustness of Deep Neural Networks via Stability Training

This paper presents a general stability training method to stabilize deep networks against small input distortions that result from various types of common image processing, such as compression, rescaling, and cropping.

Self-Supervised Policy Adaptation during Deployment

This work explores the use of self-supervision to allow the policy to continue training after deployment without using any rewards, and improves generalization in 25 out of 30 environments across various tasks, and outperforms domain randomization on a majority of environments.