# Test-Time Training with Masked Autoencoders

@article{Gandelsman2022TestTimeTW,
author={Yossi Gandelsman and Yu Sun and Xinlei Chen and Alexei A. Efros},
journal={ArXiv},
year={2022},
volume={abs/2209.07522}
}
• Published 15 September 2022
• Computer Science
• ArXiv
Test-time training adapts to a new test distribution on the ﬂy by optimizing a model for each test input using self-supervision. In this paper, we use masked autoencoders for this one-sample learning problem. Empirically, our simple method improves generalization on many visual benchmarks for distribution shifts. Theo-retically, we characterize this improvement in terms of the bias-variance trade-off.
15 Citations

## Figures and Tables from this paper

• Computer Science
ArXiv
• 2022
This paper analyzes test-time adaptation through the lens of the training losses’s convex conjugate function, and shows that under natural conditions, this (unsupervised) conjugates can be viewed as a good local approximation to the original supervised loss and indeed, it recovers the “best” losses found by meta-learning.
• Computer Science
ArXiv
• 2022
This work proposes to perform this adaptation via Activation Matching (ActMAD), which analyzes activations of the model and align activation statistics of the OOD test data to those of the training data, and model the distribution of each feature in multiple layers across the network.
• Computer Science
• 2022
This paper proposes Masked Unsupervised Self-Training (MUST), a new unsupervised adaptation method which leverages two different and complementary sources of training signals: pseudo-labels and raw images, which jointly optimizes three objectives to learn both class-level global feature and pixel-level local feature and enforces a regularization between the two.
• Computer Science
ArXiv
• 2023
This paper presents a simple yet effective approach that improves continual test-time adaptation (TTA) in a memory-efficient manner and outperforms other state-of-the-art methods on various benchmarks for image classification and semantic segmentation tasks.
• Computer Science
ArXiv
• 2022
The MATE is the first Test-Time-Training method designed for 3D data, which makes deep networks trained for point cloud classification robust to distribution shifts occurring in test data, and can effectively adapt given as few as 5% of tokens of each test sample, making it extremely lightweight.
• Computer Science
• 2022
Slot-TTA is proposed, a semi-supervised instance segmentation model equipped with a slot-centric image rendering component that is adapted per scene at test time through gradient descent on reconstruction or novel view synthesis objectives, and it is shown that test-time adaptation greatly improves segmentation in out-of-distribution scenes.
• Computer Science
ArXiv
• 2022
Theoretically, it is proved that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers, and it is shown that in such settings, selectively fine- tuning a subset of layers matches or outperforms commonly used fine- Tuning approaches.
• Computer Science
ArXiv
• 2022
This work proposes an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step, consisting in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics.
• Computer Science
ArXiv
• 2022
This work considers a setting that a model needs to adapt to a new domain under distribution shifts, given that only unlabeled test samples from the new domain are accessible at test time, and shows that for square loss, GD with conjugate labels converges to an $\epsilon$-optimal predictor under a Gaussian model for any arbitrarily small $\EPsilon$, while GD with hard pseudo-labels fails in this task.
• Computer Science
ArXiv
• 2023
A lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features with a small target dataset, resulting in better generalization due to a favorable bias-variance tradeoff.

## References

SHOWING 1-10 OF 49 REFERENCES

• Computer Science
ICML
• 2020
This work turns a single unlabeled test sample into a self-supervised learning problem, on which the model parameters are updated before making a prediction, which leads to improvements on diverse image classification benchmarks aimed at evaluating robustness to distribution shifts.
• Computer Science
ICLR
• 2021
Tent reduces generalization error for image classification on corrupted ImageNet and CIFAR-10/100 and reaches a new state-of-the-art error on ImageNet-C, and optimize the model for confidence as measured by the entropy of its predictions.
• Computer Science
NeurIPS
• 2021
A test-time feature alignment strategy utilizing offline feature summarization and online moment matching, which regularizes adaptation without revisiting training data is introduced, which indicates that storing and exploiting extra information, in addition to model parameters, can be a promising direction towards robust test- time adaptation.
• Computer Science
ICML '08
• 2008
This work introduces and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.
• Computer Science
ArXiv
• 2019
RandAugment can be used uniformly across different tasks and datasets and works out of the box, matching or surpassing all previous learned augmentation approaches on CIFAR-10, CIFar-100, SVHN, and ImageNet.
• Computer Science
CVPR 2011
• 2011
This work presents an on-line approach for rapidly adapting a “black box” classifier to a new test data set without retraining the classifier or examining the original optimization criterion.
• Computer Science
NIPS
• 2011
An algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident, and is named CODA (Co-training for domain adaptation).
• Computer Science
ICML
• 2015
A new Deep Adaptation Network (DAN) architecture is proposed, which generalizes deep convolutional neural network to the domain adaptation scenario and can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding.
• Computer Science
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
• 2016
This paper presents a general stability training method to stabilize deep networks against small input distortions that result from various types of common image processing, such as compression, rescaling, and cropping.
• Computer Science
ICLR
• 2021
This work explores the use of self-supervision to allow the policy to continue training after deployment without using any rewards, and improves generalization in 25 out of 30 environments across various tasks, and outperforms domain randomization on a majority of environments.