• Publications
  • Influence
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train. Expand
Bayesian Active Learning for Classification and Preference Learning
This work proposes an approach that expresses information gain in terms of predictive entropies, and applies this method to the Gaussian Process Classier (GPC), and makes minimal approximations to the full information theoretic objective. Expand
Parameter-Efficient Transfer Learning for NLP
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task. Expand
Big Transfer (BiT): General Visual Representation Learning
By combining a few carefully selected components, and transferring using a simple heuristic, Big Transfer achieves strong performance on over 20 datasets and performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. Expand
Self-Supervised GANs via Auxiliary Rotation Loss
This work allows the networks to collaborate on the task of representation learning, while being adversarial with respect to the classic GAN game, and takes a step towards bridging the gap between conditional and unconditional GANs. Expand
MLP-Mixer: An all-MLP Architecture for Vision
It is shown that while convolutions and attention are both sufficient for good performance, neither of them are necessary, and MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs), attains competitive scores on image classification benchmarks. Expand
Probabilistic Matrix Factorization with Non-random Missing Data
A probabilistic matrix factorization model for collaborative filtering that learns from data that is missing not at random (MNAR) to obtain improved performance over state-of-the-art methods when predicting the ratings and when modeling the data observation process. Expand
Collaborative Gaussian Processes for Preference Learning
A new model based on Gaussian processes for learning pair-wise preferences expressed by multiple users is presented which allows for supervised GP learning of user preferences with unsupervised dimensionality reduction for multi-user systems. Expand
Ask the Right Questions: Active Question Reformulation with Reinforcement Learning
This work proposes an agent that sits between the user and a black box QA system and learns to reformulate questions to elicit the best possible answers, and finds that successful question reformulations look quite different from natural language paraphrases. Expand
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visualExpand