Share This Author
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
Object-Centric Learning with Slot Attention
An architectural component that interfaces with perceptual representations such as the output of a convolutional neural network and produces a set of task-dependent abstract representations which are exchangeable and can bind to any object in the input by specializing through a competitive procedure over multiple rounds of attention is presented.
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition
- G. Tsatsaronis, Georgios Balikas, G. Paliouras
- Computer Science, MedicineBMC Bioinformatics
- 30 April 2015
Overall, BioASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.
Axial Attention in Multidimensional Transformers
Axial Transformers is proposed, a self-attention-based autoregressive model for images and other data organized as high dimensional tensors that maintains both full expressiveness over joint distributions over data and ease of implementation with standard deep learning frameworks, while requiring reasonable memory and computation.
Making Neural QA as Simple as Possible but not Simpler
This work proposes a simple heuristic that guides the development of neural baseline systems for the extractive QA task and finds that there are two ingredients necessary for building a high-performing neural QA system: the awareness of question words while processing the context and a composition function that goes beyond simple bag-of-words modeling, such as recurrent neural networks.
Scaling Autoregressive Video Models
It is shown that conceptually simple autoregressive video generation models based on a three-dimensional self-attention mechanism achieve competitive results across multiple metrics on popular benchmark datasets, for which they produce continuations of high fidelity and realism.
FastQA: A Simple and Efficient Neural Architecture for Question Answering
This work proposes a simple heuristic that guided the development of FastQA, an efficient endto-end neural model for question answering that is very competitive with existing models, and demonstrates, that an extended version (FastQAExt) achieves state-of-the-art results on recent benchmark datasets, outperforming most existing models.
Neural Domain Adaptation for Biomedical Question Answering
This work adapts a neural QA system trained on a large open-domain dataset to a biomedical dataset by employing various transfer learning techniques and achieves state-of-the-art results on factoid questions and competitive results on list questions.
The Colorization Transformer is presented, a novel approach for diverse high fidelity image colorization based on self-attention that outperforms the previous state-of-the-art on colorising ImageNet based on FID results and based on a human evaluation in a Mechanical Turk test.
Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities
A novel approach to joint word sense disambiguation (WSD) and entity linking (EL) that combines a set of complementary objectives in an extensible multi-objective formalism is presented.