• Publications
  • Influence
Linformer: Self-Attention with Linear Complexity
TLDR
This paper demonstrates that the self-attention mechanism of the Transformer can be approximated by a low-rank matrix, and proposes a new self-Attention mechanism, which reduces the overall self-ATTention complexity from $O(n^2)$ to $O (n)$ in both time and space. Expand
CLEAR: Contrastive Learning for Sentence Representation
TLDR
This paper proposes Contrastive LEArning for sentence Representation (CLEAR), which employs multiple sentence-level augmentation strategies in order to learn a noise-invariant sentence representation and investigates the key reasons that make contrastive learning effective through numerous experiments. Expand
Entailment as Few-Shot Learner
TLDR
The key idea of this approach is to reformulate potential NLP task into an entailment one, and then fine-tune the model with as little as 8 examples, which improves the various existing SOTA few-shot learning methods by 12%, and yields competitive few- shot performance with 500 times larger models, such as GPT-3. Expand
Blockwise Self-Attention for Long Document Understanding
TLDR
This model extends BERT by introducing sparse block structures into the attention matrix to reduce both memory consumption and training/inference time, which also enables attention heads to capture either short- or long-range contextual information. Expand
Language Models as Fact Checkers?
TLDR
This paper uses implicit knowledge from language models to create an effective end-to-end fact checker using a solely a language model, without any external knowledge or explicit retrieval components, and shows that this method is viable and has much room for exploration. Expand
To Pretrain or Not to Pretrain: Examining the Benefits of Pretrainng on Resource Rich Tasks
TLDR
It is shown that as the number of training examples grow into the millions, the accuracy gap between finetuning BERT-based model and training vanilla LSTM from scratch narrows to within 1%. Expand
On Unifying Misinformation Detection
TLDR
UnifiedM2 is introduced, a general-purpose misinformation model that jointly models multiple domains of misinformation with a single, unified setup and its learned representation is helpful for few-shot learning of unseen misinformation tasks/datasets and the model’s generalizability to unseen events. Expand
Studying Strategically: Learning to Mask for Closed-book QA
TLDR
This paper aims to learn the optimal masking strategy for the intermediate pretraining stage, and first train the masking policy to extract spans that are likely to be tested, using supervision from the downstream task itself, then deploy the learned policy during intermediate pre-training. Expand
Luna: Linear Unified Nested Attention
TLDR
Luna is proposed, a linear unified nested attention mechanism that approximates softmax attention with two nested linear attention functions, yielding only linear time and space complexity. Expand
Revisiting Knowledge Base Embedding as Tensor Decomposition
TLDR
This work theoretically analyzes the neural embedding framework and subsequently connects it with tensor based embedding, and presents a tensor decomposition based framework KBTD to directly approximate the derived closed form tensor. Expand
...
1
2
...