• Publications
  • Influence
A Primer in BERTology: What We Know About How BERT Works
TLDR
This paper is the first survey of over 150 studies of the popular BERT model, reviewing the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression.
Revealing the Dark Secrets of BERT
TLDR
It is shown that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models, indicating the overall model overparametrization.
Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t.
TLDR
This study applies the widely used vector offset method to 4 types of linguistic relations: inflectional and derivational morphology, and lexicographic and encyclopedic semantics, and systematically examines how accuracy for different categories is affected by window size and dimensionality of the SVD-based word embeddings.
Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen
TLDR
It is shown that simple averaging over multiple word pairs improves over the state-of-the-art, and a further improvement in accuracy is achieved by combining cosine similarity with an estimation of the extent to which a candidate answer belongs to the correct word class.
When BERT Plays the Lottery, All Tickets Are Winning
TLDR
It is shown that the "bad" subnetworks can be fine-tuned separately to achieve only slightly worse performance than the "good" ones, indicating that most weights in the pre-trained BERT are potentially useful.
RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian
TLDR
RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages are presented.
Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks
TLDR
QuAIL is presented, the first RC dataset to combine text-based, world knowledge and unanswerable questions, and to provide question type annotation that would enable diagnostics of the reasoning strategies by a given QA system.
What’s in Your Embedding, And How It Predicts Task Performance
TLDR
This work presents a new approach based on scaled-up qualitative analysis of word vector neighborhoods that quantifies interpretable characteristics of a given model that enables multi-faceted evaluation, parameter search, and generally – a more principled, hypothesis-driven approach to development of distributional semantic representations.
Adversarial Decomposition of Text Representation
TLDR
The proposed method for adversarial decomposition of text representation uses adversarial-motivational training and includes a special motivational loss, which acts opposite to the discriminator and encourages a better decomposition.
Intrinsic Evaluations of Word Embeddings: What Can We Do Better?
TLDR
It is argued for a shift from abstract ratings of word embedding “quality” to exploration of their strengths and weaknesses to do justice to the strengths of distributional meaning representations.
...
1
2
3
...