• Publications
  • Influence
A Primer in BERTology: What We Know About How BERT Works
TLDR
This paper is the first survey of over 150 studies of the popular BERT model, reviewing the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression.
Revealing the Dark Secrets of BERT
TLDR
It is shown that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models, indicating the overall model overparametrization.
Evaluating temporal relations in clinical text: 2012 i2b2 Challenge
TLDR
A corpus of discharge summaries annotated with temporal information was provided to be used for the development and evaluation of temporal reasoning systems, and the best systems overwhelmingly adopted a rule based approach for value normalization.
Unfolding physiological state: mortality modelling in intensive care units
TLDR
This work examined the use of latent variable models to decompose free-text hospital notes into meaningful features, and found that latent topic-derived features were effective in determining patient mortality under three timelines: in-hospital, 30 day post- Discharge, and 1 year post-discharge mortality.
Automating Temporal Annotation with TARSQI
We present an overview of TARSQI, a modular system for automatic temporal annotation that adds time expressions, events and temporal relations to news texts.
When BERT Plays the Lottery, All Tickets Are Winning
TLDR
It is shown that the "bad" subnetworks can be fine-tuned separately to achieve only slightly worse performance than the "good" ones, indicating that most weights in the pre-trained BERT are potentially useful.
RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian
TLDR
RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages are presented.
GhostWriter: Using an LSTM for Automatic Rap Lyric Generation
TLDR
This paper demonstrates the effectiveness of a Long Short-Term Memory language model in the initial efforts to generate unconstrained rap lyrics, which produces better “ghostwritten” lyrics than a baseline model.
Here’s My Point: Joint Pointer Architecture for Argument Mining
TLDR
This work presents the first neural network-based approach to link extraction in argument mining, and proposes a novel architecture that applies Pointer Network sequence-to-sequence attention modeling to structural prediction in discourse parsing tasks and develops a joint model that achieves state-of-the-art results.
...
...