• Corpus ID: 226306666

Augmenting BERT Carefully with Underrepresented Linguistic Features

@article{Balagopalan2020AugmentingBC,
  title={Augmenting BERT Carefully with Underrepresented Linguistic Features},
  author={Aparna Balagopalan and Jekaterina Novikova},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.06153}
}
Fine-tuned Bidirectional Encoder Representations from Transformers (BERT)-based sequence classification models have proven to be effective for detecting Alzheimer's Disease (AD) from transcripts of human speech. However, previous research shows it is possible to improve BERT's performance on various tasks by augmenting the model with additional information. In this work, we use probing tasks as introspection techniques to identify linguistic information not well-represented in various layers of… 

Tables from this paper

Comparing Acoustic-based Approaches for Alzheimer's Disease Detection
TLDR
This paper studies the performance and generalizability of three approaches for AD detection from speech on the recent ADReSSo challenge dataset, and finds that while feature-based approaches have a higher precision, classification approaches relying on the combination of embeddings and features prove to have aHigher, and more balanced performance across multiple metrics of performance.

References

SHOWING 1-10 OF 17 REFERENCES
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
What Does BERT Learn about the Structure of Language?
TLDR
This work provides novel support for the possibility that BERT networks capture structural information about language by performing a series of experiments to unpack the elements of English language structure learned by BERT.
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
TLDR
This work proposes a framework that facilitates better understanding of the encoded representations of sentence vectors and demonstrates the potential contribution of the approach by analyzing different sentence representation mechanisms.
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
TLDR
10 probing tasks designed to capture simple linguistic features of sentences are introduced and used to study embeddings generated by three different encoders trained in eight distinct ways, uncovering intriguing properties of bothencoders and training methods.
Dynamic Meta-Embeddings for Improved Sentence Representations
While one of the first steps in many NLP systems is selecting what pre-trained word embeddings to use, we argue that such a step is better left for neural networks to figure out by themselves. To
The Effect of Heterogeneous Data for Alzheimer's Disease Detection from Speech
TLDR
Visualization of decision boundaries reveals that models trained on a combination of structured picture descriptions and unstructured conversational speech have the least out-of-task error and show the most potential to generalize to multiple tasks.
The INESC-ID Multi-Modal System for the ADReSS 2020 Challenge
TLDR
A multi-modal approach for the automatic detection of Alzheimer's disease proposed in the context of the INESC-ID Human Language Technology Laboratory participation in the ADReSS 2020 challenge has shown the importance of linguistic features in the classification of dementia, which outperforms the acoustic ones in terms of accuracy.
Detecting cognitive impairments by agreeing on interpretations of linguistic features
TLDR
This paper proposes Consensus Networks (CNs), a framework to classify after reaching agreements between modalities, which significantly outperform traditional classifiers, which are used by the state-of-the-art papers.
Information-Theoretic Probing for Linguistic Structure
TLDR
An information-theoretic operationalization of probing as estimating mutual information that contradicts received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation.
Detecting Linguistic Characteristics of Alzheimer’s Dementia by Interpreting Neural Models
TLDR
This work uses NLP techniques to classify and analyze the linguistic characteristics ofAD patients using the DementiaBank dataset, and shows that first derivative saliency can not only rediscover previous language patterns of AD patients, but also shed light on the limitations of neural models.
...
...