Pretrained Language Models for Sequential Sentence Classification

@inproceedings{Cohan2019PretrainedLM,
  title={Pretrained Language Models for Sequential Sentence Classification},
  author={Arman Cohan and Iz Beltagy and Daniel King and Bhavana Dalvi and Daniel S. Weld},
  booktitle={EMNLP},
  year={2019}
}
As a step toward better document-level understanding, we explore classification of a sequence of sentences into their corresponding categories, a task that requires understanding sentences in context of the document. Recent successful models for this task have used hierarchical models to contextualize sentence representations, and Conditional Random Fields (CRFs) to incorporate dependencies between subsequent labels. In this work, we show that pretrained language models, BERT (Devlin et al… Expand

Figures and Tables from this paper

Sequential Span Classification with Neural Semi-Markov CRFs for Biomedical Abstracts
TLDR
This work proposes sequential span classification that assigns a rhetorical label, not to a single sentence but to a span that consists of continuous sentences, and introduces Neural Semi-Markov Conditional Random Fields to assign the labels to such spans by considering all possible spans of various lengths. Expand
UPSTAGE: Unsupervised Context Augmentation for Utterance Classification in Patient-Provider Communication
TLDR
Upstage uses transformer models with pretrained language models and joint sentence representation to solve the task of classifying health topics in patient-provider conversations, and leverages unlabeled corpora for pretraining and data augmentation to provide additional context, which leads to improved classification performance. Expand
Sequential Sentence Classification in Research Papers using Cross-Domain Multi-Task Learning
TLDR
It is demonstrated that models, which are trained on datasets from different scientific domains, benefit from one another when using the proposed multi-task learning architecture, and the approach outperforms the state of the art on three benchmark datasets. Expand
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks
TLDR
It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Expand
Enhancing Automated Essay Scoring Performance via Fine-tuning Pre-trained Language Models with Combination of Regression and Ranking
TLDR
A new way to fine-tune pre-trained language models with multiple losses of the same task is found to improve AES’s performance, and the model outperforms not only state-of-the-art neural models near 3 percent but also the latest statistic model. Expand
Enhancing Automated Essay Scoring Performance via Cohesion Measurement and Combination of Regression and Ranking
TLDR
A new way to fine-tune pre-trained language models with multiple losses of the same task is found to improve AES’s performance, and the model outperforms not only state-of-the-art neural models near 3 percent but also the latest statistic model. Expand
On Generating Extended Summaries of Long Documents
TLDR
This paper exploits hierarchical structure of the documents and incorporates it into an extractive summarization model through a multi-task learning approach and shows that the multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences across diverse sections. Expand
Improving Document-Level Sentiment Classification Using Importance of Sentences
TLDR
A document-level sentence classification model based on deep neural networks, in which the importance degrees of sentences in documents are automatically determined through gate mechanisms, which outperformed previous state-of-the-art models that do not consider importance differences of sentence in a document. Expand
Self-Attention Guided Copy Mechanism for Abstractive Summarization
TLDR
A Transformer-based model is proposed to enhance the copy mechanism by identifying the importance of each source word based on the degree centrality with a directed graph built by the self-attention layer in the Transformer. Expand
Unsupervised Extractive Summarization by Human Memory Simulation
TLDR
This paper introduces a wide range of heuristics that leverage cognitive representations of content units and how these are retained or forgotten in human memory and finds that properties of these representations of human memory can be exploited to capture relevance ofcontent units in scientific articles. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 27 REFERENCES
Language Model Pre-training for Hierarchical Document Representations
TLDR
This work proposes algorithms for pre-training hierarchical document representations from unlabeled data which include fixed-length sentence/paragraph representations which integrate contextual information from the entire documents. Expand
Deep Contextualized Word Representations
TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals. Expand
SciBERT: A Pretrained Language Model for Scientific Text
TLDR
SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT. Expand
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. Expand
Unified Language Model Pre-training for Natural Language Understanding and Generation
TLDR
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Expand
Neural Summarization by Extracting Sentences and Words
TLDR
This work develops a general framework for single-document summarization composed of a hierarchical document encoder and an attention-based extractor that allows for different classes of summarization models which can extract sentences or words. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. Expand
Universal Language Model Fine-tuning for Text Classification
TLDR
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model. Expand
A Supervised Approach to Extractive Summarisation of Scientific Papers
TLDR
This paper introduces a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and develops models on the dataset making use of both neural sentence encoding and traditionally used summarisation features. Expand
...
1
2
3
...