Pretrained Language Models for Sequential Sentence Classification

@article{Cohan2019PretrainedLM,
  title={Pretrained Language Models for Sequential Sentence Classification},
  author={Arman Cohan and Iz Beltagy and Daniel King and Bhavana Dalvi and Daniel S. Weld},
  journal={ArXiv},
  year={2019},
  volume={abs/1909.04054}
}
As a step toward better document-level understanding, we explore classification of a sequence of sentences into their corresponding categories, a task that requires understanding sentences in context of the document. Recent successful models for this task have used hierarchical models to contextualize sentence representations, and Conditional Random Fields (CRFs) to incorporate dependencies between subsequent labels. In this work, we show that pretrained language models, BERT (Devlin et al… 

Figures and Tables from this paper

Sequential Span Classification with Neural Semi-Markov CRFs for Biomedical Abstracts

TLDR
This work proposes sequential span classification that assigns a rhetorical label, not to a single sentence but to a span that consists of continuous sentences, and introduces Neural Semi-Markov Conditional Random Fields to assign the labels to such spans by considering all possible spans of various lengths.

UPSTAGE: Unsupervised Context Augmentation for Utterance Classification in Patient-Provider Communication

TLDR
Upstage uses transformer models with pretrained language models and joint sentence representation to solve the task of classifying health topics in patient-provider conversations, and leverages unlabeled corpora for pretraining and data augmentation to provide additional context, which leads to improved classification performance.

Sequential Sentence Classification in Research Papers using Cross-Domain Multi-Task Learning

TLDR
It is demonstrated that models, which are trained on datasets from different scientific domains, benefit from one another when using the proposed multi-task learning architecture, and the approach outperforms the state of the art on three benchmark datasets.

Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

TLDR
It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable.

Enhancing Automated Essay Scoring Performance via Fine-tuning Pre-trained Language Models with Combination of Regression and Ranking

TLDR
A new way to fine-tune pre-trained language models with multiple losses of the same task is found to improve AES’s performance, and the model outperforms not only state-of-the-art neural models near 3 percent but also the latest statistic model.

Enhancing Automated Essay Scoring Performance via Cohesion Measurement and Combination of Regression and Ranking

TLDR
A new way to fine-tune pre-trained language models with multiple losses of the same task is found to improve AES’s performance, and the model outperforms not only state-of-the-art neural models near 3 percent but also the latest statistic model.

An Empirical Study on Explainable Prediction of Text Complexity: Preliminaries for Text Simplification

TLDR
It is shown that the general problem of text simplification can be formally decomposed into a compact pipeline of tasks to ensure the transparency and explanability of the process.

On Generating Extended Summaries of Long Documents

TLDR
This paper exploits hierarchical structure of the documents and incorporates it into an extractive summarization model through a multi-task learning approach and shows that the multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences across diverse sections.

Improving Document-Level Sentiment Classification Using Importance of Sentences

TLDR
A document-level sentence classification model based on deep neural networks, in which the importance degrees of sentences in documents are automatically determined through gate mechanisms, which outperformed previous state-of-the-art models that do not consider importance differences of sentence in a document.

Self-Attention Guided Copy Mechanism for Abstractive Summarization

TLDR
A Transformer-based model is proposed to enhance the copy mechanism by identifying the importance of each source word based on the degree centrality with a directed graph built by the self-attention layer in the Transformer.
...

References

SHOWING 1-10 OF 27 REFERENCES

Language Model Pre-training for Hierarchical Document Representations

TLDR
This work proposes algorithms for pre-training hierarchical document representations from unlabeled data which include fixed-length sentence/paragraph representations which integrate contextual information from the entire documents.

Deep Contextualized Word Representations

TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

SciBERT: A Pretrained Language Model for Scientific Text

TLDR
SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.

Improving Language Understanding by Generative Pre-Training

TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

Unified Language Model Pre-training for Natural Language Understanding and Generation

TLDR
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.

Neural Summarization by Extracting Sentences and Words

TLDR
This work develops a general framework for single-document summarization composed of a hierarchical document encoder and an attention-based extractor that allows for different classes of summarization models which can extract sentences or words.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Sequence to Sequence Learning with Neural Networks

TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Universal Language Model Fine-tuning for Text Classification

TLDR
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.

A Supervised Approach to Extractive Summarisation of Scientific Papers

TLDR
This paper introduces a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and develops models on the dataset making use of both neural sentence encoding and traditionally used summarisation features.