Transformer Encoder for Social Science

  title={Transformer Encoder for Social Science},
  author={Haosen Ge and In Young Park and Xuancheng Qian and Grace Zeng},
High-quality text data has become an important data source for social scientists. We have witnessed the success of pretrained deep neural network models, such as BERT and RoBERTa, in recent social science research. In this paper, we propose a compact pretrained deep neural network, Transformer Encoder for Social Science (TESS) , explicitly designed to tackle text processing tasks in social science research. Using two validation tests, we demonstrate that TESS outperforms BERT and RoBERTa by 16… 

Figures and Tables from this paper



Introduction to Neural Transfer Learning with Transformers for Social Science Text Analysis

This paper explains how Transformer-based models for transfer learning work, why they might be advantageous, and what their limitations are, and demonstrates the benefits these models can bring to textbased social science research.

SciBERT: A Pretrained Language Model for Scientific Text

SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Using Word Order in Political Text Classification with Long Short-term Memory Models

This work investigates the conditions under which long short-term memory models are useful for political science text classification tasks with applications to Chinese social media posts as well as US newspaper articles and provides guidance for the use of LSTM models.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

BillSum: A Corpus for Automatic Summarization of US Legislation

BillSum is introduced, the first dataset for summarization of US Congressional and California state bills, and it is demonstrated that models built on Congressional bills can be used to summarize California billa, thus, showing that methods developed on this dataset can transfer to states without human-written summaries.

Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable.

Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

The INDIAPOLICEEVENTS corpus—all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002 is introduced and trained annotators read and label every document for mentions of police activity events, allowing for unbiased recall evaluations.