Share This Author
A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications
The first public dataset of scientific peer reviews available for research purposes (PeerRead v1) is presented and it is shown that simple models can predict whether a paper is accepted with up to 21% error reduction compared to the majority baseline.
Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue
- Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul A. Crook, Y-Lan Boureau, J. Weston
- Computer ScienceEMNLP
- 9 September 2019
This work collects a goal-driven recommendation dialogue dataset (GoRecDial), which consists of 9,125 dialogue games and 81,260 conversation turns between pairs of human workers recommending movies to each other, and uses the dataset to develop an end-to-end dialogue system that can simultaneously converse and recommend.
INSPIRED: Toward Sociable Recommendation Dialog Systems
- Shirley Anugrah Hayati, Dongyeop Kang, Qingxiaoyang Zhu, Weiyan Shi, Zhou Yu
- Computer ScienceEMNLP
- 29 September 2020
This work designs an annotation scheme related to recommendation strategies based on social science theories and annotate these dialogs, and shows that sociable recommendation strategies, such as sharing personal opinions or communicating with encouragement, more frequently lead to successful recommendations.
AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples
This work proposes knowledge-guided adversarial example generators for incorporating large lexical resources in entailment models via only a handful of rule templates and proposes the first GAN-style approach for training it using a natural language example generator that iteratively adjusts to the discriminator’s weaknesses.
GenAug: Data Augmentation for Finetuning Text Generators
- Steven Y. Feng, Varun Gangal, Dongyeop Kang, T. Mitamura, E. Hovy
- Computer ScienceDEELIO
- 5 October 2020
This paper proposes and evaluates various augmentation methods, including some that incorporate external knowledge, for finetuning GPT-2 on a subset of Yelp Reviews, and examines the relationship between the amount of augmentation and the quality of the generated text.
Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization
While position exhibits substantial bias in news articles, this is not the case, for example, with academic papers and meeting minutes, and the empirical study shows that different types of summarization systems are composed of different degrees of the sub-aspects.
Detecting and Explaining Causes From Text For a Time Series Event
This work proposes a novel method based on the Granger causality of time series between features extracted from text such as N-grams, topics, sentiments, and their composition to detect causal features from text.
(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas
PASTEL, the parallel and annotated stylistic language dataset, that contains ~41K parallel sentences (8.3K parallel stories) annotated across different personas, is released and a simple supervised model with the authors' parallel text outperforms the unsupervised models using nonparallel text in style transfer.
Style is NOT a single variable: Case Studies for Cross-Stylistic Language Understanding
This paper provides the benchmark corpus (XSLUE) that combines existing datasets and collects a new one for sentence-level cross-style language understanding and evaluation and finds that combinations of some contradictive styles likely generate stylistically less appropriate text.
Self-Supervised Text Planning for Paragraph Completion Task
A self-supervised text planner SSPlanner is proposed that predicts what to say first (content prediction), then guides the pretrained language model (surface realization) using the predicted content, and finds that a combination of noun and verb types of keywords is the most effective for content selection.