• Publications
  • Influence
LexRank: Graph-based Lexical Centrality as Salience in Text Summarization
A new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences is considered and the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank.
TimeML: Robust Specification of Event and Temporal Expressions in Text
TimeML is described, a rich specification language for event and temporal expressions in natural language text, developed in the context of the AQUAINT program on Question Answering Systems, and demonstrated for a delayed (underspecified) interpretation of partially determined temporal expressions.
Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
This work defines a new complex and cross-domain semantic parsing and text-to-SQL task so that different complicated SQL queries and databases appear in train and test sets and experiments with various state-of-the-art models show that Spider presents a strong challenge for future research.
Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model
This work introduces Multi-News, the first large-scale MDS news dataset, and proposes an end-to-end model which incorporates a traditional extractive summarization model with a standard SDS model and achieves competitive results on MDS datasets.
How to Analyze Political Attention with Minimal Assumptions and Costs
Previous methods of analyzing the substance of political attention have had to make several restrictive assumptions or been prohibitively costly when applied to large-scale political texts. Here, we
The ACL anthology network corpus
We introduce the ACL Anthology Network (AAN), a comprehensive manually curated networked database of citations, collaborations, and summaries in the field of Computational Linguistics. We also
TypeSQL: Knowledge-Based Type-Aware Neural Text-to-SQL Generation
This paper presents a novel approach TypeSQL which formats the problem as a slot filling task in a more reasonable way and utilizes type information to better understand rare entities and numbers in the questions.
Rumor has it: Identifying Misinformation in Microblogs
This paper addresses the problem of rumor detection in microblogs and explores the effectiveness of 3 categories of features: content- based, network-based, and microblog-specific memes for correctly identifying rumors, and believes that its dataset is the first large-scale dataset on rumor detection.
SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task
Experimental results show that SyntaxSQLNet can handle a significantly greater number of complex SQL examples than prior work, outperforming the previous state-of-the-art model by 9.5% in exact matching accuracy.