• Publications
  • Influence
Extractive Summarization of Long Documents by Combining Global and Local Context
TLDR
A novel neural single-document extractive summarization model for long documents, incorporating both the global context of the whole document and the local context within the current topic, where it outperforms previous work, both extractive and abstractive models.
PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
TLDR
A pre-trained model for multi-document representation with focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data and outperforms current state-of-the-art models on most of these settings with large margins.
Systematically Exploring Redundancy Reduction in Summarizing Long Documents
TLDR
This work systematically explore and compare different ways to deal with redundancy when summarizing long documents, and proposes three additional methods balancing non-redundancy and importance in a general and flexible way.
Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help !
TLDR
This paper presents a novel parameter-lean self-attention mechanism using discourse priors that achieves competitive ROUGE-scores on the task of extractive summarization and significantly outperform the 8-head transformer model on sentence level when applying a more balanced hyper-parameter setting.
Predicting Discourse Trees from Transformer-based Neural Summarizers
TLDR
Experiments across models and datasets reveal that the summarizer learns both, dependency- and constituency-style discourse information, which is typically encoded in a single head, covering long- and short-distance discourse dependencies.
Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning
TLDR
Experiments show that the novel technique introduced can effectively demote the model’s learned lead bias and improve its generality on out-ofdistribution data, with little to no performance loss on in-dist distribution data.
KW-ATTN: Knowledge Infused Attention for Accurate and Interpretable Text Classification
TLDR
It is shown that KW-ATTN outperforms baseline models using only words as well as other approaches using concepts by classification accuracy, which indicates that high-level concepts help model prediction.
Implicit Semantic Response Alignment for Partial Domain Adaptation
Partial Domain Adaptation (PDA) addresses the unsupervised domain adaptation problem where the target label space is a subset of the source label space. Most state-of-art PDA methods tackle the
...
1
2
...