Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning

@article{Wu2020UnsupervisedRS,
  title={Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning},
  author={Hanlu Wu and Tengfei Ma and Lingfei Wu and Tariro Manyumwa and Shouling Ji},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.01781}
}
Evaluation of a document summarization system has been a critical factor to impact the success of the summarization task. Previous approaches, such as ROUGE, mainly consider the informativeness of the assessed summary and require human-generated references for each test summary. In this work, we propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning. Specifically, we design a new metric which covers both linguistic qualities and semantic… Expand

Figures and Tables from this paper

SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization
TLDR
Results of the proposed models have been deployed into EXPLAINABOARD (Liu et al., 2021a) platform, which allows researchers to understand the systems in a more fine-grained way. Expand
A Training-free and Reference-free Summarization Evaluation Metric via Centrality-weighted Relevance and Self-referenced Redundancy
TLDR
A training-free and reference-free summarization evaluation metric that consists of a centralityweighted relevance score and a self-referenced redundancy score that can significantly outperform existing methods on both multi-document and single-document summarizations evaluation. Expand
TransSum: Translating Aspect and Sentiment Embeddings for Self-Supervised Opinion Summarization
  • Ke Wang, Xiaojun Wan
  • Computer Science
  • FINDINGS
  • 2021
TLDR
Experimental results on three different domains show that TransSum outperforms several strong baselines in generating informative, relevant and low-redundant summaries, unveiling the effectiveness of the approach. Expand
Contrastive Aligned Joint Learning for Multilingual Summarization
  • Danqing Wang, Jiaze Chen, Hao Zhou, Xipeng Qiu, Lei Li
  • Computer Science
  • FINDINGS
  • 2021
TLDR
This paper develops a unified summarization model to understand the document and generate summaries in different languages, and uses the contrastive learning strategy to train the multilingual summarization system (CALMS), which achieves significant improvement over monolingual models in all languages. Expand
Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model
TLDR
It is found that using the negative samples generated by the unsupervised learning of a golden response to create a new negative response that is designed to be inappropriate within the context while maintaining high similarity with the original golden response can increase the model's correlation with human evaluations. Expand
FFCI: A Framework for Interpretable Automatic Evaluation of Summarization
TLDR
This study reveals three key findings: calculating BertSCORE between the summary and article sentences yields a higher correlation score than recently-proposed QA-based evaluation methods for faithfulness evaluation; GPT2Score has the best Pearson's correlation for focus and coverage; and a simple NSP model is effective at evaluating inter-sentential coherence. Expand
Self-supervised Document Clustering Based on BERT with Data Augment
TLDR
This paper proposes two learning method for document clustering, the one is a partial contrastive learning with unsupervised data augment, and the other is a self-supervised contrastiveLearning, which achieves state-of-the-art results in clustering accuracy when compared to recently proposed unsuper supervised clustering approaches. Expand

References

SHOWING 1-10 OF 30 REFERENCES
SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization
TLDR
This work proposes SUPERT, which rates the quality of a summary by measuring its semantic similarity with a pseudo reference summary, i.e. selected salient sentences from the source documents, using contextualized embeddings and soft token alignment techniques. Expand
The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization
TLDR
The experimental results show that the max value over each dimension of the summary ELMo word embeddings is a good representation that results in high correlation with human ratings, and averaging the cosine similarity of all encoders the authors tested yieldsHigh correlation with manual scores in reference-free setting. Expand
Learning to Score System Summaries for Better Content Selection Evaluation.
TLDR
This work proposes to learn an automatic scoring metric based on the human judgements available as part of classical summarization datasets like TAC-2008 and Tac-2009, and releases the trained metric as an open-source tool. Expand
An Entity-Driven Framework for Abstractive Summarization
TLDR
SENECA is introduced, a novel System for ENtity-drivEn Coherent Abstractive summarization framework that leverages entity information to generate informative and coherent abstracts and significantly outperforms previous state-of-the-art based on ROUGE and proposed coherence measures on New York Times and CNN/Daily Mail datasets. Expand
Objective Function Learning to Match Human Judgements for Optimization-Based Summarization
TLDR
This work learns a summary-level scoring function \theta including human judgments as supervision and automatically generated data as regularization, and extracts summaries with a genetic algorithm using \ theta as a fitness function. Expand
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
TLDR
This paper investigates strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality and validate the new metric, namely MoverScore, on a number of text generation tasks. Expand
Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward
TLDR
ASGARD is presented, a novel framework for Abstractive Summarization with Graph-Augmentation and semantic-driven RewarD, and proposes the use of dual encoders—a sequential document encoder and a graph-structured encoder—to maintain the global context and local characteristics of entities, complementing each other. Expand
Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies
TLDR
The NEWSROOM dataset is presented, a summarization dataset of 1.3 million articles and summaries written by authors and editors in newsrooms of 38 major news publications between 1998 and 2017, and the summaries combine abstractive and extractive strategies. Expand
SUM-QE: a BERT-based Summary Quality Estimation Model
TLDR
The model addresses linguistic quality aspects that are only indirectly captured by content-based approaches to summary evaluation, without involving comparison with human references, and achieves very high correlations with human ratings. Expand
Automatically Assessing Machine Summary Content Without a Gold Standard
TLDR
It is shown that quantifying the similarity between the source text and its summary with appropriately chosen measures produces summary scores which replicate human assessments accurately, and explores the feasibility of another measure—similarity between a system summary and the pool of all other system summaries for the same input. Expand
...
1
2
3
...