Corpus ID: 127986044

BERTScore: Evaluating Text Generation with BERT

@article{Zhang2020BERTScoreET,
  title={BERTScore: Evaluating Text Generation with BERT},
  author={Tianyi Zhang and V. Kishore and Felix Wu and Kilian Q. Weinberger and Yoav Artzi},
  journal={ArXiv},
  year={2020},
  volume={abs/1904.09675}
}
We propose BERTScore, an automatic evaluation metric for text generation. [...] Key Method However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore correlates better with human judgments and provides stronger model selection performance than existing metrics. Finally, we use an adversarial paraphrase detection task to show that BERTScore is more robust to challenging examples when…Expand
370 Citations
ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT
  • 1
  • Highly Influenced
  • PDF
Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance
  • Highly Influenced
  • PDF
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
  • 84
  • PDF
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
  • 486
  • PDF
MEE : An Automatic Metric for Evaluation Using Embeddings for Machine Translation
  • 1
  • PDF
WEmbSim: A Simple yet Effective Metric for Image Captioning
  • PDF
KPQA: A Metric for Generative Question Answering Using Word Weights
  • Highly Influenced
  • PDF
Improving Neural Abstractive Summarization via Reinforcement Learning with BERTScore
  • PDF
NUBIA: NeUral Based Interchangeability Assessor for Text Generation
  • 5
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 101 REFERENCES
deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets
  • 126
  • PDF
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
  • 84
  • PDF
RUSE: Regressor Using Sentence Embeddings for Automatic Machine Translation Evaluation
  • 43
  • PDF
Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation
  • 21
  • PDF
Simple Applications of BERT for Ad Hoc Document Retrieval
  • 98
  • PDF
Learning to Evaluate Image Captioning
  • 48
  • Highly Influential
  • PDF
Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts
  • 45
  • PDF
Fine-tune BERT for Extractive Summarization
  • 126
  • PDF
SPICE: Semantic Propositional Image Caption Evaluation
  • 564
  • Highly Influential
  • PDF
Results of the WMT18 Metrics Shared Task: Both characters and embeddings achieve good performance
  • 47
  • PDF
...
1
2
3
4
5
...