Weisfeiler-Leman in the Bamboo: Novel AMR Graph Metrics and a Benchmark for AMR Graph Similarity

@article{Opitz2021WeisfeilerLemanIT,
  title={Weisfeiler-Leman in the Bamboo: Novel AMR Graph Metrics and a Benchmark for AMR Graph Similarity},
  author={Juri Opitz and Angel Daza and A. E. Frank},
  journal={Transactions of the Association for Computational Linguistics},
  year={2021},
  volume={9},
  pages={1425-1441}
}
  • J. OpitzAngel DazaA. Frank
  • Published 26 August 2021
  • Computer Science
  • Transactions of the Association for Computational Linguistics
Abstract Several metrics have been proposed for assessing the similarity of (abstract) meaning representations (AMRs), but little is known about how they relate to human similarity ratings. Moreover, the current metrics have complementary strengths and weaknesses: Some emphasize speed, while others make the alignment of graph structures explicit, at the price of a costly alignment step. In this work we propose new Weisfeiler-Leman AMR similarity metrics that unify the strengths of previous… 

A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation – through the Lens of Semantic Similarity Rating

The usefulness of CheckList is demonstrated by designing a new metric GraCo that computes lexical cohesion graphs over AMR concepts and suggests that meaning-oriented NLG metrics can profit from graph-based metric components using AMR.

SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable AMR Meaning Features

This work creates similarity metrics that are highly effective, while also providing an interpretable rationale for their rating, and employs these metrics to induce Semantically Structured Sentence BERT embeddings (S 3 BERT), which are composed of different meaning aspects captured in different sub-spaces.

SMARAGD: Synthesized sMatch for Accurate and Rapid AMR Graph Distance

This work shows the potential of neural networks to approximate the S MATCH scores and graph alignments and shows that the approximation error can be substantially reduced by applying data augmentation and AMR graph anonymization.

FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations

FactGraph is proposed, a method that decomposes the document and the summary into structured meaning representations (MR), which are more suitable for factuality evaluation and improves performance on identifying content verifiability errors and better captures subsentence-level factual inconsistencies.

Explainable Unsupervised Argument Similarity Rating with Abstract Meaning Representation and Conclusion Generation

It is shown that Abstract Meaning Representation (AMR) graphs can be useful for representing arguments, and that novel AMR graph metrics can offer explanations for argument similarity ratings and make argument similarity judgements more interpretable and may even support argument quality judgements.

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

With FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar dependency parse trees between a pair of documents based on tree kernels, it is shown that more syntactically similar arguments tend to be more persuasive, and that syntax provides a key indicator of writing style.

Better Smatch = Better Parser? AMR evaluation is not so simple anymore

An analysis of two popular and strong AMR parsers that reach quality levels on par with human IAA, and assess how human quality rat-ings relate to S MATCH and other AMR metrics.

SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features

This work shows how to learn a decomposition of the sentence embeddings into meaning features, through approximation of a suite of interpretable semantic AMR graph metrics, and preserves the overall power of the neural embedDings by controlling the decomposition learning process with a second objective that enforces consistency with the similarity ratings of an SBERT teacher model.

Structural Adapters in Pretrained Language Models for AMR-to-Text Generation

The benefits of explicitly encoding graph structure into PLMs using StructAdapt are empirically shown, outperforming the state of the art on two AMR-to-text datasets, training only 5.1% of the PLM parameters.

References

SHOWING 1-10 OF 51 REFERENCES

AMR Similarity Metrics from Principles

Criteria is established that enable researchers to perform a principled assessment of metrics comparing meaning representations like AMR and a novel metric S2match is proposed that is more benevolent to only very slight meaning deviations and targets the fulfilment of all established criteria.

Wasserstein Weisfeiler-Lehman Graph Kernels

A novel method that relies on the Wasserstein distance between the node feature vector distributions of two graphs, which allows to find subtler differences in data sets by considering graphs as high-dimensional objects, rather than simple means is proposed.

Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR

This work proposes \mathcal{M}_\beta, a decomposable metric that builds on two pillars that measures the linguistic quality of the generated text, and shows that fulfillment of both principles offers benefits for AMR-to-text evaluation, including explainability of scores.

Results of the WMT19 Metrics Shared Task: Segment-Level and Strong MT Systems Pose Big Challenges

The outputs of the translations systems competing in the WMT19 News Translation Task with automatic metrics were asked to score the outputs, and metrics were evaluated on the system level, how well a given metric correlates with the W MT19 official manual ranking, and segment level,How well the metrics correlates with human judgements of segment quality.

Weisfeiler-Lehman Graph Kernels

A family of efficient kernels for large graphs with discrete node labels based on the Weisfeiler-Lehman test of isomorphism on graphs that outperform state-of-the-art graph kernels on several graph classification benchmark data sets in terms of accuracy and runtime.

A SICK cure for the evaluation of compositional distributional semantic models

This work aims to help the research community working on compositional distributional semantic models (CDSMs) by providing SICK (Sentences Involving Compositional Knowldedge), a large size English benchmark tailored for them.

SemBleu: A Robust Metric for AMR Parsing Evaluation

SEMBLEU is a robust metric that extends BLEU to AMRs and punishes situations where a system’s output does not preserve most information from the input, and has slightly higher consistency with human judgments than SMATCH.

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

The STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017), providing insight into the limitations of existing models.

SemEval-2016 Task 8: Meaning Representation Parsing

The evaluation set was quite difficult to parse, particularly due to creative approaches to word representation in the web forum portion, and state-of-the-art baseline systems was a key factor in lowering the bar to entry.

Core Semantic First: A Top-down Approach for AMR Parsing

A novel scheme for parsing a piece of text into its Abstract Meaning Representation (AMR): Graph Spanning based Parsing (GSP), which achieves the state-of-the-art performance in the sense that no heuristic graph re-categorization is adopted.
...