POTATO: exPlainable infOrmation exTrAcTion framewOrk

@article{Kovacs2022POTATOEI,
  title={POTATO: exPlainable infOrmation exTrAcTion framewOrk},
  author={Adam Kovacs and Kinga G'emes and Eszter Ikl'odi and G{\'a}bor Recski},
  journal={Proceedings of the 31st ACM International Conference on Information \& Knowledge Management},
  year={2022}
}
  • Adam KovacsKinga G'emes Gábor Recski
  • Published 31 January 2022
  • Computer Science
  • Proceedings of the 31st ACM International Conference on Information & Knowledge Management
We present POTATO, a task- and language-independent framework for human-in-the-loop (HITL) learning of rule-based text classifiers using graph-based features. POTATO handles any type of directed graph and supports parsing text into Abstract Meaning Representations (AMR), Universal Dependencies (UD), and 4lang semantic graphs. A web-based user interface allows users to build rule systems from graph patterns, provides real-time evaluation based on ground truth data, and suggests rules by ranking… 

Figures from this paper

References

SHOWING 1-10 OF 55 REFERENCES

GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns

The library integrates a first public implementation of the existing GrASP algorithm, and allows users to extract patterns using a number of general-purpose built-in linguistic attributes, as envisaged for the original algorithm.

Odinson: A Fast Rule-based Information Extraction Framework

Odinson, a rule-based information extraction framework, which couples a simple yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time, to guarantee the rapid matching of patterns.

HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop

HEIDL is demonstrated, a prototype HITL-ML system that exposes the machine-learned model through high-level, explainable linguistic expressions formed of predicates representing semantic structure of text that result in improved productivity for text analytics model development process.

Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification

RuleNN is presented, a neural network architecture for learning transparent models for sentence classification in the form of rules expressed in first-order logic, a dialect with well-defined, human-understandable semantics that outperforms statistical relational learning and other neuro-symbolic methods.

Penman: An Open-Source Library and Tool for AMR Graphs

The open-source Python library Penman provides a robust parser, functions for graph inspection and manipulation, and functions for formatting graphs into PENMAN notation, thus extending its utility to non-Python setups.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

There is substantial room for improvement in NLI systems, and the HANS dataset can motivate and measure progress in this area, which contains many examples where the heuristics fail.

Explainable Rule Extraction via Semantic Graphs

We present an end-to-end system for extracting deontic logic formulae from legal text using a generic semantic parsing module and task-specific graph grammars, and for performing automated reasoning

Annotation Artifacts in Natural Language Inference Data

It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

This work introduces Stanza, an open-source Python natural language processing toolkit supporting 66 human languages that features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition.
...