POTATO: exPlainable infOrmation exTrAcTion framewOrk
@article{Kovacs2022POTATOEI, title={POTATO: exPlainable infOrmation exTrAcTion framewOrk}, author={Adam Kovacs and Kinga G'emes and Eszter Ikl'odi and G{\'a}bor Recski}, journal={Proceedings of the 31st ACM International Conference on Information \& Knowledge Management}, year={2022} }
We present POTATO, a task- and language-independent framework for human-in-the-loop (HITL) learning of rule-based text classifiers using graph-based features. POTATO handles any type of directed graph and supports parsing text into Abstract Meaning Representations (AMR), Universal Dependencies (UD), and 4lang semantic graphs. A web-based user interface allows users to build rule systems from graph patterns, provides real-time evaluation based on ground truth data, and suggests rules by ranking…
References
SHOWING 1-10 OF 55 REFERENCES
GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns
- Computer ScienceLREC
- 2022
The library integrates a first public implementation of the existing GrASP algorithm, and allows users to extract patterns using a number of general-purpose built-in linguistic attributes, as envisaged for the original algorithm.
Odinson: A Fast Rule-based Information Extraction Framework
- Computer ScienceLREC
- 2020
Odinson, a rule-based information extraction framework, which couples a simple yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time, to guarantee the rapid matching of patterns.
HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop
- Computer ScienceACL
- 2019
HEIDL is demonstrated, a prototype HITL-ML system that exposes the machine-learned model through high-level, explainable linguistic expressions formed of predicates representing semantic structure of text that result in improved productivity for text analytics model development process.
Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification
- Computer ScienceEMNLP
- 2020
RuleNN is presented, a neural network architecture for learning transparent models for sentence classification in the form of rules expressed in first-order logic, a dialect with well-defined, human-understandable semantics that outperforms statistical relational learning and other neuro-symbolic methods.
Penman: An Open-Source Library and Tool for AMR Graphs
- Computer ScienceACL
- 2020
The open-source Python library Penman provides a robust parser, functions for graph inspection and manipulation, and functions for formatting graphs into PENMAN notation, thus extending its utility to non-Python setups.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Computer ScienceJ. Mach. Learn. Res.
- 2020
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
- Computer ScienceACL
- 2019
There is substantial room for improvement in NLI systems, and the HANS dataset can motivate and measure progress in this area, which contains many examples where the heuristics fail.
Explainable Rule Extraction via Semantic Graphs
- Computer ScienceASAIL/LegalAIIA@ICAIL
- 2021
We present an end-to-end system for extracting deontic logic formulae from legal text using a generic semantic parsing module and task-specific graph grammars, and for performing automated reasoning…
Annotation Artifacts in Natural Language Inference Data
- Computer ScienceNAACL
- 2018
It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
- Computer ScienceACL
- 2020
This work introduces Stanza, an open-source Python natural language processing toolkit supporting 66 human languages that features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition.