• Corpus ID: 17526435

Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

@inproceedings{Chiticariu2013RuleBasedIE,
  title={Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!},
  author={Laura Chiticariu and Yunyao Li and Frederick Reiss},
  booktitle={EMNLP},
  year={2013}
}
The rise of “Big Data” analytics over unstructured text has led to renewed interest in information extraction (IE). We surveyed the landscape of IE technologies and identified a major disconnect between industry and academia: while rule-based IE dominates the commercial world, it is widely regarded as dead-end technology by the academia. We believe the disconnect stems from the way in which the two communities measure the benefits and costs of IE, as well as academia’s perception that rulebased… 

Figures and Tables from this paper

ONTOLOGY-BASED ENHANCEMENT OF RULE LEARNING FOR INFORMATION EXTRACTION
  • Computer Science
  • 2018
TLDR
A novel and generic approach to increase the performance of these extractors through enhancing the rule generalization through using a domain ontology designed to make the systems able to generate only the most likely useful rules.
Towards a Plug-and-Play B2B Marketing Tool Based on Time-Sensitive Information Extraction
TLDR
The experience with LARIAT is used as the basis for the design of a Solution-as-a-Service framework that will enable a richly extensible version of the capability, which could serve multiple B2B companies while affording economies of scale.
Exploratory relation extraction in large multilingual data
TLDR
This thesis presents a method that expands English-language Semantic Role Labeling (SRL) to other languages and uses it to generate multilingual SRL resources for seven distinct languages from different language groups in order to bootstrap semantic parsers for these languages.
UIMA Ruta: Rapid development of rule-based information extraction applications
TLDR
UIMA Ruta is compared to related rule-based systems especially concerning the compactness of the rule representation, the expressiveness, and the provided tooling support and the competitiveness of the runtime performance is shown.
Declarative Cleaning of Inconsistencies in Information Extraction
TLDR
The concept of prioritized repairs, which has been recently proposed as an extension of the traditional database repairs to incorporate priorities among conflicting facts, is adopted and it is shown that this framework captures the popular cleaning policies, as well as the POSIX semantics for extraction through regular expressions.
Why Big Data Industrial Systems Need Rules and What We Can Do About It
TLDR
Using rules (together with techniques such as learning and crowdsourcing) is fundamental to building semantics-intensive Big Data systems, and it is increasingly critical to address rule management, given the tens of thousands of rules industrial systems often manage today in an ad-hoc fashion.
Intelligent Document Processing - Methods and Tools in the real world
TLDR
This paper looks specifically at the difficult areas of classifying, extracting information and subsequent integration into business processes with respect to forms and invoices and asks the question whether the objectives and timing of the commercial world and the progress of Computer Science are fully aligned.
An analytical study of information extraction from unstructured and multidimensional big data
TLDR
This research work addresses the competency and limitations of the existing IE techniques related to data pre-processing, data extraction and transformation, and representations for huge volumes of multidimensional unstructured data and presents a systematic literature review of state-of-the-art techniques for a variety of big data.
Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design
This paper presents a case study of migrating a privacy-safe information extraction system in production for Gmail from a traditional rule-based architecture to a machine-learned Software 2.0
The BigGrams: the semi-supervised information extraction system from HTML: an improvement in the wrapper induction
  • M. Mironczuk
  • Computer Science
    Knowledge and Information Systems
  • 2017
TLDR
It is established that the proposed taxonomy of seeds and the HTML tags level analysis, with appropriate pre-processing, improve information extraction results and the boosting mode of this system works well when certain requirements are met.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 26 REFERENCES
Rule-Based Information Extraction for Structured Data Acquisition using TextMarker
TLDR
A semi-automatic approach for structured data acquisition using a rule-based information extraction system that includes the TEXTMARKER system for information extraction and data acquisition from textual documents is presented.
Open Language Learning for Information Extraction
Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary
SystemT: A Declarative Information Extraction System
TLDR
This paper presents SystemT, a declarative IE system that addresses the challenges of scalability and usability to Information Extraction systems, and facilitates the development of high quality complex annotators by providing a highly expressive language and an advanced development environment.
Probabilistic declarative information extraction
TLDR
This work implements a state-of-the-art statistical IE model - Conditional Random Fields (CRF) - in the setting of a Probabilistic Database that treats statistical models as first-class data objects and shows that the Viterbi algorithm for CRF inference can be specified declaratively in recursive SQL.
SystemT: An Algebraic Approach to Declarative Information Extraction
TLDR
A rule-based IE system whose basic design removes the expressivity and performance limitations of current systems based on cascading grammars, SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules.
Building query optimizers for information extraction: the SQoUT project
TLDR
This paper discusses the SQoUT1 project, which focuses on processing structured queries over relations extracted from text databases, and shows how, in the extraction-based scenario, query processing can be decomposed into a sequence of basic steps.
Declarative Information Extraction Using Datalog with Embedded Extraction Predicates
TLDR
This paper argues that developing information extraction programs using Datalog with embedded procedural extraction predicates is a good way to proceed, and shows how optimizing such programs raises challenges specific to text data that cannot be accommodated in the current relational optimization framework.
Boosting Unsupervised Relation Extraction by Using NER
TLDR
This paper shows how the introduction of a simple rule based NER can boost the performance of URES on a variety of relations, and compares its performance to the state-of-the-art KnowItAll system, and to theperformance of its pattern learning component.
Bootstrapped Named Entity Recognition for Product Attribute Extraction
TLDR
Focusing on listings from eBay's clothing and shoes categories, the bootstrapped NER system is able to identify new brands corresponding to spelling variants and typographical errors of the known brands, as well as identifying novel brands.
TextMarker : A Tool for Rule-Based Information Extraction
This paper presents TEXTMARKER– a powerful toolkit for rule-based information extraction. TEXTMARKER is based on UIMA and provides versatile information processing and advanced extraction techniques.
...
1
2
3
...