• Corpus ID: 14954152

WEWRA : An algorithm for Wrapper Verification

  title={WEWRA : An algorithm for Wrapper Verification},
  author={Charalampos E. Tsourakakis and Georgios Paliouras},
Web wrappers play an important role in extracting information from distributed web sources and subsequently in the integration of heterogeneous data. Changes in the layout of web sources typically break the wrapper, leading to erroneous extraction of infomation. Monitoring and repairing broken wrappers is an important hurdle for data integration, since it is an expensive and painful procedure. In this paper we present VEWRA, a new approach to wrapper verification, which improves the successful… 
MAVE: Multilevel wrApper Verification systEm
MAVE is a novel multilevel wrapper verification system that is based on one-class classification techniques to overcome previous weaknesses and shows that the experimental results show that the proposal outperforms accuracy of current solutions.


Wrapper Maintenance: A Machine Learning Approach
An efficient algorithm is presented that learns structural information about data from positive examples alone that can be used for two wrapper maintenance applications: wrapper verification and reinduction.
Regression testing for wrapper maintenance
RAPTURE is a fully-implemented, domain-independenvt erification algorithm that uses well-motivated heuristics to compute the similarity between a wrapper's expected and observed output.
A hierarchical approach to wrapper induction
This work introduces an inductive algorithm, STALKER, that generates high accuracy extraction rules based on user-labeled training examples that can handle information sources that could not be wrapped by existing techniques.
Hierarchical Wrapper Induction for Semistructured Information Sources
This work introduces an inductive algorithm, STALKER, that generates high accuracy extraction rules based on user-labeled training examples that can wrap information sources that could not be wrapped by existing inductive techniques.
Wrapper verification
RAPTURE is introduced, a fully-implemented, domain-independent wrapper verification algorithm that computes a probabilistic similarity measure between a wrapper's expected and observed output, where similarity is defined in terms of simple numeric features of the extracted strings.
Wrapper Induction for Information Extraction
This work introduces wrapper induction, a method for automatically constructing wrappers, and identifies hlrt, a wrapper class that is e ciently learnable, yet expressive enough to handle 48% of a recently surveyed sample of Internet resources.
Boosted Wrapper Induction
This work describes an algorithm that learns simple, low-coverage wrapper-like extraction patterns, which it then applies to conventional information extraction problems using boosting, resulting in BWI, a trainable information extraction system with a strong precision bias and F1 performance better than state-of-the-art techniques in many domains.
Modeling Web Sources for Information Integration
This work has developed methods for mapping web sources into a simple, uniform representation that makes it efficient to integrate multiple sources and makes it easy to maintain these agents and incorporate new sources as they become available.
Mapping Maintenance for Data Integration Systems
MAVERIC is described, an automatic solution to detecting broken mappings that combines a set of computationally inexpensive modules called sensors, which capture salient characteristics of data sources, and develops three novel improvements: perturbation, multi-source training, and filtering to reduce the number of false alarms.
Web Wrapper Validation
An Adjacency-Weight method to be used in the web wrapper extraction process or in a wrapper self-maintenance mechanism to validate web wrappers is presented.