• Corpus ID: 14954152

WEWRA : An algorithm for Wrapper Verification

@inproceedings{Tsourakakis2009WEWRAA,
  title={WEWRA : An algorithm for Wrapper Verification},
  author={Charalampos E. Tsourakakis and Georgios Paliouras},
  year={2009}
}
Web wrappers play an important role in extracting information from distributed web sources and subsequently in the integration of heterogeneous data. Changes in the layout of web sources typically break the wrapper, leading to erroneous extraction of infomation. Monitoring and repairing broken wrappers is an important hurdle for data integration, since it is an expensive and painful procedure. In this paper we present VEWRA, a new approach to wrapper verification, which improves the successful… 
MAVE: Multilevel wrApper Verification systEm
TLDR
MAVE is a novel multilevel wrapper verification system that is based on one-class classification techniques to overcome previous weaknesses and shows that the experimental results show that the proposal outperforms accuracy of current solutions.

References

SHOWING 1-10 OF 22 REFERENCES
Wrapper Maintenance: A Machine Learning Approach
TLDR
An efficient algorithm is presented that learns structural information about data from positive examples alone that can be used for two wrapper maintenance applications: wrapper verification and reinduction.
Regression testing for wrapper maintenance
TLDR
RAPTURE is a fully-implemented, domain-independenvt erification algorithm that uses well-motivated heuristics to compute the similarity between a wrapper's expected and observed output.
A hierarchical approach to wrapper induction
TLDR
This work introduces an inductive algorithm, STALKER, that generates high accuracy extraction rules based on user-labeled training examples that can handle information sources that could not be wrapped by existing techniques.
Hierarchical Wrapper Induction for Semistructured Information Sources
TLDR
This work introduces an inductive algorithm, STALKER, that generates high accuracy extraction rules based on user-labeled training examples that can wrap information sources that could not be wrapped by existing inductive techniques.
Wrapper verification
TLDR
RAPTURE is introduced, a fully-implemented, domain-independent wrapper verification algorithm that computes a probabilistic similarity measure between a wrapper's expected and observed output, where similarity is defined in terms of simple numeric features of the extracted strings.
Wrapper Induction for Information Extraction
TLDR
This work introduces wrapper induction, a method for automatically constructing wrappers, and identifies hlrt, a wrapper class that is e ciently learnable, yet expressive enough to handle 48% of a recently surveyed sample of Internet resources.
Boosted Wrapper Induction
TLDR
This work describes an algorithm that learns simple, low-coverage wrapper-like extraction patterns, which it then applies to conventional information extraction problems using boosting, resulting in BWI, a trainable information extraction system with a strong precision bias and F1 performance better than state-of-the-art techniques in many domains.
Modeling Web Sources for Information Integration
TLDR
This work has developed methods for mapping web sources into a simple, uniform representation that makes it efficient to integrate multiple sources and makes it easy to maintain these agents and incorporate new sources as they become available.
Mapping Maintenance for Data Integration Systems
TLDR
MAVERIC is described, an automatic solution to detecting broken mappings that combines a set of computationally inexpensive modules called sensors, which capture salient characteristics of data sources, and develops three novel improvements: perturbation, multi-source training, and filtering to reduce the number of false alarms.
Web Wrapper Validation
TLDR
An Adjacency-Weight method to be used in the web wrapper extraction process or in a wrapper self-maintenance mechanism to validate web wrappers is presented.
...
1
2
3
...