Data Record Extraction using Tag Tree Comparison

@inproceedings{Ansari2015DataRE,
  title={Data Record Extraction using Tag Tree Comparison},
  author={A. Ansari},
  year={2015}
}
This paper presents a robust unsupervised approach for extraction of data records from dynamic web pages using tag tree comparison. Extracting data records from the web pages involves following sequences. We first download the related web pages of interest on our system. Next we construct DOM trees for those pages using a parser. We then compare two or more web pages to eliminate the noisy unwanted data such as header, menu bar, navigation bar, advertisements, etc and find the region of… CONTINUE READING

Citations

Publications citing this paper.

References

Publications referenced by this paper.
Showing 1-10 of 14 references

Hidden Web Data Extraction Using Dynamic Rule Generation

  • A. K. Sharma
  • International Journal on Computer Science…
  • 2011
1 Excerpt

A Robust Approach of Automatic Web Data Record Extraction

  • Y. Dong, Q. Li
  • Journal of Computer Information Systems,
  • 2009
1 Excerpt

Information extraction from HTML documents by structural matching

  • Breuel, Thomas
  • U.S. Patent Application
  • 2003
1 Excerpt

Similar Papers

Loading similar papers…