The Automatic Extraction of Web Information Based on Regular Expression

  title={The Automatic Extraction of Web Information Based on Regular Expression},
  author={J. Li and Guangyu Jiang and Aijun Xu and Yunzhen Wang},
  journal={J. Softw.},
  • J. Li, Guangyu Jiang, +1 author Yunzhen Wang
  • Published 2017
  • Computer Science
  • J. Softw.
  • Based on search engine , this paper built a Web information retrieval matching and structure extraction model. And realized the algorithm of locating and automatically extracting multi-web Baidu news information. Getting the standard mathematical expression of URLs by analyzing the search results URLs and analyzing the DOM tree structure of web pages, this article designed the key tags regular expression. Finally, the method of multi-page location retrieval and structured extraction based on… CONTINUE READING

    Figures, Tables, and Topics from this paper.


    Publications referenced by this paper.
    Regular expression and its applications to web information extraction
    • 9
    Web Information Extraction Research Based on Page Classification
    • 1
    Design and Realization of Template-Based Web Crawler
    • 2
    Study on the Web Information Extraction Technology Based on the Ontolgy and DOM Tree
    • 2
    Keyword Search on XML Data: A Survey
    • 7
    Research on Critical Technologies of Semantic Retrieval Based on Rule Reasoning
    • 2
    Domain-oriented structured analysis of Web texts
    • 1
    Distributed Search Engine System Productivity Modeling and Evaluation
    • 5
    Web Information Extraction
    • 6