Automatic information extraction from the Web


The Web is a valuable repository of information. However, its size and its lack of structure difficult the search and extraction of knowledge. In this paper, we propose an automatic and autonomous methodology to retrieve and represent information from the Web in a standard way for a desired domain. It is based on the intensive use of a publicly available search engine and the analysis of a large quantity of web resources for detecting the information relevance. Polysemy detection and aliases/synonyms discovering of the evaluated domain are also considered. Results can be very useful for easing the access of the user to the Web resources or allowing computer processing of the data.

8 Figures and Tables

Cite this paper

@inproceedings{Snchez2004AutomaticIE, title={Automatic information extraction from the Web}, author={David S{\'a}nchez and Antonio Moreno}, year={2004} }