Notice of Violation of IEEE Publication PrinciplesStudy to eliminating noisy information in Web pages based on data mining

@article{Hu2010NoticeOV,
  title={Notice of Violation of IEEE Publication PrinciplesStudy to eliminating noisy information in Web pages based on data mining},
  author={Guohua Hu and Qingshan Zhao},
  journal={2010 Sixth International Conference on Natural Computation},
  year={2010},
  volume={2},
  pages={660-663}
}
In this paper, we propose a noise elimination technique based on the following observation: In a given Web site, noisy blocks usually share some common contents and presentation styles, while the main content blocks of the pages are often diverse in their actual contents and/or presentation styles. Based on this observation, we propose a tree structure, called Style Tree, to capture the common presentation styles and the actual contents of the pages in a given Web site. By sampling the pages of… CONTINUE READING

From This Paper

Topics from this paper.

Citations

Publications citing this paper.

References

Publications referenced by this paper.
Showing 1-7 of 7 references

Detection via Data Mining and its Applications,WWW 2002,2002

Z. Bar-Yossef, Rajagopalan, S.Template
2002
View 3 Excerpts
Highly Influenced

Mining for Web Intelligence,IEEE Computer,Nov.2002

J. Han, Chang, K.C.-C.Data
2002

Similar Papers

Loading similar papers…