Learn More
Web documents that are either partially or completely duplicated in content are easily found on the Internet these days. Not only these documents create redundant information on the Web, which take longer to filter unique information and cause additional storage space, but also they degrade the efficiency of Web information retrieval. In this paper, we(More)
Similar Web pages are easily found on Internet. The redundancy of information severely slows down internet applications such as crawl module of search engine, and could lead to waste of storage in the indexing procedure. In this paper, we proposed a content-based approach for detecting webpage duplications. The algorithm contains three parts: i)(More)
GRADUATE COMMITTEE APPROVAL of a thesis submitted by Rajiv Yerra This thesis has been read by each member of the following graduate committee and by majority vote has been found to be satisfactory. As chair of the candidate's graduate committee, I have read the thesis of Rajiv Yerra in its final form and have found that (1) its format, citations and(More)
  • 1