Brian D. Davison

Learn More
With the increasing importance of search in guiding today's web traffic, more and more effort has been spent to create search engine spam. Since link analysis is one of the most important factors in current commercial search engines' ranking systems, new kinds of spam aiming at links have appeared. Building link farms is one technique that can deteriorate(More)
People display regularities in almost everything they do. This paper proposes characteristics of an idealized algorithm that, when applied to sequences of user actions, would allow a user interface to adapt over time to an individual’s pattern of use. We describe a simple predictive method with these characteristics and show its predictive accuracy on a(More)
<i>Most web pages are linked to others with related content</i>. This idea, combined with another that says that <i>text in, and possibly around, HTML anchors describe the pages to which they point</i>, is the foundation for a usable World-Wide Web. In this paper, we examine to what extent these ideas hold by empirically testing whether topical locality(More)
Traditional web link-based ranking schemes use a single score to measure a page's authority without concern of the community from which that authority is derived. As a result, a resource that is highly popular for one topic may dominate the results of another topic in which it is less authoritative. To address this problem, we suggest calculating a score(More)
Classification of Web page content is essential to many tasks in Web information retrieval such as maintaining Web directories and focused crawling. The uncontrolled nature of Web content presents additional challenges to Web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features(More)
Web spam is behavior that attempts to deceive search engine ranking algorithms. TrustRank is a recent algorithm that can combat web spam. However, TrustRank is vulnerable in the sense that the seed set used by TrustRank may not be sufficiently representative to cover well the different topics on the Web. Also, for a given seed set, TrustRank has a bias(More)
Cloaking and redirection are two possible search engine spamming techniques. In order to understand cloaking and redirection on the Web, we downloaded two sets of Web pages while mimicking a popular Web crawler and as a common Web browser. We estimate that 3% of the first data set and 9% of the second data set utilize cloaking of some kind. By checking(More)
The use of link analysis and page popularity in search engines has grown recently to improve query result rankings. Since the number of such links contributes to the value of the document in such calculations, we wish to recognize and eliminate nepotistic links — links between pages that are present for reasons other than merit. This paper explores some of(More)