Corpus ID: 10071285

SpamRank -- Fully Automatic Link Spam Detection

@inproceedings{Benczr2005SpamRankF,
  title={SpamRank -- Fully Automatic Link Spam Detection},
  author={Andr{\'a}s A. Bencz{\'u}r and K{\'a}roly Csalog{\'a}ny and Tam{\'a}s Sarl{\'o}s and M{\'a}t{\'e} Uher},
  booktitle={AIRWeb},
  year={2005}
}
Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We propose a novel method based on the concept of personalized PageRank that detects pages with an undeserved high PageRank value without the need of any kind of white or blacklists or other means of human intervention. We assume that spammed pages have a biased distribution of pages that contribute to the undeserved high PageRank value. We define SpamRank by penalizing pages… Expand
Link spam detection based on mass estimation
TLDR
The concept of spam mass, a measure of the impact of link spamming on a page's ranking, is introduced, and how to estimate spam mass and how the estimates can help identifying pages that benefit significantly from links spamming are discussed. Expand
PageRank optimization applied to spam detection
  • Olivier Fercoq
  • Computer Science, Mathematics
  • 2012 6th International Conference on Network Games, Control and Optimization (NetGCooP)
  • 2012
TLDR
A new link spam detection and PageRank demotion algorithm called MaxRank, which outperforms both TrustRank and AntiTrustRank for spam and nonspam page detection and shows that the bias vector of the associated ergodic control problem is a measure of the “spamicity” of each page, used to detect spam pages. Expand
Hybrid spamicity score approach to web spam detection
  • S. P. Algur, N. T. Pendari
  • Computer Science
  • International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)
  • 2012
TLDR
In the proposed system link and content spam techniques are used to determine the spamicity score of web page, a threshold is set by empirical analysis which classifies the web page into spam or non spam. Expand
MINING PAGE FARMS AND ITS APPLICATION IN LINK SPAM DETECTION
TLDR
This thesis proposes the concept of link spamicity based on page farms to evaluate the degree of a Web page being link spam and examines the effectiveness of the spamicity-based link spam detection methods using a newly available real data set of spam pages. Expand
Spam detection with a content-based random-walk algorithm
TLDR
This work introduces the novelty of taking into account the content of the web pages to characterize the web graph and to obtain an a-priori estimation of the spam likekihood of theweb pages. Expand
Using rank propagation and Probabilistic counting for Link-Based Spam Detection
TLDR
This paper proposes spam detection techniques that only consider the link structure of Web, regardless of page contents, and compute statistics of the links in the vicinity of every Web page applying rank propagation and probabilistic counting over the Web graph. Expand
Link spam target detection using page farms
TLDR
Novel and effective detection methods for link spam target pages using page farms that outperforms the state-of-the-art methods like SpamRank and SpamMass in both precision and recall are developed. Expand
A Spamicity Approach to Web Spam Detection
TLDR
This paper introduces the notion of spamicity to measure how likely a page is spam and proposes efficient online link spam and term spam detection methods using spamicity, a more flexible and user-controlling measure than the traditional supervised classification methods. Expand
Combining Textual Content and Hyperlinks in Web Spam Detection
TLDR
This work introduces the novelty of taking into account the content of the web pages to characterize the web graph and to obtain an a priori estimation of the spam likelihood of theweb pages. Expand
Detecting Link Hijacking by Web Spammers
TLDR
This paper proposes a link analysis technique for finding link hijacked sites using modified PageRank algorithms and performs experiments on the large scale Japanese Web archive to evaluate the accuracy of the method. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 44 REFERENCES
Combating Web Spam with TrustRank
TLDR
This paper proposes techniques to semi-automatically separate reputable, good pages from spam, and shows that they can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites. Expand
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
TLDR
This paper proposes that some spam web pages can be identified through statistical analysis, and examines a variety of properties, including linkage structure, page content, and page evolution, and finds that outliers in the statistical distribution of these properties are highly likely to be caused by web spam. Expand
Towards Scaling Fully Personalized PageRank
TLDR
This paper achieves full personalization by a novel algorithm that computes a compact database of simulated random walks; this database can serve arbitrary personal choices of small subsets of web pages. Expand
Inside PageRank
TLDR
A circuit analysis is introduced that allows to understand the distribution of the page score, the way different Web communities interact each other, the role of dangling pages (pages with no outlinks), and the secrets for promotion of Web pages. Expand
Web Spam Taxonomy
TLDR
This paper presents a comprehensive taxonomy of current spamming techniques, which it is believed can help in developing appropriate countermeasures. Expand
Ranking the web frontier
TLDR
This paper analyzes features of the rapidly growing "frontier" of the web, namely the part of theweb that crawlers are unable to cover for one reason or another, and suggests ways to improve the quality of ranking by modeling the growing presence of "link rot" on the web as more sites and pages fall out of maintenance. Expand
The PageRank Citation Ranking : Bringing Order to the Web
TLDR
This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages. Expand
PageRank, HITS and a unified framework for link analysis
TLDR
This work proves that rankings produced by PageRank and HITS are both highly correlated with the ranking by in-degree and out-degree. Expand
Where to Start Browsing the Web?
TLDR
This work shows how to assist in qualifying pages as start nodes by link-based ranking algorithms by calculating the dominations and connectivity decay, and compares and analyze the proposed ranking algorithms without the need of human interaction solely from the structure of the Web. Expand
Downweighting tightly knit communities in world wide web ranking.
TLDR
Two new algorithms for using World WideWeb link structures to determine authority values of web pages from search queries are proposed, one based on Similarity Downweighting, and the other Sequential Clustering, which uses an empirical Bayes approach. Expand
...
1
2
3
4
5
...