Adversarial Web Search

@article{Castillo2010AdversarialWS,
  title={Adversarial Web Search},
  author={C. Castillo and Brian D. Davison},
  journal={Found. Trends Inf. Retr.},
  year={2010},
  volume={4},
  pages={377-486}
}
Web search engines have become indispensable tools for finding content. As the popularity of the Web has increased, the efforts to exploit the Web for commercial, social, or political advantage have grown, making it harder for search engines to discriminate between truthful signals of content quality and deceptive attempts to game search engines' rankings. This problem is further complicated by the open nature of the Web, which allows anyone to write and publish anything, and by the fact that… Expand
Network Manipulation ( with application to Political issues )
We live in an increasingly interconnected world, one in which a growing number of people turn to the web to make important medical, financial and political decisions [1]. As more people use the Web’sExpand
Fighting against web spam: a novel propagation method based on click-through data
TLDR
This work proposes a novel method that is based on click-through data analysis by propagating the spamicity score iteratively between queries and URLs from a few seed pages/sites, which is both efficient and effective in detecting Web spam. Expand
Survey on web spam detection: principles and algorithms
TLDR
This paper presents a systematic review of web spam detection techniques with the focus on algorithms and underlying principles, and categorizes all existing algorithms into three categories based on the type of information they use: content- based methods, link-based methods, and methods based on non-traditional data. Expand
Analysis of Web Spam for Non-English Content: Toward More Effective Language-Based Classifiers
TLDR
This study shows that unlike similar English pages, Google anti-spamming techniques are ineffective against a high proportion of Arabic spam pages, and develops a browser plug-in that utilizes a language-based web spam classifier to warn users about spam pages after clicking on a URL and by filtering out search engine results. Expand
Using neural network to combat with spam pages
TLDR
This paper proposes a semiautomatic method using a combinational ranking based on links between pages based on the rank of spam pages using a multilayered neural network trained by genetic algorithm. Expand
Detecting Promotion Campaigns in Query Auto Completion
TLDR
This work finds that various queries containing certain promotion intents are submitted multiple times to search engines to promote their rankings in QAC, and proposes an effective promotion query detection framework, extended to promotion target detection, to identify the consistent promotion target which is the inherent goal of the promotion campaign. Expand
Explicit web search result diversification
TLDR
It is argued that an ambiguous query should be seen as representing not one, but multiple information needs, and a novel probabilistic framework for search result diversification, xQuAD is proposed, which attains consistent and significant improvements in comparison to the most effective diversification approaches in the literature. Expand
Ranking Robustness Under Adversarial Document Manipulations
TLDR
This work formally shows that increased regularization of linear ranking functions increases ranking robustness, and conjecture that decreased variance of any ranking function results in increased robustness. Expand
Significant factors for detecting malicious redirections
TLDR
To design a more robust and reliable approach to spam detection, some new factors that facilitate redirection spam detection are presented, including the operational profile of each identified factor along with the criteria for its selection. Expand
Significant factors for Detecting Redirection Spam
Redirection spam refers to a technique where a genuine search user is befooled and made to pass through a chain of redirections and ultimately presented with a compromised web page that may be anExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 284 REFERENCES
Challenges in running a commercial search engine
TLDR
This talk will show that the world of algorithm and system design for commercial search engines can be described by two of Murphy's Laws: a) If anything can go wrong, it will, and b) if anything cannot go wrong , it will anyway. Expand
Finding and fighting search engine spam
Web surfers rely on search engines to find information from the web. Search engine spam is the attempt to deceive search engine ranking algorithms and is considered by experts from well-known searchExpand
Removing web spam links from search engine results
TLDR
A classification technique is developed that uses important features to successfully distinguish spam sites from legitimate entries and the threat posed by malicious web sites can be mitigated, reducing the risk for users to get infected by malicious code that spreads via drive-by attacks. Expand
Nullification test collections for web spam and SEO
TLDR
A need is identified for an adversarial IR collection which is not domain-restricted and which is supported by a set of appropriate query sets and (optimistically) user-behaviour data, and the term nullification is introduced. Expand
Adversarial information retrieval on the web (AIRWeb 2006)
The attraction of hundreds of millions of web searches per day provides significant incentive for many content providers to do whatever is necessary to rank highly in search engine results, whileExpand
Web Spam, Propaganda and Trust
TLDR
This paper analyzes the influence that web spam has on the evolution of the search engines and identifies the strong relationship of spamming methods to propagandistic techniques in society, which can lead to browser-level web spam filters that work in synergy with the powerful search engines to deliver personalized, trusted web results. Expand
Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam
TLDR
The problem of identifying link spam is formulated and a methodology for generating training data is discussed and experiments reveal the effectiveness of classes of intrinsic and relational attributes and shed light on the robustness of classifiers against obfuscation of attributes by an adversarial spammer. Expand
Cleaning search results using term distance features
TLDR
The method is able to detect many web pages generated by utilizing techniques such as dumping, weaving, or phrase stitching, which are spamming techniques designed to achieve high rankings while still exhibiting many of the individual word frequency (and even bi-gram) properties of natural human text. Expand
Applications of web link analysis
TLDR
TrustRank is presented, which combines input from human experts with link analysis to semi-automatically separate reputable, good pages from spam and the experimental results indicate that the proposed spam detection and web categorization techniques work well on actual web data. Expand
A large-scale study of automated web search traffic
TLDR
This paper investigates automated traffic in the query stream of a large search engine provider, and develops many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. Expand
...
1
2
3
4
5
...