Wiki-Watchdog: Anomaly Detection in Wikipedia Through a Distributional Lens

  title={Wiki-Watchdog: Anomaly Detection in Wikipedia Through a Distributional Lens},
  author={Chrisil Arackaparambil and Guanhua Yan},
  journal={2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology},
  • Chrisil ArackaparambilGuanhua Yan
  • Published 22 August 2011
  • Computer Science
  • 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology
Wikipedia has become a standard source of reference online, and many people (some unknowingly) now trust this corpus of knowledge as an authority to fulfil their information requirements. In doing so they task the human contributors of Wikipedia with maintaining the accuracy of articles, a job that these contributors have been performing admirably. We study the problem of monitoring the Wikipedia corpus with the goal of \emph{automated, online} anomaly detection. We present Wiki-watchdog, an… 

Figures and Tables from this paper

Detecting Change in News Feeds Using a Context Based Graph

A graph-cut model that leverages context, content, and sentiment information is proposed, empirically evaluate the proposed method, and results that improve upon baseline methods in terms of precision, recall, F1, and accuracy are presented.

Anomaly detection in network streams through a distributional lens

This thesis provides a unified distribution-based methodology for online detection of anomalies in network traffic streams, which regards the traffic stream as a time series of distributions (histograms), and monitors metrics of distributions in the time series.

Scalable Algorithms for Mining Dynamic Graphs and Hypergraphs with Applications to Anomaly Detection.

This dissertation develops anomaly detection algorithms for increasingly complex graph edge stream models and shows their effectiveness, both theoretically and empirically.

On Tuning the Knobs of Distribution-Based Methods for Detecting VoIP Covert Channels

A probabilistic model is developed to explain the effects of the tuning of the knobs on the rate of false positives and false negatives in popular entropy-based anomaly detection in detecting covert channels in Voice over IP (VoIP) traffic.

Anomaly detection in dynamic networks: a survey

This work focuses on anomaly detection in static graphs, which do not change and are capable of representing only a single snapshot of data, but as real‐world networks are constantly changing, there has been a shift in focus to dynamic graphs,Which evolve over time.



Mining anomalies using traffic feature distributions

It is argued that the distributions of packet features observed in flow traces reveals both the presence and the structure of a wide range of anomalies, and that using feature distributions, anomalies naturally fall into distinct and meaningful clusters that can be used to automatically classify anomalies and to uncover new anomaly types.

An empirical evaluation of entropy-based traffic anomaly detection

This work considers two classes of distributions: flow-header features (IP addresses, ports, and flow-sizes), and behavioral features (degree distributions measuring the number of distinct destination/source IPs that each host communicates with) and observes that the timeseries of entropy values of the address and port distributions are strongly correlated with each other and provide very similar anomaly detection capabilities.

The Evolution of Wikipedia

It is proposed that not only the degree of the destination node, but also it’s PageRank score can be used to explain the preferential generative process of graph edges, and the effectiveness of PageRank as a predictor of edge destination is evaluated.

Anomaly detection: A survey

This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.

oddball: Spotting Anomalies in Weighted Graphs

Several new rules in density, weights, ranks and eigenvalues that seem to govern the so-called “neighborhood sub-graphs” are discovered and shown how to use these rules for anomaly detection.

GraphScope: parameter-free mining of large time-evolving graphs

The efficiency and effectiveness of the GraphScope is demonstrated, which is designed to operate on large graphs, in a streaming fashion, on real datasets from several diverse domains, and produces meaningful time-evolving patterns that agree with human intuition.

A signal analysis of network traffic anomalies

This paper reports results of signal analysis of four classes of network traffic anomalies: outages, flash crowds, attacks and measurement failures, and shows that wavelet filters are quite effective at exposing the details of both ambient and anomalous traffic.

Dynamics of large networks

This thesis analyzes the world's largest social and communication network of Microsoft Instant Messenger with 240 million people and 255 billion conversations and makes interesting and counterintuitive observations about network community structure that suggest that only small network clusters exist, and that they merge and vanish as they grow.

Distribution‐based anomaly detection in 3G mobile networks: from theory to practice

A statistical based change detection algorithm for identifying deviations in distribution time series and a novel methodology based on semi‐synthetic traces for tuning and performance assessment of the proposed AD algorithm are proposed.

Information-theoretic measures for anomaly detection

  • Wenke LeeDong Xiang
  • Computer Science
    Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001
  • 2001
This work proposes to use several information-theoretic measures, namely, entropy, conditional entropy, relative conditional entropy; information gain, information gain; and information cost for anomaly detection for protection mechanisms against novel attacks.