Wiki-Watchdog: Anomaly Detection in Wikipedia Through a Distributional Lens

@article{Arackaparambil2011WikiWatchdogAD,
  title={Wiki-Watchdog: Anomaly Detection in Wikipedia Through a Distributional Lens},
  author={Chrisil Arackaparambil and Guanhua Yan},
  journal={2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology},
  year={2011},
  volume={1},
  pages={257-264}
}
  • Chrisil Arackaparambil, Guanhua Yan
  • Published 22 August 2011
  • Computer Science
  • 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology
Wikipedia has become a standard source of reference online, and many people (some unknowingly) now trust this corpus of knowledge as an authority to fulfil their information requirements. In doing so they task the human contributors of Wikipedia with maintaining the accuracy of articles, a job that these contributors have been performing admirably. We study the problem of monitoring the Wikipedia corpus with the goal of \emph{automated, online} anomaly detection. We present Wiki-watchdog, an… 
Detecting Change in News Feeds Using a Context Based Graph
News feeds have been utilized as a resource for extracting media context, and in particular for the discovery of unusual information within common news articles. In this paper, we present our
Anomaly detection in network streams through a distributional lens
TLDR
This thesis provides a unified distribution-based methodology for online detection of anomalies in network traffic streams, which regards the traffic stream as a time series of distributions (histograms), and monitors metrics of distributions in the time series.
Scalable Algorithms for Mining Dynamic Graphs and Hypergraphs with Applications to Anomaly Detection.
TLDR
This dissertation develops anomaly detection algorithms for increasingly complex graph edge stream models and shows their effectiveness, both theoretically and empirically.
On Tuning the Knobs of Distribution-Based Methods for Detecting VoIP Covert Channels
TLDR
A probabilistic model is developed to explain the effects of the tuning of the knobs on the rate of false positives and false negatives in popular entropy-based anomaly detection in detecting covert channels in Voice over IP (VoIP) traffic.
Anomaly detection in dynamic networks: a survey
TLDR
This work focuses on anomaly detection in static graphs, which do not change and are capable of representing only a single snapshot of data, but as real-world networks are constantly changing, there has been a shift in focus to dynamic graphs,Which evolve over time.

References

SHOWING 1-10 OF 18 REFERENCES
Mining anomalies using traffic feature distributions
TLDR
It is argued that the distributions of packet features observed in flow traces reveals both the presence and the structure of a wide range of anomalies, and that using feature distributions, anomalies naturally fall into distinct and meaningful clusters that can be used to automatically classify anomalies and to uncover new anomaly types.
An empirical evaluation of entropy-based traffic anomaly detection
TLDR
This work considers two classes of distributions: flow-header features (IP addresses, ports, and flow-sizes), and behavioral features (degree distributions measuring the number of distinct destination/source IPs that each host communicates with) and observes that the timeseries of entropy values of the address and port distributions are strongly correlated with each other and provide very similar anomaly detection capabilities.
The Evolution of Wikipedia
The evolution of online networks is a topic that has generated much interest in research. In the past, scientists have studied some dynamic properties of network evolution, choosing to focus on
Anomaly detection: A survey
TLDR
This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.
oddball: Spotting Anomalies in Weighted Graphs
TLDR
Several new rules in density, weights, ranks and eigenvalues that seem to govern the so-called “neighborhood sub-graphs” are discovered and shown how to use these rules for anomaly detection.
Distribution-based anomaly detection in 3G mobile networks: from theory to practice
TLDR
A statistical based change detection algorithm for identifying deviations in distribution time series and a novel methodology based on semisynthetic traces for tuning and performance assessment of the proposed AD algorithm are proposed.
GraphScope: parameter-free mining of large time-evolving graphs
TLDR
The efficiency and effectiveness of the GraphScope is demonstrated, which is designed to operate on large graphs, in a streaming fashion, on real datasets from several diverse domains, and produces meaningful time-evolving patterns that agree with human intuition.
A signal analysis of network traffic anomalies
TLDR
This paper reports results of signal analysis of four classes of network traffic anomalies: outages, flash crowds, attacks and measurement failures, and shows that wavelet filters are quite effective at exposing the details of both ambient and anomalous traffic.
Dynamics of large networks
TLDR
This thesis analyzes the world's largest social and communication network of Microsoft Instant Messenger with 240 million people and 255 billion conversations and makes interesting and counterintuitive observations about network community structure that suggest that only small network clusters exist, and that they merge and vanish as they grow.
Information-theoretic measures for anomaly detection
  • Wenke Lee, Dong Xiang
  • Computer Science
    Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001
  • 2001
TLDR
This work proposes to use several information-theoretic measures, namely, entropy, conditional entropy, relative conditional entropy; information gain, information gain; and information cost for anomaly detection for protection mechanisms against novel attacks.
...
1
2
...