Corpus ID: 220525904

Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies

  title={Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies},
  author={Uthsav Chitra and Kimberly Ding and Benjamin J. Raphael},
Anomaly estimation, or the problem of finding a subset of a dataset that differs from the rest of the dataset, is a classic problem in machine learning and data mining. In both theoretical work and in applications, the anomaly is assumed to have a specific structure defined by membership in an $\textit{anomaly family}$. For example, in temporal data the anomaly family may be time intervals, while in network data the anomaly family may be connected subgraphs. The most prominent approach for… Expand

Figures from this paper


Near-Optimal and Practical Algorithms for Graph Scan Statistics with Connectivity Constraints
This work proposes a framework for designing algorithms for optimizing a large class of scan statistics for networks, subject to connectivity constraints, that run in time that scales linearly on the size of the graph and depends on a parameter the authors call the β€œeffective solution size,” while providing rigorous approximation guarantees. Expand
Optimal Sparse Segment Identification With Application in Copy Number Variation Analysis
An efficient likelihood ratio selection (LRS) procedure for identifying the segments is developed, and the asymptotic optimality of this method is presented in the sense that the LRS can separate the signal segments from the noise as long as the signals are in the identifiable regions. Expand
Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs
This paper presents Non-Parametric Heterogeneous Graph Scan (NPHGS), a new approach that considers the entire heterogeneous network for event detection and efficiently maximize a nonparametric scan statistic over connected subgraphs to identify the most anomalous network clusters. Expand
Spatial Scan Statistic
Flexibly Shaped Spatial Scan Statistic digunakan untuk memetakan kecamatan di Kota Surabaya yang terdeteksi sebagai wilayah kantong balita gizi buruk agar dapat diketahui ke camatan mana yang menjadi prioritas dalam penanganan kasus balita GWNBR. Expand
Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
This work develops from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise and provides a relaxation, called the Lovasz extended scan statistic (LESS), that uses submodularity to approximate the intractable generalized likelihood ratios. Expand
Power comparisons for disease clustering tests
A collection of 1,220,000 simulated benchmark data sets generated under 51 different cluster models and the null hypothesis are presented, to be used for power evaluations and to compare the power of the spatial scan statistic, the maximized excess events test and the nonparametric M statistic. Expand
Computing All Small Cuts in Undirected Networks
It is shown that all cuts of weights less than kΞ»(N) can be enumerated in O(mn3 + n2k+2) time without using the maximum flow algorithm. Expand
A Multiscale Scan Statistic for Adaptive Submatrix Localization
An optimization framework based on a multiscale scan statistic is established, and algorithms in order to approach the optimizer are developed, which show that the estimator has superior performance compared to other estimators which do not require prior submatrix knowledge, while being comparatively faster to compute. Expand
Graph Anomaly Detection Based on Steiner Connectivity and Density
This work provides a survey of the various formulations of anomaly detection in dynamic networks with a focus on β€œwindow-based” methods, and describes two classes of techniques: 1) generalizations of Steiner connectivity; and 2) dense subgraph mining. Expand
Graph Scan Statistics With Uncertainty
This paper develops the first systematic approach to incorporating uncertainty in scan statistics using two formulations, one based on the sample average approximation and the other using a max-min objective. Expand