# Bump hunting in high-dimensional data

Many data analytic questions can be formulated as (noisy) optimization problems. They explicitly or implicitly involve finding simultaneous combinations of values for a set of (“input”) variables that imply unusually large (or small) values of another designated (“output”) variable. Specifically, one seeks a set of subregions of the input variable space within which the value of the output variable is considerably larger (or smaller) than its average value over the entire input domain. In…

### Local Sparse Bump Hunting

- Computer ScienceJournal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
- 2010

This work introduces a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables, which outperforms a naive PRIM as well as competitive nonparametric supervised and unsupervised methods in the problem of class discovery.

### Mixtures of Rectangles: Interpretable Soft Clustering

- Computer ScienceICML
- 2001

This work explores a clustering technique that requires no user-supplied parameters except for the desired number of clusters, and demonstrates the usefulness of the method in subspace clustering for synthetic data, and in real-life datasets.

### Data Exploration by Representative Region Selection: Axioms and Convergence

- Computer Science, EconomicsMath. Oper. Res.
- 2021

A new type of unsupervised learning problem is presented in which a small set of representative regions are found that approximates a larger data set that does not rely on cluster structure of the data.

### Conditional Sparse Linear Regression

- Computer ScienceITCS
- 2017

This work considers the problem of jointly identifying a significant segment of a population in which there is a highly sparse linear regression fit, together with the coefficients for the linear fit, and gives algorithms for such problems under the sup norm.

### Real-valued All-Dimensions Search: Low-overhead Rapid Searching over Subsets of Attributes

- Computer ScienceUAI
- 2002

A new, efficient approach to searching the combinatorial space of contingency tables during the inner loop of a nonlinear statistical optimization, called RADSEARCH (Real-valued All-Dimensions-tree Search), which finds the global optimum.

### SuRF: Identification of Interesting Data Regions with Surrogate Models

- Computer Science2020 IEEE 36th International Conference on Data Engineering (ICDE)
- 2020

The proposed framework, coined SuRF (SUrrogate Region Finder), leverages historical region evaluations to train surrogate models that learn to approximate the distribution of the statistic of interest and makes use of evolutionary multi-modal optimization to effectively and efficiently identify regions of interest regardless of data size and dimensionality.

### Analysis of large-scale scalar data using hixels

- Computer Science2011 IEEE Symposium on Large Data Analysis and Visualization
- 2011

A new data representation for scalar data, called hixels, that stores a histogram of values for each sample point of a domain is introduced that proposes new feature detection algorithms using a combination of topological and statistical methods.

### Comparing Algorithms for Scenario Discovery

- Computer Science
- 2008

This study offers three measures of merit -coverage, density, and interpretability and uses them to evaluate the capabilities of PRIM, a bump-hunting algorithm, and CART, a classification algorithm and finds both algorithms can perform the required task, but often imperfectly.

### Subgroup discovery in data sets with multi-dimensional responses

- Computer ScienceIntell. Data Anal.
- 2011

This work has developed a technique that uses a combination of agglomerative clustering to find subgroup candidates in the space of output attributes, and predictive modeling to score and describe these candidates inThe input attribute space.

### Scenario Discovery via Rule Extraction

- Computer ScienceArXiv
- 2019

This work proposes a new procedure for scenario discovery - an intermediate statistical model which generalizes fast, and uses it to label (a lot of) data for PRIM, and shows that this method is much better than PRIM itself.

SHOWING 1-10 OF 24 REFERENCES

