# Bump hunting in high-dimensional data

@article{Friedman1999BumpHI, title={Bump hunting in high-dimensional data}, author={Jerome H. Friedman and Nicholas I. Fisher}, journal={Statistics and Computing}, year={1999}, volume={9}, pages={123-143} }

Many data analytic questions can be formulated as (noisy) optimization problems. They explicitly or implicitly involve finding simultaneous combinations of values for a set of (“input”) variables that imply unusually large (or small) values of another designated (“output”) variable. Specifically, one seeks a set of subregions of the input variable space within which the value of the output variable is considerably larger (or smaller) than its average value over the entire input domain. In…

## 581 Citations

### Local Sparse Bump Hunting

- Computer ScienceJournal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
- 2010

This work introduces a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables, which outperforms a naive PRIM as well as competitive nonparametric supervised and unsupervised methods in the problem of class discovery.

### Mixtures of Rectangles: Interpretable Soft Clustering

- Computer ScienceICML
- 2001

This work explores a clustering technique that requires no user-supplied parameters except for the desired number of clusters, and demonstrates the usefulness of the method in subspace clustering for synthetic data, and in real-life datasets.

### Data Exploration by Representative Region Selection: Axioms and Convergence

- Computer Science, EconomicsMath. Oper. Res.
- 2021

A new type of unsupervised learning problem is presented in which a small set of representative regions are found that approximates a larger data set that does not rely on cluster structure of the data.

### Conditional Sparse Linear Regression

- Computer ScienceITCS
- 2017

This work considers the problem of jointly identifying a significant segment of a population in which there is a highly sparse linear regression fit, together with the coefficients for the linear fit, and gives algorithms for such problems under the sup norm.

### Real-valued All-Dimensions Search: Low-overhead Rapid Searching over Subsets of Attributes

- Computer ScienceUAI
- 2002

A new, efficient approach to searching the combinatorial space of contingency tables during the inner loop of a nonlinear statistical optimization, called RADSEARCH (Real-valued All-Dimensions-tree Search), which finds the global optimum.

### SuRF: Identification of Interesting Data Regions with Surrogate Models

- Computer Science2020 IEEE 36th International Conference on Data Engineering (ICDE)
- 2020

The proposed framework, coined SuRF (SUrrogate Region Finder), leverages historical region evaluations to train surrogate models that learn to approximate the distribution of the statistic of interest and makes use of evolutionary multi-modal optimization to effectively and efficiently identify regions of interest regardless of data size and dimensionality.

### Analysis of large-scale scalar data using hixels

- Computer Science2011 IEEE Symposium on Large Data Analysis and Visualization
- 2011

A new data representation for scalar data, called hixels, that stores a histogram of values for each sample point of a domain is introduced that proposes new feature detection algorithms using a combination of topological and statistical methods.

### Comparing Algorithms for Scenario Discovery

- Computer Science
- 2008

This study offers three measures of merit -coverage, density, and interpretability and uses them to evaluate the capabilities of PRIM, a bump-hunting algorithm, and CART, a classification algorithm and finds both algorithms can perform the required task, but often imperfectly.

### Subgroup discovery in data sets with multi-dimensional responses

- Computer ScienceIntell. Data Anal.
- 2011

This work has developed a technique that uses a combination of agglomerative clustering to find subgroup candidates in the space of output attributes, and predictive modeling to score and describe these candidates inThe input attribute space.

### Scenario Discovery via Rule Extraction

- Computer ScienceArXiv
- 2019

This work proposes a new procedure for scenario discovery - an intermediate statistical model which generalizes fast, and uses it to label (a lot of) data for PRIM, and shows that this method is much better than PRIM itself.

## References

SHOWING 1-10 OF 24 REFERENCES

### Model Search and Inference By Bootstrap "bumping

- Computer Science
- 1995

A bootstrap-based method for searching through a space of models that is well suited to complex, adaptively models and provides a convenient method fording better local minima, for resistant tting, and for optimization under constraints is proposed.

### Spline Models for Observational Data

- Mathematics
- 1990

Foreword 1. Background 2. More splines 3. Equivalence and perpendicularity, or, what's so special about splines? 4. Estimating the smoothing parameter 5. 'Confidence intervals' 6. Partial spline…

### Projection Pursuit Regression

- Mathematics
- 1981

Abstract A new method for nonparametric multiple regression is presented. The procedure models the regression surface as a sum of general smooth functions of linear combinations of the predictor…

### Approximation of Functions

- MathematicsNature
- 1965

Theory of Approximation of Functions of a Real VariableBy A. F. Timan. Translated by J. Berry. English translation edited and editorial preface by J. Cossar. (International Series of Monographs on…

### Classification and Regression Trees

- Computer Science
- 1983

This chapter discusses tree classification in the context of medicine, where right Sized Trees and Honest Estimates are considered and Bayes Rules and Partitions are used as guides to optimal pruning.

### Data mining and knowledge discovery: making sense out of data

- Computer Science
- 1996

Without a concerted effort to develop knowledge discovery techniques, organizations stand to forfeit much of the value from the data they currently collect and store.

### Cr-Pyrope Garnets in the Lithospheric Mantle. I. Compositional Systematics and Relations to Tectonic Setting

- Geology
- 1999

Chrome-pyrope garnet is a minor but widespread phase in ultramafic association with Mg. The position and slope of the lherzolite trend vary with temperature and tectonic setting, suggesting that the…

### Pattern Recognition and Neural Networks

- Computer Science
- 1995

Title Type pattern recognition with neural networks in c++ PDF pattern recognition and neural networks PDF neural networks for pattern recognition advanced texts in econometrics PDF neural networks…

### The Nature of Statistical Learning Theory

- Computer ScienceStatistics for Engineering and Information Science
- 2000

Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing…