Bump hunting in high-dimensional data
@article{Friedman1999BumpHI, title={Bump hunting in high-dimensional data}, author={Jerome H. Friedman and Nicholas I. Fisher}, journal={Statistics and Computing}, year={1999}, volume={9}, pages={123-143} }
Many data analytic questions can be formulated as (noisy) optimization problems. They explicitly or implicitly involve finding simultaneous combinations of values for a set of (“input”) variables that imply unusually large (or small) values of another designated (“output”) variable. Specifically, one seeks a set of subregions of the input variable space within which the value of the output variable is considerably larger (or smaller) than its average value over the entire input domain. In…
574 Citations
Local Sparse Bump Hunting
- Computer ScienceJournal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
- 2010
This work introduces a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables, which outperforms a naive PRIM as well as competitive nonparametric supervised and unsupervised methods in the problem of class discovery.
Mixtures of Rectangles: Interpretable Soft Clustering
- Computer ScienceICML
- 2001
This work explores a clustering technique that requires no user-supplied parameters except for the desired number of clusters, and demonstrates the usefulness of the method in subspace clustering for synthetic data, and in real-life datasets.
Data Exploration by Representative Region Selection: Axioms and Convergence
- Computer Science, EconomicsMath. Oper. Res.
- 2021
A new type of unsupervised learning problem is presented in which a small set of representative regions are found that approximates a larger data set that does not rely on cluster structure of the data.
Conditional Sparse Linear Regression
- Computer ScienceITCS
- 2017
This work considers the problem of jointly identifying a significant segment of a population in which there is a highly sparse linear regression fit, together with the coefficients for the linear fit, and gives algorithms for such problems under the sup norm.
SuRF: Identification of Interesting Data Regions with Surrogate Models
- Computer Science2020 IEEE 36th International Conference on Data Engineering (ICDE)
- 2020
The proposed framework, coined SuRF (SUrrogate Region Finder), leverages historical region evaluations to train surrogate models that learn to approximate the distribution of the statistic of interest and makes use of evolutionary multi-modal optimization to effectively and efficiently identify regions of interest regardless of data size and dimensionality.
Comparing Algorithms for Scenario Discovery
- Computer Science
- 2008
This study offers three measures of merit -coverage, density, and interpretability and uses them to evaluate the capabilities of PRIM, a bump-hunting algorithm, and CART, a classification algorithm and finds both algorithms can perform the required task, but often imperfectly.
Subgroup discovery in data sets with multi-dimensional responses
- Computer ScienceIntell. Data Anal.
- 2011
This work has developed a technique that uses a combination of agglomerative clustering to find subgroup candidates in the space of output attributes, and predictive modeling to score and describe these candidates inThe input attribute space.
Scenario Discovery via Rule Extraction
- Computer ScienceArXiv
- 2019
This work proposes a new procedure for scenario discovery - an intermediate statistical model which generalizes fast, and uses it to label (a lot of) data for PRIM, and shows that this method is much better than PRIM itself.
Bridging kriging believer and expected improvement using bump hunting for expensive black-box optimization
- Computer ScienceGECCO Companion
- 2021
An algorithm incorporating the strengths of the two infill methods is proposed, able to achieve a competitive performance across a range of problems with diverse characteristics; making it a strong candidate for solving black-box CEOPs.
References
SHOWING 1-10 OF 27 REFERENCES
Machine learning
- Computer ScienceCSUR
- 1996
Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Spline Models for Observational Data
- Mathematics
- 1990
Foreword 1. Background 2. More splines 3. Equivalence and perpendicularity, or, what's so special about splines? 4. Estimating the smoothing parameter 5. 'Confidence intervals' 6. Partial spline…
Projection Pursuit Regression
- Mathematics
- 1981
Abstract A new method for nonparametric multiple regression is presented. The procedure models the regression surface as a sum of general smooth functions of linear combinations of the predictor…
Approximation of Functions
- MathematicsNature
- 1965
Theory of Approximation of Functions of a Real VariableBy A. F. Timan. Translated by J. Berry. English translation edited and editorial preface by J. Cossar. (International Series of Monographs on…
Classification and Regression Trees
- Computer Science
- 1983
This chapter discusses tree classification in the context of medicine, where right Sized Trees and Honest Estimates are considered and Bayes Rules and Partitions are used as guides to optimal pruning.
Data mining and knowledge discovery: making sense out of data
- Computer Science
- 1996
Without a concerted effort to develop knowledge discovery techniques, organizations stand to forfeit much of the value from the data they currently collect and store.
Cr-Pyrope Garnets in the Lithospheric Mantle. I. Compositional Systematics and Relations to Tectonic Setting
- Geology
- 1999
Chrome-pyrope garnet is a minor but widespread phase in ultramafic association with Mg. The position and slope of the lherzolite trend vary with temperature and tectonic setting, suggesting that the…
Pattern Recognition and Neural Networks
- Computer Science
- 1995
Title Type pattern recognition with neural networks in c++ PDF pattern recognition and neural networks PDF neural networks for pattern recognition advanced texts in econometrics PDF neural networks…