Bump hunting in high-dimensional data

  title={Bump hunting in high-dimensional data},
  author={Jerome H. Friedman and Nicholas I. Fisher},
  journal={Statistics and Computing},
Many data analytic questions can be formulated as (noisy) optimization problems. They explicitly or implicitly involve finding simultaneous combinations of values for a set of (“input”) variables that imply unusually large (or small) values of another designated (“output”) variable. Specifically, one seeks a set of subregions of the input variable space within which the value of the output variable is considerably larger (or smaller) than its average value over the entire input domain. In… 
Local Sparse Bump Hunting
  • J. DazardJ. S. Rao
  • Computer Science
    Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
  • 2010
This work introduces a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables, which outperforms a naive PRIM as well as competitive nonparametric supervised and unsupervised methods in the problem of class discovery.
Mixtures of Rectangles: Interpretable Soft Clustering
This work explores a clustering technique that requires no user-supplied parameters except for the desired number of clusters, and demonstrates the usefulness of the method in subspace clustering for synthetic data, and in real-life datasets.
Data Exploration by Representative Region Selection: Axioms and Convergence
A new type of unsupervised learning problem is presented in which a small set of representative regions are found that approximates a larger data set that does not rely on cluster structure of the data.
Conditional Sparse Linear Regression
This work considers the problem of jointly identifying a significant segment of a population in which there is a highly sparse linear regression fit, together with the coefficients for the linear fit, and gives algorithms for such problems under the sup norm.
SuRF: Identification of Interesting Data Regions with Surrogate Models
The proposed framework, coined SuRF (SUrrogate Region Finder), leverages historical region evaluations to train surrogate models that learn to approximate the distribution of the statistic of interest and makes use of evolutionary multi-modal optimization to effectively and efficiently identify regions of interest regardless of data size and dimensionality.
Comparing Algorithms for Scenario Discovery
This study offers three measures of merit -coverage, density, and interpretability and uses them to evaluate the capabilities of PRIM, a bump-hunting algorithm, and CART, a classification algorithm and finds both algorithms can perform the required task, but often imperfectly.
Subgroup discovery in data sets with multi-dimensional responses
This work has developed a technique that uses a combination of agglomerative clustering to find subgroup candidates in the space of output attributes, and predictive modeling to score and describe these candidates inThe input attribute space.
Scenario Discovery via Rule Extraction
This work proposes a new procedure for scenario discovery - an intermediate statistical model which generalizes fast, and uses it to label (a lot of) data for PRIM, and shows that this method is much better than PRIM itself.
PRIM analysis
Bridging kriging believer and expected improvement using bump hunting for expensive black-box optimization
An algorithm incorporating the strengths of the two infill methods is proposed, able to achieve a competitive performance across a range of problems with diverse characteristics; making it a strong candidate for solving black-box CEOPs.


Machine learning
Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Neural Networks for Pattern Recognition
MDL and Categorical Theories (Continued)
Spline Models for Observational Data
Foreword 1. Background 2. More splines 3. Equivalence and perpendicularity, or, what's so special about splines? 4. Estimating the smoothing parameter 5. 'Confidence intervals' 6. Partial spline
Projection Pursuit Regression
Abstract A new method for nonparametric multiple regression is presented. The procedure models the regression surface as a sum of general smooth functions of linear combinations of the predictor
Approximation of Functions
Theory of Approximation of Functions of a Real VariableBy A. F. Timan. Translated by J. Berry. English translation edited and editorial preface by J. Cossar. (International Series of Monographs on
Classification and Regression Trees
This chapter discusses tree classification in the context of medicine, where right Sized Trees and Honest Estimates are considered and Bayes Rules and Partitions are used as guides to optimal pruning.
Data mining and knowledge discovery: making sense out of data
Without a concerted effort to develop knowledge discovery techniques, organizations stand to forfeit much of the value from the data they currently collect and store.
Cr-Pyrope Garnets in the Lithospheric Mantle. I. Compositional Systematics and Relations to Tectonic Setting
Chrome-pyrope garnet is a minor but widespread phase in ultramafic association with Mg. The position and slope of the lherzolite trend vary with temperature and tectonic setting, suggesting that the
Pattern Recognition and Neural Networks
Title Type pattern recognition with neural networks in c++ PDF pattern recognition and neural networks PDF neural networks for pattern recognition advanced texts in econometrics PDF neural networks