David I. Hastie

Learn More
Statistical problems where ‘the number of things you don’t know is one of the things you don’t know’ are ubiquitous in statistical modelling. They arise both in traditional modelling situations such as variable selection in regression, and in more novel methodologies such as object recognition, signal processing, and Bayesian nonparametrics. All such(More)
Since its introduction by Green (1995), reversible jump MCMC has been recognised as a powerful tool for making posterior inference about a wide range of statistical problems. Despite enjoying considerable application across a variety of disciplines, the method’s popularity has been tempered by the common perception that reversible jump samplers can be(More)
Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex(More)
Recently, concerns have centered on how to expand knowledge on the limited science related to the cumulative impact of multiple air pollution exposures and the potential vulnerability of poor communities to their toxic effects. The highly intercorrelated nature of exposures makes application of standard regression-based methods to these questions(More)
PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010). The package allows binary, categorical, count and(More)
SUMMARY ESS++ is a C++ implementation of a fully Bayesian variable selection approach for single and multiple response linear regression. ESS++ works well both when the number of observations is larger than the number of predictors and in the 'large p, small n' case. In the current version, ESS++ can handle several hundred observations, thousands of(More)
We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter [Formula: see text]. This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (Communications in Statistics - Simulation and Computation 36:45-54, 2007) and the(More)
Tumour multiplicity is a frequently measured phenotype in animal studies of cancer biology. Poisson variation of this measurement represents a biological and statistical reference point that is usually violated, even in highly controlled experiments, owing to sources of variation in the stochastic process of tumour formation. A recent experiment on murine(More)
We construct data exploration tools for recognizing important covariate patterns associated with a phenotype, with particular focus on searching for association with gene-gene patterns. To this end, we propose a new variable selection procedure that employs latent selection weights and compare it to an alternative formulation. The selection procedures are(More)