Matthew P. Wand

Learn More
The most important parameter of a histogram is the bin width, since it controls the trade-oo between presenting a picture with too much detail (\un-dersmoothing") or too little detail (\oversmoothing") with respect to the true distribution. Despite this importance there has been surprisingly little research into estimation of the \optimal" bin width.(More)
Generalized linear models (Wedderburn and NeIder 1972, McCullagh and NeIder 1988) were introduced as a means of extending the techniques of ordinary parametric regression to several commonly-used regression models arising from non-normal likelihoods. Typically these models have a variance that depends on the mean function. However, in many cases the(More)
Fully simplified expressions for Multivariate Normal updates in non-conjugate variational message passing approximate inference schemes are obtained. The simplicity of these expressions means that the updates can be achieved very efficiently. Since the Multivariate Normal family is the most common for approximating the joint posterior density function of a(More)
Multivariate kernel density estimation provides information about structure in data. Feature significance is a technique for deciding whether features – such as local extrema – are statistically significant. This paper proposes a framework for feature significance in d-dimensional data which combines kernel density derivative estimators and hypothesis tests(More)
Often, the functional form of covariate effects in an additive model varies across groups defined by levels of a categorical variable. This structure represents a factor-by-curve interaction. This article presents penalized spline models that incorporate factor-by-curve interactions into additive models. A mixed model formulation for penalized splines(More)
In data analytic applications of density estimation one is usually interested in estimating the density over its support. However, common estimators such as the basic kernel estimator use a single smoothing parameter over the whole of the support. While this will be adequate for some densities there will be other densities that will be very difficult to(More)
Maps depicting cancer incidence rates have become useful tools in public health research, giving valuable information about the spatial variation in rates of disease. Typically, these maps are generated using count data aggregated over areas such as counties or census blocks. However, with the proliferation of geographic information systems and related(More)
A method for fitting regression models to data that exhibit spatial correlation and heteroskedasticity is proposed. It is well known that ignoring a nonconstant variance does not bias least-squares estimates of regression parameters; thus, data analysts are easily lead to the false belief that moderate heteroskedasticity can generally be ignored.(More)
BACKGROUND High-throughput flow cytometry experiments produce hundreds of large multivariate samples of cellular characteristics. These samples require specialized processing to obtain clinically meaningful measurements. A major component of this processing is a form of cell subsetting known as gating. Manual gating is time-consuming and subjective. Good(More)