Jacqueline M. Hughes-Oliver

Learn More
MOTIVATION New biological systems technologies give scientists the ability to measure thousands of bio-molecules including genes, proteins, lipids and metabolites. We use domain knowledge, e.g. the Gene Ontology, to guide analysis of such data. By focusing on domain-aggregated results at, say the molecular function level, increased interpretability is(More)
When similar experimental units are assigned randomly to two groups, one to receive a "treatment," the other to serve as a control, the homogeneity of variance assumption underlying the pooled t test is valid under the null hypothesis of no treatment effect. Thus power, and not validity, of the pooled t should be the concern in such experiments, especially(More)
A common assumption in the modeling of stochastic processes is that of weak stationarity. Although this is a convenient and sometimes justifiable assumption for many applications, there are other applications for which it is clearly inappropriate. One such application occurs when the process is driven by action at a limited number of sites, or point(More)
Discovery of a new drug involves screening large chemical libraries to identify active compounds. Screening efficiency can be improved by testing compounds in pools. We consider two criteria to design pools: optimal coverage of the chemical space and minimal collision between compounds. Five pooling designs are applied to a public data set. We evaluate each(More)
Pooling experiments are used as a cost-effective approach for screening chemical compounds as part of the drug discovery process in pharmaceutical companies. When a biologically potent pool is found, the goal is to decode the pool, i.e., to determine which of the individual compounds are potent. We propose augmenting the data on pooled testing with(More)
Ensemble methods have become popular for QSAR modeling, but most studies have assumed balanced data, consisting of approximately equal numbers of active and inactive compounds. Cheminformatics data are often far from being balanced. We extend the application of ensemble methods to include cases of imbalance of class membership and to more adequately assess(More)
Testing in groups can lead to great efficiencies in total testing cost when searching for individuals with some characteristic. If the presence of a blocking object can cause a group with a positive object to test negative, there is a need to find optimal pooling strategies to minimize the cost of testing and reduce the number of missed positive(More)
A new classification method called the Optimal Bit String Tree is proposed to identify quantitative structure-activity relationships (QSARs). The method introduces the concept of a chromosome to describe the presence/absence context of a combination of descriptors. A descriptor set and its optimal chromosome form the splitting variable. A new stochastic(More)
In testing product reliability, there is often a critical cutoff level that determines whether a specimen is classified as "failed." One consequence is that the number of degradation data collected varies from specimen to specimen. The information of random sample size should be included in the model, and our study shows that it can be influential in(More)