Learn More
A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compound's quantitative or categorical biological activity based on a quantitative description of the compound's molecular structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the(More)
BACKGROUND Computational and microarray-based experimental approaches were used to generate a comprehensive transcript index for the human genome. Oligonucleotide probes designed from approximately 50,000 known and predicted transcript sequences from the human genome were used to survey transcription from a diverse set of 60 tissues and cell lines using(More)
OBJECTIVE To study the characteristics of unintentional muscle activities in clinical EEG, and to develop a high-throughput method to reduce them for better revealing drug or biological effects on EEG. METHODS Two clinical EEG datasets are involved. Pure muscle signals are extracted from EEG using Independent Component Analysis (ICA) for studying their(More)
Neural networks were widely used for quantitative structure-activity relationships (QSAR) in the 1990s. Because of various practical issues (e.g., slow on large problems, difficult to train, prone to overfitting, etc.), they were superseded by more robust methods like support vector machine (SVM) and random forest (RF), which arose in the early 2000s. The(More)
The International Pharmaco-EEG Society (IPEG) presents guidelines summarising the requirements for the recording and computerised evaluation of pharmaco-sleep data in man. Over the past years, technical and data-processing methods have advanced steadily, thus enhancing data quality and expanding the palette of sleep assessment tools that can be used to(More)
OBJECTIVE To evaluate the performance of 2 automated systems, Morpheus and Somnolyzer24X7, with various levels of human review/editing, in scoring polysomnographic (PSG) recordings from a clinical trial using zolpidem in a model of transient insomnia. METHODS 164 all-night PSG recordings from 82 subjects collected during 2 nights of sleep, one under(More)
A classification and regression tool, J. H. Friedman's Stochastic Gradient Boosting (SGB), is applied to predicting a compound's quantitative or categorical biological activity based on a quantitative description of the compound's molecular structure. Stochastic Gradient Boosting is a procedure for building a sequence of models, for instance regression(More)
High-density oligonucleotide arrays allow researchers to measure mRNA transcript abundance for thousands of genes on a single array. The large number of genes, multiple sources of variation, and typically small number of experimental units (EUs) combine to make analysis of data from these arrays challenging. We describe our experience in applying data(More)
Outlying samples are sought in a very high-dimensional data set, a library of mass spectra. Such samples are considered novel from the chemical structure point of view and are identified for further investigation of their potential biological activity. The support vector machine algorithm for domain description (Tax & Duin 1999; Schölkopf et al. 2000, 2001)(More)