Learn More
Analysis of modern biological data often involves ill-posed problems due to high dimensionality and multicollinearity. Partial Least Squares (pls) regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since 1960s. At the core of the pls methodology lies a dimension reduction(More)
We develop and apply a previously undescribed framework that is designed to extract information in the form of a positive definite kernel matrix from possibly crude, noisy, incomplete, inconsistent dissimilarity information between pairs of objects, obtainable in a variety of contexts. Any positive definite kernel defines a consistent set of distances, and(More)
Neurons constitute the most diverse cell types and acquire their identity by the activity of particular genetic programs . The GABAergic nervous system in C. elegans consists of 26 neurons that fall into six classes . Animals that are defective in GABAergic neuron function and development display "shrinker" movement , abnormal foraging and defecation .(More)
Cawley et al. (2004) have recently mapped the locations of binding sites for three transcription factors along human chromosomes 21 and 22 using ChIP-Chip experiments. ChIP-Chip experiments are a new approach to the genomewide identification of transcription factor binding sites and consist of chromatin (Ch) immunoprecipitation (IP) of transcription(More)
Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel)(More)
A number of computational methods have been proposed for identifying transcription factor binding sites from a set of unaligned sequences that are thought to share the motif in question. We here introduce an algorithm, called cosmo, that allows this search to be supervised by specifying a set of constraints that the position weight matrix of the unknown(More)
Chromatin immunoprecipitation followed by DNA microarray analysis (ChIP-chip methodology) is an efficient way of mapping genome-wide protein-DNA interactions. Data from tiling arrays encompass DNA-protein interaction measurements on thousands or millions of short oligonucleotides (probes) tiling a whole chromosome or genome. We propose a new model-based(More)
Sequential modifications of the RNA polymerase II (Pol II) C-terminal domain (CTD) coordinate the stage-specific association and release of cellular machines during transcription. Here we examine the genome-wide distributions of the 'early' (phospho-Ser5 (Ser5-P)), 'mid' (Ser7-P) and 'late' (Ser2-P) CTD marks. We identify gene class-specific patterns and(More)
The ChIP-seq technique enables genome-wide mapping of in vivo protein-DNA interactions and chromatin states. Current analytical approaches for ChIP-seq analysis are largely geared towards single-sample investigations, and have limited applicability in comparative settings that aim to identify combinatorial patterns of enrichment across multiple datasets. We(More)