Toby Hocking

Learn More
We present a new clustering algorithm by proposing a convex relaxation of hierarchical clustering, which results in a family of objective functions with a natural geometric interpretation. We give efficient algorithms for calculating the continuous regularization path of solutions, and discuss relative advantages of the parameters. Our method experimentally(More)
Many models have been proposed to detect copy number alterations in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set. Furthermore, most methods have a smoothing parameter that determines the number of breakpoints and must be chosen using various heuristics. We present three contributions(More)
BACKGROUND The recent advent of high-throughput SNP genotyping technologies has opened new avenues of research for population genetics. In particular, a growing interest in the identification of footprints of selection, based on genome scans for adaptive differentiation, has emerged. METHODOLOGY/PRINCIPAL FINDINGS The purpose of this study is to develop(More)
In segmentation models, the number of change-points is typically chosen using a penalized cost function. In this work, we propose to learn the penalty and its constants in databases of signals with weak change-point annotations. We propose a convex relaxation for the resulting interval regression problem, and solve it using accelerated proximal gradient(More)
PURPOSE The tumor genomic copy number profile is of prognostic significance in neuroblastoma patients. We have studied the genomic copy number profile of cell-free DNA (cfDNA) and compared this with primary tumor arrayCGH (aCGH) at diagnosis. EXPERIMENTAL DESIGN In 70 patients, cfDNA genomic copy number profiling was performed using the OncoScan platform.(More)
Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which method and what parameters are optimal for any given data set. In contrast, peaks can easily be located by visual inspection of profile data on a genome browser. We thus propose a supervised machine learning approach to ChIP-seq data analysis, using(More)
Peak detection is a central problem in genomic data analysis, and current algorithms for this task are unsupervised and mostly effective for a single data type and pattern (e.g. H3K4me3 data with a sharp peak pattern). We propose PeakSeg, a new constrained maximum likelihood segmentation model for peak detection with an efficient inference algorithm:(More)
MOTIVATION DNA copy number profiles characterize regions of chromosome gains, losses and breakpoints in tumor genomes. Although many models have been proposed to detect these alterations, it is not clear which model is appropriate before visual inspection the signal, noise and models for a particular profile. RESULTS We propose SegAnnDB, a Web-based(More)