Learn More
We present a new clustering algorithm by proposing a convex relaxation of hierarchical clustering, which results in a family of objective functions with a natural geometric interpretation. We give efficient algorithms for calculating the continuous regularization path of solutions, and discuss relative advantages of the parameters. Our method experimentally(More)
Many models have been proposed to detect copy number alterations in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set. Furthermore, most methods have a smoothing parameter that determines the number of breakpoints and must be chosen using various heuristics. We present three contributions(More)
BACKGROUND The recent advent of high-throughput SNP genotyping technologies has opened new avenues of research for population genetics. In particular, a growing interest in the identification of footprints of selection, based on genome scans for adaptive differentiation, has emerged. METHODOLOGY/PRINCIPAL FINDINGS The purpose of this study is to develop(More)
In segmentation models, the number of change-points is typically chosen using a penalized cost function. In this work, we propose to learn the penalty and its constants in databases of signals with weak change-point annotations. We propose a convex relaxation for the resulting interval regression problem, and solve it using accelerated proximal gradient(More)
MOTIVATION DNA copy number profiles characterize regions of chromosome gains, losses and breakpoints in tumor genomes. Although many models have been proposed to detect these alterations, it is not clear which model is appropriate before visual inspection the signal, noise and models for a particular profile. RESULTS We propose SegAnnDB, a Web-based(More)
Peak detection is a central problem in genomic data analysis, and current algorithms for this task are unsupervised and mostly effective for a single data type and pattern (e.g. H3K4me3 data with a sharp peak pattern). We propose PeakSeg, a new constrained maximum likelihood segmentation model for peak detection with an efficient inference algorithm:(More)
Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which method and what parameters are optimal for any given data set. In contrast, peaks can easily be located by visual inspection of profile data on a genome browser. We thus propose a supervised machine learning approach to ChIP-seq data analysis, using(More)
Joint peak detection is a central problem when comparing samples in genomic data analysis, but current algorithms for this task are unsupervised and limited to at most 2 sample types. We propose PeakSegJoint, a new constrained maximum likelihood segmentation model for any number of sample types. To select the number of peaks in the segmentation, we propose(More)
Clonal heterogeneity in lymphoid malignancies has been recently reported in adult T-cell lymphoma/leukemia, peripheral T-cell lymphoma, not otherwise specified, and mantle cell lymphoma. Our analysis was extended to other types of lymphoma including marginal zone lymphoma, follicular lymphoma and diffuse large B-cell lymphoma. To determine the presence of(More)