Learn More
We present a new clustering algorithm by proposing a convex relaxation of hierarchical clustering, which results in a family of objective functions with a natural geometric interpretation. We give efficient algorithms for calculating the continuous regularization path of solutions, and discuss relative advantages of the parameters. Our method experimentally(More)
Many models have been proposed to detect copy number alterations in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set. Furthermore, most methods have a smoothing parameter that determines the number of breakpoints and must be chosen using various heuristics. We present three contributions(More)
In segmentation models, the number of change-points is typically chosen using a penalized cost function. In this work, we propose to learn the penalty and its constants in databases of signals with weak change-point annotations. We propose a convex relaxation for the resulting interval regression problem, and solve it using accelerated proximal gradient(More)
Peak detection is a central problem in genomic data analysis, and current algorithms for this task are unsupervised and mostly effective for a single data type and pattern (e.g. H3K4me3 data with a sharp peak pattern). We propose PeakSeg, a new constrained maximum likelihood segmentation model for peak detection with an efficient inference algorithm:(More)
Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which method and what parameters are optimal for any given data set. In contrast, peaks can easily be located by visual inspection of profile data on a genome browser. We thus propose a supervised machine learning approach to ChIP-seq data analysis, using(More)
BACKGROUND The recent advent of high-throughput SNP genotyping technologies has opened new avenues of research for population genetics. In particular, a growing interest in the identification of footprints of selection, based on genome scans for adaptive differentiation, has emerged. METHODOLOGY/PRINCIPAL FINDINGS The purpose of this study is to develop(More)
Joint peak detection is a central problem when comparing samples in genomic data analysis, but current algorithms for this task are unsupervised and limited to at most 2 sample types. We propose PeakSegJoint, a new constrained maximum likelihood segmentation model for any number of sample types. To select the number of peaks in the segmentation, we propose(More)
In ranking problems, the goal is to learn a ranking function r(x) ∈ R from labeled pairs x, x of input points. In this paper, we consider the related comparison problem , where the label y ∈ {−1, 0, 1} indicates which element of the pair is better, or if there is no significant difference. We cast the learning problem as a margin maximization, and show that(More)
  • 1