• Corpus ID: 51864351

Selective Clustering Annotated using Modes of Projections

  title={Selective Clustering Annotated using Modes of Projections},
  author={Evan Greene and Greg Finak and Raphael Gottardo},
Selective clustering annotated using modes of projections (SCAMP) is a new clustering algorithm for data in $\mathbb{R}^p$. SCAMP is motivated from the point of view of non-parametric mixture modeling. Rather than maximizing a classification likelihood to determine cluster assignments, SCAMP casts clustering as a search and selection problem. One consequence of this problem formulation is that the number of clusters is $\textbf{not}$ a SCAMP tuning parameter. The search phase of SCAMP consists… 
A New Robust Multivariate Mode Estimator for Eye-tracking Calibration
A new algorithm to identify the first mode of multivariate distributions, named BRIL, which relies on recursive depth-based filtering, and is tested on artificial mixtures of Gaussian and Uniform distributions and compared to existing methods.
A new data-driven cell population discovery and annotation method for single-cell data, FAUST, reveals correlates of clinical response to cancer immunotherapy
It is shown that FAUST’s phenotypic annotations enable cross-study data integration and multivariate analysis in the presence of heterogeneous data and diverse immunophenotyping staining panels, demonstrating FAUST is a powerful method for unbiased discovery in single-cell data.
A targeted multi-omic analysis approach measures protein expression and low abundance transcripts on the single cell level
A novel targeted transcriptomics approach that combines analysis of over 400 genes with simultaneous measurement of over 40 proteins on more than 25,000 cells is described, which requires only about 1/10 of the read depth compared to a whole transcriptome approach while retaining high sensitivity for low abundance transcripts.
New interpretable machine learning method for single-cell data reveals correlates of clinical response to cancer immunotherapy
A new interpretable machine learning method for single-cell mass and flow cytometry studies, FAUST, that robustly performs unbiased cell population discovery and annotation and enables hypothesis-driven inference about cell sub-population abundance through a multivariate modeling framework called Phenotypic and FunctionalDifferentialAbundance (PFDA).


Assessment and pruning of hierarchical model based clustering
A new clustering method is proposed that can be regarded as a hybrid between model-based and nonparametric clustering, and the hybrid clustering algorithm prunes the cluster tree generated by hierarchical model- based clustering.
A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density
A graph-based method is presented that can approximate the cluster tree of any density estimate and proposes excess mass as a measure for the size of a branch, reflecting the height of the corresponding peak of the density above the surrounding valley floor as well as its spatial extent.
OPTICS: ordering points to identify the clustering structure
A new algorithm is introduced for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure.
Model-Based Clustering, Discriminant Analysis, and Density Estimation
This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.
Consistent Procedures for Cluster Tree Estimation and Pruning
A tree pruning procedure is studied that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient, and derive lower bounds on the sample complexity of cluster tree estimation.
A Nonparametric Statistical Approach to Clustering via Mode Identification
A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator and tends to combine the strengths of linkage and mixture-model-based clustering.
Model-based Gaussian and non-Gaussian clustering
The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.
Combining Mixture Components for Clustering
  • J. Baudry, A. Raftery, G. Celeux, Kenneth Lo, R. Gottardo
  • Computer Science
    Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
  • 2010
This paper proposes first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion, which yields a unique soft clustering for each number of clusters less than or equal to K.
Data clustering: 50 years beyond K-means
A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.