• Corpus ID: 51864351

Selective Clustering Annotated using Modes of Projections

@article{Greene2018SelectiveCA,
  title={Selective Clustering Annotated using Modes of Projections},
  author={Evan Greene and Greg Finak and Raphael Gottardo},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.10328}
}
Selective clustering annotated using modes of projections (SCAMP) is a new clustering algorithm for data in $\mathbb{R}^p$. SCAMP is motivated from the point of view of non-parametric mixture modeling. Rather than maximizing a classification likelihood to determine cluster assignments, SCAMP casts clustering as a search and selection problem. One consequence of this problem formulation is that the number of clusters is $\textbf{not}$ a SCAMP tuning parameter. The search phase of SCAMP consists… 

A New Robust Multivariate Mode Estimator for Eye-tracking Calibration

TLDR
A new algorithm to identify the first mode of multivariate distributions, named BRIL, which relies on recursive depth-based filtering, and is tested on artificial mixtures of Gaussian and Uniform distributions and compared to existing methods.

A new data-driven cell population discovery and annotation method for single-cell data, FAUST, reveals correlates of clinical response to cancer immunotherapy

TLDR
It is shown that FAUST’s phenotypic annotations enable cross-study data integration and multivariate analysis in the presence of heterogeneous data and diverse immunophenotyping staining panels, demonstrating FAUST is a powerful method for unbiased discovery in single-cell data.

New interpretable machine learning method for single-cell data reveals correlates of clinical response to cancer immunotherapy

TLDR
A new interpretable machine learning method for single-cell mass and flow cytometry studies, FAUST, that robustly performs unbiased cell population discovery and annotation and enables hypothesis-driven inference about cell sub-population abundance through a multivariate modeling framework called Phenotypic and FunctionalDifferentialAbundance (PFDA).

A Targeted Multi-omic Analysis Approach Measures Protein Expression and Low-Abundance Transcripts on the Single-Cell Level

TLDR
A novel targeted transcriptomics approach that combines analysis of over 400 genes with simultaneous measurement of over 40 proteins on more than 25,000 cells is described, which requires only about 1/10 of the read depth compared to a whole transcriptome approach while retaining high sensitivity for low abundance transcripts.

References

SHOWING 1-10 OF 79 REFERENCES

A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density

TLDR
A graph-based method is presented that can approximate the cluster tree of any density estimate and proposes excess mass as a measure for the size of a branch, reflecting the height of the corresponding peak of the density above the surrounding valley floor as well as its spatial extent.

OPTICS: ordering points to identify the clustering structure

TLDR
A new algorithm is introduced for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure.

Model-Based Clustering, Discriminant Analysis, and Density Estimation

TLDR
This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.

Consistent Procedures for Cluster Tree Estimation and Pruning

TLDR
A tree pruning procedure is studied that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient, and derive lower bounds on the sample complexity of cluster tree estimation.

A Nonparametric Statistical Approach to Clustering via Mode Identification

TLDR
A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator and tends to combine the strengths of linkage and mixture-model-based clustering.

Model-based Gaussian and non-Gaussian clustering

TLDR
The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.

Data clustering: 50 years beyond K-means

TLDR
A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

Segmentation as Maximum-Weight Independent Set

TLDR
Empirical evaluation on the benchmark Berkeley segmentation dataset shows that the new MWIS algorithm eliminates the need for hand-picking optimal input parameters of the state-of-the-art segmenters, and outperforms their best, manually optimized results.

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

TLDR
DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
...