Topology and data

  title={Topology and data},
  author={Gunnar E. Carlsson},
  journal={Bulletin of the American Mathematical Society},
  • G. Carlsson
  • Published 29 January 2009
  • Computer Science
  • Bulletin of the American Mathematical Society
An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the… 

Data Analysis using Computational Topology and Geometric Statistics

The purpose of this workshop is to bring together these two research directions and explore their overlap, particularly in the service of statistical data analysis.

The Shape of Things: Topological Data Analysis

Much daunting mathematics lies behind the methods of TDA, but it is possible to gain an idea and understanding of the approach and its potential usefulness even without a deep dive into the intricacies of topology, homology classes, and the like.

Applied Computational Topology for Point Clouds and Sparse Timeseries Data

This work provides a practical and scalable tool for identifying coherent sets from a sparse set of particle trajectories using eigenanalysis and extends existing tools in topological data analysis and provides a theoretical framework for studying topological features of a point cloud over a range of resolutions, enabling the analysis of topology features using statistical methods.

On Topological Data Mining

  • Andreas Holzinger
  • Computer Science
    Interactive Knowledge Discovery and Data Mining in Biomedical Informatics
  • 2014
Knowing the intrinsic dimensionality of data can be seen as one first step towards understanding its structure, and applying topological techniques to data mining and knowledge discovery is a hot and promising future research area.

Efficient Approximation of Multiparameter Persistence Modules

This article presents the first approximation scheme, which is based on bered barcodes and exact matchings, two constructions that stem from the theory of single-parameter persistence, for computing and decomposing general multi-parameters persistence modules.

Clustering by the local intrinsic dimension: the hidden structure of real-world data

The results show that a simple topological feature, the local ID, is sufficient to uncover a rich structure in high-dimensional data landscapes, and many real-world data sets contain regions with widely heterogeneous dimensions.

Data segmentation based on the local intrinsic dimension

This work develops a robust approach to discriminate regions with different local IDs and segment the points accordingly, finding that many real-world data sets contain regions with widely heterogeneous dimensions.

Topology, Big Data and Optimization

The idea is to extract robust topological features from data and use these summaries for modeling the data, and the coordinate-free nature of topology generates algorithms and viewpoints well suited to highly complex datasets.

Fast Computation of Persistent Homology with Data Reduction and Data Partitioning

A combination of data reduction and data partitioning to compute persistent homology on big data that enables the identification of both large and small topological features from the input data set and reduces the approximation errors that typically accompany data reduction.



A Sober Look at Clustering Stability

It is concluded that stability is not a well-suited tool to determine the number of clusters - it is determined by the symmetries of the data which may be unrelated to clustering parameters.

Topological estimation using witness complexes

This paper tackles the problem of computing topological invariants of geometric objects in a robust manner, using only point cloud data sampled from the object, and produces a nested family of simplicial complexes, which represent the data at different feature scales, suitable for calculating persistent homology.

Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization

  • S. LafonAnn B. Lee
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2006
It is shown that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusionspace and a precise measure of the performance of general clustering algorithms.

The Nonlinear Statistics of High-Contrast Patches in Natural Images

This study explores the space of data points representing the values of 3 × 3 high-contrast patches from optical and 3D range images and finds that the distribution of data is extremely “sparse” with the majority of the data points concentrated in clusters and non-linear low-dimensional manifolds.

What is a statistical model

This paper addresses two closely related questions, What is a statistical model? and What is a parameter? The notions that a model must make sense, and that a parameter must have a well-defined

A global geometric framework for nonlinear dimensionality reduction.

An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.

Finding the Homology of Submanifolds with High Confidence from Random Samples

This work considers the case where data are drawn from sampling a probability distribution that has support on or near a submanifold of Euclidean space and shows how to “learn” the homology of the sub manifold with high confidence.

Multivariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh)

A data depth can be used to measure the ‘‘depth’’ or ‘‘outlyingness’’ of a given multivariate sample with respect to its underlying distribution. This leads to a natural center-outward ordering of

Uncovering the overlapping community structure of complex networks in nature and society

After defining a set of new characteristic quantities for the statistics of communities, this work applies an efficient technique for exploring overlapping communities on a large scale and finds that overlaps are significant, and the distributions introduced reveal universal features of networks.

The Elements of Statistical Learning

Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.