Topology and data
@article{Carlsson2009TopologyAD, title={Topology and data}, author={Gunnar E. Carlsson}, journal={Bulletin of the American Mathematical Society}, year={2009}, volume={46}, pages={255-308} }
An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the…
1,898 Citations
Topological data analysis: A promising big data exploration tool in biology, analytical chemistry and physical chemistry.
- ChemistryAnalytica chimica acta
- 2016
Data Analysis using Computational Topology and Geometric Statistics
- Mathematics, Computer Science
- 2009
The purpose of this workshop is to bring together these two research directions and explore their overlap, particularly in the service of statistical data analysis.
The Shape of Things: Topological Data Analysis
- Computer Science
- 2021
Much daunting mathematics lies behind the methods of TDA, but it is possible to gain an idea and understanding of the approach and its potential usefulness even without a deep dive into the intricacies of topology, homology classes, and the like.
Applied Computational Topology for Point Clouds and Sparse Timeseries Data
- Computer Science
- 2017
This work provides a practical and scalable tool for identifying coherent sets from a sparse set of particle trajectories using eigenanalysis and extends existing tools in topological data analysis and provides a theoretical framework for studying topological features of a point cloud over a range of resolutions, enabling the analysis of topology features using statistical methods.
On Topological Data Mining
- Computer ScienceInteractive Knowledge Discovery and Data Mining in Biomedical Informatics
- 2014
Knowing the intrinsic dimensionality of data can be seen as one first step towards understanding its structure, and applying topological techniques to data mining and knowledge discovery is a hot and promising future research area.
Efficient Approximation of Multiparameter Persistence Modules
- Computer Science, MathematicsArXiv
- 2022
This article presents the first approximation scheme, which is based on bered barcodes and exact matchings, two constructions that stem from the theory of single-parameter persistence, for computing and decomposing general multi-parameters persistence modules.
Clustering by the local intrinsic dimension: the hidden structure of real-world data
- Computer ScienceArXiv
- 2019
The results show that a simple topological feature, the local ID, is sufficient to uncover a rich structure in high-dimensional data landscapes, and many real-world data sets contain regions with widely heterogeneous dimensions.
Data segmentation based on the local intrinsic dimension
- Computer ScienceScientific reports
- 2020
This work develops a robust approach to discriminate regions with different local IDs and segment the points accordingly, finding that many real-world data sets contain regions with widely heterogeneous dimensions.
Topology, Big Data and Optimization
- Computer Science
- 2016
The idea is to extract robust topological features from data and use these summaries for modeling the data, and the coordinate-free nature of topology generates algorithms and viewpoints well suited to highly complex datasets.
Fast Computation of Persistent Homology with Data Reduction and Data Partitioning
- Computer Science2019 IEEE International Conference on Big Data (Big Data)
- 2019
A combination of data reduction and data partitioning to compute persistent homology on big data that enables the identification of both large and small topological features from the input data set and reduces the approximation errors that typically accompany data reduction.
References
SHOWING 1-10 OF 81 REFERENCES
A Sober Look at Clustering Stability
- Computer ScienceCOLT
- 2006
It is concluded that stability is not a well-suited tool to determine the number of clusters - it is determined by the symmetries of the data which may be unrelated to clustering parameters.
Topological estimation using witness complexes
- MathematicsPBG
- 2004
This paper tackles the problem of computing topological invariants of geometric objects in a robust manner, using only point cloud data sampled from the object, and produces a nested family of simplicial complexes, which represent the data at different feature scales, suitable for calculating persistent homology.
Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2006
It is shown that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusionspace and a precise measure of the performance of general clustering algorithms.
The Nonlinear Statistics of High-Contrast Patches in Natural Images
- Computer Science
- 2003
This study explores the space of data points representing the values of 3 × 3 high-contrast patches from optical and 3D range images and finds that the distribution of data is extremely “sparse” with the majority of the data points concentrated in clusters and non-linear low-dimensional manifolds.
What is a statistical model
- Philosophy
- 2002
This paper addresses two closely related questions, What is a statistical model? and What is a parameter? The notions that a model must make sense, and that a parameter must have a well-defined…
A global geometric framework for nonlinear dimensionality reduction.
- Computer ScienceScience
- 2000
An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Finding the Homology of Submanifolds with High Confidence from Random Samples
- Mathematics, Computer ScienceDiscret. Comput. Geom.
- 2008
This work considers the case where data are drawn from sampling a probability distribution that has support on or near a submanifold of Euclidean space and shows how to “learn” the homology of the sub manifold with high confidence.
Multivariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh)
- Mathematics
- 1999
A data depth can be used to measure the ‘‘depth’’ or ‘‘outlyingness’’ of a given multivariate sample with respect to its underlying distribution. This leads to a natural center-outward ordering of…
Uncovering the overlapping community structure of complex networks in nature and society
- Computer ScienceNature
- 2005
After defining a set of new characteristic quantities for the statistics of communities, this work applies an efficient technique for exploring overlapping communities on a large scale and finds that overlaps are significant, and the distributions introduced reveal universal features of networks.
The Elements of Statistical Learning
- BusinessTechnometrics
- 2003
Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.