Monitoring the shape of weather, soundscapes, and dynamical systems: a new statistic for dimension-driven data analysis on large datasets

  title={Monitoring the shape of weather, soundscapes, and dynamical systems: a new statistic for dimension-driven data analysis on large datasets},
  author={Henry Kvinge and Elin Farnell and Michael J. Kirby and Chris Peterson},
  journal={2018 IEEE International Conference on Big Data (Big Data)},
Dimensionality-reduction methods are a fundamental tool in the analysis of large datasets. These algorithms work on the assumption that the "intrinsic dimension" of the data is generally much smaller than the ambient dimension in which it is collected. Alongside their usual purpose of mapping data into a smaller-dimensional space with minimal information loss, dimensionality-reduction techniques implicitly or explicitly provide information about the dimension of the dataset.In this paper, we… 

Figures from this paper

Rare Geometries: Revealing Rare Categories via Dimension-Driven Statistics
  • Henry Kvinge, Elin Farnell
  • Computer Science
    2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)
  • 2019
A new supervised learning algorithm is presented that uses a dimension-driven statistic, called the kappa-profile, to determine whether unlabeled points belong to a rare class, and is invariant with respect to translation so that it performs equivalently on both separable and non-separable classes.
Big Data Analytics in Weather Forecasting: A Systematic Review
This paper tenders a systematic literature review method for big data analytic approaches in weather forecasting (published between 2014 and August 2020) and presents a comparison of the aforementioned categories regarding accuracy, scalability, execution time, and other Quality of Service factors.


Too many secants: a hierarchical approach to secant-based dimensionality reduction on large data sets
A hierarchical secant-based dimensionality-reduction method is proposed, which can be employed for data sets where explicitly calculating all secants is not feasible, and which relates Whitney's embedding theorem to the natural dimension of the data.
Linear dimensionality reduction: survey, insights, and generalizations
This survey and generic solver suggest that linear dimensionality reduction can move toward becoming a blackbox, objective-agnostic numerical technology.
Dimensionality Reduction Using Secant-Based Projection Methods: The Induced Dynamics in Projected Systems
In previous papers we have developed an approach to the data reduction problem which is based on a well-known, constructive proof of Whitney’s embedding theorem [Broomhead, D. S. and Kirby, M., SIAM
Data dimensionality estimation methods: a survey
Estimation of Topological Dimension
A geometric scaling property and dimensionality criterion is presented that permit the automated application of the algorithm as well as a significant reduction in computational expense.
An Algorithm for Finding Intrinsic Dimensionality of Data
An algorithm for the analysis of multivariant data is presented along with some experimental results, and an analysis that demonstrates the feasability of this approach.
Determining Intrinsic Dimension and Entropy of High-Dimensional Shape Spaces
  • J. Costa, A. Hero
  • Mathematics, Computer Science
    Statistics and Analysis of Shapes
  • 2006
This chapter provides proofs of strong consistency of these estimators of dimension and entropy based on the lengths of the geodesic minimal spanning tree (GMST) and the k-nearest neighbor (k-NN) graph under weak assumptions of compactness of the manifold and boundedness ofThe Lebesgue sampling density supported on the manifold is illustrated.
A New Approach to Dimensionality Reduction: Theory and Algorithms
Whitney's embedding theorem is applied to the data reduction problem and a new approach motivated in part by the (constructive) proof of the theorem is introduced which involves picking projections of the high-dimensional system that are optimized such that they are easy to invert.
The Whitney Reduction Network: A Method for Computing Autoassociative Graphs
To implement this network, the idea of a good-projection is proposed, which enhances the generalization capabilities of the network, and an adaptive secant basis algorithm to achieve it is proposed.
Topological dimension and local coordinates from time series data
A method for the estimation of the topological dimension of a manifold from time series data is presented. It is based on the approximation of the manifold near a point chi by its tangent space at