• Corpus ID: 15021971

On statistical learning via the lens of compression

  title={On statistical learning via the lens of compression},
  author={Ofir David and Shay Moran and Amir Yehudayoff},
This work continues the study of the relationship between sample compression schemes and statistical learning, which has been mostly investigated within the framework of binary classification. The central theme of this work is establishing equivalences between learnability and compressibility, and utilizing these equivalences in the study of statistical learning theory. We begin with the setting of multiclass categorization (zero/one loss). We prove that in this case learnability is equivalent… 

Complexity of Classification Problems

This work studies distributed learning in the spirit of Yao’s model of communication complexity: consider a two-party setting, where each of the players gets a list of labelled examples and they

Learning from weakly dependent data under Dobrushin's condition

The standard complexity measures of Gaussian and Rademacher complexities and VC dimension are sufficient measures of complexity for the purposes of bounding the generalization error and learning rates of hypothesis classes in this setting.

A Characterization of List Learnability

This work completely characterize k - list learnability in terms of a generalization of DS dimension that it is shown that a hypothesis class is k -list learnable if and only if the k -DS dimension is finite.

Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks

This work shows that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g. feed-forward networks of ReLUs).

Quadratic Upper Bound for Recursive Teaching Dimension of Finite VC Classes

A quadratic upper bound is shown: $\mathrm{RTD}(\mathcal C) = O(d^2)$, much closer to an answer to the open problem of is RTD linearly upper bounded by VCD.

From Imitation to Prediction, Data Compression vs Recurrent Neural Networks for Natural Language Processing

In this journey, the author discovered what he thinks is the fundamental difference between a Data Compression Algorithm and a Recurrent Neural Network.

From Imitation to Prediction , Data Compression vs Recurrent Neural Networks for Sentiment Analysis and Automatic Text Generation

A fundamental difference between a Data Compression Algorithm and Recurrent Neural Networks has been discovered and it is found that a compression algorithm is even more intelligent than a neural network in natural language processing tasks of sentiment analysis and text generation.

On the Winograd Schema: Situating Language Understanding in the Data-Information-Knowledge Continuum

This paper formally 'situates' the Winograd Schema challenge in the data-information-knowledge continuum, and shows that a WS is just special case of a more general phenomenon in language understanding, namely the missing text phenomenon (henceforth, MTP).

Autoregressive Predictive Coding: A Comprehensive Study

To study the speech representation learned by autoregressive predictive coding, common speech tasks are used to demonstrate the utility of the learned representation and a suite of fine-grained tasks are designed to probe the phonetic and prosodic content of the representation.



Relating Data Compression and Learnability

It is demonstrated that the existence of a suitable data compression scheme is sufficient to ensure learnability and the introduced compression scheme provides a rigorous model for studying data compression in connection with machine learning.

Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

It is demonstrated that the existence of a sample compression scheme of fixed-size for aclass C is sufficient to ensure that the class C is pac-learnable, and the relationship between sample compression schemes and the VC dimension is explored.

Sample compression, learnability, and the Vapnik-Chervonenkis dimension

It is demonstrated that the existence of a sample compression scheme of fixed-size for a class C is sufficient to ensure that the classC is pac-learnable, and the relationship between sample compression schemes and the VC dimension is explored.

PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification

Using the property of compression, bounds on the average prediction error of kernel classifiers in the PAC-Bayesian framework are derived and these bounds assume a prior measure over the expansion coefficients in the data-dependent kernel expansion and bound theAverage prediction error uniformly over subsets of the space of expansion coefficients.

Sample compression schemes for VC classes

It is shown that every concept class C with VC dimension d has a sample compression scheme of size exponential in d, and an approximate minimax phenomenon for binary matrices of low VC dimension is used, which may be of interest in the context of game theory.

Honest Compressions and Their Application to Compression Schemes

This work proves the existence of such compression schemes under stronger assumptions than nite VCdimension in concept classes dened by hyperplanes, polynomials, exponentials, restricted analytic functions and compositions, additions and multiplications of all of the above.

Combinatorial Variability of Vapnik-chervonenkis Classes with Applications to Sample Compression Schemes

A Geometric Approach to Sample Compression

It is shown that simple arrangements of hyperplanes in hyperbolic space are shown to represent maximum classes, generalizing the corresponding Euclidean result, and that d-maximum classes corresponding to PL-hyperplane arrangements in Rd have cubical complexes homeomorphic to a d-ball, or equivalently complexes that are manifolds with boundary.

Adaptive Learning with Robust Generalization Guarantees

It is shown that robustgeneralization is a strictly weaker concept, and that there is a learning task that can be carried out subject to robust generalization guarantees, yet cannot be carriedOut subject to differential privacy.

Multiclass Learnability and the ERM principle

A principle is proposed for designing good ERM learners, and this principle is used to prove tight bounds on the sample complexity of learning symmetric multiclass hypothesis classes--classes that are invariant under permutations of label names.