#### Filter Results:

- Full text PDF available (31)

#### Publication Year

2007

2017

- This year (9)
- Last 5 years (27)
- Last 10 years (35)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Amod J. G. Anandkumar, Rong Ge, Daniel J. Hsu, Sham M. Kakade, Matus Telgarsky
- Journal of Machine Learning Research
- 2014

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their low-order observable moments (typically, of secondand third-order). Specifically,… (More)

- Matus Telgarsky, Andrea Vattani
- AISTATS
- 2010

Hartigan’s method for k-means clustering is the following greedy heuristic: select a point, and optimally reassign it. This paper develops two other formulations of the heuristic, one leading to a number of consistency properties, the other showing that the data partition is always quite separated from the induced Voronoi partition. A characterization of… (More)

- Matus Telgarsky
- COLT
- 2016

For any positive integer k, there exist neural networks with Θ(k) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2) nodes. This result is proved here for a class of nodes termed semi-algebraic gates which includes the common… (More)

- Matus Telgarsky
- ArXiv
- 2015

This note provides a family of classification problems, indexed by a positive integer k, where all shallow networks with fewer than exponentially (in k) many nodes exhibit error at least 1/6, whereas a deep network with 2 nodes in each of 2k layers achieves zero error, as does a recurrent network with 3 distinct nodes iterated k times. The proof is… (More)

- Peter Bartlett, Dylan J. Foster, Matus Telgarsky
- ArXiv
- 2017

This paper presents a margin-based multiclass generalization bound for neural networks which scales with their margin-normalized spectral complexity : their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor. This bound is empirically investigated for a standard AlexNet network on the… (More)

- Matus Telgarsky
- Journal of Machine Learning Research
- 2012

Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak… (More)

- Matus Telgarsky, Sanjoy Dasgupta
- ICML
- 2012

This manuscript develops the theory of agglomerative clustering with Bregman divergences. Geometric smoothing techniques are developed to deal with degenerate clusters. To allow for cluster models based on exponential families with overcomplete representations, Bregman divergences are developed for nondifferentiable convex functions.

Can we effectively learn a nonlinear representation in time comparable to linear learning? We describe a new algorithm that explicitly and adaptively expands higher-order interaction features over base linear representations. The algorithm is designed for extreme computational efficiency, and an extensive experimental study shows that its… (More)

- Maxim Raginsky, Alexander Rakhlin, Matus Telgarsky
- COLT
- 2017

Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is added to an unbiased estimate of the gradient at each iteration. This modest change allows SGLD to escape local minima and suffices to guarantee asymptotic convergence to global minimizers for sufficiently… (More)

- Matus Telgarsky
- NIPS
- 2011

This manuscript considers the convergence rate of boosting under a large class of losses, including the exponential and logistic losses, where the best previous rate of convergence was O(exp(1/ )). First, it is established that the setting of weak learnability aids the entire class, granting a rate O(ln(1/ )). Next, the (disjoint) conditions under which the… (More)