• Publications
  • Influence
Metric Learning by Collapsing Classes
TLDR
An algorithm for learning a quadratic Gaussian metric (Mahalanobis distance) for use in classification tasks and discusses how the learned metric may be used to obtain a compact low dimensional feature representation of the original input space, allowing more efficient classification with very little reduction in performance. Expand
Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations
TLDR
A novel message passing algorithm for approximating the MAP problem in graphical models that is derived via block coordinate descent in a dual of the LP relaxation of MAP, but does not require any tunable parameters such as step size or tree weights. Expand
Tightening LP Relaxations for MAP using Message Passing
TLDR
This work proposes to solve the cluster selection problem monotonically in the dual LP, iteratively selecting clusters with guaranteed improvement, and quickly re-solving with the added clusters by reusing the existing solution. Expand
Information Bottleneck for Gaussian Variables
TLDR
A formal definition of the general continuous IB problem is given and an analytic solution for the optimal representation for the important case of multivariate Gaussian variables is obtained, in terms of the eigenvalue spectrum. Expand
Nightmare at test time: robust learning by feature deletion
TLDR
A new algorithm for avoiding single feature over-weighting is introduced by analyzing robustness using a game theoretic formalization, and classifiers which are optimally resilient to deletion of features in a minimax sense are developed. Expand
Euclidean Embedding of Co-occurrence Data
TLDR
This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space, based on their co-occurrence statistics, and shows that it consistently and significantly outperforms standard methods of statistical correspondence modeling. Expand
Structured Prediction Models via the Matrix-Tree Theorem
TLDR
It is shown how partition functions and marginals for directed spanning trees can be computed by an adaptation of Kirchhoff’s Matrix-Tree Theorem by using the algorithm in training both log-linear and max-margin dependency parsers. Expand
Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs
TLDR
This work provides the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations, and shows that learning is NP-complete in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time. Expand
SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data
TLDR
This work proves convergence rates of SGD to a global minimum and provides generalization guarantees for this global minimum that are independent of the network size, and shows that SGD can avoid overfitting despite the high capacity of the model. Expand
Learning Bayesian Network Structure using LP Relaxations
TLDR
This work proposes to solve the combinatorial problem ofding the highest scoring Bayesian network structure from data by maintaining an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. Expand
...
1
2
3
4
5
...