Corpus ID: 3869017

LSH Microbatches for Stochastic Gradients: Value in Rearrangement

  title={LSH Microbatches for Stochastic Gradients: Value in Rearrangement},
  author={Eliav Buchnik and E. Cohen and Avinatan Hassidim and Y. Matias},
Metric embeddings are immensely useful representation of interacting entities such as videos, users, search queries, online resources, words, and more. Embeddings are computed by optimizing a loss function of the form of a sum over provided associations so that relation of embedding vectors reflects strength of association. Moreover, the resulting embeddings allow us to predict the strength of unobserved associations. Typically, the optimization performs stochastic gradient updates on… Expand


DeepWalk: online learning of social representations
DeepWalk is an online learning algorithm which builds useful incremental results, and is trivially parallelizable, which make it suitable for a broad class of real world applications such as network classification, and anomaly detection. Expand
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure. Expand
Large-scale matrix factorization with distributed stochastic gradient descent
A novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements, called DSGD, that can be fully distributed and run on web-scale datasets using, e.g., MapReduce. Expand
Factorization meets the neighborhood: a multifaceted collaborative filtering model
The factor and neighborhood models can now be smoothly merged, thereby building a more accurate combined model and a new evaluation metric is suggested, which highlights the differences among methods, based on their performance at a top-K recommendation task. Expand
Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments
A sampling framework based on coordinated weighted samples that is suited for multiple weight assignments is developed and estimators that are orders of magnitude tighter than previously possible are obtained. Expand
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight. Expand
Variance Reduction in SGD by Distributed Importance Sampling
This work proposes a framework for distributing deep learning in which one set of workers search for the most informative examples in parallel while a single worker updates the model on examples selected by importance sampling, which leads the model to update using an unbiased estimate of the gradient. Expand
Distributed Representations of Words and Phrases and their Compositionality
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling. Expand
Restricted Boltzmann machines for collaborative filtering
This paper shows how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM's), can be used to model tabular data, such as user's ratings of movies, and demonstrates that RBM's can be successfully applied to the Netflix data set. Expand
Stochastic Learning on Imbalanced Data: Determinantal Point Processes for Mini-batch Diversification
Balanced Mini-batch SGD can be considered a generalization of stratified sampling to cases where no discrete features exist to bin the data into groups and results more interpretable and diverse features in unsupervised setups, and in better classification accuracies in supervised setups. Expand