Learn More
Mining of transliterations from comparable or parallel text can enhance natural language processing applications such as machine translation and cross language information retrieval. This paper presents an enhanced transliteration mining technique that uses a generative graph reinforcement model to infer mappings between source and target character(More)
We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (Svrg) methods for them. Svrg and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (Sgd); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove(More)
Recently there has been substantial interest in spectral methods for learning dy-namical systems. These methods are popular since they often offer a good tradeoff between computational and statistical efficiency. Unfortunately, they can be difficult to use and extend in practice: e.g., they can make it difficult to incorporate prior information such as(More)
We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have been shown to outperform SGD, both theoretically and empirically. However, asynchronous versions of these algorithms—a(More)
A single, stationary topic model such as latent Dirichlet allocation is inappropriate for modeling corpora that span long time periods, as the popularity of topics is likely to change over time. A number of models that incorporate time have been proposed, but in general they either exhibit limited forms of temporal variation, or require computationally(More)
We develop randomized block coordinate descent (CD) methods for linearly constrained convex optimization. Unlike other large-scale CD methods, we do not assume the constraints to be separable, but allow them be coupled linearly. To our knowledge, ours is the first CD method that allows linear coupling constraints, without making the global iteration(More)
What does a user do when he logs in to the Twitter website? Does he merely browse through the tweets of all his <i>friends</i> as a source of information for his own tweets, or does he simply tweet a message of his own personal interest? Does he skim through the tweets of all his friends or only of a selected few? A number of factors might influence a user(More)
Recently there has been substantial interest in predictive state methods for learning dynamical systems: these algorithms are popular since they often offer a good tradeoff between computational speed and statistical efficiency. Despite their desirable properties, though, predictive state methods can sometimes be difficult to use in practice. E.g., in(More)