Learn More
We develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is(More)
Learning meaningful topic models with massive document collections which contain millions of documents and billions of tokens is challenging because of two reasons. First, one needs to deal with a large number of topics (typically on the order of thousands). Second, one needs a scalable and efficient way of distributing the computation across multiple(More)
Crowdsourcing-based user studies have become increasingly popular in information visualization (InfoVis) and visual analytics (VA). However, it is still unclear how to deal with some undesired crowdsourcing workers, especially those who submit random responses simply to gain wages (random clickers, henceforth). In order to mitigate the impacts of random(More)
We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification. The algorithm shows a very competitive performance on standard benchmark datasets against other representative algorithms in the literature. On the other hand, in large scale(More)
Embedding words in a vector space has gained a lot of attention in recent years. While state-of-the-art methods provide efficient computation of word similarities via a low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that word embedding can be naturally viewed as a ranking problem due to the ranking nature(More)
We describe the first sub-quadratic sampling algorithm for the Multiplicative Attribute Graph Model (MAGM) of Kim and Leskovec (2010). We exploit the close connection between MAGM and the Kronecker Product Graph Model (KPGM) of Leskovec et al. (2010), and show that to sample a graph from a MAGM it suffices to sample small number of KPGM graphs and quilt(More)
Many machine learning algorithms minimize a regularized risk, and stochastic optimization is widely used for this task. When working with massive data, it is desirable to perform stochastic optimization in parallel. Unfortunately, many existing stochastic algorithms cannot be parallelized efficiently. In this paper we show that one can rewrite the(More)
In this talk I will describe the first sub-quadratic sampling algorithm for the Multiplicative Attribute Graph Model (MAGM, Kim and Leskove 2010). To design our algorithm, we first define a stochastic ball-dropping process (BDP). Although a special case of this process was introduced as an efficient approximate sampling algorithm for the Kronecker Product(More)