#### Filter Results:

#### Publication Year

2011

2016

#### Publication Type

#### Co-author

#### Publication Venue

#### Data Set Used

#### Key Phrases

Learn More

We develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchas-tic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is… (More)

Learning meaningful topic models with massive document collections which contain millions of documents and billions of tokens is challenging because of two reasons. First, one needs to deal with a large number of topics (typically on the order of thousands). Second, one needs a scalable and efficient way of distributing the computation across multiple… (More)

Crowdsourcing-based user studies have become increasingly popular in information visualization (InfoVis) and visual analytics (VA). However, it is still unclear how to deal with some undesired crowdsourcing workers, especially those who submit random responses simply to gain wages (random clickers, henceforth). In order to mitigate the impacts of random… (More)

We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification. It shows competitive performance on standard benchmark datasets against a number of other representative algorithms in the literature. We also discuss extensions of RoBiRank… (More)

We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification. The algorithm shows a very competitive performance on standard benchmark datasets against other representative algorithms in the literature. On the other hand, in large scale… (More)

We describe the first sub-quadratic sampling algorithm for the Multiplicative Attribute Graph Model (MAGM) of Kim and Leskovec (2010). We exploit the close connection between MAGM and the Kronecker Product Graph Model (KPGM) of Leskovec et al. (2010), and show that to sample a graph from a MAGM it suffices to sample small number of KPGM graphs and quilt… (More)

Embedding words in a vector space has gained a lot of attention in recent years. While state-of-the-art methods provide efficient computation of word similarities via a low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that word embedding can be naturally viewed as a ranking problem due to the ranking nature… (More)

Many machine learning algorithms minimize a regularized risk, and stochastic optimization is widely used for this task. When working with massive data, it is desirable to perform stochastic optimization in parallel. Unfortunately, many existing stochastic algorithms cannot be parallelized efficiently. In this paper we show that one can rewrite the… (More)

Multinomial logistic regression is a popular tool in the arsenal of machine learning algorithms, yet scaling it to datasets with very large number of data points and classes has not been trivial. This is primarily because one needs to compute the log-partition function on every data point. This makes distributing the computation hard. In this paper, we… (More)

- Hsiang-Fu Yu, Hyokun Yun
- 2016

Analyzing the massive datasets of today's applications will require scalable and sophisticated machine-learning methods. NOMAD, a novel nomadic framework, combines two common approaches: stochastic optimization and distributed computing. T oday's applications often contain datasets that are too big to fit in a single comput-er's main memory. Analyzing these… (More)