Learn More
How can one build a distributed framework that allows efficient deployment of a wide spectrum of modern advanced machine learning (ML) programs for industrial-scale problems using Big Models (100s of billions of parameters) on Big Data (terabytes or petabytes)- Contemporary parallelization strategies employ fine-grained operations and scheduling beyond the(More)
A major bottleneck to applying advanced ML programs at industrial scales is the migration of an academic implementation, often specialized for a small, well-controlled computer platform such as desktop PCs and small lab-clusters, to a big, less predicable platform such as a corporate cluster or the cloud. This poses enormous challenges: how does one train(More)
Logistic-normal topic models can effectively discover correlation structures among latent topics. However, their inference remains a challenge because of the non-conjugacy between the logistic-normal prior and multinomial topic mixing proportions. Existing algorithms either make restricting mean-field assumptions or are not scalable to large-scale(More)
Distributed machine learning has typically been approached from a data parallel perspective, where big data are partitioned to multiple workers and an algorithm is executed concurrently over different data subsets under various synchronization schemes to ensure speed-up and/or correctness. A sibling problem that has received relatively less attention is how(More)
When building large-scale machine learning (ML) programs, such as massive topics models or deep networks with up to trillions of parameters and training examples, one usually assumes that such massive tasks can only be attempted with industrial-sized clusters with thousands of nodes, which are out of reach for most practitioners or academic researchers. We(More)
Machine learning (ML) algorithms are commonly applied to big data, using distributed systems that partition the data across machines and allow each machine to read and update all ML model parameters --- a strategy known as data parallelism. An alternative and complimentary strategy, model parallelism, partitions the model parameters for non-shared parallel(More)
Regularization has played a key role in deriving sensible estimators in high dimensional statistical inference. A substantial amount of recent works has argued for nonconvex regularizers in favor of their superior theoretical properties and excellent practical performances. In a different but analogous vein, nonconvex loss functions are promoted because of(More)
Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually over-weighted by document word counts; and 2) existing variational inference methods make strict mean-field assumptions. We address these issues by: 1) introducing a regularization constant to better balance the(More)