Learn More
Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for model-ing high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical(More)
We describe distributed algorithms for two widely-used topic models, namely the Latent Dirich-let Allocation (LDA) model, and the Hierarchical Dirichet Process (HDP) model. In our distributed algorithms the data is partitioned across separate processors and inference is done in a parallel, distributed fashion. We propose two distributed algorithms for LDA.(More)
We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or " topic " model – using distributed computation , where each of ¢ processors only sees £ ¥ ¤ ¦ ¢ of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local(More)
In this paper we introduce a novel collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model. Our new method results in significant speedups on real world text corpora. Conventional Gibbs sampling schemes for LDA require O(K) operations per sample where K is the number of topics in the model. Our proposed method draws(More)
Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known unsupervised learning frameworks: Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Processes (HDP). In the proposed approach, the data are distributed(More)
Real-world relational data sets, such as social networks, often involve measurements over time. We propose a Bayesian nonparamet-ric latent feature model for such data, where the latent features for each actor in the network evolve according to a Markov process, extending recent work on similar models for static networks. We show how the number of features(More)
Software traceability is a fundamentally important task in software engineering. The need for automated traceability increases as projects become more complex and as the number of artifacts increases. We propose an automated technique that combines traceability with a machine learning technique known as topic modeling. Our approach automatically records(More)
We present <i>TopicNets</i>, a Web-based system for visual and interactive analysis of large sets of documents using statistical topic models. A range of visualization types and control mechanisms to support knowledge discovery are presented. These include corpus- and document-specific views, iterative topic modeling, search, and visual filtering.(More)
The development of statistical models for continuous-time longitudinal network data is of increasing interest in machine learning and social science. Leveraging ideas from survival and event history analysis, we introduce a continuous-time regression modeling framework for network event data that can incorporate both time-dependent network statistics and(More)
Composite likelihood methods provide a wide spectrum of computationally efficient techniques for statistical tasks such as parameter estimation and model selection. In this paper, we present a formal connection between the optimization of composite likelihoods and the well-known con-trastive divergence algorithm. In particular, we show that composite(More)