Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they… (More)

Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive collections of… (More)

Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters” have little practical… (More)

A natural evaluation metric for statistical topic models is the probability of held-out documents given a trained model. While exact computation of this probability is intractable, several estimators… (More)

Some models of textual corpora employ text generation methods involving n-gram statistics, while others use latent topic variables inferred using the "bag-of-words" assumption, in which word order is… (More)

The problem of how three-dimensional form is perceived in spite of the fact that pertinent stimulation consists only in two-dimensional retinal images has been only partly solved. Much is known about… (More)

The task of assigning label sequences to a set of observation sequences arises in many fields, including bioinformatics, computational linguistics and speech recognition [6, 9, 12]. For example,… (More)

This thesis introduces new methods for statistically modelling text using topic models. Topic models have seen many successes in recent years, and are used in a variety of applications, including… (More)

Computational social science is an emerging research area at the intersection of computer science, statistics, and the social sciences, in which novel computational methods are used to answer… (More)