#### Filter Results:

- Full text PDF available (56)

#### Publication Year

1977

2017

- This year (2)
- Last 5 years (33)
- Last 10 years (54)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Unfortunately, typical dimensionality reduction methods for text, such as latent Dirichlet allocation, often produce low-dimensional subspaces… (More)

Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive collections of interlinked documents in dozens of languages, such as Wikipedia, are now widely available, calling for tools that can characterize content in many languages. We introduce… (More)

- Hanna M. Wallach, David M. Mimno, Andrew McCallum
- NIPS
- 2009

Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document–topic… (More)

- Hanna M. Wallach
- ICML
- 2006

Some models of textual corpora employ text generation methods involving <i>n</i>-gram statistics, while others use latent topic variables inferred using the "bag-of-words" assumption, in which word order is ignored. Previously, these methods have not been combined. In this work, I explore a hierarchical generative probabilistic model that incorporates both… (More)

A natural evaluation metric for statistical topic models is the probability of held-out documents given a trained model. While exact computation of this probability is intractable, several estimators for this probability have been used in the topic modeling literature, including the harmonic mean method and empirical likelihood method. In this paper, we… (More)

- Hanna M. Wallach
- 2004

The task of assigning label sequences to a set of observation sequences arises in many fields, including bioinformatics, computational linguistics and speech recognition [6, 9, 12]. For example, consider the natural language processing task of labeling the words in a sentence with their corresponding part-of-speech (POS) tags. In this task, each word is… (More)

- Hanna M. Wallach
- Annual review of psychology
- 1987

- Hanna M. Wallach
- 2008

This thesis introduces new methods for statistically modelling text using topic models. Topic models have seen many successes in recent years, and are used in a variety of applications, including analysis of news articles, topic-based search interfaces and navigation tools for digital libraries. Despite these recent successes, the field of topic modelling… (More)

- Hanna M. Wallach
- 2002

This thesis explores a number of parameter estimation techniques for conditional random fields, a recently introduced [31] probabilistic model for labelling and segmenting sequential data. Theoretical and practical disadvantages of the training techniques reported in current literature on CRFs are discussed. We hypothesise that general numerical… (More)

- Ryan P. Adams, Hanna M. Wallach, Zoubin J. C. Ghahramani
- AISTATS
- 2010

Deep belief networks are a powerful way to model complex probability distributions. However, it is difficult to learn the structure of a belief network, particularly one with hidden units. The Indian buffet process has been used as a nonparametric Bayesian prior on the structure of a directed belief network with a single infinitely wide hidden layer. Here,… (More)