Another Hierarchical Topic Model

Abstract

We describe a hierarchical topic model. We assume that there are various levels of specificity in a document collection. For example, a collection of mailing list posts might be organized according to sentence, paragraph, post and thread. We describe a model that captures the structure at each level of the hierarchy. We use a trace norm penalty on a matrix composed of natural parameters for the multinomial model. 1 The Basic Model We consider a probabilitistic model of text. We assume that a set of documents is generated in two stages. First, a set of document models are generated according to a prior model. Then, words for each document are generated according to that document’s model. We assume that a document’s term frequencies are generated independently of other documents’, when conditioning on the document’s model. We use a trace norm to penalize document models’ divergence from the prior model, effectively placing a Gaussian prior on the singular vectors of the matrix composed of stacked document model parameter vectors. We use the rest of this section to describe the model in detail. Let φ be the multinomial natural parameter vector for the “prior” model; φ represents somewhat of a “center” from which the individual document models emanate. Each document has its own multinomial model, with a natural parameter vector, θi. We define the prior on the document models as a characterization of where the document models are located with respect to the prior

Cite this paper

@inproceedings{Rennie2005AnotherHT, title={Another Hierarchical Topic Model}, author={Jason D. M. Rennie}, year={2005} }