Learn More
Model choice is a major methodological issue in the explosive growth of data-mining models involving latent structure for clustering and classification, especially because models often have different parameterizations and very different specifications and constraints. Here, we work from a general formulation of hierarchical Bayesian mixed-membership models(More)
There has been an explosive growth of data-mining models involving latent structure for clustering and classification. While having related objectives these models use different parameter-izations and often very different specifications and constraints. Model choice is thus a major methodological issue and a crucial practical one for applications. In this(More)
PNAS article classification is rooted in long-standing disciplinary divisions that do not necessarily reflect the structure of modern scientific research. We reevaluate that structure using latent pattern models from statistical machine learning, also known as mixed-membership models, that identify semantic structure in co-occurrence of words in the(More)
Internet health forums are a rich textual resource with content generated through free exchanges among patients and, in certain cases, health professionals. We tackle the problem of retrieving clinically relevant information from such forums, with relevant topics being defined from clinical auto-questionnaires. Texts in forums are largely unstructured and(More)
We establish strong large deviation results for an arbitrary sequence of random variables under some assumptions on the normalized cumulant generating function. In other words, we give asymptotic expansions for the tail probabilities of the same kind as those obtained by Bahadur and Rao (Ann. Math. Stat. 31:1015–1027, 1960) for the sample mean. We consider(More)
The Dirichlet process prior can be used as a prior distribution on the class assignment of a set of objects. This can be naturally implemented in hierarchical Bayesian mixed-membership models (HBMMM) and these encompass a wide variety of models with latent structure for clustering and classification. As in most clustering methods, a principal aspect of(More)
  • 1