Ying-Lang Chang

Learn More
Automatic summarization is developed to extract the representative contents or sentences from a large corpus of documents. This paper presents a new hierarchical representation of words, sentences and documents in a corpus, and infers the Dirichlet distributions for latent topics and latent themes in word level and sentence level, respectively. The(More)
Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the(More)
Language model is a popular method of exploiting linguistic regularities for document retrieval. To improve retrieval performance, the scheme of relevance feedback is adopted by adjusting the query language model using the information feedback from the retrieved documents. This study presents a new Bayesian learning approach to instantaneous and(More)
  • 1