In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent topical facets in a Weblog collection, the subtopics in the results of an ad hoc query, and their associated(More)
Biologists often need to find information about genes whose function is not described in the genome databases. Currently they must try to search disparate biomedical literature to locate relevant articles, and spend considerable efforts reading the retrieved articles in order to locate the most relevant knowledge about the gene. We describe our software,(More)
A common task in many text mining applications is to generate a multi-faceted overview of a topic in a text collection. Such an overview not only directly serves as an informative summary of the topic, but also provides a detailed view of navigation to different facets of the topic. Existing work has cast this problem as a categorization problem and(More)
The spatial clustering of genes across different genomes has been used to study important problems in comparative genomics, from identification of operons to detection of homologous regions. A set of formal models and algorithms of so-called max-gap clusters have been proposed recently. These algorithms guarantee the completeness of the results, and the(More)
Most knowledge accumulated through scientific discoveries in genomics and related biomed-ical disciplines is buried in the vast amount of biomedical literature. Since understanding gene regulations is fundamental to biomedical research, summarizing all the existing knowledge about a gene based on literature is highly desirable to help biologists digest the(More)
MOTIVATION Spatial clusters of genes conserved across multiple genomes provide important clues to gene functions and evolution of genome organization. Existing methods of identifying these clusters often made restrictive assumptions, such as exact conservation of gene order, and relied on heuristic algorithms. RESULTS We developed a very efficient(More)
Hepatocellular carcinoma (HCC) is one of the most common malignancies worldwide and the third leading cause of cancer mortality. Despite continuing development of new therapies, prognosis for patients with HCC remains extremely poor. In recent years, control of organ size becomes a hot topic in HCC development. The Hippo signaling pathway has been(More)
The University of Illinois at Urbana-Champaign (UIUC) participated in TREC 2007 Genomics Track. Our general goal of participation is to apply language model-based approaches to the genomics retrieval task and study how we may extend the standard language models to accommodate two special needs for this year's genomics retrieval task: (1) gene synonym(More)
We report experiment results from the collaborative participation of UIUC and MUSC in the TREC 2005 Ge-nomics Track. We participated in both the adhoc task and the categorization task, and studied the use of some mixture language models in these tasks. Experiment results show that a structured theme-based language mod-eling approach is effective in(More)
