Learn More
In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent topical facets in a Weblog collection, the subtopics in the results of an ad hoc query, and their associated(More)
In 2003, China launched a heavily subsidized voluntary health insurance program for rural residents. We combine differences-in-differences with matching methods to obtain impact estimates, using data collected from program administrators, health facilities and households. The scheme has increased outpatient and inpatient utilization, and has reduced the(More)
DNA methylation systems are well characterized in vertebrates, but methylation in Drosophila melanogaster and other invertebrates remains controversial. Using the recently sequenced honey bee genome, we present a bioinformatic, molecular, and biochemical characterization of a functional DNA methylation system in an insect. We report on catalytically active(More)
Honey bees (Apis mellifera) undergo an age-related, socially regulated transition from working in the hive to foraging, which is associated with changes in the expression of thousands of genes in the brain. To begin to study the cis-regulatory code underlying this massive social regulation of gene expression, we used the newly sequenced honey bee genome to(More)
Biologists often need to find information about genes whose function is not described in the genome databases. Currently they must try to search disparate biomedical literature to locate relevant articles, and spend considerable efforts reading the retrieved articles in order to locate the most relevant knowledge about the gene. We describe our software,(More)
Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs) and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat(More)
The spatial clustering of genes across different genomes has been used to study important problems in comparative genomics, from identification of operons to detection of homologous regions. A set of formal models and algorithms of so-called max-gap clusters have been proposed recently. These algorithms guarantee the completeness of the results, and the(More)
MOTIVATION Spatial clusters of genes conserved across multiple genomes provide important clues to gene functions and evolution of genome organization. Existing methods of identifying these clusters often made restrictive assumptions, such as exact conservation of gene order, and relied on heuristic algorithms. RESULTS We developed a very efficient(More)
A common task in many text mining applications is to generate a multi-faceted overview of a topic in a text collection. Such an overview not only directly serves as an informative summary of the topic, but also provides a detailed view of navigation to different facets of the topic. Existing work has cast this problem as a categorization problem and(More)
Most knowledge accumulated through scientific discoveries in genomics and related biomed-ical disciplines is buried in the vast amount of biomedical literature. Since understanding gene regulations is fundamental to biomedical research, summarizing all the existing knowledge about a gene based on literature is highly desirable to help biologists digest the(More)