Learn More
—Real world Web mining applications usually have different requirements, such as massive data processing, low system latency, and high scalability. In order to meet these different requirements, we proposed a distributed text mining system with a layered architecture that divides the system functions into three layers, namely, the crawling and storage(More)
Community detection is a long-standing yet very difficult task in social network analysis. It becomes more challenging as many online social networking sites are evolving into super-large scales. Numerous methods have been proposed for community detection from massive networks, but how to reconcile the partitioning efficiency and the community quality(More)
While agreement-based joint training has proven to deliver state-of-the-art alignment accuracy, the produced word alignments are usually restricted to one-to-one mappings because of the hard constraint on agreement. We propose a general framework to allow for arbitrary loss functions that measure the disagreement between asymmetric alignments. The loss(More)
Probabilistic frequent pattern mining over uncertain data has received a great deal of attention recently due to the wide applications of uncertain data. Similar to its counterpart in deterministic databases, however, prob-abilistic frequent pattern mining suffers from the same problem of generating an exponential number of result patterns. The large number(More)
BACKGROUND Patients with ulcerative colitis (UC) are predisposed to colitis-associated colorectal cancer (CAC). However, the transcriptional mechanism of the transformation from UC to CAC is not fully understood. METHODOLOGY Firstly, we showed that CAC and non-UC-associated CRC were very similar in gene expression. Secondly, based on multiple datasets for(More)
We introduce an agreement-based approach to learning parallel lexicons and phrases from non-parallel corpora. The basic idea is to encourage two asym-metric latent-variable translation models (i.e., source-to-target and target-to-source) to agree on identifying latent phrase and word alignments. The agreement is defined at both word and phrase levels. We(More)