Yang Bao

Learn More
Cross-domain text classification aims to automatically train a precise text classifier for a target domain by using labelled text data from a related source domain. To this end, one of the most promising ideas is to induce a new feature representation so that the distributional difference between domains can be reduced and a more accurate classifier can be(More)
Trust has been used to replace or complement rating-based similarity in recommender systems, to improve the accuracy of rating prediction. However, people trusting each other may not always share similar preferences. In this paper, we try to fill in this gap by decomposing the original single-aspect trust information into four general trust aspects, i.e.(More)
Although users' preference is semantically reflected in the free-form review texts, this wealth of information was not fully exploited for learning recommender models. Specifically, almost all existing recommendation algorithms only exploit rating scores in order to find users' preference, but ignore the review texts accompanied with rating information. In(More)
It is indispensable for users to evaluate the trust-worthiness of other users (referred to as advisors), to cope with possible misleading opinions provided by them. Advisors' misleading opinions may be induced by their dishonesty, subjectivity difference with users, or both. Existing approaches do not well distinguish the two different causes. In this(More)
DNA methylation was suggested as the promising biomarker for lung cancer diagnosis. However, it is a great challenge to search for the optimal combination of methylation biomarkers to obtain maximum diagnostic performance. In this study, we developed a panel of DNA methylation biomarkers and validated their diagnostic efficiency for non-small cell lung(More)
Keywords: Online review system Electronic commerce National culture difference Cross-cultural study Empirical analysis a b s t r a c t Online reviews, as one kind of quality indicator of products or service, are becoming increasingly important in influencing purchase decisions of prospective consumers on electronic commerce websites. With the fast growth of(More)
In this paper, we propose a novel problem of summarizing textual corporate risk factor disclosure, which aims to simultaneously infer the risk types across corpus and assign each risk factor to its most probable risk type. To solve the problem, we develop a variation of LDA topic model called Sent-LDA. The variational EM learning algorithm, which guarantees(More)
In this paper, we describe our submission to the TREC 2011 Microblog track. We first use URLs as a clue to discover and remove the spam tweets. Then we use both Lucene and Indri to generate a ranked list of results for each query, together with their relevance scores. After that, we use the scores to find out useful hashtags relevant to the query, therefore(More)