Learn More
In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources'(More)
In the past decade, commercial crowdsourcing platforms have revolutionized the ways of classifying and annotating data, especially for large datasets. Obtaining labels for a single instance can be inexpensive, but for large datasets, it is important to allocate budgets wisely. With limited budgets, requesters must trade-off between the quantity of labeled(More)
The demand for automatic extraction of true information (i.e., truths) from conflicting multi-source data has soared recently. A variety of <i>truth discovery</i> methods have witnessed great successes via jointly estimating source reliability and truths. All existing truth discovery methods focus on providing a point estimator for each object's truth, but(More)
Predicting the future health information of patients from the historical Electronic Health Records (EHR) is a core research task in the development of personalized healthcare. Patient EHR data consist of sequences of visits over time, where each visit contains multiple medical codes, including diagnosis, medication, and procedure codes. The most important(More)
Topical Influential User Analysis (TIUA) is an important technique in Twitter. Existing techniques neglected relationship strength between users, which is a crucial aspect for TIUA. For modeling relationship strength, interaction frequency between users has not been considered in previous works. In this paper, we firstly introduce a poisson regression-based(More)
Monitoring the future health status of patients from the historical Electronic Health Record (EHR) is a core research topic in predictive healthcare. The most important challenges are to model the temporality of sequential EHR data and to interpret the prediction results. In order to reduce the future risk of diseases, we propose a multi-task framework that(More)
Drug side-effects become a worldwide public health concern, which are the fourth leading cause of death in the United States. Pharmaceutical industry has paid tremendous effort to identify drug side-effects during the drug development. However, it is impossible and impractical to identify all of them. Fortunately, drug side-effects can also be reported on(More)
Discovering topics in short texts, such as news titles and tweets, has become an important task for many content analysis applications. However, due to the lack of rich context information in short texts, the performance of conventional topic models on short texts is usually unsatisfying. In this paper, we propose a novel topic model for short text corpus(More)
In the age of big data, information for the same entity can be obtained from different sources, which is inevitably conflicting. Therefore, aggregation methods are needed to identify the trustworthy information from such conflicting data. Truth discovery, which improves the aggregation results by estimating source trustworthiness and discovering truths(More)