Fuzhen Zhuang

Learn More
Cross-domain text categorization targets on adapting the knowledge learnt from a labeled source-domain to an unlabeled target-domain, where the documents from the source and target domains are drawn from different distributions. However, in spite of the different distributions in raw word features, the associations between word clusters (conceptual(More)
Recent years have witnessed an increased interest in transfer learning. Despite the vast amount of research performed in this field, there are remaining challenges in applying the knowledge learnt from multiple source domains to a target domain. First, data from multiple source domains can be semantically related, but have different distributions. It is not(More)
Transfer learning has attracted a lot of attention in the past decade. One crucial research issue in transfer learning is how to find a good representation for instances of different domains such that the divergence between domains can be reduced with the new representation. Recently, deep learning has been proposed to learn more robust or higherlevel(More)
The distribution difference among multiple data domains has been considered for the cross-domain text classification problem. In this study, we show two new observations along this line. First, the data distribution difference may come from the fact that different domains use different key words to express the same concept. Second, the association between(More)
Transfer learning focuses on the learning scenarios when the test data from target domains and the training data from source domains are drawn from similar but different data distribution with respect to the raw features. Some recent studies argued that the high-level concepts (e.g. word clusters) can help model the data distribution difference, and thus(More)
The distribution difference among multiple domains has been exploited for cross-domain text categorization in recent years. Along this line, we show two new observations in this study. First, the data distribution difference is often due to the fact that different domains use different index words to express the same concept. Second, the association between(More)
Time series shapelets are small and local time series subsequences which are in some sense maximally representative of a class. E.Keogh uses distance of the shapelet to classify objects. Even though shapelet classification can be interpretable and more accurate than many state-of-the-art classifiers, there is one main limitation of shapelets, i.e. shapelet(More)
Ensemble learning with output from multiple supervised and unsupervised models aims to improve the classification accuracy of supervised model ensemble by jointly considering the grouping results from unsupervised models. In this paper we cast this ensemble task as an unconstrained probabilistic embedding problem. Specifically, we assume both objects and(More)