Instance-Based Domain Adaptation in NLP via In-Target-Domain Logistic Approximation

Abstract

In the field of NLP, most of the existing domain adaptation studies belong to the feature-based adaptation, while the research of instance-based adaptation is very scarce. In this work, we propose a new instance-based adaptation model, called in-target-domain logistic approximation (ILA). In ILA, we adapt the source-domain data to the target domain by a logistic approximation. The normalized in-targetdomain probability is assigned as an instance weight to each of the source-domain training data. An instance-weighted classification model is trained finally for the cross-domain classification problem. Compared to the previous techniques, ILA conducts instance adaptation in a dimensionalityreduced linear feature space to ensure efficiency in highdimensional NLP tasks. The instance weights in ILA are learnt by leveraging the criteria of both maximum likelihood and minimum statistical distance. The empirical results on two NLP tasks including text categorization and sentiment classification show that our ILA model has advantages over the state-of-the-art instance adaptation methods, in crossdomain classification accuracy, parameter stability and computational efficiency. Introduction For many NLP tasks, e.g., text categorization, sentiment classification, etc., it is nowadays very easy to obtain a large collection of labeled data from different domains in the vast amount of Internet texts. But not all of them are useful for training a desired target-domain classifier. Thus, it is necessary for us to employ an instance adaptation technique to identify the most important training instances, and increase their weights in the training process. However, to the best of our knowledge, most existing work for domain adaptation in NLP employs feature-based adaptation, while the research of instance-based adaptation is very scarce (Jiang and Zhai, 2007; Pan and Yang, 2010; Xia et al., 2013a). The instance adaptation methods were mainly proposed by the machine learning community in the past. In machine learning, “instance adaptation” is also termed “covariate shift” or “instance selection bias”, where the key problem is density ratio estimation (DRE). Series of kernel-based methods were proposed to solve the DRE problem (Shimodaira, 2000; Huang et al., 2007; Sugiyama et al., 2007; Tsuboi et al., 2008; Kanamori et al., 2009). Among them, the KLIEP algorithm (Sugiyama et al., 2007) is the representative one. It estimates the density ratio based on a linear model in a Gaussian kernel space. However, the kernel-based methods are mostly designed under tasks of low-dimensional continuous distributions. It is hard to apply them directly to tasks of high-dimensional discrete distributions. E.g., if KLIEP is applied to such tasks, it is difficult to choose a suitable kernel function. The kernel function mapping in high-dimensional feature space is also computationally impractical. In this work, we propose a new instance adaptation model, called in-target-domain logistic approximation (ILA), to adapt the source-domain training data to the target domain by a logistic approximation. In ILA, instance adaptation is conducted in a linear feature space, rather than a complex kernel space. A domain-sensitive feature selection method is proposed furthermore to reduce the dimensionality of the linear feature space. Both make ILA efficient for high-dimensional NLP tasks. More recently, Xia et al. (2013b) proposed an instance weighting approach via PU learning (PUIW) for domain adaptation in sentiment classification. Although PUIW is applicable to high-dimensional NLP tasks, the instance weights are learnt by two separated steps in PUIW. The instance weight learning is not efficient, and the adaptation performance depends heavily on the preset value of the calibration parameter. In ILA, the instance weights are Instance-Based Domain Adaptation in NLP via In-Target-Domain Logistic Approximation Rui Xia, Jianfei Yu, Feng Xu, and Shumei Wang Copyright © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence

Extracted Key Phrases

5 Figures and Tables

Cite this paper

@inproceedings{Xia2014InstanceBasedDA, title={Instance-Based Domain Adaptation in NLP via In-Target-Domain Logistic Approximation}, author={Rui Xia and Jianfei Yu and Feng Xu and Shumei Wang}, booktitle={AAAI}, year={2014} }