- Published 2011 in AAAI

We concern the problem of learning a Mahalanobis distance metric for improving nearest neighbor classification. Our work is built upon the large margin nearest neighbor (LMNN) classification framework. Due to the semidefiniteness constraint in the optimization problem of LMNN, it is not scalable in terms of the dimensionality of the input data. The original LMNN solver partially alleviates this problem by adopting alternating projection methods instead of standard interior-point methods. Still, at each iteration, the computation complexity is at least O(D) (D is the dimension of input data). In this work, we propose a column generation based algorithm to solve the LMNN optimization problem much more efficiently. Our algorithm is much more scalable in that at each iteration, it does not need full eigen-decomposition. Instead, we only need to find the leading eigenvalue and its corresponding eigenvector, which is of O(D) complexity. Experiments show the efficiency and efficacy of our algorithms. Introduction The distance metric learning is an important topic in machine learning and has been successfully integrated with classification and clustering methods, including k-nearest neighbor (kNN) and k-means clustering. The basic idea is to learn a metric with which distances between examples belonging to the same class are minimized, while distances between different classes are maximized. For instance, given a bunch of points (xi,xj), we are interesting in designing a quadratic Mahalanobis distance distij = ‖xi − xj‖M = √ (xi − xj) M(xi − xj) with M 0. Here M is a positive semidefinite (p.s.d) matrix. Therefore, a constrained semidefinite programming (SDP) is usually involved, which makes it a difficult problem to solve. ∗C. Shen’s research was supported in part by the Australian Research Council through its special research initiative in bionic vision science and technology grant to Bionic Vision Australia. †Z. Hao’s contribution was made when visiting NICTA Canberra Research Laboratory. ‡NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Center of Excellence program. Copyright c © 2011, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Since the first metric learning algorithm based on SDP was proposed by (Xing et al. 2002) for learning a Mahalanobis metric for unsupervised clustering, much research interest has been focused on distance metric learning, including Neighborhood Component Analysis (NCA) (Goldberger et al. 2004), Maximally Collapsing Metric Learning (MCML) (Globerson and Roweis 2005), Large Margin Nearest Neighbor (LMNN) (Weinberger, Blitzer, and Saul 2005), and Positive Semidefinite Boosting (PSDBoost) (Shen, Welsh, and Wang 2008). The work presented here is mainly motivated by the work of LMNN and PSDBoost. The idea of LMNN is to maximize the margin between different classes, while keeping the distances between same-class instances as minimum as possible. LMNN has been reported the state-of-the-art classification performance (Weinberger, Blitzer, and Saul 2005). However, a drawback of LMNN is that it is not scalable in terms of dimensionality of the training data due to the semidefinite-ness constraint. The original implementation of LMNN partially remedies this problem by employing an alternating projection method instead of using conventional interior-point methods. However, the computation complexity is still high: at each iteration, the complexity is at least O(D) with D being the dimension of the input data. At each iteration, a full eigen-decomposition is needed to project the intermediate solution to the positive semidefinite cone. Another work that is similar to ours is PSDBoost (Shen, Welsh, and Wang 2008), where the SDP problem involved in metric learning is converted into an additive Linear Programming (LP) problem based on the observation that any positive semidefinite matrix can be decomposed into a sum of linear positive combination of trace-one rankone matrices. Although a simplified version of PSDBoost is proposed later—BoostMetric in (Shen et al. 2009)—which performs a stage-wise learning procedure for minimizing an exponential loss function, PSDBoost is still advantageous in terms of the convergence speed due to the totally corrective property, and flexible in optimizing any type of loss functions. We implement our algorithm in this work using both the exponential loss and logistic loss. Note that both PSDBoost and BoostMetric solve a simplified version of the original LMNN optimization problem in the sense that they ignore the within-class distance information. In contrast to the within-class distance as a regularization term in LMNN, Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence

@inproceedings{Park2011EfficientlyLA,
title={Efficiently Learning a Distance Metric for Large Margin Nearest Neighbor Classification},
author={Kyoungup Park and Chunhua Shen and Zhihui Hao and Junae Kim},
booktitle={AAAI},
year={2011}
}