Learn More
Stacked denoising autoencoders (SDAs) have been successfully used to learn new representations for domain adaptation. Recently, they have attained record accuracy on standard benchmark tasks of sentiment analysis across different text domains. SDAs learn robust data representations by reconstruction, recovering original features from data that are(More)
Automatic image annotation is a difficult and highly relevant machine learning task. Recent advances have significantly improved the state-of-the-art in retrieval accuracy with algorithms based on nearest neighbor classification in carefully learned metric spaces. But this comes at a price of increased computational complexity during training and testing.(More)
One of the most successful semi-supervised learning approaches is co-training for multiview data. In co-training, one trains two classifiers, one for each view, and uses the most confident predictions of the unlabeled data for the two classifiers to “teach each other”. In this paper, we extend co-training to learning scenarios without an explicit multi-view(More)
The goal of machine learning is to develop predictors that generalize well to test data. Ideally, this is achieved by training on very large (infinite) training data sets that capture all variations in the data distribution. In the case of finite training data, an effective solution is to extend the training set with artificially created examples—which,(More)
Recently, machine learning algorithms have successfully entered large-scale real-world industrial applications (e.g. search engines and email spam filters). Here, the CPU cost during test-time must be budgeted and accounted for. In this paper, we address the challenge of balancing the test-time cost and the classifier accuracy in a principled fashion. The(More)
Denoising auto-encoders (DAEs) have been successfully used to learn new representations for a wide range of machine learning tasks. During training, DAEs make many passes over the training dataset and reconstruct it from partial corruption generated from a pre-specified corrupting distribution. This process learns robust representation, though at the(More)
Machine learning algorithms are increasingly used in large-scale industrial settings. Here, the operational cost during test-time has to be taken into account when an algorithm is designed. This operational cost is affected by the average running time and the computation time required for feature extraction. When a diverse set of features is used, the(More)
Machine learning algorithms have successfully entered industry through many real-world applications (e.g. , search engines and product recommendations). In these applications, the test-time CPU cost must be budgeted and accounted for. In this paper, we examine two main components of the test-time CPU cost, classifier evaluation cost and feature extraction(More)
Duality is an important notion for nonlinear programming (NLP). It provides a theoretical foundation for many optimization algorithms. Duality can be used to directly solve NLPs as well as to derive lower bounds of the solution quality which have wide use in other high-level search techniques such as branch and bound. However, the conventional duality(More)