Corpus ID: 73728807

Everything old is new again: A multi-view learning approach to learning using privileged information and distillation

@article{Wang2019EverythingOI,
  title={Everything old is new again: A multi-view learning approach to learning using privileged information and distillation},
  author={Weiran Wang},
  journal={ArXiv},
  year={2019},
  volume={abs/1903.03694}
}
  • Weiran Wang
  • Published 2019
  • Computer Science, Mathematics
  • ArXiv
We adopt a multi-view approach for analyzing two knowledge transfer settings---learning using privileged information (LUPI) and distillation---in a common framework. Under reasonable assumptions about the complexities of hypothesis spaces, and being optimistic about the expected loss achievable by the student (in distillation) and a transformed teacher predictor (in LUPI), we show that encouraging agreement between the teacher and the student leads to reduced search space. As a result, improved… Expand
Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training
TLDR
This work considers weakly supervised approaches for training aspect classifiers that only require the user to provide a small set of seed words for aspect detection, and proposes a student-teacher approach that effectively leverages seed words in a bag-of-words classifier (teacher); in turn, uses the teacher to train a second model that is potentially more powerful (e.g., a neural network that uses pre-trained word embeddings). Expand
Large Scale Long-tailed Product Recognition System at Alibaba
TLDR
A novel side information based large scale visual recognition co-training (SICoT) system to deal with the long tail problem by leveraging the image related side information and a semantic embedding from the noisy side information is presented. Expand

References

SHOWING 1-10 OF 22 REFERENCES
Unifying distillation and privileged information
TLDR
The theoretical and causal insight about the inner workings of generalized distillation is provided, it is extended to unsupervised, semisupervised and multitask learning scenarios, and its efficacy on a variety of numerical simulations on both synthetic and real-world data is illustrated. Expand
On the theory of learning with Privileged Information
In Learning Using Privileged Information (LUPI) paradigm, along with the standard training data in the decision space, a teacher supplies a learner with the privileged information in the correctingExpand
A Co-Regularization Approach to Semi-supervised Learning with Multiple Views
The Co-Training algorithm uses unlabeled examples in multiple views to bootstrap classifiers in each view, typically in a greedy manner, and operating under assumptions of view-independence andExpand
Learning using privileged information: similarity control and knowledge transfer
TLDR
Two mechanisms that can be used for significantly accelerating the speed of student's learning using privileged information are described: correction of Student's concepts of similarity between examples, and direct Teacher-Student knowledge transfer. Expand
A new learning paradigm: Learning using privileged information
TLDR
Details of the new paradigm and corresponding algorithms are discussed, some new algorithms are introduced, several specific forms of privileged information are considered, and superiority of thenew learning paradigm over the classical learning paradigm when solving practical problems is demonstrated. Expand
The Rademacher Complexity of Co-Regularized Kernel Classes
TLDR
The co-regularization method used in the CoRLS algorithm, in which the views are reproducing kernel Hilbert spaces (RKHS's), is examined, which reduces the Rademacher complexity by an amount that depends on the distance between the two views, as measured by a data dependent metric. Expand
Distilling the Knowledge in a Neural Network
TLDR
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Expand
Combining labeled and unlabeled data with co-training
TLDR
A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples. Expand
Multi-view Regression Via Canonical Correlation Analysis
TLDR
This work provides a semi-supervised algorithm which first uses unlabeled data to learn a norm (or, equivalently, a kernel) and then uses labeled data in a ridge regression algorithm (with this induced norm) to provide the predictor. Expand
Efficient Co-Training of Linear Separators under Weak Dependence
We develop the first polynomial-time algorithm for co-training of homogeneous linear separators under weak dependence, a relaxation of the condition of independence given the label. Our algorithmExpand
...
1
2
3
...