RDPD: Rich Data Helps Poor Data via Imitation

  title={RDPD: Rich Data Helps Poor Data via Imitation},
  author={Shenda Hong and Cao Xiao and Tengfei Ma and Hongyan Li and Jimeng Sun},
In many situations, we need to build and deploy separate models in related environments with different data qualities. For example, an environment with strong observation equipments (e.g., intensive care units) often provides high-quality multi-modal data, which are acquired from multiple sensory devices and have rich-feature representations. On the other hand, an environment with poor observation equipment (e.g., at home) only provides low-quality, uni-modal data with poor-feature… 
Dynamic Knowledge Distillation for Black-box Hypothesis Transfer Learning
This paper introduces a novel algorithm called dynamic knowledge distillation for hypothesis transfer learning (dkdHTL), which uses knowledgedistillation with instance-wise weighting mechanism to adaptively transfer the "dark" knowledge from the source hypothesis to the target domain.
Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion
A new fusion paradigm is developed that represents each expert as a distribution over a spectrum of predictive prototypes, which are isolated from task-specific information encoded within the prototype distribution and can then be reintegrated to generate a new model that solves a new task encoded with a different prototype distribution.
Embedded deep learning in ophthalmology: making ophthalmic imaging smarter
Improved egde-layer performance via ‘active acquisition’ serves as an automatic data curation operator translating to better quality data in electronic health records, as well as on the cloud layer, for improved deep learning–based clinical data mining.
Copying Machine Learning Classifiers
The theory behind the problem of copying is developed, highlighting its properties, and a framework to copy the decision behavior of any classifier using no prior knowledge of its parameters or training data distribution is proposed.
Application of Artificial Intelligence Technology in Power Industry
This paper will introduce AI technology provides application capability for the power industry based on the advanced artificial intelligence framework of the industry, and focuses on solving the application demand of artificial intelligence in power grid inspection, power grid security check, powerGrid marketing, power Grid customer service, powergrid communication and other business fields.


Data Distillation: Towards Omni-Supervised Learning
It is argued that visual recognition models have recently become accurate enough that it is now possible to apply classic ideas about self-training to challenging real-world data and propose data distillation, a method that ensembles predictions from multiple transformations of unlabeled data, using a single model, to automatically generate new training annotations.
Deep Model Compression: Distilling Knowledge from Noisy Teachers
This work extends the teacher-student framework for deep model compression to include a noise-based regularizer while training the student from the teacher, which provides a healthy provement in the performance of the student network.
Domain-Adversarial Training of Neural Networks
A new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions, which can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer.
Marginalized Denoising Autoencoders for Domain Adaptation
The approach of mSDA marginalizes noise and thus does not require stochastic gradient descent or other optimization algorithms to learn parameters--in fact, they are computed in closed-form, significantly speeds up SDAs by two orders of magnitude.
Distilling the Knowledge in a Neural Network
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
This paper proposes a novel knowledge transfer method by treating it as a distribution matching problem, which matches the distributions of neuron selectivity patterns between teacher and student networks and can significantly improve the performance of student networks.
Domain Separation Networks
The novel architecture results in a model that outperforms the state-of-the-art on a range of unsupervised domain adaptation scenarios and additionally produces visualizations of the private and shared representations enabling interpretation of the domain adaptation process.
Training Neural Networks with Very Little Data - A Draft
The radial transform in polar coordinate space for image augmentation to facilitate the training of neural networks from limited source data and both augment data as well as increase the diversity of poorly represented classes.
Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition
A generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which is suitable for multimodal wearable sensors, does not require expert knowledge in designing features, and explicitly models the temporal dynamics of feature activations is proposed.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.