Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling

  title={Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling},
  author={Mu Yuan and Lan Zhang and Xiangyang Li and Hui Xiong},
  journal={2020 IEEE 36th International Conference on Data Engineering (ICDE)},
  • Mu Yuan, Lan Zhang, +1 author Hui Xiong
  • Published 8 February 2020
  • Computer Science, Mathematics
  • 2020 IEEE 36th International Conference on Data Engineering (ICDE)
Labeling data comprehensively and efficiently is a widely needed but challenging task. With limited computing resources, given a data stream and a collection of deep-learning models, we propose to adaptively select and schedule a subset of these models to execute, aiming to maximize the value of the model output. Achieving this goal is nontrivial since a model’s output on any data item is content-dependent and hard to predict. In this paper, we present an Adaptive Model Scheduling framework… 
Cost-effective ensemble models selection using deep reinforcement learning
Extensive evaluation on two large malware datasets demonstrates that SPIREL is highly cost-effective, enabling us to reduce running time by ∼ 80% while decreasing the accuracy and F1-score by only 0.5%.
Entropy Repulsion for Semi-supervised Learning Against Class Mismatch
This work proposes a new technique, entropy repulsion for mismatch (ERCM), to improve SSL against a class mismatch situation and demonstrates that ERCM can significantly improve the performance of state-of-the-art SSL algorithms, namely Mean Teacher, Virtual Adversarial Training (VAT) and Mixmatch in various class-mismatch cases.
In-Database Machine Learning with SQL on GPUs
This work demonstrates that SQL with recursive tables makes it possible to express a complete machine learning pipeline out of data preprocessing, model training and its validation, and fine-tune GPU kernels at hardware level to allow a higher throughput and propose non-blocking synchronisation of multiple units.


Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
This work proposes a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data and demonstrates the performance improvements by MMoE on real tasks including a binary classification benchmark, and a large-scale content recommendation system at Google.
Complex Object Classification: A Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport
A novel Multi-modal Multi-instance Multi-label Deep Network (M3DN) is proposed, which learns the label prediction and exploits label correlation simultaneously based on the Optimal Transport, by considering the consistency principle between different modal bag-level prediction and the learned latent ground label metric.
Deep semantic ranking based hashing for multi-label image retrieval
In this work, deep convolutional neural network is incorporated into hash functions to jointly learn feature representations and mappings from them to hash codes, which avoids the limitation of semantic representation power of hand-crafted features.
An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning
An end-to-end automatic CDB tuning system, CDBTune, using deep reinforcement learning (RL), which enables end- to-end learning and accelerates the convergence speed of the model and improves efficiency of online tuning.
One Model To Learn Them All
It is shown that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all, and that adding a block to the model never hurts performance and in most cases improves it on all tasks.
Deep Reinforcement Learning with Double Q-Learning
This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.
Chameleon: scalable adaptation of video analytics
Chameleon is a controller that dynamically picks the best configurations for existing NN-based video analytics pipelines, demonstrating that compared to a baseline that picks a single optimal configuration offline, Chameleon can achieve 20-50% higher accuracy with the same amount of resources, or achieve the same accuracy with only 30--50% of the resources.
Deep Multi-Similarity Hashing for Multi-label Image Retrieval
Experiments on large scale dataset NUS-WIDE have proved the state-of-the-art performance of the proposed Deep Multi-Similarity Hashing model in the task of multi-label image retrieval.
Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization
It is proved that if a problem satisfies adaptive submodularity, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy, providing performance guarantees for both stochastic maximization and coverage.
Survey on Multi-Output Learning
The four Vs of multi-output learning are characterized, i.e., volume, velocity, variety, and veracity, and the ways in which the four Vs both benefit and bring challenges to multi- output learning by taking inspiration from big data are examined.