An Efficient Combinatorial Optimization Model Using Learning-to-Rank Distillation

@inproceedings{Woo2022AnEC,
  title={An Efficient Combinatorial Optimization Model Using Learning-to-Rank Distillation},
  author={Honguk Woo and Hyunsung Lee and Sangwook Cho},
  booktitle={AAAI},
  year={2022}
}
Recently, deep reinforcement learning (RL) has proven its feasibility in solving combinatorial optimization problems (COPs). The learning-to-rank techniques have been studied in the field of information retrieval. While several COPs can be formulated as the prioritization of input items, as is common in the information retrieval, it has not been fully explored how the learning-to-rank techniques can be incorporated into deep RL for COPs. In this paper, we present the learning-to-rank… 

References

SHOWING 1-10 OF 46 REFERENCES

RankDistil: Knowledge Distillation for Ranking

TLDR
This paper presents a distillation framework for top-k ranking and develops a novel distillation approach, RankDistil, specifically catered towards ranking problems with a large number of items to rank, and establishes statistical basis for the method.

Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System

TLDR
A novel way to train ranking models, such as recommender systems, that are both effective and efficient is proposed, and a smaller student model is trained to learn to rank documents/items from both the training data and the supervision of a larger teacher model.

A State Aggregation Approach for Solving Knapsack Problem with Deep Reinforcement Learning

TLDR
The results demonstrate that the proposed model with the state aggregation strategy not only gives better solutions but also learns in less timesteps, than the one without state aggregation.

Panda: Reinforcement Learning-Based Priority Assignment for Multi-Processor Real-Time Scheduling

TLDR
This approach is the first to employ RL for real-time task scheduling and presents a RL-based priority assignment model Panda that employs a taskset embedding mechanism driven by attention-based encoder-decoder deep neural networks, hence enabling to efficiently extract useful features from the dynamic relation of periodic tasks.

Learning to rank: from pairwise approach to listwise approach

TLDR
It is proposed that learning to rank should adopt the listwise approach in which lists of objects are used as 'instances' in learning, and introduces two probability models, respectively referred to as permutation probability and top k probability, to define a listwise loss function for learning.

Sequence-Level Knowledge Distillation

TLDR
It is demonstrated that standard knowledge distillation applied to word-level prediction can be effective for NMT, and two novel sequence-level versions of knowledge distilling are introduced that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search.

Device Placement Optimization with Reinforcement Learning

TLDR
A method which learns to optimize device placement for TensorFlow computational graphs using a sequence-to-sequence model, which finds non-trivial device placements that outperform hand-crafted heuristics and traditional algorithmic methods.

TinyBERT: Distilling BERT for Natural Language Understanding

TLDR
A novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models is proposed and, by leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be effectively transferred to a small “student” TinyBERT.

Neural Architecture Search with Reinforcement Learning

TLDR
This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.

BPR: Bayesian Personalized Ranking from Implicit Feedback

TLDR
This paper presents a generic optimization criterion BPR-Opt for personalized ranking that is the maximum posterior estimator derived from a Bayesian analysis of the problem and provides a generic learning algorithm for optimizing models with respect to B PR-Opt.