• Corpus ID: 204800859

Task-Based Learning via Task-Oriented Prediction Network

  title={Task-Based Learning via Task-Oriented Prediction Network},
  author={Di Chen and Yada Zhu and Xiaodong Cui and Carla P. Gomes},
Real-world applications often involve domain-specific and task-based performance objectives that are not captured by the standard machine learning metrics, such as mean squared error, mean absolute error, and cross-entropy loss, but are critical for decision making. A key challenge for direct integration of more meaningful domain and task-based evaluation criteria into an end-to-end gradient-based training process is the fact that often such performance objectives are not necessarily… 

Figures and Tables from this paper

Balanced Order Batching with Task-Oriented Graph Clustering

An end-to-end learning and optimization framework named Balanced Task-orientated Graph Clustering Network (BTOGCN) is proposed to solve the BOBP by reducing it to balanced graph clustering optimization problem.



Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization

This work focuses on combinatorial optimization problems and introduces a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce highquality decisions, and shows that decisionfocused learning often leads to improved optimization performance compared to traditional methods.

End to end learning and optimization on graphs

This work proposes an alternative decision-focused learning approach that integrates a differentiable proxy for common graph optimization problems as a layer in learned systems to learn a representation that maps the original optimization problem onto a simpler proxy problem that can be efficiently differentiated through.

Task-based End-to-end Model Learning in Stochastic Optimization

This paper proposes an end-to-end approach for learning probabilistic machine learning models in a manner that directly captures the ultimate task-based objective for which they will be used, within the context of stochastic programming.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Learning to Predict by the Methods of Temporal Differences

This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.

Using a Financial Training Criterion Rather than a Prediction Criterion

It is found with noisy time series that better results can be obtained when the model is directly trained in order to maximize the financial criterion of interest, here gains and losses incurred during trading.

Correcting Forecasts with Multifactor Neural Attention

A novel neural network attention mechanism that naturally incorporates data from multiple external sources without the feature engineering needed to get other techniques to work is introduced.

Long Short-Term Memory

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.


This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.