Learn More
This paper describes the Q-routing algorithm for packet routing, in which a reinforcement learning module is embedded into each node of a switching network. Only local communication is used by each node to keep accurate statistics on which routing decisions lead to minimal delivery times. In simple experiments involving a 36-node, irregularly connected(More)
TD(λ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD(λ) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value(More)
A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neu-ral net. Although this has been successful in the domain of backgammon , there is no guarantee of convergence. In this paper, we show that the combination of(More)
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to(More)
This paper introduces a new algorithm, Q2, for optimizing the expected output of a multi-input noisy continuous function. Q2 is designed to need only a few experiments, it avoids strong assumptions on the form of the function, and it is autonomous in that it requires little problem-speciic tweaking. These capabilities are directly applicable to industrial(More)
This paper describes STAGE, a learning approach to automatically improving search performance on optimization problems. STAGE learns an evaluation function which predicts the outcome of a local search algorithm, such as hillclimbing or WALKSAT, as a function of state features along its search trajectories. The learned evaluation function is used to bias(More)
Production scheduling, the problem of sequentially connguring a factory to meet forecasted demands, is a critical problem throughout the manufacturing industry. The requirement of maintaining product inventories in the face of unpredictable demand and stochastic factory output makes standard scheduling models, such as job-shop, inadequate. Currently applied(More)