Justin A. Boyan

Learn More
This paper describes the Q-routing algorithm for packet routing, in which a reinforcement learning module is embedded into each node of a switching network. Only local communication is used by each node to keep accurate statistics on which routing decisions lead to minimal delivery times. In simple experiments involving a 36-node, irregularly connected(More)
A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neural net. Although this has been successful in the domain of backgammon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic(More)
TD.λ/ is a popular family of algorithms for approximate policy evaluation in large MDPs. TD.λ/ works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value(More)
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-98-152.) TD( ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD( ) works by incrementally updating the value function after each observed(More)
We describe an extension of the Markov decision process model in which a continuous time dimension is incllMcd in the stat(', space. This allows tbr the representation and exact solution of a wide range of prohleins in which transitions or rewards vary over time. We examine problems based on route planning with public transportation and telescope(More)
In complex sequential decision problems such as scheduling factory production, planning medical treatments, and playing backgammon, optimal decision policies are in general unknown, and it is often di cult, even for human domain experts, to manually encode good decision policies in software. The reinforcement-learning methodology of \value function(More)
This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during search. The learned evaluation function is then used to(More)
The employee or politician who wants to protect his or her privacy when viewing sensitive medical information, a competitor’s web site, sexual materials, or a web site catering to a marginalized group (e.g., gay rights, pro-choice or pro-life). The scientist who is asked to anonymously review a colleague’s article submission and wants to gather background(More)