Learn More
This paper describes the Q-routing algorithm for packet routing, in which a reinforcement learning module is embedded into each node of a switching network. Only local communication is used by each node to keep accurate statistics on which routing decisions lead to minimal delivery times. In simple experiments involving a 36-node, irregularly connected(More)
TD.λ/ is a popular family of algorithms for approximate policy evaluation in large MDPs. TD.λ/ works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value(More)
A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neu-ral net. Although this has been successful in the domain of backgammon , there is no guarantee of convergence. In this paper, we show that the combination of(More)
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to(More)
1 Abstract In complex sequential decision problems such a s s c heduling factory production, planning medical treatments, and playing backgammon, optimal decision policies are in general unknown, and it is often diicult, even for human domain experts, to manually encode good decision policies in software. The reinforcement-learning methodology of value(More)
Indexing systems for the World Wide Web, such as Lycos and Alta Vista, play an essential role in making the Web useful and usable. These systems are based on Information Retrieval methods for indexing plain text documents, but also include heuristics for adjusting their document rankings based on the special HTML structure of Web documents. In this paper,(More)