Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information — the expected improvement in future decision quality arising from the information acquired by exploration.… (More)
Safe state abstraction in reinforcement learning allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. This paper explores safe state abstraction in hierarchical reinforcement learning, where learned behaviors must conform to a given partial, hierarchical… (More)
We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameter-ized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning… (More)
– The design (synthesis) of analog electrical circuits starts with a high-level statement of the circuit's desired behavior and requires creating a circuit that satisfies the specified design goals. Analog circuit synthesis entails the creation of both the topology and the sizing (numerical values) of all of the circuit's components. The difficulty of the… (More)
– It would be desirable if computers could solve problems without the need for a human to write the detailed programmatic steps. That is, it would be desirable to have a domain-independent automatic programming technique in which "What You Want Is What You Get" ("WYWIWYG" – pronounced "wow-eee-wig"). Genetic programming is such a technique. This paper… (More)
Prioritized sweeping is a model-based reinforcement learning method that attempts to focus an agent's limited computational resources to achieve a good estimate of the value of environment states. To choose effectively where to spend a costly planning step, classic prioritized sweeping uses a simple heuristic to focus computation on the states that are… (More)
Genetic programming was used to evolve both the topology and sizing (numerical values) for each component of a low-distortion, low-bias 60 decibel (1000-to-1) amplifier with good frequency generalization.