#### Filter Results:

- Full text PDF available (34)

#### Publication Year

2004

2017

- This year (3)
- Last 5 years (16)
- Last 10 years (29)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

- Timothy P. Lillicrap, Jonathan J. Hunt, +5 authors Daan Wierstra
- ArXiv
- 2015

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20… (More)

- Emanuel Todorov, Tom Erez, Yuval Tassa
- 2012 IEEE/RSJ International Conference on…
- 2012

We describe a new physics engine tailored to model-based control. Multi-joint dynamics are represented in generalized coordinates and computed via recursive algorithms. Contact responses are computed via efficient new algorithms we have developed, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers. Models are… (More)

- Yuval Tassa, Tom Erez, Emanuel Todorov
- 2012 IEEE/RSJ International Conference on…
- 2012

We present an online trajectory optimization method and software platform applicable to complex humanoid robots performing challenging tasks such as getting up from an arbitrary pose on the ground and recovering from large disturbances using dexterous acrobatic maneuvers. The resulting behaviors, illustrated in the attached video, are computed only 7… (More)

We present a unified framework for learning continuous control policies using backpropagation. It supports stochastic control by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise. The product is a spectrum of general policy gradient algorithms that range from model-free methods with value functions to model-based… (More)

- Yuval Tassa, Tom Erez, William D. Smart
- NIPS
- 2007

The control of high-dimensional, continuous, non-linear dynamical systems is a key problem in reinforcement learning and control. Local, trajectory-based methods, using techniques such as Differential Dynamic Programming (DDP), are not directly subject to the curse of dimensionality, but generate only local controllers. In this paper,we introduce Receding… (More)

- Nicolas Heess, Dhruva TB, +9 authors David Silver
- ArXiv
- 2017

The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the… (More)

OF THE DISSERTATION Optimal Control for Autonomous Motor Behavior by Tom Erez Doctor of Philosophy in Computer Science Washington University in St. Louis, 2011 Research Advisor: Professor William D. Smart This dissertation presents algorithms that allow robots to generate optimal behavior from first principles. Instead of hard-coding every desired behavior,… (More)

Generating diverse behaviors with a humanoid robot requires a mix of human supervision and automatic control. Ideally, the user’s input is restricted to high-level instruction and guidance, and the controller is intelligent enough to accomplish the tasks autonomously. Here we describe an integrated system that achieves this goal. The automatic controller is… (More)

- Mingyuan Zhong, M. Johnson, Yuval Tassa, Tom Erez, Emanuel Todorov
- ADPRL
- 2013

Both global methods and on-line trajectory optimization methods are powerful techniques for solving optimal control problems; however, each has limitations. In order to mitigate the undesirable properties of each, we explore the possibility of combining the two. We explore two methods of deriving a descriptive final cost function to assist model predictive… (More)

- Yuval Tassa, Tom Erez
- IEEE Transactions on Neural Networks
- 2007

In this paper, we present an empirical study of iterative least squares minimization of the Hamilton-Jacobi-Bellman (HJB) residual with a neural network (NN) approximation of the value function. Although the nonlinearities in the optimal control problem and NN approximator preclude theoretical guarantees and raise concerns of numerical instabilities, we… (More)