#### Filter Results:

- Full text PDF available (122)

#### Publication Year

1958

2017

- This year (53)
- Last 5 years (113)
- Last 10 years (125)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

Learn More

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing largeâ€¦ (More)

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steadyâ€¦ (More)

- Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel
- Journal of Machine Learning Research
- 2016

Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control. In this paper, we aim to answer the following question: does training the perception and control systems jointly end-toendâ€¦ (More)

- Sergey Levine, Peter Pastor, Alex Krizhevsky, Deirdre Quillen
- ArXiv
- 2016

We describe a learning-based approach to handeye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independently ofâ€¦ (More)

- Chelsea Finn, Ian J. Goodfellow, Sergey Levine
- NIPS
- 2016

A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomesâ€¦ (More)

- Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik
- 2015 IEEE International Conference on Computerâ€¦
- 2015

We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers. We test instantiations of ERD architectures in the tasks of motion capture (mocap)â€¦ (More)

- Sergey Levine, Zoran Popovic, Vladlen Koltun
- NIPS
- 2011

We present a probabilistic algorithm for nonlinear inverse reinforcement learning. The goal of inverse reinforcement learning is to learn the reward function in a Markov decision process from expert demonstrations. While most prior inverse reinforcement learning algorithms represent the reward as a linear combination of a set of features, we use Gaussianâ€¦ (More)

Model-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However, the sample complexity of modelfree algorithms, particularly when using highdimensional function approximators, tends to limit their applicability to physicalâ€¦ (More)

- Sergey Levine, Vladlen Koltun
- ICML
- 2012

Inverse optimal control, also known as inverse reinforcement learning, is the problem of recovering an unknown reward function in a Markov decision process from expert demonstrations of the optimal policy. We introduce a probabilistic inverse optimal control algorithm that scales gracefully with task dimensionality, and is suitable for large, continuousâ€¦ (More)

- Chelsea Finn, Sergey Levine, Pieter Abbeel
- ICML
- 2016

Reinforcement learning can acquire tcomplex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control ofâ€¦ (More)