#### Filter Results:

- Full text PDF available (207)

#### Publication Year

2002

2017

- This year (38)
- Last 5 years (168)
- Last 10 years (204)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Pieter Abbeel, Andrew Y. Ng
- ICML
- 2004

We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly… (More)

- Ben Taskar, Pieter Abbeel, Daphne Koller
- UAI
- 2002

In many supervised learning tasks, the entities to be labeled are related to each other in complex ways and their labels are not independent. For example, in hy-pertext classification, the labels of linked pages are highly correlated. A standard approach is to classify each entity independently, ignoring the correlations between them. Recently,… (More)

1 ■ Success of neural networks in supervised learning relies on the fact that learning reduces to a nonlinear optimization problem ■ To fully wield the power of nonlinear function approximators in RL, we need better understanding of how to ensure monotonic behavior Motivation

- Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
- NIPS
- 2016

This paper describes InfoGAN, an information-theoretic extension to the Gener-ative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a… (More)

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady… (More)

- Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel
- ICML
- 2016

Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However , it has been difficult to… (More)

- Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel
- Journal of Machine Learning Research
- 2016

Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control. In this paper, we aim to answer the following question: does training the perception and control systems jointly end-to-end… (More)

- Su-In Lee, Honglak Lee, Pieter Abbeel, Andrew Y. Ng
- AAAI
- 2006

L 1 regularized logistic regression is now a workhorse of machine learning: it is widely used for many classification problems, particularly ones with many features. L 1 regularized logistic regression requires solving a convex optimization problem. However, standard algorithms for solving convex optimization problems do not scale well enough to handle the… (More)

- Pieter Abbeel, Andrew Y. Ng
- ICML
- 2005

We consider reinforcement learning in systems with unknown dynamics. Algorithms such as <i>E</i><sup>3</sup> (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies" to drive the system towards poorly modeled states, so as to encourage exploration. But this makes these algorithms impractical for many systems; for example, on an… (More)

- Ben Taskar, Ming Fai Wong, Pieter Abbeel, Daphne Koller
- NIPS
- 2003

Many real-world domains are relational in nature, consisting of a set of objects related to each other in complex ways. This paper focuses on predicting the existence and the type of links between entities in such domains. We apply the relational Markov network framework of Taskar et al. to define a joint probabilis-tic model over the entire link graph —… (More)