• Publications
  • Influence
Hindsight Experience Replay
TLDR
A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum.
Learning to learn by gradient descent by gradient descent
TLDR
This paper shows how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way.
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
TLDR
A suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware and following a Multi-Goal Reinforcement Learning (RL) framework are introduced.
Learning dexterous in-hand manipulation
TLDR
This work uses reinforcement learning (RL) to learn dexterous in-hand manipulation policies that can perform vision-based object reorientation on a physical Shadow Dexterous Hand, and these policies transfer to the physical robot despite being trained entirely in simulation.
Overcoming Exploration in Reinforcement Learning with Demonstrations
TLDR
This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.
Parameter Space Noise for Exploration
TLDR
This work demonstrates that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
TLDR
By randomizing the dynamics of the simulator during training, this paper is able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained.
Secure Multiparty Computations on Bitcoin
TLDR
The Bit coin system can be used to go beyond the standard "emulation-based" definition of the MPCs, by constructing protocols that link their inputs and the outputs with the real Bit coin transactions.
One-Shot Imitation Learning
TLDR
A meta-learning framework for achieving one-shot imitation learning, where ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering.
Solving Rubik's Cube with a Robot Hand
  • OpenAI, I. Akkaya, +16 authors Lei Zhang
  • Computer Science, Mathematics
    ArXiv
  • 16 October 2019
TLDR
It is demonstrated that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot, made possible by a novel algorithm, which is called automatic domain randomization (ADR), and a robot platform built for machine learning.
...
1
2
3
4
...