• Publications
  • Influence
Hindsight Experience Replay
TLDR
We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. Expand
  • 692
  • 159
  • PDF
Learning to learn by gradient descent by gradient descent
TLDR
We show how the design of optimization algorithms can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Expand
  • 940
  • 105
  • PDF
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing roboticsExpand
  • 180
  • 35
  • PDF
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
TLDR
Training policies with randomized dynamics in simulation enables the resulting policies to be deployed directly on a physical robot despite poor calibrations. Expand
  • 395
  • 27
  • PDF
Secure Multiparty Computations on Bitcoin
TLDR
We show that the Bit coin system provides an attractive way to construct a version of "timed commitments", where the committer has to reveal his secret within a certain time frame, or to pay a fine. Expand
  • 270
  • 23
  • PDF
Parameter Space Noise for Exploration
TLDR
This paper investigates how parameter space noise can be effectively combined with off-the-shelf deep RL algorithms such as DQN, DDPG, and TRPO to improve their exploratory behavior. Expand
  • 297
  • 23
  • PDF
Overcoming Exploration in Reinforcement Learning with Demonstrations
TLDR
We use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Expand
  • 255
  • 19
  • PDF
One-Shot Imitation Learning
TLDR
We propose a meta-learning framework for achieving such capability, which we call one-shot imitation learning. Expand
  • 340
  • 18
  • PDF
Solving Rubik's Cube with a Robot Hand
  • OpenAI, I. Akkaya, +16 authors Lei Zhang
  • Mathematics, Computer Science
  • ArXiv
  • 16 October 2019
TLDR
We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. Expand
  • 194
  • 11
  • PDF
PoW-Based Distributed Cryptography with No Trusted Setup
TLDR
We study the question of constructing distributed cryptographic protocols in a fully peer-to-peer scenario under the assumption that the adversary has limited computing power and there is no trusted setup (like PKI, or an unpredictable beacon). Expand
  • 46
  • 11
  • PDF