Reinforcement learning is often regarded as one of the hardest problems in machine learning. Algorithms for solving these problems often require copious resources in comparison to other problems, and will often fail for no obvious reason. This report surveys a set of algorithms for various reinforcement learning problems that are known to terminate with good solution after a number of interactions with the domain that is polynomial in it’s parameters. These algorithms are said to be solving the exploration problem. We analyze these algorithms in the probably approximately correct (PAC) framework, after a brief introduction to this powerful method. We see that the runtime of efficient exploration algorithms appears to depend on the method with which the learner samples from the domain, and offer an explanation for why is the case. Finally, a brief review of recent work and related areas is provided.