# Self-Play Learning Without a Reward Metric

@article{Schmidt2019SelfPlayLW, title={Self-Play Learning Without a Reward Metric}, author={Dan Schmidt and N. Moran and Jonathan S. Rosenfeld and Jonathan Rosenthal and J. Yedidia}, journal={ArXiv}, year={2019}, volume={abs/1912.07557} }

The AlphaZero algorithm for the learning of strategy games via self-play, which has produced superhuman ability in the games of Go, chess, and shogi, uses a quantitative reward function for game outcomes, requiring the users of the algorithm to explicitly balance different components of the reward against each other, such as the game winner and margin of victory. We present a modification to the AlphaZero algorithm that requires only a total ordering over game outcomes, obviating the need to… Expand

#### References

SHOWING 1-10 OF 10 REFERENCES

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

- Computer Science
- ArXiv
- 2017

This paper generalises the approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains, and convincingly defeated a world-champion program in each case. Expand

Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

- Computer Science, Mathematics
- ArXiv
- 2018

Results from applying the R2 algorithm to instances of a two-dimensional and three-dimensional bin packing problems show that it outperforms generic Monte Carlo tree search, heuristic algorithms and integer programming solvers. Expand

Learning values across many orders of magnitude

- Computer Science, Mathematics
- NIPS
- 2016

This work proposes to adaptively normalize the targets used in learning, useful in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when the policy of behavior changes. Expand

A Survey of Preference-Based Reinforcement Learning Methods

- Computer Science
- J. Mach. Learn. Res.
- 2017

A unified framework for PbRL is provided that describes the task formally and points out the different design principles that affect the evaluation task for the human as well as the computational complexity. Expand

SAI: a Sensible Artificial Intelligence that plays with handicap and targets high scores in 9x9 Go (extended version)

- Computer Science
- ArXiv
- 2019

We develop a new model that can be applied to any perfect information two-player zero-sum game to target a high score, and thus a perfect play. We integrate this model into the Monte Carlo tree… Expand

Accelerating Self-Play Learning in Go

- Computer Science, Mathematics
- ArXiv
- 2019

By introducing several improvements to the AlphaZero process and architecture, we greatly accelerate self-play learning in Go, achieving a 50x reduction in computation over comparable methods. Like… Expand

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

- Computer Science
- ICML
- 1999

Conditions under which modi cations to the reward function of a Markov decision process preserve the op timal policy are investigated to shed light on the practice of reward shap ing a method used in reinforcement learn ing whereby additional training rewards are used to guide the learning agent. Expand

Natural Evolution Strategies

- Mathematics, Computer Science
- 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence)
- 2008

NES is presented, a novel algorithm for performing real-valued dasiablack boxpsila function optimization: optimizing an unknown objective function where algorithm-selected function measurements constitute the only information accessible to the method. Expand

Deep Ordinal Reinforcement Learning

- Computer Science, Mathematics
- ECML/PKDD
- 2019

This paper shows how to convert common reinforcement learning algorithms to an ordinal variation by the example of Q-learning and introduces Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal rewards. Expand

Evolution strategy: Optimization of technical systems by means of biological evolution

- Fromman-Holzboog, Stuttgart 104:15–16.
- 1973