Model-Free Preference-Based Reinforcement Learning


Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tuning from a human expert. In contrast, preference-based reinforcement learning (PBRL) utilizes only pairwise comparisons between trajectories as a feedback signal, which are often more intuitive to specify. Currently available approaches to PBRL for control… (More)


4 Figures and Tables


Citations per Year

Citation Velocity: 18

Averaging 18 citations per year over the last 2 years.

Learn more about how we calculate this metric in our FAQ.