L. Chatriot

Learn More
We combine for Monte-Carlo exploration machine learning at four different time scales: – online regret, through the use of bandit algorithms and Monte-Carlo estimates; – transient learning, through the use of rapid action value estimates (RAVE) which are learnt online and used for accelerating the exploration and are thereafter neglected; – offline(More)
We combine for Monte-Carlo exploration machine learning at four different time scales: – online regret, through the use of bandit algorithms and Monte-Carlo estimates; – transient learning, through the use of rapid action value estimates (RAVE) which are learnt online and used for accelerating the exploration and are thereafter neglected; – offline(More)
  • 1