Gabor Matuz

We don’t have enough information about this author to calculate their statistics. If you think this is an error let us know.
Learn More
Reinforcement learning has solid foundations, but becomes inefficient in partially observed (non-Markovian) environments. Thus, a learning agent – born with a representation and a policy – might wish to investigate to what extent the Markov property holds. We propose a learning architecture that utilizes combinatorial policy optimization to overcome(More)
  • 1