Random Forest for the Contextual Bandit Problem

K number of actions M number of contextual variables Dθ maximum depth of the tree θ L number of trees T time horizon A set of actions V set of variables S set of remaining variables x context vector x = (x1, . . . , xM ) y reward vector y = (y1, . . . , yK) kt action chosen at time t cθ context path the tree θ, cθ = (xi1 , vi1), ..., (xidθ , vidθ ) d… CONTINUE READING