We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant… (More)

In this paper we consider the problem of learning a near-optimal policy in continuous-space, expected total discounted-reward Markovian Decision Problems using approximate policy iteration. We… (More)

We consider the problem of actively learning the mean values of distributions associated with a finite number of options (arms). The decision maker can select which option to generate the next sample… (More)

We consider the rate of convergence of the expected distortion redundancy of empirically optimal vector quantizers. Earlier results show that the mean-squared distortion of an empirically optimal… (More)

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian decision problems when the training data is composed of the trajectory of some fixed… (More)

ÐWe give a short proof of the following result. Let X; Y be any distribution on N f0; 1g, and let X1; Y1; . . . ; Xn; Yn be an i.i.d. sample drawn from this distribution. In discrimination,… (More)

We consider the problem of actively learning the mean values of distributions associated with a finite number of options. The decision maker can select which option to generate the next observation… (More)