Voot Tangkaratt

Learn More
The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is(More)
The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy(More)
A typical goal of linear-supervised dimension reduction is to find a low-dimensional subspace of the input space such that the projected input variables preserve maximal information about the output variables. The dependence-maximization approach solves the supervised dimension-reduction problem through maximizing a statistical dependence between projected(More)
Regression is a fundamental problem in statistical data analysis, which aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional probability density is multi-modal, asymmetric, and heteroscedastic. To overcome this limitation, various estimators of conditional densities themselves have(More)
Regression aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional density is multimodal, heteroskedastic, and asymmetric. In such a case, estimating the conditional density itself is preferable, but conditional density estimation (CDE) is challenging in high-dimensional space. A naive(More)
Since biological systems have the ability to efficiently reuse previous experiences to change their behavioral strategies to avoid enemies or find food, the number of required samples from real environments to improve behavioral policy is greatly reduced. Even for real robotic systems, it is desirable to use only a limited number of samples from real(More)
Sufficient dimension reduction (SDR) is a framework of supervised linear dimension reduction, and is aimed at finding a low-dimensional orthogonal projection matrix for input data such that the projected input data retains maximal information on output data. A computationally efficient approach employs gradient estimates of the conditional density of the(More)
The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in(More)
Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction(More)