#### Filter Results:

- Full text PDF available (8)

#### Publication Year

2013

2017

- This year (2)
- Last 5 years (11)
- Last 10 years (11)

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

- Voot Tangkaratt, Syogo Mori, Tingting Zhao, Jun Morimoto, Masashi Sugiyama
- Neural Networks
- 2014

The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is… (More)

- Tingting Zhao, Hirotaka Hachiya, Voot Tangkaratt, Jun Morimoto, Masashi Sugiyama
- Neural Computation
- 2013

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy… (More)

- Norikazu Sugimoto, Voot Tangkaratt, Thijs Wensveen, Tingting Zhao, Masashi Sugiyama, Jun Morimoto
- Humanoids
- 2014

- Voot Tangkaratt, Hiroaki Sasaki, Masashi Sugiyama
- Neural computation
- 2017

A typical goal of linear-supervised dimension reduction is to find a low-dimensional subspace of the input space such that the projected input variables preserve maximal information about the output variables. The dependence-maximization approach solves the supervised dimension-reduction problem through maximizing a statistical dependence between projected… (More)

- Motoki Shiga, Voot Tangkaratt, Masashi Sugiyama
- Machine Learning
- 2014

Regression is a fundamental problem in statistical data analysis, which aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional probability density is multi-modal, asymmetric, and heteroscedastic. To overcome this limitation, various estimators of conditional densities themselves have… (More)

- Voot Tangkaratt, Ning Xie, Masashi Sugiyama
- Neural Computation
- 2015

Regression aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional density is multimodal, heteroskedastic, and asymmetric. In such a case, estimating the conditional density itself is preferable, but conditional density estimation (CDE) is challenging in high-dimensional space. A naive… (More)

- Hiroaki Sasaki, Voot Tangkaratt, Masashi Sugiyama
- ACML
- 2015

Sufficient dimension reduction (SDR) is a framework of supervised linear dimension reduction, and is aimed at finding a low-dimensional orthogonal projection matrix for input data such that the projected input data retains maximal information on output data. A computationally efficient approach employs gradient estimates of the conditional density of the… (More)

- Norikazu Sugimoto, Voot Tangkaratt, Thijs Wensveen, Tingting Zhao, Masashi Sugiyama, Jun Morimoto
- IEEE Robotics & Automation Magazine
- 2016

Since biological systems have the ability to efficiently reuse previous experiences to change their behavioral strategies to avoid enemies or find food, the number of required samples from real environments to improve behavioral policy is greatly reduced. Even for real robotic systems, it is desirable to use only a limited number of samples from real… (More)

In this study, we show that a movement policy can be improved efficiently using the previous experiences of a real robot. Reinforcement Learning (RL) is becoming a popular approach to acquire a nonlinear optimal policy through trial and error. However, it is considered very difficult to apply RL to real robot control since it usually requires many learning… (More)

- Voot Tangkaratt, Jun Morimoto, Masashi Sugiyama
- Neural Networks
- 2016

The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in… (More)