• Publications
  • Influence
Stochastic Variance-Reduced Policy Gradient
TLDR
A novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs) with convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes.
Safe Policy Iteration
TLDR
Two safe policy-iteration algorithms that differ in the way the next policy is chosen w.r.t. the current policy are proposed and compared with state-of-the-art approaches on some chain-walk domains and on the Blackjack card game.
Transfer of samples in batch reinforcement learning
TLDR
A novel algorithm is introduced that transfers samples from the source tasks that are mostly similar to the target task, and is empirically show that, following the proposed approach, the transfer of samples is effective in reducing the learning complexity.
Unimodal Thompson Sampling for Graph-Structured Arms
TLDR
A Thompson Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound for the considered setting and it is shown that Bayesian MAB algorithms dramatically outperform frequentist ones.
Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods
TLDR
A novel actor-critic approach in which the policy of the actor is estimated through sequential Monte Carlo methods, and results obtained in a control problem consisting of steering a boat across a river are reported.
Risk-Averse Trust Region Optimization for Reward-Volatility Reduction
TLDR
A novel measure of risk, which is called reward volatility, consisting of the variance of the rewards under the state-occupancy measure, is defined and it is shown that the reward volatility bounds the return variance so that reducing the former also constrains the latter.
Tree‐based reinforcement learning for optimal water reservoir operation
TLDR
A reinforcement‐learning approach, called fitted Q‐iteration, is presented: it combines the principle of continuous approximation of the value functions with a process of learning off‐line from experience to design daily, cyclostationary operating policies to overcome the curse of modeling.
A kinematic-independent dead-reckoning sensor for indoor mobile robotics
TLDR
This sensor is based on a pair of optical mice rigidly connected to the robot body and its main advantages are that it is a low-cost solution with a precision comparable to classical shaft encoders.
Sparse multi-task reinforcement learning
TLDR
This paper develops two multi-task extensions of the fitted Q-iteration algorithm that assume that the tasks are jointly sparse in the given representation and learns a transformation of the features in the attempt of finding a more sparse representation.
Policy gradient in Lipschitz Markov Decision Processes
TLDR
This paper shows that both the expected return of a policy and its gradient are Lipschitz continuous w.r.t. policy parameters and defines policy-parameter updates that guarantee a performance improvement at each iteration.
...
1
2
3
4
5
...