Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Stochastic Variance-Reduced Policy Gradient
- M. Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli
- Computer ScienceICML
- 14 June 2018
A novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs) with convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes.
Safe Policy Iteration
Two safe policy-iteration algorithms that differ in the way the next policy is chosen w.r.t. the current policy are proposed and compared with state-of-the-art approaches on some chain-walk domains and on the Blackjack card game.
Transfer of samples in batch reinforcement learning
A novel algorithm is introduced that transfers samples from the source tasks that are mostly similar to the target task, and is empirically show that, following the proposed approach, the transfer of samples is effective in reducing the learning complexity.
Unimodal Thompson Sampling for Graph-Structured Arms
A Thompson Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound for the considered setting and it is shown that Bayesian MAB algorithms dramatically outperform frequentist ones.
Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods
A novel actor-critic approach in which the policy of the actor is estimated through sequential Monte Carlo methods, and results obtained in a control problem consisting of steering a boat across a river are reported.
Risk-Averse Trust Region Optimization for Reward-Volatility Reduction
- L. Bisi, Luca Sabbioni, Edoardo Vittori, M. Papini, Marcello Restelli
- Computer ScienceArXiv
- 6 December 2019
A novel measure of risk, which is called reward volatility, consisting of the variance of the rewards under the state-occupancy measure, is defined and it is shown that the reward volatility bounds the return variance so that reducing the former also constrains the latter.
Tree‐based reinforcement learning for optimal water reservoir operation
A reinforcement‐learning approach, called fitted Q‐iteration, is presented: it combines the principle of continuous approximation of the value functions with a process of learning off‐line from experience to design daily, cyclostationary operating policies to overcome the curse of modeling.
A kinematic-independent dead-reckoning sensor for indoor mobile robotics
- Andrea Bonarini, M. Matteucci, Marcello Restelli
- Computer ScienceIEEE/RSJ International Conference on Intelligent…
- 28 September 2004
This sensor is based on a pair of optical mice rigidly connected to the robot body and its main advantages are that it is a low-cost solution with a precision comparable to classical shaft encoders.
Sparse multi-task reinforcement learning
- Daniele Calandriello, A. Lazaric, Marcello Restelli
- Computer ScienceIntelligenza Artificiale
- 8 December 2014
This paper develops two multi-task extensions of the fitted Q-iteration algorithm that assume that the tasks are jointly sparse in the given representation and learns a transformation of the features in the attempt of finding a more sparse representation.
Policy gradient in Lipschitz Markov Decision Processes
This paper shows that both the expected return of a policy and its gradient are Lipschitz continuous w.r.t. policy parameters and defines policy-parameter updates that guarantee a performance improvement at each iteration.