Prabuchandran K. J.

Learn More
We consider the problem of finding the best features for value function approximation in reinforcement learning and develop an online algorithm to optimize the mean square Bellman error objective. For any given feature value, our algorithm performs gradient search in the parameter space via a residual gradient scheme and, on a slower timescale, also(More)
Optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users amidst the rapid increase in the usage of vehicles. In this paper, we formulate the TSC problem as a discounted cost Markov decision process (MDP) and apply multi-agent reinforcement learning (MARL)(More)
For maximizing influence spread in a social network, given a certain budget on the number of seed nodes, we investigate the effects of selecting and activating the seed nodes in multiple phases. In particular, we formulate an appropriate objective function for two-phase influence maximization under the independent cascade model, investigate its properties,(More)
Stochastic multi-armed bandit (MAB) mechanisms are widely used in sponsored search auctions, crowdsourcing, online procurement, etc. Existing stochastic MAB mechanisms with a deterministic payment rule, proposed in the literature, necessarily suffer a regret of Ω(T 2/3), where T is the number of time steps. This happens because the existing mechanisms(More)
A recent goal in the Reinforcement Learning (RL) framework is to choose a sequence of policy to minimize the regret incurred in a finite time horizon. For several RL problems in operation research and optimal control, the optimal policy of the underlying Markov Decision Process (MDP) is characterized by a known structure. The state of the art algorithms do(More)
The problem of maximizing information diffusion, given a certain budget expressed in terms of the number of seed nodes, is an important topic in social networks research. Existing literature focuses on single phase diffusion where all seed nodes are selected at the beginning of diffusion and all the selected nodes are activated simultaneously. This paper(More)
We develop two new online actor-critic control algorithms with adaptive feature tuning for Markov Decision Processes (MDPs). One of our algorithms is proposed for the long-run average cost objective, while the other works for discounted cost MDPs. Our actor-critic architecture incorporates parameterization both in the policy and the value function. A(More)
In this paper, we study the problem of obtaining the optimal order of the phase sequence [14] in a road network for efficiently managing the traffic flow. We model this problem as a Markov decision process (MDP). This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent(More)