Prabuchandran K. J.

Learn More
We consider the problem of finding the best features for value function approximation in reinforcement learning and develop an online algorithm to optimize the mean square Bellman error objective. For any given feature value, our algorithm performs gradient search in the parameter space via a residual gradient scheme and, on a slower timescale, also(More)
Optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users amidst the rapid increase in the usage of vehicles. In this paper, we formulate the TSC problem as a discounted cost Markov decision process (MDP) and apply multi-agent reinforcement learning (MARL)(More)
For maximizing influence spread in a social network, given a certain budget on the number of seed nodes, we investigate the effects of selecting and activating the seed nodes in multiple phases. In particular, we formulate an appropriate objective function for two-phase influence maximization under the independent cascade model, investigate its properties,(More)
The problem of maximizing information diffusion, given a certain budget expressed in terms of the number of seed nodes, is an important topic in social networks research. Existing literature focuses on single phase diffusion where all seed nodes are selected at the beginning of diffusion and all the selected nodes are activated simultaneously. This paper(More)
A recent goal in the Reinforcement Learning (RL) framework is to choose a sequence of policy to minimize the regret incurred in a finite time horizon. For several RL problems in operation research and optimal control, the optimal policy of the underlying Markov Decision Process (MDP) is characterized by a known structure. The state of the art algorithms do(More)
We develop two new online actor-critic control algorithms with adaptive feature tuning for Markov Decision Processes (MDPs). One of our algorithms is proposed for the long-run average cost objective, while the other works for discounted cost MDPs. Our actor-critic architecture incorporates parameterization both in the policy and the value function. A(More)
In this paper, we study the problem of obtaining the optimal order of the phase sequence [14] in a road network for efficiently managing the traffic flow. We model this problem as a Markov decision process (MDP). This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent(More)
We consider the problem of finding optimal energy sharing policies that maximize the network performance of a system comprising of multiple sensor nodes and a single energy harvesting (EH) source. Sensor nodes periodically sense the random field and generate data, which is stored in the corresponding data queues. The EH source harnesses energy from ambient(More)
  • 1