Learn More
Optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users amidst the rapid increase in the usage of vehicles. In this paper, we formulate the TSC problem as a discounted cost Markov decision process (MDP) and apply multi-agent reinforcement learning (MARL)(More)
In this paper, we study the problem of obtaining the optimal order of the phase sequence [14] in a road network for efficiently managing the traffic flow. We model this problem as a Markov decision process (MDP). This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent(More)
We consider the problem of finding the best features for value function approximation in reinforcement learning and develop an online algorithm to optimize the mean square Bellman error objective. For any given feature value, our algorithm performs gradient search in the parameter space via a residual gradient scheme and, on a slower timescale, also(More)
For maximizing influence spread in a social network, given a certain budget on the number of seed nodes, we investigate the effects of selecting and activating the seed nodes in multiple phases. In particular, we formulate an appropriate objective function for two-phase influence maximization under the independent cascade model, investigate its properties,(More)
The problem of maximizing information diffusion, given a certain budget expressed in terms of the number of seed nodes, is an important topic in social networks research. Existing literature focuses on single phase diffusion where all seed nodes are selected at the beginning of diffusion and all the selected nodes are activated simultaneously. This paper(More)
A recent goal in the Reinforcement Learning (RL) framework is to choose a sequence of policy to minimize the regret incurred in a finite time horizon. For several RL problems in operation research and optimal control, the optimal policy of the underlying Markov Decision Process (MDP) is characterized by a known structure. The state of the art algorithms do(More)
Stochastic multi-armed bandit (MAB) mechanisms are widely used in sponsored search auctions, crowdsourcing, online procurement, etc. Existing stochastic MAB mechanisms with a deterministic payment rule, proposed in the literature, necessarily suffer a regret of Ω(T 2/3), where T is the number of time steps. This happens because the existing mechanisms(More)
We consider the problem of finding optimal energy sharing policies that maximize the network performance of a system comprising of multiple sensor nodes and a single energy harvesting (EH) source. Sensor nodes periodically sense the random field and generate data, which is stored in the corresponding data queues. The EH source harnesses energy from ambient(More)