Learn More
—In this paper, we consider a class of restless mul-tiarmed bandit processes (RMABs) that arises in dynamic multichannel access, user/server scheduling, and optimal activation in multiagent systems. For this class of RMABs, we establish the indexability and obtain Whittle index in closed form for both discounted and average reward criteria. These results(More)
—We formulate and study a decentralized multi-armed bandit (MAB) problem. There are distributed players competing for independent arms. Each arm, when played, offers i.i.d. reward according to a distribution with an unknown parameter. At each time, each player chooses one arm to play without exchanging observations or any information with other players.(More)
A restless multi-armed bandit problem that arises in multichannel opportunistic communications is considered, where channels are modeled as independent and identical Gilbert-Elliot channels and channel state detection is subject to errors. A simple structure of the myopic policy is established under a certain condition on the false alarm probability of the(More)
We consider a class of restless multi-armed bandit problems (RMBP) that arises in dynamic multichannel access, user/server scheduling, and optimal activation in multi-agent systems. For this class of RMBP, we establish the indexability and obtain Whittle's index in closed-form for both discounted and average reward criteria. These results lead to a direct(More)
— We consider a multi-channel opportunistic communication system where the states of these channels evolve as independent and statistically identical Markov chains (the Gilbert-Elliot channel model). A user chooses one channel to sense and access in each slot and collects a reward determined by the state of the chosen channel. The problem is to design a(More)
—We consider a cognitive radio network with distributed multiple secondary users, where each user independently searches for spectrum opportunities in multiple channels without exchanging information with others. The occupancy of each channel is modeled as an i.i.d. Bernoulli process with unknown mean. Users choosing the same channel collide, and none or(More)
—We focus on an opportunistic communication system consisting of multiple independent channels with time-varying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the state of the sensed channels. We formulate the problem of optimal sequential channel probing as a restless multi-armed bandit(More)
We focus on an opportunistic communication system consisting of multiple independent channels with time-varying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the states of the sensed channels. We formulate the problem of optimal sequential channel selection as a restless multi-armed(More)
—We consider the restless multiarmed bandit problem with unknown dynamics in which a player chooses one out of arms to play at each time. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. The performance of an arm selection policy is(More)