• Corpus ID: 239998332

Deep Reinforcement Learning for Simultaneous Sensing and Channel Access in Cognitive Networks

  title={Deep Reinforcement Learning for Simultaneous Sensing and Channel Access in Cognitive Networks},
  author={Yoel Bokobza and Ron Dabora and Kobi Cohen},
We consider the problem of dynamic spectrum access (DSA) in cognitive wireless networks, where only partial observations are available to the users due to narrowband sensing and transmissions. The cognitive network consists of primary users (PUs) and a secondary user (SU), which operate in a time duplexing regime. The traffic pattern for each PU is assumed to be unknown to the SU and is modeled as a finite-memory Markov chain. Since observations are partial, then both channel sensing and access… 

Figures and Tables from this paper


Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks
This work considers a dynamic multichannel access problem, where multiple correlated channels follow an unknown joint Markov model and users select the channel to transmit data, and proposes an adaptive DQN approach with the capability to adapt its learning in time-varying scenarios.
Deep Reinforcement Learning for Dynamic Multichannel Access
The concept of online learning is applied and a Deep Q-Network (DQN) is implemented that can deal with large state space without any prior knowledge of the system dynamics and has the capability to learn a good policy in complex real scenarios, which do not necessarily show Markovian dynamics.
Deep Q-Learning with Multiband Sensing for Dynamic Spectrum Access
This work proposes to use the deep Q-learning method to learn a state-action value function that determines an access policy from the observed states of all channels, and demonstrates through experiments that the learning-based policies consistently achieve performances that are close to the optimal ones.
Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access
A novel distributed dynamic spectrum access algorithm based on deep multi-user reinforcement leaning is developed for accessing the spectrum that maximizes a certain network utility in a distributed manner without online coordination or message exchanges between users.
Dealing with Partial Observations in Dynamic Spectrum Access: Deep Recurrent Q-Networks
  • Y. Xu, J. Yu, R. Buehrer
  • Computer Science
    MILCOM 2018 - 2018 IEEE Military Communications Conference (MILCOM)
  • 2018
This paper develops a technique to allow a secondary radio node to share the spectrum with multiple existing (primary) radio nodes in the presence of partial observations and no a priori knowledge of the primary nodes' behaviors.
The Application of Deep Reinforcement Learning to Distributed Spectrum Access in Dynamic Heterogeneous Environments With Partial Observations
This papera1 investigates deep reinforcement learning (DRL) based on a Recurrent Neural Network (RNN) for Dynamic Spectrum Access (DSA) under partial observations, referred to as a Deep Recurrent Q-Network (DRQN), and shows the following benefits of using recurrent neural networks in DSA.
Deep Reinforcement Learning for Joint Channel Selection and Power Control in D2D Networks
A distributed deep reinforcement learning (DRL)-based scheme is proposed, with which D2D pairs can autonomously optimize channel selection and transmit power by only exploiting local information and outdated nonlocal information, which can achieve better scalability and reduce signalling overheads significantly.
An Order Optimal Policy for Exploiting Idle Spectrum in Cognitive Radio Networks
An index policy, in which the index of a frequency band comprises a sample mean term and a recency-based exploration bonus term, is proposed, which provides often improved performance at low complexity over other state-of-the-art policies in the literature.
A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access
This work employs the proposed framework as a single agent in the single-user case, and extends it to a decentralized multi-agent framework in the multi-user scenario, and develops algorithms for the actor-critic deep reinforcement learning and evaluates the proposed learning policies via experiments and numerical results.
Optimality of Myopic Sensing in Multichannel Opportunistic Access
It is shown that a myopic policy that maximizes the immediate one-step reward is optimal when the state transitions are positively correlated over time and when the number of channels is limited to two or three, while presenting a counterexample for the case of four channels.