A Survey on Applications of Model-Free Strategy Learning in Cognitive Wireless Networks
We consider how two secondary users should interact to maximize their total throughput in a twochannel sensing-based opportunistic spectrum access network where spectrum opportunities are time varying and spatially inhomogeneous. By modeling the occupancy of the primary users as discrete-time Markov chains, we obtain the optimal dynamic coordination policy using a partially observable Markov decision process (POMDP) solver. We also develop several tractable approaches a cooperative multiuser approach based on explicit communication between the secondary users, a learning-based approach involving use of collision feedback information, and a single-user approach based on uncooperative independent decisions. As a baseline we consider the static partitioning policy where both users are allocated a single channel of their own. Simulations comparing the performance of these strategies yield several interesting findings: that significant improvements over static partitioning are possible with the optimal scheme; that the cooperative multiuser approach shows near-optimal performance in all cases; that there are scenarios when learning through collision feedback can be beneficial; and that the single-user approach generally shows poor performance.