Symbolic Reinforcement Learning for Safe RAN Control

@inproceedings{Nikou2021SymbolicRL,
  title={Symbolic Reinforcement Learning for Safe RAN Control},
  author={Alexandros Nikou and Anusha Mujumdar and Marin Orlic and Aneta Vulgarakis Feljan},
  booktitle={AAMAS},
  year={2021}
}
In this paper, we demonstrate a Symbolic Reinforcement Learning (SRL) architecture for safe control in Radio Access Network (RAN) applications. In our automated tool, a user can select a high-level safety specifications expressed in Linear Temporal Logic (LTL) to shield an RL agent running in a given cellular network with aim of optimizing network performance, as measured through certain Key Performance Indicators (KPIs). In the proposed architecture, network safety shielding is ensured through… Expand
1 Citations

Figures from this paper

Safe RAN control: A Symbolic Reinforcement Learning Approach
TLDR
A Symbolic Reinforcement Learning (SRL) based architecture for safety control of Radio Access Network (RAN) applications is presented and the safety is ensured through model-checking techniques over combined discrete system models (automata) that are abstracted through the learning process. Expand

References

SHOWING 1-10 OF 16 REFERENCES
Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning
TLDR
It is proved that the approach toward incorporating knowledge about safe control into learning systems preserves safety guarantees, and it is demonstrated that the empirical performance benefits provided by reinforcement learning are retained. Expand
Safe Reinforcement Learning via Shielding
TLDR
A new approach to learn optimal policies while enforcing properties expressed in temporal logic by synthesizing a reactive system called a shield, which monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. Expand
A comprehensive survey on safe reinforcement learning
TLDR
This work categorize and analyze two approaches of Safe Reinforcement Learning, based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor and the incorporation of external knowledge or the guidance of a risk metric. Expand
Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes
TLDR
This work shows how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula and compute the associated belief state policy in a partially observable Markov decision process (POMDP). Expand
On the Timed Temporal Logic Planning of Coupled Multi-Agent Systems
TLDR
A decentralized abstraction that provides a space and time discretization of the multi-agent system is designed and an algorithm that computes the individual runs which provably satisfy the high-level tasks is proposed. Expand
Automatic synthesis of multi-agent motion tasks based on LTL specifications
  • S. Loizou, K. Kyriakopoulos
  • Computer Science
  • 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601)
  • 2004
TLDR
A methodology for automatically synthesizing motion task controllers based on linear temporal logic (LTL) specifications that combines the continuous dynamics of the underlying system with the automatically synthesized switching logic that enforces the LTL specification. Expand
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications. Expand
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. Expand
Off-policy Learning for Remote Electrical Tilt Optimization
TLDR
This paper proposes CMAB learning algorithms to extract optimal tilt update policies from the data and trains and evaluates these policies on real-world 4G Long Term Evolution (LTE) cellular network data, showing consistent improvements over the rule-based logging policy used to collect the data. Expand
Self-optimization of coverage and capacity based on a fuzzy neural network with cooperative reinforcement learning
TLDR
This paper proposes self-optimization of antenna tilt and power using a fuzzy neural network optimization based on reinforcement learning (RL-FNN), a central control mechanism enables cooperation-based learning by allowing distributed SON entities to share their optimization experience, represented as the parameters of learning method. Expand
...
1
2
...