Causal Bandits for Linear Structural Equation Models

  title={Causal Bandits for Linear Structural Equation Models},
  author={Burak Varici and Karthikeyan Shanmugam and Prasanna Sattigeri and Ali Tajer},
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model to minimize the cumulative regret with respect to the best intervention in hindsight. This is, naturally, posed as a causal bandit problem. The focus is on causal bandits for linear structural equation models (SEMs) and soft interventions. It is assumed that the graph’s structure is known, and it has N nodes. Two linear mechanisms, one soft intervention and one observational, are assumed… 

Figures from this paper

Combinatorial Causal Bandits without Graph Skeleton

An exponential lower bound of cumulative regrets for the CCB problem on general causal models is provided and a regret minimization algorithm for BGLMs even without the graph skeleton is designed and shows that it still achieves O ( √ T ln T ) expected regret.

Model-based Causal Bayesian Optimization

This work proposes the model-based causal Bayesian optimization algorithm (MCBO) that learns a full system model instead of only modeling intervention-reward pairs and bound its cumulative regret, and obtains the first non-asymptotic bounds for CBO.



Adaptively Exploiting d-Separators with Causal Bandits

This work formalize and study the notion of adaptivity, and provides a novel algorithm that simultaneously achieves (a) optimal regret when a d -separator is observed, improving on classical minimax algorithms, and (b) significantly smaller regret than recent causal bandit algorithms when the observed variables are not a d-separator.

Improved Algorithms for Linear Stochastic Bandits

A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.

Budgeted and Non-budgeted Causal Bandits

This work studies the problem of learning best interventions without budget constraint in general graphs and gives an algorithm that achieves constant expected cumulative regret in terms of the instance parameters when the parent distribution of the reward variable for each intervention is known.

Regret Analysis of Bandit Problems with Causal Background Knowledge

It is observed that even with a few hundreds of iterations, the regret of causal algorithms is less than that of standard algorithms by a factor of three, and under certain causal structures, these algorithms scale better than the standard bandit algorithms as the number of interventions increases.

Causal Bandits: Learning Good Interventions via Causal Inference

A new algorithm is proposed that exploits the causal feedback and proves a bound on its simple regret that is strictly better (in all quantities) than algorithms that do not use the additional causal information.

Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges

It is proved that the adjacency matrix and the Laplacian of that random graph are concentrated around the corresponding matrices of the weighted graph whose edge weights are the probabilities in the random model.

Identifying Best Interventions through Online Importance Sampling

This work poses this as a best arm identification bandit problem with K arms where each arm is a soft intervention at V, and leverages the information leakage among the arms to provide the first gap dependent error and simple regret bounds for this problem.

Pure Exploration of Causal Bandits

This work provides first gap-dependent fully adaptive fully adaptive pure exploration algorithms on three types of causal models including parallel graphs, general graphs with small number of backdoor parents, and binary generalized linear models.

Combinatorial Causal Bandits

A tradeoff is shown between the MLE-based BGLM-OFU algorithm and the linear-regression-based algorithm on BLMs: the latter removes the assumption needed by the former but has an extra factor in the regret bound, and a new algorithm and its regret bound based on the linear regression method are shown.

Scalable Intervention Target Estimation in Linear Models

A scalable and efficient algorithm that consistently identifies all intervention targets in a causal directed acyclic graph from observational and interventional data is proposed and consistency, Markov equivalency, and sample complexity are established analytically.