• Corpus ID: 242757315

Multi-task Learning of Order-Consistent Causal Graphs

  title={Multi-task Learning of Order-Consistent Causal Graphs},
  author={Xinshi Chen and Haoran Sun and Caleb Ellington and Eric P. Xing and Le Song},
We consider the problem of discovering K related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a l 1 /l 2 regularized maximum likelihood estimator (MLE) for learning K linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the… 

Figures and Tables from this paper

On the Convergence of Continuous Constrained Optimization for Structure Learning

This work reviews the standard convergence result of the ALM and shows that the required conditions are not satisfied in the recent continuous constrained formulation for learning DAGs, and establishes the convergence guarantee of QPM to a DAG solution, under mild conditions, based on a property of the DAG constraint term.



High-dimensional joint estimation of multiple directed Gaussian graphical models

It is proved that under certain regularity conditions, the proposed $\ell_0$-penalized maximum likelihood estimator converges in Frobenius norm to the adjacency matrices consistent with the data-generating distributions and has the correct sparsity.

Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression

The main results establish support recovery guarantees and deviation bounds for a family of penalized least-squares estimators under concave regularization without assuming prior knowledge of a variable ordering.

Integer Programming for Learning Directed Acyclic Graphs from Continuous Data

Computational results indicate that the proposed LN formulation clearly outperforms existing mathematical formulations and scales better than available algorithms that can solve the same problem with only [Formula: see text] regularization.

Globally optimal score-based learning of directed acyclic graphs in high-dimensions

We prove that $\Omega(s\log p)$ samples suffice to learn a sparse Gaussian directed acyclic graph (DAG) from data, where $s$ is the maximum Markov blanket size. This improves upon recent results that

DAGs with NO TEARS: Continuous Optimization for Structure Learning

This paper forms the structure learning problem as a purely continuous optimization problem over real matrices that avoids this combinatorial constraint entirely and achieves a novel characterization of acyclicity that is not only smooth but also exact.

Masked Gradient-Based Causal Structure Learning

A masked gradient-based structure learning method based on binary adjacency matrix that exists for any structural equation model that can readily include any differentiable score function and model function for learning causal structures is proposed.

Inferring large graphs using l1-penalized likelihood

A novel procedure based on a specific formulation of the l1-norm regularized maximum likelihood is proposed, which decomposes the graph estimation into two optimization sub-problems: topological structure and node order learning.

Learning Sparse Nonparametric DAGs

A completely general framework for learning sparse nonparametric directed acyclic graphs (DAGs) from data is developed that can be applied to general nonlinear models, general differentiable loss functions, and generic black-box optimization routines.


It is shown that the 0-penalized maximum likelihood estimator of a DAG has about the same number of edges as the minimal-edge I-MAP (a DAG with minimal number of edge representing the distribution), and that it converges in Frobenius norm.

DAG-GNN: DAG Structure Learning with Graph Neural Networks

A deep generative model is proposed and a variant of the structural constraint to learn the DAG is applied that learns more accurate graphs for nonlinearly generated samples; and on benchmark data sets with discrete variables, the learned graphs are reasonably close to the global optima.