# DAGs with NO TEARS: Continuous Optimization for Structure Learning

@inproceedings{Zheng2018DAGsWN, title={DAGs with NO TEARS: Continuous Optimization for Structure Learning}, author={Xun Zheng and Bryon Aragam and Pradeep Ravikumar and Eric P. Xing}, booktitle={Neural Information Processing Systems}, year={2018} }

Estimating the structure of directed acyclic graphs (DAGs, also known as {Bayesian networks}) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. [] Key Method This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can be efficiently solved by standard numerical algorithms, which also makes implementation effortless. The proposed method outperforms existing ones…

## 328 Citations

### Learning DAGs with Continuous Optimization

- Computer Science
- 2021

This thesis forms the problem as a purely continuous optimization program over real matrices that avoids the combinatorial constraint entirely, and extends the algebraic characterization of acyclicity to nonparametric structural equation model (SEM) by leveraging nonparametricsparsity based on partial derivatives, resulting in a continuous optimization problem that can be applied to a variety ofNonparametric and semiparametric models including GLMs, additive noise models, and index models as special cases.

### DAGs with No Curl: An Efficient DAG Structure Learning Approach

- Computer ScienceICML
- 2021

A novel learning framework to model and learn the weighted adjacency matrices in the DAG space directly and provides comparable accuracy but better efficiency than baseline DAG structure learning methods on both linear and generalized structural equation models.

### Integer Programming for Learning Directed Acyclic Graphs from Continuous Data

- Computer ScienceINFORMS Journal on Optimization
- 2021

Computational results indicate that the proposed LN formulation clearly outperforms existing mathematical formulations and scales better than available algorithms that can solve the same problem with only [Formula: see text] regularization.

### Learning DAGs without imposing acyclicity

- Computer ScienceArXiv
- 2020

It is empirically shown that solving an $\ell_1$-penalized optimization yields to good recovery of the true graph and, in general, to almost-DAG graphs.

### On the Role of Sparsity and DAG Constraints for Learning Linear DAGs

- Computer ScienceNeurIPS
- 2020

This paper studies the asymptotic roles of the sparsity and DAG constraints for learning DAG models in the linear Gaussian and non-Gaussian cases, and investigates their usefulness in the finite sample regime, and forms a likelihood-based score function that leads to an unconstrained optimization problem that is much easier to solve.

### On the Sparse DAG Structure Learning Based on Adaptive Lasso

- Computer ScienceArXiv
- 2022

This paper develops a data-driven DAG structure learning method without the predefined threshold, called adaptive NOT EARS, achieved by applying adaptive penalty levels to each parameters in the regularization term, and shows that adaptive NOTEARS enjoys the oracle properties under some specific conditions.

### On the Convergence of Continuous Constrained Optimization for Structure Learning

- Computer ScienceAISTATS
- 2022

This work reviews the standard convergence result of the ALM and shows that the required conditions are not satisfied in the recent continuous constrained formulation for learning DAGs, and establishes the convergence guarantee of QPM to a DAG solution, under mild conditions, based on a property of the DAG constraint term.

### Differentiable and Transportable Structure Learning

- Computer ScienceArXiv
- 2022

D-Struct is introduced which recovers transportability in the discovered structures through a novel architecture and loss function, while remaining completely diﬀerentiable.

### Learning Large DAGs by Combining Continuous Optimization and Feedback Arc Set Heuristics

- Computer ScienceAAAI
- 2022

This work proposes two scalable heuristics for learning DAGs in the linear structural equation case by alternating between unconstrained gradient descent-based step to optimize an objective function and solving a maximum acyclic subgraph problem to enforce acyClicity.

### Low Rank Directed Acyclic Graphs and Causal Structure Learning

- Computer ScienceArXiv
- 2020

This paper proposes a novel approach to mitigate this problem, by exploiting a low rank assumption regarding the (weighted) adjacency matrix of a DAG causal model, and shows how to adapt existing methods for causal structure learning to take advantage of this assumption.

## References

SHOWING 1-10 OF 63 REFERENCES

### Penalized estimation of directed acyclic graphs from discrete data

- Computer ScienceStat. Comput.
- 2019

A maximum penalized likelihood method to tackle Bayesian networks from discrete or categorical data, which model the conditional distribution of a node given its parents by multi-logit regression instead of the commonly used multinomial distribution.

### A Simple Approach for Finding the Globally Optimal Bayesian Network Structure

- Computer ScienceUAI
- 2006

It is shown that it is possible to learn the best Bayesian network structure with over 30 variables, which covers many practically interesting cases and offers a possibility for efficient exploration of the best networks consistent with different variable orderings.

### Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks

- Computer ScienceUAI
- 2005

It is shown that ordering-based search outperforms the standard baseline, and is competitive with recent algorithms that are much harder to implement.

### 0-PENALIZED MAXIMUM LIKELIHOOD FOR SPARSE DIRECTED ACYCLIC GRAPHS BY SARA

- Computer Science, Mathematics
- 2013

It is shown that the 0-penalized maximum likelihood estimator of a DAG has about the same number of edges as the minimal-edge I-MAP (a DAG with minimal number of edge representing the distribution), and that it converges in Frobenius norm.

### Optimal Structure Identification With Greedy Search

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2002

This paper proves the so-called "Meek Conjecture", which shows that if a DAG H is an independence map of another DAG G, then there exists a finite sequence of edge additions and covered edge reversals in G such that H remains anindependence map of G and after all modifications G =H.

### Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent

- Computer Science
- 2013

An L 1-penalized likelihood approach to estimate the structure of causal Gaussian networks is developed and it is established that model selection consistency for causalGaussian networks can be achieved with the adaptive lasso penalty and sufficient experimental interventions.

### Bayesian network learning with cutting planes

- BusinessUAI
- 2011

The problem of learning the structure of Bayesian networks from complete discrete data with a limit on parent set size is considered and it is shown that this is a particularly fast method for exact BN learning.

### Finding optimal Bayesian networks by dynamic programming

- Computer Science, Mathematics
- 2005

This paper describes a “merely” exponential space/time algorithm for finding a Bayesian network that corresponds to a global maxima of a decomposable scoring function, such as BDeu or BIC.

### Learning Graphical Model Structure Using L1-Regularization Paths

- Computer ScienceAAAI
- 2007

This paper shows how the decomposability of the MDL score, plus the ability to quickly compute entire regularization paths, allows us to efficiently pick the optimal regularization parameter on a per-node basis.

### Learning Bayesian networks with ancestral constraints

- Computer ScienceNIPS
- 2016

This work considers the problem of learning Bayesian networks optimally, when subject to background knowledge in the form of ancestral constraints, and demonstrates that the approach can be orders-of-magnitude more efficient than alternative frameworks, such as those based on integer linear programming.