• Corpus ID: 238583315

Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families

  title={Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families},
  author={Goutham Rajendran and Bohdan Kivva and Ming Gao and Bryon Aragam},
Greedy algorithms have long been a workhorse for learning graphical models, and more broadly for learning statistical models with sparse structure. In the context of learning directed acyclic graphs, greedy algorithms are popular despite their worst-case exponential runtime. In practice, however, they are very efficient. We provide new insight into this phenomenon by studying a general greedy scorebased algorithm for learning DAGs. Unlike edge-greedy algorithms such as the popular GES and hill… 

Figures from this paper


On Learning Discrete Graphical Models using Greedy Methods
This paper studies the sparsistency, or consistency in sparsity pattern recovery, properties of a forward-backward greedy algorithm as applied to general statistical models, and applies this algorithm to learn the structure of a discrete graphical model via neighborhood estimation.
Learning directed acyclic graph models based on sparsest permutations: Learning DAG models using sparsest permutations
The sparsest permutation (SP) algorithm is proposed, showing that learning Bayesian networks is possible under strictly weaker assumptions than faithfulness, but comes at a computational price, thereby indicating a statisticalcomputational trade-off for causal inference algorithms.
Optimal Structure Identification With Greedy Search
This paper proves the so-called "Meek Conjecture", which shows that if a DAG H is an independence map of another DAG G, then there exists a finite sequence of edge additions and covered edge reversals in G such that H remains anindependence map of G and after all modifications G =H.
The max-min hill-climbing Bayesian network structure learning algorithm
The first empirical results simultaneously comparing most of the major Bayesian network algorithms against each other are presented, namely the PC, Sparse Candidate, Three Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence Search, and Greedy Search.
Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks
It is shown that ordering-based search outperforms the standard baseline, and is competitive with recent algorithms that are much harder to implement.
Learning Identifiable Gaussian Bayesian Networks in Polynomial Time and Sample Complexity
A provably polynomial-time algorithm for learning sparse Gaussian Bayesian networks with equal noise variance --- a class ofBayesian networks for which the DAG structure can be uniquely identified from observational data --- under high-dimensional settings is proposed.
Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint
A systematically analyze the theoretical properties of forward-backward greedy algorithms for solving sparse feature selection problems with general convex smooth functions and shows that FoBa-gdt outperforms other methods based on forward greedy selection and L1-regularization.
Finding Optimal Bayesian Network Given a Super-Structure
Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that…
Information Theoretic Optimal Learning of Gaussian Graphical Models
This paper constructively answer the question of the optimal number of independent observations from which a sparse Gaussian Graphical Model can be correctly recovered and proposes an algorithm, termed DICE, whose sample complexity matches the information-theoretic lower bound up to a universal constant factor.
Efficiently Learning Ising Models on Arbitrary Graphs
A simple greedy procedure allows to learn the structure of an Ising model on an arbitrary bounded-degree graph in time on the order of p2, and it is shown that for any node there exists at least one neighbor with which it has a high mutual information.