Learn More
We propose algorithms for learning Markov boundaries from data without having to learn a Bayesian network first. We study their correctness, scalability and data efficiency. The last two properties are important because we aim to apply the algorithms to identify the minimal set of features that is needed for probabilistic classification in databases with(More)
This paper proposes and evaluates the k-greedy equivalence search algorithm (KES) for learning Bayesian networks (BNs) from complete data. The main characteristic of KES is that it allows a trade-off between greediness and randomness, thus exploring different good local optima when run repeatedly. When greediness is set at maximum, KES corresponds to the(More)
In this paper, we aim to compare empirically four initialization methods for the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although this algorithm is known for its robustness, it is widely reported in literature that its performance depends upon two key points: initial clustering and instance order. We conduct a series of experiments to draw(More)
This paper shows how the Bayesian network paradigm can be used in order to solve com­ binatorial optimization problems. To do it some methods of structure learning from data and simulation of Bayesian networks are in­ serted inside Estimation of Distribution Al­ gorithms (EDA). EDA are a new tool for evo­ lutionary computation in which populations of(More)
Many optimization problems are what can be called globally multimodal, i.e., they present several global optima. Unfortunately, this is a major source of difficulties for most estimation of distribution algorithms, making their effectiveness and efficiency degrade, due to genetic drift. With the aim of overcoming these drawbacks for discrete globally(More)
MOTIVATION For the last few years, Bayesian networks (BNs) have received increasing attention from the computational biology community as models of gene networks, though learning them from gene-expression data is problematic. Most gene-expression databases contain measurements for thousands of genes, but the existing algorithms for learning BNs from data do(More)
We analyze two different feature selection problems: finding a minimal feature set optimal for classification (MINIMAL-OPTIMAL) vs. finding all features relevant to the target variable (ALL-RELEVANT). The latter problem is motivated by recent applications within bioinformatics, particularly gene expression analysis. For both problems, we identify classes of(More)
We propose an algorithm for learning the Markov boundary of a random variable from data without having to learn a complete Baye-sian network. The algorithm is correct under the faithfulness assumption, scalable and data efficient. The last two properties are important because we aim to apply the algorithm to identify the minimal set of random variables that(More)
In 2007, we applied MCMC to approximately calculate the ratio of essential graphs (EGs) to directed acyclic graphs (DAGs) for up to 20 nodes. In the present paper, we extend our previous work from 20 to 31 nodes. We also extend our previous work by computing the approximate ratio of connected EGs to connected DAGs, of connected EGs to EGs, and of connected(More)