Learn More
We propose algorithms for learning Markov boundaries from data without having to learn a Bayesian network first. We study their correctness, scalability and data efficiency. The last two properties are important because we aim to apply the algorithms to identify the minimal set of features that is needed for probabilistic classification in databases with(More)
Using deep sequencing (deepCAGE), the FANTOM4 study measured the genome-wide dynamics of transcription-start-site usage in the human monocytic cell line THP-1 throughout a time course of growth arrest and differentiation. Modeling the expression dynamics in terms of predicted cis-regulatory sites, we identified the key transcription regulators, their(More)
We analyze two different feature selection problems: finding a minimal feature set optimal for classification (MINIMAL-OPTIMAL) vs. finding all features relevant to the target variable (ALL-RELEVANT). The latter problem is motivated by recent applications within bioinformatics, particularly gene expression analysis. For both problems, we identify classes of(More)
Complete repertoires of molecular activity in and between tissues provided by new high-dimensional "omics" technologies hold great promise for characterizing human physiology at all levels of biological hierarchies. The combined effects of genetic and environmental perturbations at any level of these hierarchies can lead to vicious cycles of pathology and(More)
Triglyceride-rich lipoproteins (TRLs) that are modified during alimentary lipemia and their remnants are indicated to play an important role in the development of atherosclerosis. Although recent studies in transgenic and gene knock-out animal models have shed new light on the function of different apolipoproteins (apos) in the metabolism of TRLs and on(More)
MOTIVATION For the last few years, Bayesian networks (BNs) have received increasing attention from the computational biology community as models of gene networks, though learning them from gene-expression data is problematic. Most gene-expression databases contain measurements for thousands of genes, but the existing algorithms for learning BNs from data do(More)
Increased baseline values of the acute-phase reactant C-reactive protein (CRP) are significantly associated with future cardiovascular disease, and some in vitro studies have claimed that human CRP (hCRP) has proatherogenic effects. in vivo studies in apolipoprotein E-deficient mouse models, however, have given conflicting results. We bred(More)
BACKGROUND Exaggerated postprandial triglyceridemia is common in normolipidemic patients with coronary artery disease (CAD). Alterations in the composition of triglyceride-rich lipoproteins (TRLs) are likely to underlie this metabolic disturbance. However, the composition of very-low-density lipoproteins (VLDLs), which are the most abundant postprandial(More)
OBJECTIVES Remodeling of extracellular matrix (ECM) plays an important role in inflammatory disorders such as atherosclerosis. ADAMTS (a disintegrin and metalloproteinase with thrombospondin motifs) is a recently described family of proteinases that is able to degrade the ECM proteins aggrecan and versican expressed in blood vessels. The purpose of the(More)
The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive.(More)