BayesMD: Flexible Biological Modeling for Motif Discovery

  title={BayesMD: Flexible Biological Modeling for Motif Discovery},
  author={Man-Hung Eric Tang and Anders Krogh and Ole Winther},
  journal={Journal of computational biology : a journal of computational molecular cell biology},
  volume={15 10},
We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained on transcription factor (TF) databases in order to extract the typical properties of TF binding sites. In a similar fashion we train organism-specific priors for the background sequences. Lastly, we use a prior… 

Figures and Tables from this paper

Metamotifs - a generative model for building families of nucleotide position weight matrices
A probabilistic model for position weight matrix (PWM) sequence motif families that describes recurring familial patterns in a set of motifs and has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference.
coMOTIF: a mixture framework for identifying transcription factor and a coregulator motif in ChIP-seq Data
A finite mixture framework with an expectation-maximization algorithm that considers two motifs jointly and simultaneously determines which sequences contain both motifs, either one or neither of them is presented.
DISPARE: DIScriminative PAttern REfinement for Position Weight Matrices
A novel algorithm for the refinement of position weight matrices representing transcription factor binding sites based on experimental data, including ChIP-chip analyses is described, which significantly improves the sensitivity and specificity of matrix models for identifying transcription factorbinding sites.
Motif discovery and transcription factor binding sites before and after the next-generation sequencing era
ChIP, applied to transcription factors and coupled with genome tiling arrays or next-generation sequencing technologies (ChIP-Seq) has opened new avenues in research, as well as posed new challenges to bioinformaticians developing algorithms and methods for motif discovery.
Inferring the Binding Preferences of RNA-binding Proteins
A protocol to design a complex RNA pool that represents diverse sets of sequence and structure elements to be used in an in vitro assay to efficiently measure RBP binding preferences and a computational models to learn binding preferences of RBPs from large-scale data are developed.
PairMotif+: A Fast and Effective Algorithm for De Novo Motif Discovery in DNA sequences
Experimental results show that PairMotif+ can solve various (l, d) problems within an hour on a PC with 2.67 GHz processor, and has a better identification accuracy than the compared algorithms MEME, AlignACE and VINE.
Protocol S1: A brief review of DNA motif finding
  • 2010
Motif discovery programs
  • Computer Science, Biology
  • 2009
The main feature of this BayesMD method is the use of a positional prior that provides a priori information about the location and number of occurrences of the sought motifs.


LOGOS: a modular Bayesian model for de novo motif detection
LOGOS is presented, an integrated LOcal and GlObal motif Sequence model for biopolymer sequences, which provides a principled framework for developing, modularizing, extending and computing expressive motif models for complexBiopolymer sequence analysis.
NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence
This work investigates the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compares it with the popular MEME program to show that the new method is significantly more sensitive than MEME.
PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny
A new motif sampling algorithm, PhyloGibbs, that runs on arbitrary collections of multiple local sequence alignments of orthologous sequences, which performs significantly better than four other motif-finding algorithms, including algorithms that also take phylogeny into account.
MotifPrototyper: A Bayesian profile model for motif families
  • E. XingR. Karp
  • Biology, Computer Science
    Proc. Natl. Acad. Sci. USA
  • 2004
In this article, we address the problem of modeling generic features of structurally but not textually related DNA motifs, that is, motifs whose consensus sequences are entirely different but
A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
A dynamic Bayesian model for motifs in biopolymer sequences which captures rich biological prior knowledge and positional dependencies in motif structure in a principled way is proposed and has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns.
Bayesian inference on biopolymer models
This paper presents a tutorial style description of a Bayesian inference procedure for segmentation of a sequence based on the heterogeneity in its composition, and shows how this goal can be achieved for most bioinformatics methods that use dynamic programming.
A Gibbs Sampling Method to Detect Overrepresented Motifs in the Upstream Regions of Coexpressed Genes
Two modifications of the original Gibbs sampling algorithm for motif finding are presented: the use of a probability distribution to estimate the number of copies of the motif in a sequence and the technical aspects of the incorporation of a higher-order background model.
A survey of motif discovery methods in an integrated framework
A survey of methods for motif discovery in DNA, based on a structured and well defined framework that integrates all relevant elements, shows that although no single method takes allrelevant elements into consideration, a very large number of different models treating the various elements separately have been tried.
Computational motif discovery
This article focuses on the computational prediction of protein binding sites in nucleotide sequences, and three types of motif model are described: consensus, IUPAC, and weight matrix.
Finding functional sequence elements by multiple local alignment.
Two theoretical contributions to this classic but unsolved problem are presented here: a method to determine the width of the aligned motif automatically; and a technique for calculating the statistical significance of alignments, i.e. whether the alignments are stronger than those that would be expected to occur by chance among random, unrelated sequences.