Learn More
MOTIVATION We propose a new class of variable-order Bayesian network (VOBN) models for the identification of transcription factor binding sites (TFBSs). The proposed models generalize the widely used position weight matrix (PWM) models, Markov models and Bayesian network models. In contrast to these models, where for each position a fixed subset of the(More)
Abstract Document collections evolve over time, new topics emerge and old ones decline. At the same time, the terminology evolves as well. Much literature is devoted to topic evolution in nite document sequences assuming a xed vocabulary. In this study, we propose \Topic Monitor" for the monitoring and understanding of topic and vocabulary evolution over an(More)
We introduce inhomogeneous parsimonious Markov models for modeling statistical patterns in discrete sequences. These models are based on parsimonious context trees, which are a generalization of context trees, and thus generalize variable order Markov models. We follow a Bayesian approach, consisting of structure and parameter learning. Structure learning(More)
Jstacs is an object-oriented Java library for analysing and classifying sequence data, which emerged from the need for a standardized implementation of statistical models, learning principles, classifiers, and performance measures. In Jstacs, these components can be used, combined, and extended easily, which allows for a direct comparison of different(More)
Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to(More)
Probabilistic latent semantic indexing (PLSI) represents documents of a collection as mixture proportions of latent topics, which are learned from the collection by an expectation maximization (EM) algorithm. New documents or queries need to be folded into the latent topic space by a simplified version of the EM-algorithm. During PLSI- Folding-in of a new(More)
Variable order Markov models and variable order Bayesian trees have been proposed for the recognition of cis-regulatory elements, and it has been demonstrated that they outperform traditional models such as position weight matrices, Markov models, and Bayesian trees for the recognition of binding sites in prokaryotes. Here, we study to which degree variable(More)
Many different computer programs for the prediction of transcription factor binding sites have been developed over the last decades. These programs differ from each other by pursuing different objectives and by taking into account different sources of information. For methods based on statistical approaches, these programs differ at an elementary level from(More)
2 Bayesian Baum-Welch algorithm 5 2.1 Basics of the Bayesian Baum-Welch algorithm . . . . . . . . . . . . . . . . . . . 5 2.2 Baum’s auxiliary function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Estimation of initial state probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Estimation of transition probabilities .(More)