Hidden Markov Chains and the Analysis of Genome Structure

  title={Hidden Markov Chains and the Analysis of Genome Structure},
  author={G. Churchill},
  journal={Comput. Chem.},
Abstract In this paper, statistical methods based on a hidden Markov chain model are used to study the structure of some small complete genomes and a human genome segment. A variety of discrete compositional domains are discovered and their correlations with genome function are explored. 

Topics from this paper

Modelling Bacterial Genomes Using Hidden Markov Models
  • F. Muri
  • Computer Science
  • 1998
This work compares different identification algorithms for hidden Markov chains and presents some applications to bacterial genomes to illustrate the method. Expand
Hidden Markov models in biology.
In the course, the forward-backward, the Viterbi, the Baum-Welch (EM) algorithm, and a Metropolis sampling scheme are presented. Expand
A comparison of reversible jump MCMC algorithms for DNA sequence segmentation using hidden Markov models
This paper describes a Bayesian approach to determining the number of hidden states in a hidden Markov model (HMM) via reversible jump Markov chain Monte Carlo (MCMC) methods. Acceptance rates forExpand
A Bayesian approach to DNA sequence segmentation.
A Bayesian method is described that identifies segments by using a Markov chain governed by a hidden Markov model to segmentation of the bacteriophage lambda genome, a common benchmark sequence used for the comparison of statistical segmentation algorithms. Expand
Bayesian Restoration of a Hidden Markov Chain with Applications to DNA Sequencing
This work presents a Bayesian solution to the problem of restoring the sequence of states visited by the hidden Markov chain from a given sequence of observed outputs through the Bayesian approach to HMM restoration. Expand
Comparing the performance of a reversible jump Markov chain Monte Carlo algorithm for DNA sequences alignment
Assume that K independent copies are made from a common prototype DNA sequence whose length is a random variable. In this paper, the problem of aligning those copies and therefore the problem ofExpand
Markov models of genome segmentation.
The advantage of higher-order Markov-model-based segmentation procedures in detecting compositional inhomogeneity in chimeric DNA sequences constructed from genomes of diverse species, and in application to the E. coli K12 genome, boundaries of genomic islands, cryptic prophages, and horizontally acquired regions are accurately identified. Expand
Comparative statistical analysis of bacteria genomes in “word” context
It has been revealed that the word ranked distributions are quite well approximated by logarithmic law and the results obtained in the absent word investigation show the considerably nonrandom character of DNA texts. Expand
Finding Genes in Human DNA with a Hidden Markov Model
The initial results are highly encouraging and indicate that an HMM can form the basis of an eeective gene-nding system. Expand
Estimating dependent Binomial mixture models through reversible jump MCMC
We present a hidden Markov model of Binomial variables as a dependent mixture model and propose the reversible jump procedure to estimate the number of components and parameters of the model andExpand


Stochastic models for heterogeneous DNA sequences.
  • G. Churchill
  • Biology, Medicine
  • Bulletin of mathematical biology
  • 1989
The DNA sequence is viewed as a stochastic process with local compositional properties determined by the states of a hidden Markov chain, a discrete-state, discrete-outcome version of a general model for non-stationary time series proposed by Kitagawa (1987). Expand
Theoretical models for heterogeneity of base composition in DNA.
  • R. Elton
  • Biology, Medicine
  • Journal of theoretical biology
  • 1974
It is concluded that the heterogeneity is probably caused by variations in the relative use of synonymous codons in different genes, and a model in which the DNA consists of a sequence of “segments” with different underlying base compositions is concluded. Expand
Base compositional structure of genomes.
A significant shift in the style of domain models is suggested, in which the variation of A+T content with position is modeled by a random walk with frequent small steps rather than with large quantum jumps, to reduce the amount of computation in the assembly of large sequences from sequences of randomly chosen fragments. Expand
Sequence and organization of the human mitochondrial genome
The complete sequence of the 16,569-base pair human mitochondrial genome is presented and shows extreme economy in that the genes have none or only a few noncoding bases between them, and in many cases the termination codons are not coded in the DNA but are created post-transcriptionally by polyadenylation of the mRNAs. Expand
The genome of simian virus 40.
The nucleotide sequence of SV40 DNA was determined, and the sequence was correlated with known genes of the virus and with the structure of viral messenger RNA's. There is a limited overlap of theExpand
Statistical characterization of nucleic acid sequence functional domains.
This report investigated the statistical measures most distinctive of the various domains of the genome and then linked them to current understandings in so far as possible and suggested others. Expand
Giant G+C% mosaic structures of the human genome found by arrangement of GenBank human DNA sequences according to genetic positions.
To determine the overall variation in the G+C% distribution over long ranges of the human genome, DNA sequences of human genes, which were closely linked genetically or physically, were surveyed from the GenBank Data Bank and found that sequences within each group almost always had similar G-C% levels, but those belonging to different groups often had different levels. Expand
Complete sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome.
The bovine 12 S and 16 S Ribosomal RNA genes, when compared with those from human mitochondrial DNA, show conserved features that are consistent with proposed secondary structure models for the ribosomal RNAs. Expand
Sequence and gene organization of mouse mitochondrial DNA
The mouse mitochondrial DNA genome is highly homologous in overall sequence and in gene organization to human mitochondrial DNA, with the descending order of conserved regions being tRNA genes; origin of light-strand replication; r RNA genes; knownprotein-coding genes; unidentified protein-c coding genes; displacement-loop region. Expand
CpG-rich islands and the function of DNA methylation
It is likely that most vertebrate genes are associated with ‘HTF islands’—DNA sequences in which CpG is abundant and non-methylated. Highly tissue-specific genes, though, usually lack islands. TheExpand