What is the expectation maximization algorithm?

  title={What is the expectation maximization algorithm?},
  author={Chuong B. Do and Serafim Batzoglou},
  journal={Nature Biotechnology},
The expectation maximization algorithm arises in many computational biology applications that involve probabilistic models. What is it good for, and how does it work? 
Molecular interaction motifs in a system-wide network context: Computationally charting transient kinase-substrate phosphorylation events
Molecular interaction motifs in a system-wide network context are evaluated by Computationally charting transient kinase-substrate phosphorylation events and showing relationships between these motifs and kinase activity.
Improving the Performance and Understanding of the Expectation Maximization Algorithm: Evolutionary and Visualization Methods
This work proposes a genetic algorithm for expectation maximization (GAEM), where it is found that small population sizes are sufficient to produce high solution quality and considerable speed-up compared to the traditional EM algorithm and develops an age-layered EM algorithm, ALEM, which enables comparisons between similarly aged EM runs and discards less promising EM runs well before their convergence.
EM*: An EM Algorithm for Big Data
The strategy is to embed EM-T into a non-linear hierarchical data structure (heap) that allows us to separate data that needs to be revisited from data that does not and narrow the iteration toward the data that is more difficult to cluster.
Belief Revision and the EM Algorithm
This paper provides a natural interpretation of the EM algorithm as a succession of revision steps that try to find a probability distribution in a parametric family of models in agreement with
Using data to build a better EM: EM* for big data
The strategy is to embed EM-T into a nonlinear hierarchical data structure (heap) that allows us to separate data that needs to be revisited from data that does not and narrow the iteration toward the data that is more difficult to cluster.
Structure, Variation, and Reproducibility: Bayesian inference in problems arising from the study of RNA and an RNA-binding protein
A method for assessing the reproducibility of genome-scale studies that make a large number of predictions, using the prediction of ADAR binding sites as a motivating example, and structural prediction of RNA, inferring the common structural and sequence characteristics of a set of related transcripts.
Normal inverse Gaussian autoregressive model using EM algorithm
  • Monika S. Dhull, Arun Kumar
  • Computer Science
    International Journal of Advances in Engineering Sciences and Applied Mathematics
  • 2021
It is shown that NIG autoregressive model fit very well on the considered financial data and hence could be useful in modeling of various real life time-series data.
Clustering Algorithms: Their Application to Gene Expression Data
This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure.
A Probabilistic Framework for Constructing Temporal Relations in Replica Exchange Molecular Trajectories.
A probabilistic algorithm, borrowing concepts from graph theory and machine learning, to extract reactive pathways from molecular trajectories in the absence of temporal data is proposed and can be used to analyze trajectories from Monte Carlo sampling techniques and replica exchange molecular dynamics (REMD).


A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography
  • A. Pierro
  • Mathematics
    IEEE Trans. Medical Imaging
  • 1995
The new method is a natural extension of the EM for maximizing likelihood with concave priors for emission tomography and convergence proofs are given.
Testing for linkage disequilibrium in genotypic data using the Expectation-Maximization algorithm
It is concluded that with highly polymorphic loci, the EM algorithm does lead to a useful test for linkage disequilibrium, but that it is necessary to find the empirical distribution of likelihood ratios in order to perform a test of significance correctly.
An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences
Statistical methodology for the identification and characterization of protein binding sites in a set of unaligned DNA fragments is presented and the final motif is utilized in a search for undiscovered CRP binding sites.
Hidden Markov models in computational biology. Applications to protein modeling.
The results suggest the presence of an EF-hand calcium binding motif in a highly conserved and evolutionary preserved putative intracellular region of 155 residues in the alpha-1 subunit of L-type calcium channels which play an important role in excitation-contraction coupling.
Genome-wide discovery of transcriptional modules from DNA sequence and gene expression
The EM algorithm is used to identify transcriptional modules--sets of genes that are co-regulated in a set of experiments, through a common motif profile, and refines both the module assignment and the motif profile so as to best explain the expression data as a function of transcriptional motifs.
Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population.
An expectation-maximization (EM) algorithm leading to maximum-likelihood estimates of molecular haplotype frequencies under the assumption of Hardy-Weinberg proportions is implemented and appears to be useful for the analysis of nuclear DNA sequences or highly variable loci.
RNA sequence analysis using covariance models.
We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence
This method is applied to data on blood groups collected from villages near the mouth of the River Po, in northern Italy, in the course of an investigation on microcythaemia, and it is shown to be equivalent to maximum likelihood, and therefore fully efficient in the statistical sense.