# What is the expectation maximization algorithm?

@article{Do2008WhatIT, title={What is the expectation maximization algorithm?}, author={Chuong B. Do and Serafim Batzoglou}, journal={Nature Biotechnology}, year={2008}, volume={26}, pages={897-899} }

The expectation maximization algorithm arises in many computational biology applications that involve probabilistic models. What is it good for, and how does it work?

## 323 Citations

Molecular interaction motifs in a system-wide network context: Computationally charting transient kinase-substrate phosphorylation events

- Biology
- 2016

Molecular interaction motifs in a system-wide network context are evaluated by Computationally charting transient kinase-substrate phosphorylation events and showing relationships between these motifs and kinase activity.

Improving the Performance and Understanding of the Expectation Maximization Algorithm: Evolutionary and Visualization Methods

- Computer Science
- 2016

This work proposes a genetic algorithm for expectation maximization (GAEM), where it is found that small population sizes are sufficient to produce high solution quality and considerable speed-up compared to the traditional EM algorithm and develops an age-layered EM algorithm, ALEM, which enables comparisons between similarly aged EM runs and discards less promising EM runs well before their convergence.

EM*: An EM Algorithm for Big Data

- Computer Science2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
- 2016

The strategy is to embed EM-T into a non-linear hierarchical data structure (heap) that allows us to separate data that needs to be revisited from data that does not and narrow the iteration toward the data that is more difficult to cluster.

Belief Revision and the EM Algorithm

- Computer ScienceIPMU
- 2016

This paper provides a natural interpretation of the EM algorithm as a succession of revision steps that try to find a probability distribution in a parametric family of models in agreement with…

Model selection in biological networks using a graphical EM algorithm

- Computer ScienceNeurocomputing
- 2019

Using data to build a better EM: EM* for big data

- Computer ScienceInternational Journal of Data Science and Analytics
- 2017

The strategy is to embed EM-T into a nonlinear hierarchical data structure (heap) that allows us to separate data that needs to be revisited from data that does not and narrow the iteration toward the data that is more difficult to cluster.

Structure, Variation, and Reproducibility: Bayesian inference in problems arising from the study of RNA and an RNA-binding protein

- Biology
- 2014

A method for assessing the reproducibility of genome-scale studies that make a large number of predictions, using the prediction of ADAR binding sites as a motivating example, and structural prediction of RNA, inferring the common structural and sequence characteristics of a set of related transcripts.

Normal inverse Gaussian autoregressive model using EM algorithm

- Computer ScienceInternational Journal of Advances in Engineering Sciences and Applied Mathematics
- 2021

It is shown that NIG autoregressive model fit very well on the considered financial data and hence could be useful in modeling of various real life time-series data.

Clustering Algorithms: Their Application to Gene Expression Data

- Computer ScienceBioinformatics and biology insights
- 2016

This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure.

A Probabilistic Framework for Constructing Temporal Relations in Replica Exchange Molecular Trajectories.

- Computer ScienceJournal of chemical theory and computation
- 2018

A probabilistic algorithm, borrowing concepts from graph theory and machine learning, to extract reactive pathways from molecular trajectories in the absence of temporal data is proposed and can be used to analyze trajectories from Monte Carlo sampling techniques and replica exchange molecular dynamics (REMD).

## References

SHOWING 1-10 OF 12 REFERENCES

A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography

- MathematicsIEEE Trans. Medical Imaging
- 1995

The new method is a natural extension of the EM for maximizing likelihood with concave priors for emission tomography and convergence proofs are given.

Testing for linkage disequilibrium in genotypic data using the Expectation-Maximization algorithm

- BiologyHeredity
- 1996

It is concluded that with highly polymorphic loci, the EM algorithm does lead to a useful test for linkage disequilibrium, but that it is necessary to find the empirical distribution of likelihood ratios in order to perform a test of significance correctly.

An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences

- BiologyProteins
- 1990

Statistical methodology for the identification and characterization of protein binding sites in a set of unaligned DNA fragments is presented and the final motif is utilized in a search for undiscovered CRP binding sites.

Hidden Markov models in computational biology. Applications to protein modeling.

- Biology, Computer ScienceJournal of molecular biology
- 1994

The results suggest the presence of an EF-hand calcium binding motif in a highly conserved and evolutionary preserved putative intracellular region of 155 residues in the alpha-1 subunit of L-type calcium channels which play an important role in excitation-contraction coupling.

Genome-wide discovery of transcriptional modules from DNA sequence and gene expression

- BiologyISMB
- 2003

The EM algorithm is used to identify transcriptional modules--sets of genes that are co-regulated in a set of experiments, through a common motif profile, and refines both the module assignment and the motif profile so as to best explain the expression data as a function of transcriptional motifs.

Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population.

- BiologyMolecular biology and evolution
- 1995

An expectation-maximization (EM) algorithm leading to maximum-likelihood estimates of molecular haplotype frequencies under the assumption of Hardy-Weinberg proportions is implemented and appears to be useful for the analysis of nuclear DNA sequences or highly variable loci.

RNA sequence analysis using covariance models.

- BiologyNucleic acids research
- 1994

We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence…

THE ESTIMATION OF GENE FREQUENCIES IN A RANDOM‐MATING POPULATION

- BiologyAnnals of human genetics
- 1955

This method is applied to data on blood groups collected from villages near the mouth of the River Po, in northern Italy, in the course of an investigation on microcythaemia, and it is shown to be equivalent to maximum likelihood, and therefore fully efficient in the statistical sense.