# What is the expectation maximization algorithm?

@article{Do2008WhatIT, title={What is the expectation maximization algorithm?}, author={Chuong B. Do and Serafim Batzoglou}, journal={Nature Biotechnology}, year={2008}, volume={26}, pages={897-899} }

The expectation maximization algorithm arises in many computational biology applications that involve probabilistic models. What is it good for, and how does it work?

## 318 Citations

### MapReduce for Bayesian Network Parameter Learning using the EM Algorithm

- Computer Science
- 2012

Details of the MapReduce formulation of EM are presented, speed-ups versus the sequential case are reported, and various Hadoop cluster configurations in experiments with Bayesian networks of different sizes and structures are compared.

### Molecular interaction motifs in a system-wide network context: Computationally charting transient kinase-substrate phosphorylation events

- Biology
- 2016

Molecular interaction motifs in a system-wide network context are evaluated by Computationally charting transient kinase-substrate phosphorylation events and showing relationships between these motifs and kinase activity.

### A Genetic Algorithm for Learning Parameters in Bayesian Networks using Expectation Maximization

- Computer ScienceProbabilistic Graphical Models
- 2016

It is shown that GAEM provides significant speed-ups since it tends to select more fit individuals, which converge faster, as parents for the next generation, while producing better log-likelihood scores than the traditional EM algorithm.

### Variants of compound models and their application to citation analysis

- Philosophy
- 2017

A thesis submitted in partial ful lment of the
requirements of the University of Wolverhampton
for the degree of Doctor of Philosophy.

### Improving the Performance and Understanding of the Expectation Maximization Algorithm: Evolutionary and Visualization Methods

- Computer Science
- 2016

This work proposes a genetic algorithm for expectation maximization (GAEM), where it is found that small population sizes are sufficient to produce high solution quality and considerable speed-up compared to the traditional EM algorithm and develops an age-layered EM algorithm, ALEM, which enables comparisons between similarly aged EM runs and discards less promising EM runs well before their convergence.

### EM*: An EM Algorithm for Big Data

- Computer Science2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
- 2016

The strategy is to embed EM-T into a non-linear hierarchical data structure (heap) that allows us to separate data that needs to be revisited from data that does not and narrow the iteration toward the data that is more difficult to cluster.

### Belief Revision and the EM Algorithm

- Computer ScienceIPMU
- 2016

This paper provides a natural interpretation of the EM algorithm as a succession of revision steps that try to find a probability distribution in a parametric family of models in agreement with…

### Model selection in biological networks using a graphical EM algorithm

- Computer ScienceNeurocomputing
- 2019

### Using data to build a better EM: EM* for big data

- Computer ScienceInternational Journal of Data Science and Analytics
- 2017

The strategy is to embed EM-T into a nonlinear hierarchical data structure (heap) that allows us to separate data that needs to be revisited from data that does not and narrow the iteration toward the data that is more difficult to cluster.

### An expectation-maximization algorithm enables accurate ecological modeling using longitudinal microbiome sequencing data

- BiologyMicrobiome
- 2019

BEEM addresses a key bottleneck in “systems analysis” of microbiomes by enabling accurate inference of ecological models from high throughput sequencing data without the need for experimental biomass measurements.

## References

SHOWING 1-10 OF 12 REFERENCES

### How does gene expression clustering work?

- Computer ScienceNature Biotechnology
- 2005

Clustering is often one of the first steps in gene expression analysis. How do clustering algorithms work, which ones should we use and what can we expect from them?

### A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography

- MathematicsIEEE Trans. Medical Imaging
- 1995

The new method is a natural extension of the EM for maximizing likelihood with concave priors for emission tomography and convergence proofs are given.

### Testing for linkage disequilibrium in genotypic data using the Expectation-Maximization algorithm

- BiologyHeredity
- 1996

It is concluded that with highly polymorphic loci, the EM algorithm does lead to a useful test for linkage disequilibrium, but that it is necessary to find the empirical distribution of likelihood ratios in order to perform a test of significance correctly.

### An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences

- BiologyProteins
- 1990

Statistical methodology for the identification and characterization of protein binding sites in a set of unaligned DNA fragments is presented and the final motif is utilized in a search for undiscovered CRP binding sites.

### Hidden Markov models in computational biology. Applications to protein modeling.

- Biology, Computer ScienceJournal of molecular biology
- 1994

The results suggest the presence of an EF-hand calcium binding motif in a highly conserved and evolutionary preserved putative intracellular region of 155 residues in the alpha-1 subunit of L-type calcium channels which play an important role in excitation-contraction coupling.

### A statistical model for identifying proteins by tandem mass spectrometry.

- BiologyAnalytical chemistry
- 2003

A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample, and it is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications.

### Genome-wide discovery of transcriptional modules from DNA sequence and gene expression

- BiologyISMB
- 2003

The EM algorithm is used to identify transcriptional modules--sets of genes that are co-regulated in a set of experiments, through a common motif profile, and refines both the module assignment and the motif profile so as to best explain the expression data as a function of transcriptional motifs.

### Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population.

- BiologyMolecular biology and evolution
- 1995

An expectation-maximization (EM) algorithm leading to maximum-likelihood estimates of molecular haplotype frequencies under the assumption of Hardy-Weinberg proportions is implemented and appears to be useful for the analysis of nuclear DNA sequences or highly variable loci.

### RNA sequence analysis using covariance models.

- BiologyNucleic acids research
- 1994

We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence…

### THE ESTIMATION OF GENE FREQUENCIES IN A RANDOM‐MATING POPULATION

- BiologyAnnals of human genetics
- 1955

This method is applied to data on blood groups collected from villages near the mouth of the River Po, in northern Italy, in the course of an investigation on microcythaemia, and it is shown to be equivalent to maximum likelihood, and therefore fully efficient in the statistical sense.