Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space

@inproceedings{Loewenstein2008EfficientAF,
  title={Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space},
  author={Yaniv Loewenstein and Elon Portugaly and Menachem Fromer and Michal Linial},
  booktitle={ISMB},
  year={2008}
}
MOTIVATION UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. APPLICATION We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without… CONTINUE READING