Mammoth Data in the Cloud: Clustering Social Images

  title={Mammoth Data in the Cloud: Clustering Social Images},
  author={Judy Qiu and Bingjing Zhang},
  booktitle={High Performance Computing Workshop},
Social image datasets have grown to dramatic size with images classified in vector spaces with high dimension (512-2048) and with potentially billions of images and corresponding classification vectors. We study the challenging problem of clustering such sets into millions of clusters using Iterative MapReduce. We introduce a new Kmeans algorithm in the Map phase which can tackle the challenge of large cluster and dimension size. Further we stress that the necessary parallelism of such data… CONTINUE READING
Highly Cited
This paper has 23 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 15 extracted citations


Publications referenced by this paper.
Showing 1-10 of 36 references

and R

  • E. Chan, M. Heimlich, A. Purkayastha
  • A. van de Geijn. Collective communication: theory…
  • 2007
Highly Influential
6 Excerpts

Scientists See Promise in Deep-Learning Programs

  • John Markoff
  • New York Times, November
  • 2012
1 Excerpt

Similar Papers

Loading similar papers…