Massively Parallel Unsupervised Feature Selection on Spark

  title={Massively Parallel Unsupervised Feature Selection on Spark},
  author={Bruno Ordozgoiti Rubio and Sandra G{\'o}mez Canaval and Alberto Mozo},
High dimensional data sets pose important challenges such as the curse of dimensionality and increased computational costs. Dimensionality reduction is therefore a crucial step for most data mining applications. Feature selection techniques allow us to achieve said reduction. However, it is nowadays common to deal with huge data sets, and most existing feature selection algorithms are designed to function in a centralized fashion, which makes them non scalable. Moreover, some of them require… CONTINUE READING


Publications referenced by this paper.
Showing 1-10 of 21 references

Minimum Redundancy Maximum Relevance: MapReduce implementation using Apache Hadoop.

  • Reggiani, Claudio
  • 2014
1 Excerpt

, and A . Alonso - Betanzos . ” Distributed feature selection : An application to microarray data classification

  • V. Boln-Canedo, N. Snchez-Maroo
  • ” A scalable approach to column - based low…
  • 2013

Similar Papers

Loading similar papers…