A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems


Categorical data fields characterized by a large number of distinct values represent a serious challenge for many classification and regression algorithms that require numerical inputs. On the other hand, these types of data fields are quite common in real-world data mining applications and often contain potentially relevant information that is difficult to… (More)
DOI: 10.1145/507533.507538


