A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems


Categorical data fields characterized by a large number of distinct values represent a serious challenge for many classification and regression algorithms that require numerical inputs. On the other hand, these types of data fields are quite common in real-world data mining applications and often contain potentially relevant information that is difficult to… (More)
DOI: 10.1145/507533.507538


2 Figures and Tables


Citations per Year

Citation Velocity: 8

Averaging 8 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.