Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers
What is the best traffic classification method to date? Under what conditions? Why? Despite a plethora of research devoted to traffic classification and a variety of proposed traffic classification methods, the research community still does not have definitive answers to these questions, and the task of traffic classification remains unapproachable and confusing for a practitioner. Rigorous comparison of various classification methods is challenging for three reasons. First, there is no publicly available payload trace set, so every method is evaluated using a different set of locally collected payload traces. Second, existing approaches use different techniques that track different features, tune different parameters and use different definitions and categorization of applications. Third, more often than not, authors do not make their developed implementation codes publicly available once they publish their results. To address these challenges, we have conducted a comprehensive and coherent evaluation of three traffic classification approaches: port-based, behavior-based, and statistical. For each approach we selected a representative tool to test: CoralReef , BLINC , and WEKA , correspondingly. In this paper we present the results of our comparison, debunk traffic classification myths, identify caveats, and suggest practical tips.