Is preprocessing of text really worth your time for online comment classification?

@article{Mohammad2018IsPO,
  title={Is preprocessing of text really worth your time for online comment classification?},
  author={Fahim Mohammad},
  journal={CoRR},
  year={2018},
  volume={abs/1806.02908}
}
A large proportion of online comments present on public domains are usually constructive, however a significant proportion are toxic in nature. The comments contain lot of typos which increases the number of features manifold, making the ML model difficult to train. Considering the fact that the data scientists spend approximately 80% of their time in collecting, cleaning and organizing their data [1], we explored how much effort should we invest in the preprocessing (transformation) of raw… CONTINUE READING
Recent Discussions
This paper has been referenced on Twitter 2 times over the past 90 days. VIEW TWEETS