Optimal stop word selection for text mining in critical infrastructure domain

@article{Amarasinghe2015OptimalSW,
  title={Optimal stop word selection for text mining in critical infrastructure domain},
  author={Kasun Amarasinghe and Milos Manic and Ryan C. Hruska},
  journal={2015 Resilience Week (RWS)},
  year={2015},
  pages={1-6}
}
Eliminating all stop words from the feature space is a standard practice of preprocessing in text mining, regardless of the domain which it is applied to. However, this may result in loss of important information, which adversely affects the accuracy of the text mining algorithm. Therefore, this paper proposes a novel methodology for selecting the optimal set of domain specific stop words for improved text mining accuracy. First, the presented methodology retains all the stop words in the text… CONTINUE READING