Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application
The ability to accurately predict an impending hard disk failure is important for reliable storage system design. The facility provided by most hard drive manufacturers, called S.M.A.R.T. (selfmonitoring, analysis and reporting technology), has been shown by current research to have poor predictive value. The problem of finding alternatives to S.M.A.R.T. for predicting disk failure is an area of active research. In this paper, we present a rule discovery methodology, and show that it is possible to construct decision support systems that can detect such failures using information recorded in the AutoSupport database. We demonstrate the effectiveness of our system by evaluating it on disks that were returned to NetApp from the field. Our evaluation shows that our system can be tuned either to have a high failure detection rate (i.e., classify a bad disk as bad) or to have a low false alarm rate (i.e., not classify a good disk as bad). Further, our rule-based classifier generates rules that are intuitive and easy to understand, unlike black box techniques.