Identifying symptoms of recurrent faults in log files of distributed information systems

@article{Reidemeister2010IdentifyingSO,
  title={Identifying symptoms of recurrent faults in log files of distributed information systems},
  author={Thomas Reidemeister and Mohammad Ahmad Munawar and Paul A. S. Ward},
  journal={2010 IEEE Network Operations and Management Symposium - NOMS 2010},
  year={2010},
  pages={187-194}
}
The manual process to identifying causes of failure in distributed information systems is difficult and time-consuming. The underlying reason is the large size and complexity of these systems, and the vast amount of monitoring data they generate. Despite its high cost, this manual process is necessary in order to avoid the detrimental consequences of system downtime. Several studies and operator practice suggest that a large fraction of the failures in these systems are caused by recurrent… CONTINUE READING