Enabling comprehensive data-driven system management for large computational facilities

@article{Browne2013EnablingCD,
  title={Enabling comprehensive data-driven system management for large computational facilities},
  author={James C. Browne and Robert L. DeLeon and Charng-Da Lu and Matthew D. Jones and Steven M. Gallo and Amin Ghadersohi and Abani K. Patra and William L. Barth and John Hammond and Thomas R. Furlani and Robert T. McLay},
  journal={2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)},
  year={2013},
  pages={1-11}
}
This paper presents a tool chain, based on the open source tool TACC_Stats, for systematic and comprehensive job level resource use measurement for large cluster computers, and its incorporation into XDMoD, a reporting and analytics framework for resource management that targets meeting the information needs of users, application developers, systems administrators, systems management and funding managers. Accounting, scheduler and event logs are integrated with system performance data from… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-9 OF 9 CITATIONS

Similar Papers

Loading similar papers…