A New Open Resource Management Architecture in the Sun HPC ClusterToolsTM Environment

Abstract

Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed at http:// www.sun.com/patents and one or more additional patents or pending patent applications in the U.S. and in other countries. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the United States and other countries, exclusively licensed through X/Open Company, Ltd. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun's licensees who implement OPEN LOOK GUIs and otherwise comply with Sun's written license agreements. This article presents a new architecture for the integration of the Sun HPC ClusterTools™ parallel computing environment with distributed resource management systems such as the Sun™ Grid Engine system. This new architecture achieves a tight integration with multiple distributed resource management systems in a uniform and extensible framework, which means that any of the popular management systems may be used to launch and monitor Sun™ MPI parallel jobs. Unlike previously available loose integrations, tight integrations allow a resource manager (RM) to: s Accurately measure resources used by the parallel processes s Terminate jobs that exceed resource limits s Generate accurate accounting information for multiprocess jobs We have implemented tight integrations with Sun Grid Engine software, PBS from Veridian Systems, and LSF from Platform Computing. We provide examples showing correct resource accounting, ease of use to launch and debug Sun MPI jobs under these systems, and the improvements in behavior that result from the tight integration.

5 Figures and Tables

Cite this paper

@inproceedings{Sistare1997ANO, title={A New Open Resource Management Architecture in the Sun HPC ClusterToolsTM Environment}, author={Steve Sistare}, year={1997} }