Learn More
This paper presents a hierarchical clustering method named RACHET (Recursive Agglomeration of Clustering Hierarchies by Encircling Tactic) for analyzing multi-dimensional distributed data. A typical clustering algorithm requires bringing all the data in a centralized warehouse. This results in O(nd) transmission cost, where n is the number of data points(More)
System- and application-level failures could be characterized by analyzing relevant log files. The resulting data might then be used in numerous studies on and future developments for the mission-critical and large scale computational architecture, including fields such as failure prediction, reliability modeling, performance modeling and power awareness.(More)
We address the difficulty involved in obtaining meaningful measurements of I/O performance in HPC applications, as well as the further challenge of understanding the causes of I/O bottlenecks in these applications. The need for I/O optimization is critical given the difficulty in scaling I/O to ever increasing numbers of processing cores. To address this(More)
This paper presents a novel algorithm for identification and functional characterization of "key" genome features responsible for a particular biochemical process of interest. The central idea is that individual genome features are identified as "key" features if the discrimination accuracy between two classes of genomes with respect to a given biochemical(More)
We describe a new method for computing a global principal component analysis (PCA) for the purpose of dimension reduction in data distributed across several locations. We assume that a virtual n × p (items × features) data matrix is distributed by blocks of rows (items), where n > p and the distribution among s locations is determined by a given(More)
Overview: The tutorial will introduce attendees to high performance computing concepts for dealing with big data using R, particularly on large distributed platforms. We will describe the use of the " programming with big data in R " (pbdR) package ecosystem by presenting several examples of varying complexity. Our packages provide infrastructure to use and(More)
Systemic pathways-oriented approaches to analysis of metabolic networks are effective for small networks but are computationally infeasible for genome scale networks. Current computational approaches to this analysis are based on the mathematical principles of convex analysis. The enumeration of a complete set of " systemicallyindependent "(More)
In order to address anticipated high failure rates, resiliency characteristics have become an urgent priority for next-generation extreme-scale high-performance computing (HPC) systems. This poster describes our past and ongoing efforts in novel fault resilience technologies for HPC. Presented work includes proactive fault resilience techniques, system and(More)
FastMap is a dimension reduction technique that operates on distances between objects. Although only distances are used, implicitly the technique assumes that the objects are points in a p-dimensional Euclidean space. It selects a sequence of k /spl les/ p orthogonal axes defined by distant pairs of points (called pivots) and computes the projection of the(More)