Hui-Fang Wen

Learn More
Modeling of the heart ventricles is one of the most challenging tasks in soft tissue mechanics because cardiac tissue is a strongly anisotropic incompressible material with an active component of stress. In most current approaches with active force, the number of degrees of freedom (DOF) is limited by the direct method of solution of linear systems of(More)
We have developed the capability to rapidly simulate cardiac electrophysiological phenomena in a human heart discretised at a resolution comparable with the length of a cardiac myocyte. Previous scientific investigation has generally invoked simplified geometries or coarse-resolution hearts, with simulation duration limited to 10s of heartbeats. Using(More)
On todays massively parallel processing (MPP) supercomputers, it is increasingly important to understand I/O performance of an application both to guide scalable application development and to tune its performance. These two critical steps are often enabled by performance analysis tools to obtain performance data on thousands of processors in an MPP system.(More)
Our productivity centered performance tuning framework for HPC applications comprises of three main components: (1) a versatile source code, performance metrics, and performance data visualization and analysis graphical user interface, (2) a unique source code and binary instrumentation engine, and (3) an array of data collection facilities to gather(More)
In this paper, we present the architecture design and implementation of a framework for automated performance bottleneck detection. The framework analyzes the time-spent distribution in the application and discovers the performance bottlenecks by using given bottleneck definitions. The user can query the application execution performance to identify(More)
We have developed a highly efficient and scalable cardiac electrophysiology simulation capability that supports groundbreaking resolution and detail to elucidate the mechanisms of sudden cardiac death from arrhythmia. We can simulate thousands of heartbeats at a resolution of 0.1 mm, comparable to the size of cardiac cells, thereby enabling scientific(More)
Memory access latency is often a crucial performance limitation for high performance computing. Prefetching is one of the strategies used by system designers to bridge the processor-memory gap. This paper describes a new innovative list prefetching feature introduced in the IBM Blue Gene/Q supercomputer. The list prefetcher records the L1 cache miss(More)
Applications on today's massively parallel supercomputers are often guided with performance analysis tools toward scalable performance on thousands of processors. However, conventional tools for parallel performance analysis have serious problems due to the large data volume that needs to be handled. In this paper, we discuss the scalability issue for MPI(More)
We present an application-level I/O caching, prefetching, asynchronous system to hide access latency experienced by HPC applications. Our solution of user controllable caching and prefetching system maintains a file-IO cache in the user space of the application, analyzes the I/O access patterns, prefetches requests, and performs write-back of dirty data to(More)