Sayantan Chakravorty

Learn More
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all processors to previous checkpoints after a crash. This wastes a significant amount of computation as all processors have to redo all the computation from that checkpoint onwards.(More)
Failures are likely to be more frequent in systems with thousands of processors. Therefore, schemes for dealing with faults become increasingly important. In this paper, we present a fault tolerance solution for parallel applications that proactively migrates execution from processors where failure is imminent. Our approach assumes that some failures are(More)
Summary form only given. As parallel machines grow larger, the mean time between failure shrinks. With the planned machines of near future, therefore, fault tolerance will become an important issue. The traditional method of dealing with faults is to checkpoint the entire application periodically and to start from the last checkpoint. However, such a(More)
Unstructured meshes are used in many engineering applications with irregular domains, from elastic deformation problems to crack propagation to fluid flow. Because of their complexity and dynamic behavior , the development of scalable parallel software for these applications is challenging. The Charm++ Parallel Framework for Unstructured Meshes allows one(More)
High-performance systems with thousands of processors have been introduced in the recent past, and systems with hundreds of thousands of processors should become available in the near future. Since failures are likely to be frequent in such systems, schemes for dealing with faults are important. In this paper, we introduce a new fault tolerance solution for(More)
Finite element simulations of dynamic fracture problems usually require very fine discretizations in the vicinity of the propagating stress waves and advancing crack fronts, while coarser meshes can be used in the remainder of the domain. This need for a constantly evolving discretization poses several challenges, especially when the simulation is performed(More)
In this work we present a methodology for intelligent path planning in an uncertain environment. Examples would include a mobile robot exploring an unknown terrain or a UAV navigating enemy territory while avoiding radar detection. We show that the problem of path planning in an uncertain environment, under certain assumptions, can be posed as the adaptive(More)
A novel multi-resolution algorithm is presented to solve the Fokker Planck equation (FPE) for general N-dimensional nonlinear systems while addressing the "curse of dimensionality". Numerical aspects of the extension of the proposed approach to high dimensional systems is discussed for the stationary FPE. The algorithm is validated against and compared with(More)
Traditional full-featured operating systems are known to have properties that limit the scalability of distributed memory parallel programs, the most common programming paradigm utilized in high end computing. Furthermore, as processor counts increase with the most capable systems, the necessary activity to manage the system becomes more of a burden. To(More)
The Finite Element Method framework allows the user to develop scalable parallel finite element applications easily. During initialization it reads in an input mesh and partitions it into a large number of chunks that are distributed among different processors. This partition process is sequential and memory intensive. Thus the partition algorithm is a(More)