# The Implementation and Testing of Time-Minimal and Resource-Optimal Parallel Reversal Schedules

@inproceedings{Lehmann2002TheIA, title={The Implementation and Testing of Time-Minimal and Resource-Optimal Parallel Reversal Schedules}, author={U. Lehmann and A. Walther}, booktitle={International Conference on Computational Science}, year={2002} }

For computational purposes such as the computation of adjoint, applying the reverse mode of automatic differentiation, or debugging one may require the values computed during the evaluation of a function in reverse order. The naive approach is to store all information needed for the reversal and to read this information backwards during the reversal. This technique leads to an enormous memory requirement, which is proportional to the computing time. The paper presents an approach to reducing… Expand

#### Figures and Topics from this paper

#### 10 Citations

Bounding the Number of Processors and Checkpoints Needed in Time-minimal Parallel Reversal Schedules

- Computer Science
- Computing
- 2004

The structure of such parallel reversal schedules that use the checkpointing technique on a multi-processor machine are described and they are shown to require the least number of processors and memory locations to store checkpoints given a certain number of time steps. Expand

Parallel reversal schedules using more checkpoints than processors

- Computer Science
- 2015

This diploma thesis is an attempt to continue the research by relaxing the central assumption, such that memory for a large number of plain checkpoints can be used with a comparatively small number of processors. Expand

A-revolve: an adaptive memory-reduced procedure for calculating adjoints; with an application to computing adjoints of the instationary Navier–Stokes system

- Mathematics, Computer Science
- Optim. Methods Softw.
- 2005

A low-storage and low-run-time approach for calculating numerical approximations of adjoint equations for the instationary Navier–Stokes equations with adaptive evaluation of the discretization step uses adaptive checkpointing. Expand

Schedules for dynamic bidirectional simulations on parallel computers

- Computer Science
- 2003

The author says he is indebted to Rachel Lichten, John Shaw and Ellen Smith who helped him to bring this thesis into shape and his parents very much for all of the lifelong support and help they have given him. Expand

Adjoint Algorithmic Differentiation Tool Support for Typical Numerical Patterns in Computational Finance

- Computer Science
- 2018

The flexibility and ease of use of C++ algorithmic differentiation (AD) tools based on overloading to numerical patterns (kernels) arising in computational finance are demonstrated. Expand

Algorithmic Differentiation of Numerical Methods : Tangent-Linear and Adjoint Solvers for Systems of Nonlinear Equations

- 2012

We discuss software tool support for the Algorithmic Differentiation (also known as Automatic Differentiation; AD) of numerical simulation programs that contain calls to solvers for parameterized… Expand

Algorithmic Differentiation of Numerical Methods: Tangent and Adjoint Solvers for Parameterized Systems of Nonlinear Equations

- Mathematics, Computer Science
- TOMS
- 2015

The algorithmic formalism is developed building on prior work by other colleagues and an implementation based on the AD software dco/c++ is presented, which supports the theoretically obtained computational complexity results with practical runtime measurements. Expand

Separating language dependent and independent tasks for the semantic transformation of numerical programs

- Computer Science
- IASTED Conf. on Software Engineering and Applications
- 2004

Adjoint Calculation Using Time-Minimal Program Reversals for Multi-Processor Machines

- Computer Science, Mathematics
- System Modelling and Optimization
- 2001

A new approach to reversing program executions that runs the forward simulation and the reversal process at the same speed and illustrates the principle structure of time-minimal parallel reversal schedules and quotes the required resources. Expand

#### References

SHOWING 1-10 OF 89 REFERENCES

The Tera computer system

- Computer Science
- 1990

The Tera architecture was designed with several goals in mind; it needed to be suitable for very high speed implementations, i. Expand

Transactional Memory: Architectural Support For Lock-free Data Structures

- Computer Science
- Proceedings of the 20th Annual International Symposium on Computer Architecture
- 1993

Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock. Expand

Synchronization and communication in the T3E multiprocessor

- Computer Science
- ASPLOS VII
- 1996

The T3E augments the memory interface of the DEC 21164 microprocessor with a large set of explicitly-managed, external registers (E-registers), which provide a rich set of atomic memory operations and a flexible, user-level messaging facility. Expand

The SPLASH-2 programs: characterization and methodological considerations

- Computer Science
- ISCA
- 1995

This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality. Expand

A "flight data recorder" for enabling full-system multiprocessor deterministic replay

- Computer Science
- ISCA '03
- 2003

A practical low-overhead hardware recorder for cachecoherent multiprocessors, called Flight Data Recorder (FDR), which like an aircraft flight data recorder continuously records the execution, even on deployed systems, logging the execution for post-mortem analysis. Expand

Design, implementation and testing of extended and mixed precision BLAS

- Computer Science
- TOMS
- 2002

The design rationale, a C implementation, and conformance testing of a subset of the new Standard for the BLAS (Basic Linear Algebra Subroutines): Extended and Mixed Precision BLAS are described, which achieves excellent performance. Expand

Transactional Memory Coherence and Consistency ( TCC )

- 2004

The Transactional memory Coherence and Consistency (TCC) provides a shared memory model in which atomic transactions are always the basic unit of parallel work, communication, memory coherence, and… Expand

MPI: The Complete Reference

- Computer Science
- 1996

MPI: The Complete Reference is an annotated manual for the latest 1.1 version of the standard that illuminates the more advanced and subtle features of MPI and covers such advanced issues in parallel computing and programming as true portability, deadlock, high-performance message passing, and libraries for distributed and parallel computing. Expand

Superoptimizer: a look at the smallest program

- Computer Science
- ASPLOS 1987
- 1987

Given an instruction set, the superoptimizer finds the shortest program to compute a function, a probabilistic test that makes exhaustive searches practical for programs of useful size. Expand

Automated Task Allocation for Network Processors

- 2004

Network processors have great potential to combine high performance with increased flexibility. These multiprocessor systems consist of programmable elements, dedicated logic, and specialized memory… Expand