alto: a link-time optimizer for the Compaq Alpha

  title={alto: a link-time optimizer for the Compaq Alpha},
  author={Robert Muth and Saumya K. Debray and Scott A. Watterson and Koen De Bosschere},
  journal={Softw. Pract. Exp.},
Traditional optimizing compilers are limited in the scope o f their optimizations by the fact that only a single function, or possibly a single module, is available for anal ysis and optimization. In particular, this means that library routines cannot be optimized to specific calling con texts. Other optimization opportunities, exploiting information not available before linktime such as addresse s of variables and the final code layout, are often ignored because linkers are traditionally… 
An infrastructure for adaptive dynamic optimization
This work provides an interface for building external modules, or clients, for the DynamoRIO dynamic code modification system by restricting optimization units to linear streams of code and using adaptive levels of detail for representing instructions.
Using de-optimization to re-optimize code
This paper presents an extension of the VISTA framework for investigating the effect and potential benefit of performing de-optimization before re-optimizing assembly code, and the design and implementation of algorithms for de- optimization of both loop-invariant code motion and register allocation.
Mojo: A Dynamic Optimization System
This paper describes work that has been accomplished over the past several months at Microsoft Research to design and develop a dynamic software optimization system called Mojo, and presents implementation details for the x86 architecture -- Mojo's initial target.
PLTO: A Link-Time Optimizer for the Intel IA-32 Architecture
This paper describes PLTO, a link-time instrumentation and optimization tool developed for the Intel IA-32 architecture, and how it addresses problems of this architecture and the resulting performance improvements it is able to achieve.
Link-time compaction and optimization of ARM executables
This paper discusses how the peculiarities of the ARM architecture related to its visible program counter can be dealt with and how the introduced overhead can to a large extent be eliminated and shows how the incorporation of link-time optimization in tool chains may influence library interface design.
Code and data outlining
This dissertation investigates compiler techniques to address the performance problems caused by heterogeneous execution frequency of code in the same function and heterogeneous access pattern of fields in thesame data structure and uses data outlining or reshaping, which splits large data structures into smaller ones, to improve the efficiency of data cache.
Dynamic native optimization of interpreters
This paper presents an innovative approach that dynamically removes much of the interpreted overhead from language implementations, with minimal instrumentation of the original interpreter.
Speculative alias analysis for executable code
  • Manel Fernández, R. Espasa
  • Computer Science
    Proceedings.International Conference on Parallel Architectures and Compilation Techniques
  • 2002
Experimental results indicate that introducing speculation at analysis-time is clearly beneficial: precision increases up to 83% in average, against a baseline precision of 16%, which shows that the technique can be used even for scenarios where speculation recovery is expensive.
Optimizing large applications
The goal of the thesis is to analyse all existing techniques of optimization, evaluate their efficiency and design new solutions based on the link-time optimization platform.
A first look at the interplay of code reordering and configurable caches
This work explores for the first time the interplay of two popular instruction cache optimization techniques: the long-known technique of code reordering and the relatively-new technique of cache configuration.


Link-time optimization of address calculation on a 64-bit architecture
This paper has used its link-time code modification system OM to perform program transformations related to global address use on the Alpha AXP, and describes the optimizations performed and shows their effects on program size and performance.
Scalable cross-module optimization
A framework for scalable CMO that provides large gains in performance on applications that contain millions of lines of code and is deployed in Hewlett-Packard's UNIX compiler products and speeds up shipped independent software vendors' applications by as much as 71%.
Simple and effective link-time optimization of Modula-3 programs
Optimization techniques are implemented in mld, a retargetable linker for the MIPS, SPARC, and Intel 486, mld links a machine-independent intermediate code that is suitable for link-time optimization and code generation.
A general approach for run-time specialization and its application to C
This paper describes a general approach to run-time specialization that automatically produces source templates at compile time, and transforms them so that they can be processed by a standard compiler, and is efficient, as shown by the implementation for the C language.
A practical system fljr intermodule code optimization at link-time
A system that takes a collection of object modules constituting the entire program, and converts the object code into a symbolic Register Transfer Language form that is then transformed by intermodule optimization and finally converted back into object form to explore the problem of code optimization at link-time.
Effectiveness of a machine-level, global optimizer
We present an overview of the design of a machine-code-level, global (intraprocedural) optimizer that supports several front-ends producing code for the Hewlett-Packard Precision Architecture family
Vortex: an optimizing compiler for object-oriented languages
The Vortex compiler infrastructure is developed, a language-independent optimizing compiler for object-oriented languages, with front-ends for Cecil, C++, Java, and Modula-3, and the results of experiments assessing the effectiveness of different combinations of optimizations on sizable applications across these four languages are reported.
Efficient context-sensitive pointer analysis for C programs
An efficient technique for context-sensitive pointer analysis that is applicable to real C programs and based on a low-level representation of memory locations that safely handles all the features of C.
Interprocedural compilation of Fortran D for MIMD distributed-memory machines
The authors present interprocedural analysis, optimization, and code generation algorithms for Fortran D that limit compilation to only one pass over each procedure, showing that interprocesural optimization is crucial in achieving acceptable performance for a common application.
Interprocedural dataflow analysis in an executable optimizer
The results show that the compact representation allows Spike to compute interprocedural dataflow information in less than 2 seconds for each of the SPEC95 integer benchmarks, even for the largest PC application containing over 1.7 million instructions in 340 thousand basic blocks, which requires just 12 seconds.