Register allocation for free: The C machine stack cache

  title={Register allocation for free: The C machine stack cache},
  author={David R. Ditzel and Hubert R. McLellan},
  booktitle={ASPLOS I},
The Bell Labs C Machine project is investigating computer architectures to support the C programming language.1 One of the goals is to match an efficient architecture to the language and the compiler technology available. Measurements of different C programs show that roughly one out of every twenty instructions executed is either a procedure call or return.2 Procedure call overhead is therefore a very important consideration in the overall machine design. A second and related area of primary… 

Tables from this paper

Flexible register management for sequential programs

  • D. QuammenD. Miller
  • Computer Science
    [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture
  • 1991
In this paper, a register hardware organization called fhreoded windows or f-windows, which is being developed by the authors to enhance the performance of concurrent systems, is evaluated for sequential programs.

Exploiting large register sets

Compiler and microarchitecture mechanisms for exploiting registers to improve memory performance

This dissertation introduces a new compiler optimization called speculative register promotion and a new hardware structure called the store load address table to address the growing gap between memory and processor speed and the large number of memory operations present in typical programs.

A simple interprocedural register allocation algorithm and its effectiveness for LISP

The evaluation considers the scheme's limitations and compares these “software register windows” against the hardware register windows used in the Berkeley RISC and SPUR processors.

Implementation of Stack-Based Languages on Register Machines

The basic optimizations explored in this thesis are: Caching the frequentlyaccessed top-of-stack items in registers reduces stack access overhead, and combining stack-pointer updates eliminates most of them.

Design and Applications of a Virtual Context Architecture

A new register-file architecture that virtualizes logical register contexts that achieves a 10% increase in performance over the baseline architecture even with fewer physical than logical registers while also reducing data cache bandwidth is proposed.

Virtualizing register context

This dissertation introduces the virtual context architecture, a new architecture that virtualizes logical register contexts that enables support for both register windows and simultaneous multithreading without increasing the size of the register file, increasing the performance by 50% over a single thread and 30%over a conventional multithreaded architecture.

Spills , Fills , and Kills An Architecture for Reducing Register-Memory Traffic

This work removes compiler memory references by augmenting a conventional architecture with a spill name space and separate spill, fill, and kill instructions to access this space, facilitating compiler name generation while allowing the hardware to take advantage of two key properties of compiler references.

Stack caching for interpreters

  • M. Ertl
  • Computer Science
    PLDI '95
  • 1995
This paper explores two methods to reduce this overhead for virtual stack machines by caching top-of-stack values in (real machine) registers by using a dynamic or a static method.

The store-load address table and speculative register promotion

  • M. PostiffD. GreeneT. Mudge
  • Computer Science
    Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000
  • 2000
A new hardware structure, the store-load address table (SLAT), which watches both load and store instructions to see if they conflict with entries loaded into the SLAT by explicit software mapping instructions to allow values to be promoted to registers when they cannot be proven to be promotable by conventional compiler analysis.



A 32-bit processor design

This paper describes a user-level instruction set for a 32-bit processor which seems exceptionally attractive to at least one software person (the author).

How to Use 1000 Registers

A spectrum of ways to exploit more registers in an architecture is discussed, ranging from programmer-managed cache (large numbers of explicitly-addressed registers, as in the Cray-1) to better schemes for automatically- managed cache.

The c language calling sequence

This document sets forth the issues involved in designing a calling sequence for the C language, and discusses experience with various environments, and presents some sample designs.

An architecture with many operand registers to efficiently execute block-structured languages

Simulation statistics for a machine with many registers and a conventional architecture indicate that the average operand access time and the required memory bandwidth of conventional machines can be significantly reduced.

Design of a user-microprogrammable building block

A user-microprogrammable computer has been developed for use as a building block in general-purpose and dedicated computer systems. The architecture is designed to be easily microprogrammed and

The 801 minicomputer

An overview of an experimental system developed at the IBM T. J. Watson Research Center that consists of a running hardware prototype, a control program and an optimizing compiler, which features a primitive instruction set which can be completely hard-wired.

A Reduced Instruction Set VLSI Computer," Proceedings of 8th Symposium on Computer Architecture, pp

  • 443-457
  • 1981

Measurements of C Program Stack Depth Unpublished Memorandum

  • Measurements of C Program Stack Depth Unpublished Memorandum
  • 1981