Improving the accuracy of dynamic branch prediction using branch correlation

@inproceedings{Pan1992ImprovingTA,
  title={Improving the accuracy of dynamic branch prediction using branch correlation},
  author={Shien-Tai Pan and Kimming So and Joseph T. Rahmeh},
  booktitle={ASPLOS V},
  year={1992}
}
Long branch delay is a well–known problem in today’s high performance superscalar and supetpipeline processor designs. A common technique used to alleviate this problem is to predict the direction of branches during the instruction fetch. Counter-based branch prediction, in particular, has been reported as an effective scheme for predicting the direction of branches. However, its accuracy is generally limited by branches whose future behavior is also dependent upon the history of other branches… 

Static correlated branch prediction

This article shows that it is possible to determine automatically an appropriate trade-off between code expansion and branch predictability so that the transformation improves the performance of multiple-issue, deeply pipelined microprocessors like those being built today.

Comparison of Branch History and Branch Correlated Prediction Techniques

Conditional branch instructions are recognized as major impediments for high performance processors. These branches are control dependencies that cause stalls in a processor's execution and keep the

Performance issues in correlated branch prediction schemes

This work finds that the application of profile-driven code layout and branch alignment techniques (without SCBP) can improve the performance of the dynamic correlated branch prediction techniques.

Elastic history buffer: a low-cost method to improve branch prediction accuracy

The Elastic History Buffer is presented, a low-cost yet effective scheme that can exploit the property that each branch instruction may have a different degree of correlation with other branches, while keeping the simple structure of a single global branch history.

Analyzing the working set characteristics of branch execution

  • Sangwook P. KimG. Tyson
  • Computer Science
    Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture
  • 1998
A new profile-based conditional branch analysis technique called branch working set analysis is presented to provide additional information about control flow behavior of general purpose applications and its prediction accuracy is improved by 16%-comparable with the performance of a BHT of infinite capacity.

Novel Branch Prediction Strategy based on Adaptive History Length for High-Performance Microprocessor

In this paper, a new method called Instruction Address alloyed History Length Modification branch predictor is proposed to handle the useless history bits.

Classification-directed branch predictor design

A hybrid branch predictor is constructed which achieves a higher prediction accuracy than any branch predictor previously reported in the literature and significantly reduces the branch execution penalty suffered by wide-issue processors.

Techniques for improving efficiency and accuracy of contemporary dynamic branch predictors

A scalable per-address (SPA) predictor is proposed that leverages value locality in the history of branch outcomes to reduce the size of pattern history table (PHT) by about 50%, while maintaining high prediction accuracy, and alternative designs of SPA predictor are proposed to reduce internal conflicts within the PHT and improve its prediction accuracy.

Compiler synthesized dynamic branch prediction

  • S. MahlkeB. Natarajan
  • Computer Science, Business
    Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29
  • 1996
A novel technique is proposed that the compiler use profile feedback to define a prediction function for each branch and insert a few explicit instructions per branch into the compiled code to compute the prediction function.

Skewed Branch Predictors

Through both analytical and simulation models, it is shown that the skewed branch predictor removes a substantial portion of conflict aliasing by introducing redundancy to the branch-predictor tables, which increases capacity aliasing compared to a standard one-bank structure of comparable size.
...

References

SHOWING 1-9 OF 9 REFERENCES

Correlation-based branch prediction

A correlation-based branch prediction scheme which takes into consideration the information provided by the outcomes of other branches as well is proposed and shows that up to an addition of 11% prediction accuracy is achievable by the new scheme.

Comparing Software And Hardware Schemes For Reducing The Cost Of Branches

  • W. HwuT. ConteP. Chang
  • Computer Science
    The 16th Annual International Symposium on Computer Architecture
  • 1989
Three schemes to reduce the cost of branches are presented in the context of a general pipeline model to increase throughput of the instruction fetch, instruction decode, and instruction execution portions of modern computers.

Limits of instruction-level parallelism

The results of simulations of 18 different test programs under 375 different models of available parallelism analysis are presented, showing how simulations based on instruction traces can model techniques at the limits of feasibility and even beyond.

Reducing the cost of branches

A range of schemes for reducing branch cost focusing on both static (compile-time) and dynamic (hardware-assisted) prediction of branches are examined, from quantitative performance and implementation viewpoints.

A superpipeline approach to the MIPS architecture

There are no instruction issue restrictions with the R4000 superpipeline, as there would have been in a superscalar implementation, and various combinations of independent instructions, including ALU (arithmetic logic unit)/ALU and load/load can be executed without any pipeline stalls.

The Gmicro/100 32-bit microprocessor

The prejump mechanism, implemented as a hardware solution for the jump problem, executes benchmark programs 16.8% faster on the average and Optimized microinstructions permit bitmap-manipulation instructions to perform two to five times faster than the software loops.

Machine Organization of the IBM RISC System/6000 Processor

The IBM RISC System/6000 processor is a second-generation RISC processor which reduces the execution pipeline penalties caused by branch instructions and also provides high floating-point

Two-level adaptive training branch prediction

A new dynamic branch predictor is proposed, the Two-Level Adaptive Paining scheme, which alters the branch prediction algorithm on the basis of information collected at run-time, which represents more than a 100 percent improvement in reducing the number of pipeline hushes required.