Tolerating Branch Predictor Latency on SMT

Abstract

Simultaneous Multithreading (SMT) tolerates latency by executing instructions from multiple threads. If a thread is stalled, resources can be used by other threads. However, fetch stall conditions caused by multi-cycle branch predictors prevent SMT to achieve all its potential performance, since the flow of fetched instructions is halted. This paper proposes and evaluates solutions to deal with the branch predictor delay on SMT. Our contribution is two-fold: we describe a decoupled implementation of the SMT fetch unit, and we propose an interthread pipelined branch predictor implementation. These techniques prove to be effective for tolerating the branch predictor access latency. keywords: SMT, branch predictor delay, decoupled fetch, predictor pipelining.

DOI: 10.1007/978-3-540-39707-6_7

Extracted Key Phrases

11 Figures and Tables

Cite this paper

@inproceedings{Falcn2003ToleratingBP, title={Tolerating Branch Predictor Latency on SMT}, author={Ayose Falc{\'o}n and Oliverio J. Santana and Alex Ram{\'i}rez and Mateo Valero}, booktitle={ISHPC}, year={2003} }