Learn More
Interpreters designed for high general-purpose performance typically perform a large number of indirect branches (3.2%–13% of all executed instructions in our benchmarks). These branches consume more than half of the run-time in a number of configurations we simulated. We evaluate how accurate various existing and proposed branch prediction schemes are on a(More)
Romer et al (ASPLOS 96) examined several interpreters and concluded that they behave much like general purpose integer programs such as gcc. We show that there is an important class of interpreters which behave very differently. Efficient virtual machine interpreters perform a large number of indirect branches (3.2%–13% of all executed instructions in our(More)
Sparse matrix by vector multiplication (SMV) is a key operation of many scientific and engineering applications. Field programmable gate arrays (FPGAs) have the potential to significantly improve the performance of computationally intensive applications which are dominated by SMV. A shortcoming of most existing FPGA SMV implementations is that they use(More)
JIT compilers produce fast code, whereas interpreters are easy to port between architectures. We propose to combine the advantages of these language implementation techniques as follows: we generate native code by concatenating and patching machine code fragments taken from interpreter-derived code (generated by a C compiler); we completely eliminate the(More)
In a virtual machine interpreter, the code for each virtual machine instruction has similarities to code for other instructions. We present an interpreter generator that takes simple virtual machine instruction descriptions as input and generates C code for processing the instructions in several ways: execution, virtual machine code generation, disassembly,(More)
The recent development of large FPGAs along with the availability of a variety of floating point cores have made it possible to implement high-performance matrix and vector kernel operations on FPGAs. In this paper we seek to evaluate the performance of FPGAs for real scientific computations by implementing Lattice QCD, one of the classic scientific(More)
Sorting is one of the most important and well-studied problems in computer science. Many good algorithms are known which offer various trade-offs in efficiency, simplicity, memory use, and other factors. However, these algorithms do not take into account features of modern computer architectures that significantly influence performance. Caches and branch(More)