• Publications
  • Influence
Partitioning: An Essential Step in Mapping Algorithms Into Systolic Array Processors
TLDR
The simplicity, modularity, and expandability of SAPs make them suitable for VLSI/WSI implementation.
Reducing TLB power requirements
TLDR
It is concluded that for small TLBs (high miss rates) fully-associative TLBs consume less power but for larger TLB (low miss rates), set-associate TLBs are better and the proposed modifications produce significant reductions in power consumption.
Dynamic history-length fitting: a third level of adaptivity for branch prediction
TLDR
A method is introduced that dynamically determines the optimum history length during execution, adapting to the specific requirements of any code, input data and system workload, adding an extra level of adaptivity to two-level adaptive branch predictors.
CC-Radix: a cache conscious sorting based on Radix sort
TLDR
CC-Radix improves the data locality by dynamically partitioning the data set into subsets that fit in cache level L/sub 2/.
Global hyperbolicity is stable in the interval topology
We prove that global hyperbolicity is stable in the interval topology on the spacetime metrics. We also prove that every globally hyperbolic spacetime admits a Cauchy hypersurface which remains
Data caches for superscalar processors
TLDR
The purpose of this study is to examine the data cache bandwidth requirements of high-degree superscalar processors, and investigate alternative cache designs, ranging from classic solutions like multi-banked caches to more complex solutions recently proposed in the literature.
MOB forms: a class of multilevel block algorithms for dense linear algebra operations
TLDR
It is shown that the family the authors call Multilevel Orthogonal Block (MOB) algorithms is optimal and easy to design and that using the multilevel approach produces significant performance improvements.
The Difference-Bit Cache
The difference-bit cache is a two-way set-associative cache with an access time that is smaller than that of a conventional one and close or equal to that of a direct-mapped cache. This is achieved
Improving Performance of Hypermatrix Cholesky Factorization
TLDR
This paper shows how a sparse hypermatrix Cholesky factorization of sparse matrices can be improved by means of efficient codes which operate on very small dense matrices by reducing the block size when those routines are used.
...
...