Bogdan Spinean

Learn More
In this paper we extend a custom FFT vector architecture by adding multiple lane capabilities and study its hardware implementation. We use the six step algorithm to segment a long transform of size N = Z × L into L smaller transforms of size Z. We split the data into pairs of vector registers (for the real and imaginary part), containing Z elements.(More)
Processors and memory systems suffer from a growing performance gap between them. Each technology generation increases the on-chip performance capabilities however, memory bandwidth increases at a much slower pace. Therefore, overall performance improvements are constrained by the available memory bandwidth. In this paper, we address the memory bandwidth(More)
In many core systems with shared DRAM memory a clear performance dissbalance exists between the requirements of the processors and the bandwidth that the memory system can provide. Very often the utilization of the memory interface is poor even for well understood and regular workloads. In this paper we propose a method to reorder the in-flight requests by(More)
— This paper presents a methodology for synthesizing customized vector ISAs for various application domains targeting high performance execution. A number of applications from the telecommunication and linear algebra domains have been studied, and custom vector instructions sets have been synthesized. Three algorithms that compute the shortest paths in a(More)
— In this article, we analyze the speedup potentials of media and signal processing software on vector processors. We evaluate the impact on performance of several design decisions such as the vector register length, memory latency, memory bandwidth and the number of parallel lanes in the datapath. To quantify the influence of the aforementioned design(More)
  • 1