Per Larsson-Edefors

Learn More
We propose an accurate architecture-level power estimation method for SRAM memories. This hybrid method is composed of an analytical part for dynamic power estimation and a circuit-simulation backend used to obtain static leakage power values of all basic memory components. The method is flexible in that memory size is an arbitrary parameter. In a(More)
We introduce FlexCore, the first exemplar of an architecture based on the FlexSoC framework. Comprising the same datapath units found in a conventional five-stage pipeline, the FlexCore has an exposed datapath control and a flexible interconnect to allow the datapath to be dynamically reconfigured as a consequence of code generation. Additionally, the(More)
— A twin-precision multiplier that uses reconfigurable power gating is presented. Employing power cutoff techniques in independently controlled power-gating regions, yields significant static leakage reductions when half-precision multiplications are carried out. In comparison to a conventional 8-bit tree multi-plier, the power overhead of a 16-bit(More)
In this paper, we present a new technique which indirectly separates and extracts the total short-circuit power consumption of digital CMOS circuits. We avoid a direct encounter with the complex behavior of the short-circuit currents. Instead, we separate the dynamic power consumption from the total power and extract the total short-circuit power. The(More)
— We investigate the effects of introducing a flexible interconnect into an exposed datapath. We define an exposed datapath as a traditional GPP datapath that has its normal control removed, leading to the exposure of a wide control word. For an FFT benchmark, the introduction of a flexible interconnect reduces the total execution time by 16%. Compared to a(More)
A novel partial-product reduction circuit for use in integer in the reduction tree. However, first we need to select a multiplier to multiplication is presented. The High-Performance Multiplier (HPM) start from, our model multiplier. In principle, we can select any of the reduction tree has the ease of layout of a simple carry-save reduction three(More)
—Level-1 (L1) cache memories are complex circuits that tightly integrate memory, logic, and state machines near the processor datapath. During the design of a processor-based system, many different cache configurations that vary in, for example, size, associativity, and replacement policies, need to be evaluated in order to maximize performance or power(More)