Per Larsson-Edefors

Learn More
We introduce FlexCore, the first exemplar of an architecture based on the FlexSoC framework. Comprising the same datapath units found in a conventional five-stage pipeline, the FlexCore has an exposed datapath control and a flexible interconnect to allow the datapath to be dynamically reconfigured as a consequence of code generation. Additionally, the(More)
— A twin-precision multiplier that uses reconfigurable power gating is presented. Employing power cutoff techniques in independently controlled power-gating regions, yields significant static leakage reductions when half-precision multiplications are carried out. In comparison to a conventional 8-bit tree multi-plier, the power overhead of a 16-bit(More)
A novel partial-product reduction circuit for use in integer in the reduction tree. However, first we need to select a multiplier to multiplication is presented. The High-Performance Multiplier (HPM) start from, our model multiplier. In principle, we can select any of the reduction tree has the ease of layout of a simple carry-save reduction three(More)
We propose an accurate architecture-level power estimation method for SRAM memories. This hybrid method is composed of an analytical part for dynamic power estimation and a circuit-simulation backend used to obtain static leakage power values of all basic memory components. The method is flexible in that memory size is an arbitrary parameter. In a(More)
In this paper, we present a new technique which indirectly separates and extracts the total short-circuit power consumption of digital CMOS circuits. We avoid a direct encounter with the complex behavior of the short-circuit currents. Instead, we separate the dynamic power consumption from the total power and extract the total short-circuit power. The(More)
— We investigate the effects of introducing a flexible interconnect into an exposed datapath. We define an exposed datapath as a traditional GPP datapath that has its normal control removed, leading to the exposure of a wide control word. For an FFT benchmark, the introduction of a flexible interconnect reduces the total execution time by 16%. Compared to a(More)
—A high-speed low-power cross-correlator ASIC has been implemented in a 65-nm CMOS process for the purpose of synthetic aperture radiometry from geostationary orbiting earth observation satellites. The chip performs cross-correlation on all individual signal pairs from 64 digital 1-bit inputs, which amounts to 2016 individual cross-correlation products. The(More)
— The modified-Booth algorithm is extensively used for high-speed multiplier circuits. Once, when array multipliers were used, the reduced number of generated partial products significantly improved multiplier performance. In designs based on reduction trees with logarithmic logic depth, however, the reduced number of partial products has a limited impact(More)