Tinoosh Mohsenin

Learn More
A 167-processor computational platform consists of an array of simple programmable processors capable of per-processor dynamic supply voltage and clock frequency scaling, three algorithm-specific processors, and three 16 KB shared memories; and is implemented in 65 nm CMOS. All processors and shared memories are clocked by local fully independent,(More)
A low-complexity message-passing algorithm, called Split-Row Threshold, is used to implement LDPC decoders with reduced layout routing congestion. Five LDPC decoders compatible with the 10GBASE-T standard are implemented using MinSum Normalized and MinSum Split-Row Threshold algorithms. All decoders are built using a standard cell design flow and include(More)
This paper presents a high performance architecture for the reconstruction of compressive sampled signals using Orthogonal Matching Pursuit (OMP) algorithm. Q-R decomposition (QRD) process is used for the matrix inverse core and a new algorithm for finding fast inverse square root of a fixed point number is also implemented to support the QRD process. The(More)
A 167-processor 65 nm computational platform well suited for DSP, communication, and multimedia workloads contains 164 programmable processors with dynamic supply voltage and dynamic clock frequency circuits, three algorithm-specific processors, and three 16 KB shared memories, all clocked by independent oscillators and connected by configurable(More)
The Asynchronous Array of Simple Processors (AsAP) uses processor cores with small instruction and data memories to dramatically reduce area and power while increasing performance. Fig. 23.6.1 shows the architecture of an individual AsAP processor and the 6×6 array contained on the chip. Data enters the array through the upper left processor and exits from(More)
An efficient technique for early detection of undecodable blocks during LDPC decoding is introduced. The proposed method avoids unnecessary decoding iterations by predicting decoding failure and therefore results in significant improvement in power and latency in low SNR values. The proposed method which has a low hardware overhead compares the parity(More)
An array of simple programmable processors is implemented in 0.18 m CMOS and contains 36 asynchronously clocked independent processors. Each processor occupies 0.66 mm and is fully functional at a clock rate of 520–540 MHz at 1.8 V and over 600 MHz at 2.0 V. Processors dissipate an average of 32 mW under typical conditions at 1.8 V and 475 MHz, and 2.4 mW(More)
Ubiquitous bio-sensing for personalized health monitoring is slowly becoming a reality with the increasing availability of small, diverse, robust, high fidelity sensors. This oncoming flood of data begs the question of how we will extract useful information from it. In this paper we explore the use of a variety of representations and machine learning(More)