MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators

@article{Nasrin2021MFNetCS,
  title={MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators},
  author={Shamma Nasrin and Diaa Badawi and Ahmet Enis Cetin and Wilfred Gomes and Amit Ranjan Trivedi},
  journal={IEEE Transactions on Circuits and Systems I: Regular Papers},
  year={2021},
  volume={68},
  pages={1966-1978}
}
We propose a co-design approach for <italic>compute-in-memory</italic> inference for deep neural networks (DNN). We use multiplication-free function approximators based on <inline-formula> <tex-math notation="LaTeX">$\ell _{1}$ </tex-math></inline-formula> norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current <italic>art</italic> of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each… 

Figures and Tables from this paper

Block Walsh-Hadamard Transform Based Binary Layers in Deep Neural Networks
TLDR
This paper proposes to use binary block Walsh-Hadamard transform (WHT) instead of the Fourier transform to use WHT-based binary layers to replace some of the regular convolution layers in deep neural networks, and implements the WHT layers into MobileNet-V2, Mobile net-V3-Large, and ResNet to reduce the number of parameters significantly with negligible accuracy loss.
MC-CIM: Compute-in-Memory with Monte-Carlo Dropouts for Bayesian Edge Intelligence
TLDR
The framework reliably gives prediction confidence amidst non-idealities imposed by MC-CIM to a good extent and how the random instances can be optimally ordered to minimize the overall MC-Dropout workload by exploiting combinatorial optimization methods is discussed.
Multiplication-Avoiding Variant of Power Iteration with Applications
TLDR
MAPI replaces the standard `2 inner products that appear at the regular power iteration (RPI) with multiplication-free vector products, which are Mercer-type kernels that induce the `1 norm, which provides a significant reduction of the number of multiplication operations.
Detecting Anomaly in Chemical Sensors via L1-Kernels based Principal Component Analysis
TLDR
A new multiplication-free kernel, related to the l1-norm for the anomaly detection task, which is not only computationally efficient but also energy-efficient because it does not require any actual multiplications during the kernel covariance matrix computation.
ENOS: Energy-Aware Network Operator Search for Hybrid Digital and Compute-in-Memory DNN Accelerators
TLDR
The proposed ENOS framework allows an optimal layerwise integration of inference operators and computing modes to achieve the desired balance of energy and accuracy, and shows interesting insights, such as amenability of different filters to using low complexity operators, minimizing the energy of inference while maintaining high prediction accuracy.

References

SHOWING 1-10 OF 39 REFERENCES
CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks
TLDR
An energy-efficient static random access memory (SRAM) with embedded dot-product computation capability, for binary-weight convolutional neural networks, using a 10T bit-cell-based SRAM array to store the 1-b filter weights.
An In-Memory VLSI Architecture for Convolutional Neural Networks
TLDR
An energy-efficient and high throughput architecture for convolutional neural networks (CNN) employing a deep in-memory architecture, to embed energy- efficient low swing mixed-signal computations in the periphery of the SRAM bitcell array.
A 12.08-TOPS/W All-Digital Time-Domain CNN Engine Using Bi-Directional Memory Delay Lines for Energy Efficient Edge Computing
TLDR
An energy efficient convolutional neural network (CNN) engine by performing multiply-and-accumulate (MAC) operations in the time domain by employing a novel bi-directional memory delay line (MDL) unit to perform signed accumulation of input and weight products.
15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications
TLDR
Compute-in-memory parallelizes multiply-and-average (MAV) computations and reduces off-chip weight access to reduce energy consumption and latency and cannot meet the requirement for high-precision operations and scalability for large neural networks.
Supported-BinaryNet: Bitcell Array-Based Weight Supports for Dynamic Accuracy-Energy Trade-Offs in SRAM-Based Binarized Neural Network
TLDR
This work introduces bitcell array-based support parameters to improve the prediction accuracy of SRAM-based binarized neural network (SRAM-BNN) and proposes a dynamic drop out of support parameters, which also reduces the processing energy of the in-SRAM weight-input product.
In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array
TLDR
A machine-learning classifier where computations are performed in a standard 6T SRAM array, which stores the machine- learning model, and a training algorithm enables a strong classifier through boosting and also overcomes circuit nonidealities, by combining multiple columns.
15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips
TLDR
This work develops a two-way transpose (TWT) SRAM-CIM macro supporting multibit MAC operations for FWD and BWD propagation with fast rAC and high EF within a compact area.
C3SRAM: An In-Memory-Computing SRAM Macro Based on Robust Capacitive Coupling Computing Mechanism
TLDR
The macro is an SRAM module with the circuits embedded in bitcells and peripherals to perform hardware acceleration for neural networks with binarized weights and activations and utilizes analog-mixed-signal capacitive-coupling computing to evaluate the main computations of binary neural networks, binary-multiply-and-accumulate operations.
14.3 A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse
TLDR
For a high compression rate and high efficiency, the granularity of sparsity needs to be explored based on CIM characteristics, and system-level weight mapping to a CIM Macro and data-reuse strategies are not well explored - these directions are important for CIM macro utilization and energy efficiency.
XNOR-SRAM: In-Bitcell Computing SRAM Macro based on Resistive Computing Mechanism
TLDR
The memory macro computes XNOR-and-accumulate for binary/ternary deep convolutional neural networks on the bitline without row-by-row data access and achieves high accuracy in machine learning tasks.
...
1
2
3
4
...