Ternary Neural Networks with Fine-Grained Quantization


We propose a novel fine-grained quantization method for ternarizing pre-trained full precision models, while also constraining activations to 8-bits. Using this method, we demonstrate minimal loss in classification accuracy on state-of-the-art topologies without additional training. This enables a full 8-bit inference pipeline, with best reported accuracy… (More)


6 Figures and Tables

Slides referencing similar topics