Low-bit quantization and quantization-aware training for small-footprint keyword spotting

Yuriy Mishchenko; Yusuf Goren; Ming Sun; Chris Beauchene; Spyros Matsoukas; Oleg Rybakov; Shiv Naga Prasad Vitaladevuni

Publication

Low-bit quantization and quantization-aware training for small-footprint keyword spotting

By Yuriy Mishchenko, Yusuf Goren , Ming Sun, Chris Beauchene, Spyros Matsoukas, Oleg Rybakov, Shiv Naga Prasad Vitaladevuni

2019

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

In this paper, we investigate novel quantization approaches to reduce memory and computational footprint of deep neural network (DNN) based keyword spotters (KWS). We propose a new method for KWS offline and online quantization, which we call dynamic quantization, where we quantize DNN weight matrices column-wise, using each column’s exact individual min-max range, and the DNN layers’ inputs and outputs are quantized for every input audio frame individually, using the exact min-max range of each input and output vector. We further apply a new quantization-aware training approach that allows us to incorporate quantization errors into KWS model during training. Together, these approaches allow us to significantly improve the performance of KWS in 4-bit and 8-bit quantized precision, achieving the end-to-end accuracy close to that of full precision models while reducing the models’ on-device memory footprint by up to 80%.

Low-bit quantization and quantization-aware training for small-footprint keyword spotting

Latest news

Work with us