LoWino: Towards efficient low-precision Winograd convolutions on modern CPUs

Guangli Li; Zhen Jia; Xiaobing Feng; Yida Wang

Publication

LoWino: Towards efficient low-precision Winograd convolutions on modern CPUs

By Guangli Li, Zhen Jia, Xiaobing Feng, Yida Wang

2021

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Low-precision computation, which has been widely supported in contemporary hardware, is considered as one of the most effective methods to accelerate convolutional neural networks. However, low-precision computation is not widely used to speed up Winograd, an algorithm for fast convolution computation, due to the numerical error introduced by combining Winograd transformation and quantization. In this paper, we propose a low-precision Winograd convolution approach, LoWino, based on post-training quantization, which employs a linear quantization method in the Winograd domain to reduce the precision loss caused by transformations. Moreover, we present an efficient implementation that integrates well-designed optimization techniques, thereby adequately exploiting the capability of low-precision computation on modern CPUs. We evaluate our approach on Intel Xeon Scalable Processors by leveraging representative convolutional layers in prevailing deep neural networks. Experimental results show that LoWino achieves up to 2.04× speedup over state-of-the-art implementations in the vendor library while maintaining the accuracy at a reasonable level.

LoWino: Towards efficient low-precision Winograd convolutions on modern CPUs

Latest news

Work with us