LoWino: Towards efficient low-precision Winograd convolutions on modern CPUs
Low-precision computation, which has been widely supported in contemporary hardware, is considered as one of the most effective methods to accelerate convolutional neural networks. However, low-precision computation is not widely used to speed up Winograd, an algorithm for fast convolution computation, due to the numerical error introduced by combining Winograd transformation and quantization. In this paper, we propose a low-precision Winograd convolution approach, LoWino, based on post-training quantization, which employs a linear quantization method in the Winograd domain to reduce the precision loss caused by transformations. Moreover, we present an efficient implementation that integrates well-designed optimization techniques, thereby adequately exploiting the capability of low-precision computation on modern CPUs. We evaluate our approach on Intel Xeon Scalable Processors by leveraging representative convolutional layers in prevailing deep neural networks. Experimental results show that LoWino achieves up to 2.04× speedup over state-of-the-art implementations in the vendor library while maintaining the accuracy at a reasonable level.