Abstract
Convolution has been the core operation of modern deep neural networks. It is well known that convolutions can be implemented in the Fourier Transform domain. In this article, we propose to use binary block Walsh–Hadamard transform (WHT) instead of the Fourier transform. We use WHT-based binary layers to replace some of the regular convolution layers in deep neural networks. We utilize both one-dimensional (1D) and 2D binary WHTs in this article. In both 1D and 2D layers, we compute the binary WHT of the input feature map and denoise the WHT domain coefficients using a nonlinearity that is obtained by combining soft-thresholding with the tanh function. After denoising, we compute the inverse WHT. We use 1D-WHT to replace the 1 × 1 convolutional layers, and 2D-WHT layers can replace the 3 × 3 convolution layers and Squeeze-and-Excite layers. 2D-WHT layers with trainable weights can be also inserted before the Global Average Pooling layers to assist the dense layers. In this way, we can reduce the number of trainable parameters significantly with a slight decrease in trainable parameters. In this article, we implement the WHT layers into MobileNet-V2, MobileNet-V3-Large, and ResNet to reduce the number of parameters significantly with negligible accuracy loss. Moreover, according to our speed test, the 2D-FWHT layer runs about 24 times as fast as the regular 3 × 3 convolution with 19.51% less RAM usage in an NVIDIA Jetson Nano experiment.
- [1] . 2012. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012), 1097–1105.Google Scholar
Digital Library
- [2] . 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556.Google Scholar
- [3] . 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google Scholar
Cross Ref
- [4] . 2017. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3156–3164.Google Scholar
Cross Ref
- [5] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [6] . 2016. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision. Springer, 630–645.Google Scholar
Cross Ref
- [7] . 2020. Computationally efficient spatio-temporal dynamic texture recognition for volatile organic compound (VOC) leakage detection in industrial plants. IEEE J. Select. Top. Sign. Process. 14, 4 (2020), 676–687.Google Scholar
Cross Ref
- [8] . 2021. CoroNet: A deep network architecture for enhanced identification of COVID-19 from chest x-ray images. In Medical Imaging 2021: Computer-Aided Diagnosis, Vol. 11597. International Society for Optics and Photonics, 1159722.Google Scholar
- [9] . 2020. A self-attentive emotion recognition network. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 7199–7203.Google Scholar
Cross Ref
- [10] . 2018. Designing adaptive neural networks for energy-constrained image classification. In Proceedings of the International Conference on Computer-Aided Design. 1–8.Google Scholar
Digital Library
- [11] . 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.Google Scholar
- [12] . 2020. Deep convolutional generative adversarial networks for flame detection in video. In Proceedings of the International Conference on Computational Collective Intelligence. Springer, 807–815.Google Scholar
Digital Library
- [13] . 2019. Pain detection from facial videos using two-stage deep learning. In Proceedings of the IEEE Global Conference on Signal and Information Processing (GlobalSIP’19). IEEE, 1–5.Google Scholar
Cross Ref
- [14] . 2019. Early wildfire smoke detection based on motion-based geometric image transformation and deep convolutional generative adversarial networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 8315–8319.Google Scholar
Cross Ref
- [15] . 2018. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’18). 325–341.Google Scholar
Digital Library
- [16] . 2019. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 603–612.Google Scholar
Cross Ref
- [17] . 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.Google Scholar
Cross Ref
- [18] . 2019. Fast-scnn: Fast semantic segmentation network. arXiv:1902.04502. Retrieved from https://arxiv.org/abs/1902.04502.Google Scholar
- [19] . 2019. Fast detection of traffic congestion from ultra-low frame rate image based on semantic segmentation. In Proceedings of the 14th IEEE Conference on Industrial Electronics and Applications (ICIEA’19). IEEE, 528–532.Google Scholar
Cross Ref
- [20] . 2021. Fast walsh-hadamard transform and smooth-thresholding based binary layers in deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4650–4659.Google Scholar
Cross Ref
- [21] . 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.Google Scholar
Cross Ref
- [22] . 2016. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4820–4828.Google Scholar
Cross Ref
- [23] . 2020. Robust and computationally-efficient anomaly detection using powers-of-two networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 2992–2996.Google Scholar
Cross Ref
- [24] . 2015. Compressing neural networks with the hashing trick. In Proceedings of the International Conference on Machine Learning. 2285–2294.Google Scholar
- [25] . 2020. Computationally efficient wildfire detection method using a deep convolutional network pruned via Fourier analysis. Sensors 20, 10 (2020), 2891.Google Scholar
Cross Ref
- [26] . 2018. Gradiveq: Vector quantization for bandwidth-efficient gradient aggregation in distributed cnn training. arXiv:1811.03617. Retrieved from https://arxiv.org/abs/1811.03617.Google Scholar
- [27] . 2019. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv:1510.00149. Retrieved from https://arxiv.org/abs/1510.00149.Google Scholar
- [28] . 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv:1602.07360. Retrieved from https://arxiv.org/abs/1602.07360.Google Scholar
- [29] . 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv:1602.02830. Retrieved from https://arxiv.org/abs/1602.02830.Google Scholar
- [30] . 2018. Hierarchical binary CNNs for landmark localization with limited resources. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2 (2018), 343–356.Google Scholar
Cross Ref
- [31] . 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525–542.Google Scholar
Cross Ref
- [32] . 2021. S2-BNN: Bridging the gap between self-supervised real and 1-bit neural networks via guided distribution calibration. arXiv:2102.08946. Retrieved from https://arxiv.org/abs/2102.08946.Google Scholar
- [33] . 2020. Reactnet: Towards precise binary neural network with generalized activation functions. In European Conference on Computer Vision. Springer, 143–159.Google Scholar
Digital Library
- [34] . 2020. Training binary neural networks with real-to-binary convolutions. arXiv:2003.11535. Retrieved from https://arxiv.org/abs/2003.11535.Google Scholar
- [35] . 2020. Bats: Binary architecture search. arXiv:2003.01711. Retrieved from https://arxiv.org/abs/2003.01711.Google Scholar
- [36] . 2016. Binarized neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 4114–4122.Google Scholar
Digital Library
- [37] . 2018. An empirical study of binary neural networks’ optimisation. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [38] . 2021. Larq compute engine: Design, benchmark and deploy state-of-the-art binarized neural networks. Proc. Mach. Learn. Syst. 3 (2021), 680–695.Google Scholar
- [39] . 2017. Local binary convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19–28.Google Scholar
Cross Ref
- [40] . 2017. Towards accurate binary convolutional neural network. Advances in Neural Information Processing Systems 30 (2017).Google Scholar
- [41] . 2019. Learning channel-wise interactions for binary convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 568–577.Google Scholar
Cross Ref
- [42] . 2019. Building efficient deep neural networks with unitary group convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11303–11312.Google Scholar
Cross Ref
- [43] . 2021. MF-Net: Compute-In-Memory SRAM for multibit precision inference using memory-immersed data conversion and multiplication-free operators. IEEE Trans. Circ. Syst. I: Regul. Pap. 68, 5 (2021), 1966–1978.Google Scholar
Cross Ref
- [44] . 2020. RMNv2: Reduced Mobilenet V2 for CIFAR10. In Proceedings of the 10th Annual Computing and Communication Workshop and Conference (CCWC’20). IEEE, 0287–0292.Google Scholar
Cross Ref
- [45] . 2019. Hetconv: Heterogeneous kernel-based convolutions for deep cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4835–4844.Google Scholar
Cross Ref
- [46] . 2018. Energy efficient hadamard neural networks. arXiv:1805.05421. Retrieved from https://arxiv.org/abs/1805.05421.Google Scholar
- [47] . 1993. Block wavelet transforms for image coding. IEEE Trans. Circ. Syst. Vid. Technol. 3, 6 (1993), 433–435.Google Scholar
Digital Library
- [48] . 1923. A closed set of normal orthogonal functions. Am. J. Math. 45, 1 (1923), 5–24.Google Scholar
Cross Ref
- [49] . 1976. Unified matrix treatment of the fast Walsh-Hadamard transform. IEEE Trans. Comput. 25, 11 (1976), 1142–1146.Google Scholar
Digital Library
- [50] . 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1492–1500.Google Scholar
Cross Ref
- [51] . 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google Scholar
Cross Ref
- [52] . 1999. ECG noise filtering using wavelets with soft-thresholding methods. In Computers in Cardiology 1999, Vol. 26 (Cat. No. 99CH37004). IEEE, 535–538.Google Scholar
- [53] . 1995. De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41, 3 (1995), 613–627.Google Scholar
Digital Library
- [54] . 2000. An overview of JPEG-2000. In Proceedings of the Data Compression Conference (DCC’00). IEEE, 523–541.Google Scholar
Cross Ref
- [55] . 2021. FNet: Mixing tokens with fourier transforms. arXiv:2105.03824. Retrieved from https://arxiv.org/abs/2105.03824.Google Scholar
- [56] Mobilenet in tensorflow’s official github. Retrieved March 1, 2021 https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet.Google Scholar
- [57] Transfer learning and fine-tuning. Retrieved March 1, 2021 from https://www.tensorflow.org/tutorials/images/transfer_learning.Google Scholar
- [58] . 2019. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1314–1324.Google Scholar
Cross Ref
- [59] . 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026–1034.Google Scholar
Digital Library
- [60] Block walsh-hadamard transform layer speed test code. Retrieved December, 31, 2021 from https://github.com/phy710/Block-Walsh-Hadamard-Transform-Layer-Speed-Test.Google Scholar
Index Terms
Block Walsh–Hadamard Transform-based Binary Layers in Deep Neural Networks
Recommendations
Lossless 2D discrete Walsh-Hadamard transform
ICASSP '01: Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 03The 64-point separable lossless two dimensional (2D) WHT is composed of the 8-point lossless one dimensional WHT. The latter is obtained by first decomposing the 8-point WHT into 2-point WHTs and then replacing every 2-point WHT by a ladder network. ...
Fast Walsh–Hadamard–Fourier Transform Algorithm
An efficient fast Walsh–Hadamard–Fourier transform algorithm which combines the calculation of the Walsh–Hadamard transform (WHT) and the discrete Fourier transform (DFT) is introduced. This can be used in Walsh–Hadamard precoded orthogonal frequency ...
Natural, Dyadic, and Sequency Order Algorithms and Processors for the Walsh-Hadamard Transform
The Walsh-Hadamard transform has recently received increasing attention in engineering applications due to the simplicity of its implementation and to its properties which are similar to the familiar Fourier transform. The transform matrices found so ...






Comments