Abstract
In the Internet of Things era, where we see many interconnected and heterogeneous mobile and fixed smart devices, distributing the intelligence from the cloud to the edge has become a necessity. Due to limited computational and communication capabilities, low memory and limited energy budget, bringing artificial intelligence algorithms to peripheral devices, such as end-nodes of a sensor network, is a challenging task and requires the design of innovative solutions. In this work, we present PhiNets, a new scalable backbone optimized for deep-learning-based image processing on resource-constrained platforms. PhiNets are based on inverted residual blocks specifically designed to decouple the computational cost, working memory, and parameter memory, thus exploiting all available resources for a given platform. With a YoloV2 detection head and Simple Online and Realtime Tracking (SORT), the proposed architecture achieves state-of-the-art results in (i) detection on the COCO and VOC2012 benchmarks, and (ii) tracking on the MOT15 benchmark. PhiNets obtain a reduction in parameter count of around 90% with respect to previous state-of-the-art models (EfficientNetv1, MobileNetv2) and achieve better performance with lower computational cost. Moreover, we demonstrate our approach on a prototype node based on an STM32H743 microcontroller (MCU) with 2 MB of internal Flash and 1MB of RAM and achieve power requirements in the order of 10 mW. The code for the PhiNets is publicly available on GitHub.1
- [1] . 2016. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing. IEEE, 3464–3468.Google Scholar
Cross Ref
- [2] . 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934. Retrieved from https://arxiv.org/abs/2004.10934.Google Scholar
- [3] . 2019. Xnor-net++: Improved binary neural networks. arXiv:1909.13863. Retrieved from https://arxiv.org/abs/1909.13863.Google Scholar
- [4] . 2019. Once for all: Train one network and specialize it for efficient deployment. arXiv:1908.09791. Retrieved from https://arxiv.org/abs/1908.09791.Google Scholar
- [5] . 2020. Tiny transfer learning: Towards memory-efficient on-device learning. arXiv:2007.11622. Retrieved from https://arxiv.org/abs/2007.11622.Google Scholar
- [6] . 2019. Neural network distillation on IoT platforms for sound event detection. In Proceedings of the Interspeech 2019. 3609–3613.
DOI: Google ScholarCross Ref
- [7] . 2018. ChamNet: Towards efficient network design through platform-aware model adaptation. arXiv:1812.08934. Retrieved from https://arxiv.org/abs/1812.08934.Google Scholar
- [8] . 2019. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6569–6578.Google Scholar
Cross Ref
- [9] . [n. d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Retrieved from http://www.pascal-network.org/challenges/VOC/voc2012/ workshop/index.html.Google Scholar
- [10] . 2018. GAP-8: A RISC-V SoC for AI at the edge of the IoT. In Proceedings of the 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors. IEEE, 1–4.Google Scholar
Cross Ref
- [11] . [n. d.]. Raspberry Pi Hardware. Retrieved from https://www.raspberrypi.org/documen-tation/hardware/raspberrypi/.Google Scholar
- [12] . [n. d.]. NanoPi NEO-LTS. Retrieved from https://www.friendlyarm.com.Google Scholar
- [13] . 2020. PULP-NN: Accelerating quantized neural networks on parallel ultra-low-power RISC-V processors. Philosophical Transactions of the Royal Society A 378, 2164 (2020), 20190155.Google Scholar
Cross Ref
- [14] . 2021. Enabling design methodologies and future trends for edge AI: Specialization and co-design. IEEE Design and Test 38, 4 (2021), 7–26.Google Scholar
Cross Ref
- [15] . 2020. TUTOR: Training neural networks using decision rules as model priors. arXiv:2010.05429. Retrieved from https://arxiv.org/abs/2010.05429.Google Scholar
- [16] . 2019. SCANN: Synthesis of compact and accurate neural networks. arXiv:1904.09090. Retrieved from https://arxiv.org/abs/1904.09090.Google Scholar
- [17] . 2017. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.Google Scholar
Cross Ref
- [18] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [19] . 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved from https://arxiv.org/abs/1704.04861.Google Scholar
- [20] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google Scholar
Cross Ref
- [21] . 2021. A 1D-CNN based deep learning technique for sleep apnea detection in IoT sensors. arXiv:2105.00528. Retrieved from https://arxiv.org/abs/2105.00528.Google Scholar
- [22] . 2018. Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus. arXiv:1801.06601. Retrieved from https://arxiv.org/abs/1801.06601.Google Scholar
- [23] . 2015. Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv:504.01942. Retrieved from https://arxiv.org/abs/504.01942.Google Scholar
- [24] . 2020. Mcunet: Tiny deep learning on iot devices. arXiv:2007.10319. Retrieved from https://arxiv.org/abs/2007.10319.Google Scholar
- [25] . 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.Google Scholar
Cross Ref
- [26] . 2016. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21–37.Google Scholar
Cross Ref
- [27] [n. d.]. Cortex-M. Retrieved from https://developer.arm.com/ip-products/processors/cortex-m.Google Scholar
- [28] . 2019. People/car classification using an ultra-low-power smart vision sensor. In Proceedings of the 2019 IEEE 8th International Workshop on Advances in Sensors and Interfaces. IEEE, 91–96.Google Scholar
Cross Ref
- [29] . 2021. Enabling energy efficient machine learning on a Ultra-Low-Power vision sensor for IoT. arXiv:2102.01340. Retrieved from https://arxiv.org/abs/2102.01340.Google Scholar
- [30] . 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7263–7271.Google Scholar
Cross Ref
- [31] . 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv:1506.01497. Retrieved from https://arxiv.org/abs/1506.01497.Google Scholar
- [32] . 2020. 3PXNet: Pruned-permuted-packed XNOR networks for edge machine learning. ACM Transactions on Embedded Computing Systems 19, 1(2020), 23 pages.
DOI: Google ScholarDigital Library
- [33] . 2018. Always-ON visual node with a hardware-software event-based binarized neural network inference engine. In Proceedings of the 15th ACM International Conference on Computing Frontiers.Association for Computing Machinery, New York, NY, 314–319.
DOI: Google ScholarDigital Library
- [34] . 2020. Machine learning on mainstream microcontrollers. Sensors 20, 9 (2020).
DOI: Google ScholarCross Ref
- [35] . 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.Google Scholar
Cross Ref
- [36] . 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, 6105–6114.Google Scholar
- [37] . 2021. Efficientnetv2: Smaller models and faster training. arXiv:2104.00298. Retrieved from https://arxiv.org/abs/2104.00298.Google Scholar
- [38] . 2021. Memory-efficient speech recognition on smart devices. arXiv:2102.11531. Retrieved from https://arxiv.org/abs/2102.11531.Google Scholar
- [39] . 2020. Convergence of edge computing and deep learning: A comprehensive survey. IEEE Communications Surveys and Tutorials 22, 2 (2020), 869–904.Google Scholar
Cross Ref
- [40] . 2017. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing. IEEE, 3645–3649.Google Scholar
Digital Library
- [41] . 2020. Edge intelligence: Architectures, challenges, and applications. arXiv:2003.12172. Retrieved from https://arxiv.org/abs/2003.12172.Google Scholar
- [42] . 2020. FairMOT: On the fairness of detection and re-identification in multiple object tracking. arXiv:2004.01888. Retrieved from https://arxiv.org/abs/2004.01888.Google Scholar
- [43] . 2021. Memory-efficient, limb position-aware hand gesture recognition using hyperdimensional computing. arXiv:2103.05267. Retrieved from https://arxiv.org/abs/2103.05267.Google Scholar
Index Terms
PhiNets: A Scalable Backbone for Low-power AI at the Edge
Recommendations
Edge Intelligence: Concepts, Architectures, Applications, and Future Directions
The name edge intelligence, also known as Edge AI, is a recent term used in the past few years to refer to the confluence of machine learning, or broadly speaking artificial intelligence, with edge computing. In this article, we revise the concepts ...
Roadmap for edge AI: a Dagstuhl perspective
Based on the collective input of Dagstuhl Seminar (21342), this paper presents a comprehensive discussion on AI methods and capabilities in the context of edge computing, referred as Edge AI. In a nutshell, we envision Edge AI to provide adaptation for ...
Paving the Way Towards Collective Intelligence at the IoT Edge
AbstractWith the scale of data generated and the increasing computing capabilities at the IoT Edge, moving intelligence to end-devices and gateways in IoT networks is within the realms of possibility. There are tremendous hopes on how self-awareness in ...






Comments