Machine Learning (ML) is nowadays embedded in several computing devices, consumer electronics, and cyber-physical systems. Smart sensors are deployed everywhere, in applications such as autonomous vehicles, robotics, wearables and perceptual computing devices, and intelligent algorithms power our connected world. These devices collect and aggregate volumes of data, and in doing so, they augment our society in multiple ways: from healthcare, to social networks, to consumer electronics, and many more. To process these immense volumes of data, machine learning is emerging as the de facto analysis tool, that powers several aspects of our Big Data society. Applications spanning from infrastructure (smart agriculture, smart cities, intelligent transportation systems, smart grids, to name a few), to social networks and content delivery, to e-commerce and smart factories, and emerging concepts such as self-driving cars and autonomous/assistive robots, are powered by advanced machine learning technologies. These emerging systems require real-time inference and decision support; such scenarios therefore may use customized hardware accelerators, are typically bound by limited compute, memory and energy resources, and are restricted to limited connectivity and bandwidth. Thus, near-sensor computation and near-sensor intelligence are starting to emerge as necessities, in order to continue supporting the paradigm shift of our connected world.
The need for real-time intelligent data analytics (especially in the era of Big Data) for decision support near the data acquisition points, emphasizes the need of revolutionizing the way we design, build, test and verify processors, accelerators and systems that facilitate machine learning (and deep learning in particular) implemented in resource-constrained environments for use at the edge and the fog. As such, traditional Von Neumann architectures may no longer be sufficient and suitable, primarily because of limitations in both performance and energy efficiency caused especially by large amounts of data movement. Furthermore, due to the connected and critical nature of such systems, security and reliability are also critically important. To facilitate AI at the edge, we need to re-focus on problems such as design, verification, architecture, scheduling and allocation policies, optimization, and many more, for determining the most efficient way to implement these novel applications within a resource-constrained system, which may or may not be connected. Acceleration of AI at the edge therefore, is a fast-growing field of machine learning technologies and applications including algorithms, hardware, and software capable of performing on-device sensor (vision, audio, IMU, biomedical, etc.) data analytics at extremely low power, typically in the mW range and below, and hence enabling a variety of always-on use-cases and targeting battery-operated devices. This special issue therefore targets research at the intersection of AI/machine learning applications, algorithms, software, and hardware in deeply embedded machine learning systems.
This special issue received a total of 76 submissions and involved numerous reviewers, who were selected based on their expertise on the topics of the submissions. Most of the submissions went through two rounds of reviews, including both major and minor revisions, in order to further enhance their technical quality. For instance, in the first round, on average, we had four reviews per submission. Thorough revisions were made by the authors, and careful revision reviewing and cross-checking were done by the reviewers and the guest editors to ensure that the revisions comprehensively addressed all the comments. This represents a tremendous effort by the authors, reviewers, guest editors, technical, and administrative staff of TECS and the editor-in-chief. After an extensive review process, 38 high-quality submissions were accepted. Therefore, this special issue is structured in two parts. Overall, the accepted papers in this special issue cover a wide range of solutions for EdgeAI, targeting different layers of the system stack, and covering both conventional and non-conventional computing paradigms.
In the following, we briefly present the accepted papers in the Part-I of this special issue.
(1) | The article “ Edge Intelligence : Concepts, Architectures, Applications and Future Directions” provides a review of the state of the art, compares these techniques, and analyzes different application scenarios. It discusses the existing frameworks and platforms for Edge Computing as well as analyzes different devices, techniques, and frameworks to enable deployment of deep learning at the edge with respect to different metrics like latency, energy consumption, and neural network accuracy to highlight limitations of the state of the art. This article also discusses challenges and future directions in EdgeAI. | ||||
(2) | The article “ More is Less : Model Augmentation for Intermittent Deep Inference” discusses the limitations of model compression and overhead of intermittent deep neural network (DNN) inference on resource constrained devices. It develops a model augmentation technique to extend a given DNN with extra components that enables the accelerator to integrate progress indicators during the inference process, which allows low-overhead correct recovery under power resumption. The evaluation is done on a TI device for different DNN models, hardware settings, and progress preservation granularities. | ||||
(3) | The article “ TAB : Unified and Optimized Ternary, Binary and Mixed-Precision Neural Network Inference on the Edge” aims at addressing the issues related to a lack of unified encoding for binary and ternary values, bit extraction overheads, complex computation pipeline, and mixed-precision multiplication through a unified value representation, efficient data storage, and optimized bitwise dot-product pipelines. Comparisons are done with different state-of-the-art techniques to illustrate the performance optimizations and memory savings. | ||||
(4) | The article “ Tensor RT-based Framework and Optimization Methodology for Deep Learning Inference on Jetson Boards” presents a TensorRT-based framework to support different optimization parameters for accelerating DNNs on Jetson embedded GPUs. A heuristic is developed for balancing the pipeline stages among different processors and to fine-tune the optimization parametres. | ||||
(5) | The article “ Towards Adversary Aware Non-Iterative Model Pruning Through Dynamic Network Rewiring of DNNs” presents a dynamic network rewiring method to generate robust DNN models. The loss function in the optimization problem integrates the sparse learning with robust adversarial training. The extended approach further improves the robustness of pruned models by adding the sparse parametric Gaussian noise tensor to the weight tensors for robust regularization. Unlike existing frameworks that require multiple training iterations, this method requires only a single iteration while supporting both structured and irregular channel pruning. | ||||
(6) | The article “ PhiNets : A Scalable Backbone for Low-power AI at the Edge” introduces a new scalable backbone model based on inverted residual blocks, for DNN-based image processing on resource-constrained embedded platforms. The idea is to decouple the computational cost, working memory, and parameter memory. PhiNets achieve 90% parameter reduction compared to the state of the art. The article also presents a prototypes solution based on an STM32H743 microcontroller. | ||||
(7) | The article in “ Data-flow Driven Partitioning of ML Applications for Optimal Energy Use in Batteryless Systems” presents an automated method, Julienning, for optimizing energy cost of batteryless systems. The proposed method partitions the data- and energy-intensive applications into different execution cycles, and minimizes the number of system activations and nonvolatile data transfers by leveraging the inter-kernel data dependencies. The method is validated for two batteryless cameras executing ML workloads. | ||||
(8) | The article in “ Contention Grading and Adaptive Model Selection for Machine Vision in Embedded Systems” aims at addressing the contention challenges related to multiple ML workload executions on a resource-constrained device that lead to an increase in the inference delay. This is done by adaptively selecting an appropriate model while trading off the network accuracy. The approach profiles the system at design time to generate a model set which is optimized for different contention levels, and then selects an appropriate one at run time considering the system contention. The evaluation is done on the Nvidia Jetson TX2 platform. | ||||
(9) | The article in “ A Construction Kit for Efficient Low Power Neural Network Accelerator Designs” provides a quantitative review of different optimization approaches for DNN accelerators used by the state of the art and identifies their efficacy for edge processing. A construction kit is provided in form of a set of optimizations with quantitative evaluations, that provides the designers an overview of different design choices for developing low-power DNN accelerators. | ||||
(10) | The article in “ Energy Efficient and Reliable Inference in Nonvolatile Memory under Extreme Operating Conditions” presents a non-volatile processing-in-memory architecture with low overhead checkpointing mechanisms that provides high energy efficiency, resilience to radiations, and operational capabilities for a wide range of temperatures. This enables utilization of ML-based embedded devices in harsh operating scenarios. | ||||
(11) | The article in “ Resource-Demand Estimation for Edge Tensor Processing Units” presents an approach for resource estimation of an embedded DNN accelerator. The method is based on generation of random DNN models, their static analysis, and measurement of their execution time and power profiles on an embedded accelerator. Afterwards, it trains an estimator to map the properties of statically analyzed DNNs to the measured resource requirements. | ||||
(12) | The article in “ CAP'NN : a Class-aware Framework for Personalized Neural Network Inference” presents a user-aware pruning approach for DNNs that can additionally remove non-effective neurons without requiring retraining. It relies on pruning requests from different users, and identifies similarities between them. It deploys a cache architecture to reuse information from already-pruned networks. | ||||
(13) | The article in “ Quantized Sparse Training : A Unified Trainable Framework for Joint Pruning and Quantization of DNNs” aims at integrating joint pruning and quantization into a gradient-based optimization process to simultaneously train, prune, and quantize a given DNN from scratch, for achieving high compression efficiency with minimal accuracy degradation. | ||||
(14) | The article in “ ATCN : Resource-Efficient Processing of Time Series on Edge” presents the Agile Temporal Convolutional Network for accurate yet fast time-series prediction on embedded devices. With formalized hyperparameters, it enables application-specific optimizations in the model architecture, while considering tight performance and memory constraints. It deploys residual connections and separable depth-wise convolution for improving the network efficiency whilst reducing its computational complexity. The proposed models are evaluated on Cortex-M7 and Cortex-A57 processors and compared against the InceptionTime and MiniRocket models. | ||||
(15) | The article in “ Hardware-friendly User-specific Machine Learning for Edge Devices” aims at building user-specific small-sized ML models by employing hardware-aware pruning for mobile platforms, and leveraging the compute sharing between inference and pruning. It also employs new architectural support to prune user-specific models on a systolic-array based accelerator. | ||||
(16) | The article in “ A Unified Programmable Edge Matrix Processor for Deep Neural Networks and Matrix Algebra” presents MxCore architecture that integrates vector and programmable cores with optimized interconnects and a configurable hardware scheduler to improve resource utilization and performance efficiency. MxCore can accelerate multiple Matrix Algebra and DNN applications, considering different levels of sparsity and data reuse. The processor is implemented in a 7nm process and operates at a very low power budget. | ||||
(17) | The article in “ Performance Modeling of Computer Vision-based CNN on Edge GPUs” aims at modeling the performance of CNNs on embedded GPUs to facilitate fast design space exploration considering performance vs. hardware efficiency tradeoffs. For this, five different ML algorithms are implemented for performance prediction considering three different GPUs executing image classification. | ||||
(18) | The article in “ Mobile or FPGA?A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework” aims at exploring the impact of network compression techniques considering different edge devices (such as, an FPGA and a mobile GPU) in both a qualitative and quantitative way. Based on the analyses, the paper proposes a unified framework for DNN optimizations using block-based pruning and inference acceleration. | ||||
(19) | The article in “ EdgeWise : Energy-Efficient CNN Computation on Edge Devices under Stochastic Communication Delays” aims at integrating a resource-constrained embedded device for data pre-processing and workload distribution with a powerful device for acceleration though a wireless network. It solves an optimization problem to find optimal layer assignment considering network delay. The evaluation is done on a system with a Raspberry PI 3 device working with an NVIDIA Jetson TX2. | ||||
As a final note, the guest editors would like to thank all the authors, reviewers, Editor-in-Chief (EiC), and the administrative staff of ACM Transactions on Embedded Computing Systems (TECS), and all other officials and technical staff who were directly or indirectly engaged in making this special issue a big success. A big thanks to the reviewers for their valuable time, expert reviews, and excellent feedback, while managing such a huge submission load, and especially for providing timely reviews. A special thanks to the authors for their submissions, for comprehensively addressing the reviewers’ feedback in a constructive way, and for meeting all the intermediate deadlines and requirements. Finally, we would like to thank the EiC of ACM TECS, Professor Tulika Mitra, and her whole administrative and technical team, for their continuous interactions, timely responses and information exchange, and excellent professional support. Many thanks to all!
Index Terms
(auto-classified)Introduction to the Special Issue on Accelerating AI on the Edge – Part 1
Recommendations
Special Issue: Learning and creativity Part 1
Special Issue Part 1 (Issue 3) and Part 2 (Issue 4) of AIEDAM are based on a workshop on Learning and Creativity held at the 2002 conference on Artificial Intelligence in Design, AID '02 (www.cad.strath.ac.uk/AID02_workshop/Workshop_webpage.html; Gero, ...
Special Issue: Learning and creativity Part 2
Special Issue Part 1 (Issue 3) and Part 2 (Issue 4) of AIEDAM are based on a workshop on Learning and Creativity held at the 2002 conference on Artificial Intelligence in Design, AID '02 (www.cad.strath.ac.uk/AID02_workshop/Workshop_webpage.html; Gero, ...
Special Issue: Machine Learning in Design
This issue of AIEDAM is based on a workshop on Machine Learning in Design held at the 1996 Conference on Artificial Intelligence in Design, AID'96 (Gero & Sudweeks, 1996), the third of such workshops, with the previous two being held at AID'92 (Gero, ...






Comments