skip to main content
editorial
Free Access

Introduction to the Special Issue on Accelerating AI on the Edge – Part 2

Published:12 December 2022Publication History

Machine Learning (ML) is nowadays embedded in several computing devices, consumer electronics, and cyber-physical systems. Smart sensors are deployed everywhere, in applications such as autonomous vehicles, robotics, wearables, and perceptual computing devices, and intelligent algorithms power our connected world. These devices collect and aggregate volumes of data, and in doing so, they augment our society in multiple ways; from healthcare, to social networks, to consumer electronics and many more. To process these immense volumes of data, machine learning is emerging as the de facto analysis tool, that powers several aspects of our Big Data society. Applications spanning from infrastructure (smart agriculture, smart cities, intelligent transportation systems, smart grids, to name a few), to social networks and content delivery, to e-commerce and smart factories, and emerging concepts such as self-driving cars and autonomous/ assistive robots, are powered by advanced machine learning technologies. These emerging systems require real-time inference and decision support; such scenarios therefore may use customized hardware accelerators, are typically bound by limited compute, memory and energy resources, and are restricted to limited connectivity and bandwidth. Thus, near-sensor computation and near-sensor intelligence are starting to emerge as necessities, in order to continue supporting the paradigm shift of our connected world.

The need for real-time intelligent data analytics (especially in the era of Big Data) for decision support near the data acquisition points, emphasizes the need of revolutionizing the way we design, build, test and verify processors, accelerators, and systems that facilitate machine learning (and deep learning in particular) implemented in resource-constrained environments for use at the edge and the fog. As such, traditional Von Neumann architectures may no longer be sufficient and suitable, primarily because of limitations in both performance and energy efficiency caused especially by large amounts of data movement. Furthermore, due to the connected and critical nature of such systems, security and reliability are also critically important. To facilitate AI at the edge, we need to re-focus on problems such as design, verification, architecture, scheduling and allocation policies, optimization, and many more, for determining the most efficient way to implement these novel applications within a resource-constrained system, which may or may not be connected. Acceleration of AI at the edge therefore, is a fast-growing field of machine learning technologies and applications including algorithms, hardware, and software capable of performing on-device sensor (vision, audio, IMU, biomedical, etc.) data analytics at extremely low power, typically in the mW range and below, and hence enabling a variety of always-on use-cases and targeting battery-operated devices. This special issue therefore targets research at the intersection of AI/machine learning applications, algorithms, software, and hardware in deeply embedded machine learning systems.

This special issue received a total of 76 submissions and involved numerous reviewers, who were selected based on their expertise on the topics of the submissions. Most of the submissions went through two rounds of reviews, including both major and minor revisions, in order to further enhance their technical quality. For instance, in the first round, on average, we had four reviews per submission. Thorough revisions were made by the authors, and careful revision reviewing and cross-checking were done by the reviewers and the guest editors to ensure that the revisions comprehensively addressed all the comments. This represents a tremendous effort by the authors, reviewers, guest editors, technical and administrative staff of TECS and the editor-in-chief. After an extensive review process, 38 high-quality submissions were accepted. Therefore, this special issue is structured in two parts. Overall, the accepted papers in this special issue cover a wide range of solutions for EdgeAI, targeting different layers of the system stack, and covering both conventional and non-conventional computing paradigms.

In the following, we briefly present the accepted papers in the Part-II of this special issue.

(1)

The article DOI: Efficient Realization of Decision Trees for Real-Time Inference provides an analysis of memory locality issues of decision trees to determine an appropriate memory layout for each tree at the application layer. An efficient heuristic is proposed to consider the architecture dependent information in the optimization process. Moreover, the code generation framework is made open source. The evaluation is done for different datasets and significant performance improvement is reported for both server-class and embedded systems.

(2)

The article DOI: Block Walsh-Hadamard Transform Based Binary Layers in Deep Neural Networks proposes to use 1-D and 2-D binary block Walsh-Hadamard transform (WHT) as a replacement to some of the convolutional layers in a given DNN, and denoise the WHT domain coefficients. The goal is to reduce the number of trainable parameters significantly. The evaluation is done for MobileNet-V2, MobileNet-V3-Large, and ResNet DNNs and a speedup of 24× is reported with 19% reduced memory utilization for Jetson Nano experiments.

(3)

The article DOI: Accelerated Fire Detection and Localization at Edge employs a cascaded model that detects fire using a classifier trained on large fire datasets using a multi-task learning (MTL) approach and triggers the localization workflow. For improved inference at the edge, quantization and compression are performed. An automated hardware-software framework is developed. The results report fire localization accuracy and inference rate.

(4)

The article DOI: DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device presents a distributed inference framework that employs a CNN-specific data packing method and a scheduler to dynamically tune the partition point and transferred data precision while considering the execution environment. The proposed framework also accounts for device heterogeneity, varying bandwidth, and multiple design objectives. Results illustrate the efficacy of the proposed DynO approach, e.g., up to 7.9× throughput improvement with 60× less data transferred.

(5)

The article DOI: Design and Scaffolded Training of an Efficient DNN Operator for Computer Vision on the Edge introduces Fully-Separable Convolutions (FuSeConv) as a replacement for depth-wise separable convolutions, and presents an efficient dataflow mapping technique, called Spatial-Tiled Output Stationary (ST-OS), for systolic arrays. A fast training technique is proposed that can also be combined with a NAS flow. Performance improvements of 4× to 9× are reported for the ImageNet dataset.

(6)

The article DOI: Online Learning for Orchestration of Inference in Multi-User End-Edge-Cloud Networks presents a reinforcement learning based technique for computation offloading, which learns an optimal offloading policy to minimize the response time. Evaluation is done using a real-world setup with multiple AWS and ARM core configurations. The proposed technique provides 35% speedup with less than 1% accuracy loss.

(7)

The article DOI: Federated Self-Training for Semi-Supervised Audio Recognition studies the issues of semi-supervised learning of audio recognition models in a federated learning setting, and proposes a framework to improve the generalization of these audio models by exploiting the on-device unlabeled data. Evaluation is performed using public datasets. An improvement of approximately 13% is reported compared to a fully-supervised model when only 35 of labeled data is available.

(8)

The article DOI: Synaptic Activity and Hardware Footprint of Spiking Neural Networks in Digital Neuromorphic Systems presents a high-level metric to characterize the energy-efficiency of SNNs along with estimators for logic resources, power consumption, execution time, and energy consumption, when considering FPGA based implementations of four state-of-the-art accelerator architectures that support both sequential and parallel implementations of SNNs. The study claims that SNNs provide better energy efficiency in sequential implementations, while synaptic activity is an important factor for low-energy implementations.

(9)

The article DOI: DyCo: Dynamic, Contextualized AI Models provides a system to preserve privacy while dynamically improving the small model accuracy on devices. The proposed technique employs a semi-supervised approach in conjunction with existing training frameworks and network models. The key idea is to periodically train contextualized, smaller models for resource-constrained devices. Dyco also provides an edge-cloud solution to auto-label data, while preserving privacy and enabling development of bespoke models for different devices in parallel that are trained using the auto-labeled data. The evaluation is done for two object detection models and two datasets, where results demonstrate accuracy improvements of 16% to 20%, while reducing the training costs.

(10)

The article DOI: Design-Technology Co-Optimization for NVM-based Neuromorphic Processing Elements presents an analysis of design-technology tradeoff when implementing machine learning inferences on NVM based neuromorphic systems that are deployed in resource-constrained Edge devices. Detailed circuit level simulations are performed to illustrate the impact of technology scaling on the latency. Based on the analysis, multiple design optimizations are proposed, e.g., finding an efficient NVM resistance state to reduce latency, introducing isolation transistors in each PE to enable fine-grained power gating, and a system software for efficient inference considering the technological enhancements.

(11)

The article DOI: HyDREA: Utilizing Hyperdimensional Computing for a More Robust and Efficient Machine Learning System presents an in-memory architecture for federated learning environments that provides significant speedup and energy efficiency compared to the baseline architecture used for classification and clustering tasks, by relaxing different hardware parameters and introducing computational errors because the considered HD is resilient to approximation errors.

(12)

The article DOI: Brain-inspired Cognition in Next Generation Racetrack Memories presents a racetrack memory (RTM)-based architecture to accelerate hyper-dimensional computing in memory. An RTM nanowire-based counting mechanism is proposed to reduce the circuit overhead. The proposed system provides more than 8x reduction in the energy consumption for a language recognition task, while compared to an FPGA-based solution, it provides more than 7× and 5s speedup and energy reduction, respectively.

(13)

The article DOI: Winograd Convolution for Deep Neural Networks: Efficient Point Selection aims at replacing compute-intensive convolution layers with Winograd convolution that suffer from poor numeric properties. To address this issue, this article proposes a technique to do point selection in a certain form that cancellation to reduce errors. Evaluations are performed using different sizes of 1D and 2D convolution, and error reduction of 22% to 63% are reported.

(14)

The article DOI: MHDeep: Mental Health Disorder Detection System based on Wearable Sensors and Neural Networks proposes a framework that employs eight different categories of data obtained from off-the-shelf sensors deployed in smart watches and phones, perform data augmentation using the synthetic data from the same probability distribution, and analyzes this data using neural networks (NNs) to diagnose different types of mental health disorders, i.e., schizoaffective, major depressive, and bipolar.

(15)

The article DOI: Leveraging Computational Storage for Power-Efficient Distributed Data Analytics demonstrates the performance and power benefits of computational storage drives (CSD) using in-storage processing. For this, a software technique is proposed to perform workload balancing for training and inference. A speedup of 3× with an energy reduction of 67% is reported compared to regular SSDs.

(16)

The article DOI: A Fall Detection Network by 2D/3D Spatio-temporal Joint Models with Tensor Compression on Edge aims at providing an efficient solution for falling detection which is a serious threat in elderly healthcare. For this, a fast video fall detection network is proposed that uses spatio-temporal joint-point models, LSTMs, and a 3D pose estimation network to perform high accuracy detection. To reduce the storage and computation load, tensor train decomposition is applied for edge-based implementations. Results show significant improvement in the accuracy and speedup.

(17)

The article DOI: FELIX:AFerroelectricFETBasedLowPowerMixed-SignalIn-memoryArchitectureforDNNAcceleration uses a single FeFET as an NVM cell. The proposed architecture provides a high degree of parallelism using 3-bit ADCs, flexibility and high utilization, while eliminating the need for a DAC. Simulations show significant performance-per-power efficiency using a 22nm FDSOI technology.

(18)

The article DOI: Resource-Efficient Continual Learning for Sensor-Based Human Activity Recognition provides an efficient continual learning solution using an expandable NN with a replay-based technique that employs a compressed replay memory. Evaluation is done for four datasets and two microcontrollers.

(19)

The article DOI: OnSRAM: Efficient Inter-Node On-Chip Scratchpad Management in Deep Learning Accelerators performs an analysis to illustrate more than 5x performance gap in DL inference that can be bridged using efficient scratchpad management. The proposed framework with a compiler runtime employs two different methods, one for static graphs and the other for an eager execution model using a history-based speculation. The methods are integrated in TensorFlow and evaluation is done for different networks that illustrate significant reduction in inference latency and energy consumption over a baseline with no scratchpad management.

As a final note, the guest editors would like to thank all the authors, reviewers, Editor-in-Chief (EiC), and the administrative staff of ACM Transactions on Embedded Computing Systems (TECS), and all other officials and technical staff who were directly or indirectly engaged in making this special issue a big success. A big thanks to the reviewers for their valuable time, expert reviews, and excellent feedback, while managing such a huge submission load, and especially for providing timely reviews. A special thanks to the authors for their submissions, for comprehensively addressing the reviewers’ feedback in a constructive way, and for meeting all the intermediate deadlines and requirements. Finally, we would like to thank the EiC of ACM TECS, Professor Tulika Mitra, and her whole administrative and technical team, for their continuous interactions, timely responses and information exchange, and excellent professional support. Many thanks to all!

Muhammad ShafiqueTheocharis TheocharidesHai (Helen) LiChun Jason XueGuest Editors

Index Terms

(auto-classified)
  1. Introduction to the Special Issue on Accelerating AI on the Edge – Part 2

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Embedded Computing Systems
          ACM Transactions on Embedded Computing Systems  Volume 21, Issue 6
          November 2022
          498 pages
          ISSN:1539-9087
          EISSN:1558-3465
          DOI:10.1145/3561948
          • Editor:
          • Tulika Mitra
          Issue’s Table of Contents

          Copyright © 2022 Copyright held by the owner/author(s).

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 December 2022
          Published in tecs Volume 21, Issue 6

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • editorial
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)242
          • Downloads (Last 6 weeks)46

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!