Poster: Home-based, On-Device Non-invasive Obstructive Sleep Apnea Monitoring with Infrared Video

Obstructive sleep apnea (OSA) is a prevalent sleep disorder, affecting approximately one billion individuals globally. In this study, we aim to address the limitations of Polysomnography (PSG), the gold standard for OSA diagnosis, by developing SlAction, a non-intrusive system that utilizes infrared videos for OSA detection in daily sleep settings. Considering the privacy-sensitive nature of sleep videos, SlAction is designed to analyze data directly on the camera-capturing device, eliminating the need to transmit video data to a server. With the collaboration of clinical experts, we extensively analyze the largest dataset worldwide that we collected, establishing correlations between OSA events and human motions during sleep. Our novel approach achieved an OSA prediction performance with an F1 score of 0.88. Notably, even when running on a low-spec CPU, our SlAction operates approximately 75 times faster than previous work evaluated on high-performance GPU servers.


INTRODUCTION
Obstructive sleep apnea (OSA) is one of the prevalent sleep disorders, marked by recurrent obstructions of the upper airway leading to temporary pauses in breathing, known as apnea, or shallow breathing, known as hypopnea [9].OSA affects approximately one billion people, comprising about 14% of the global population aged 30 to 69 years [2].Despite its wide-reaching impact [8], the timely identification of OSA remains a significant challenge.This is primarily due to a lack of awareness regarding its symptoms, such as snoring and interruptions in breathing during sleep, often leading to delayed medical intervention [6].
Polysomnography (PSG) serves as the gold standard in evaluating sleep.PSG requires patients spending a night at a dedicated sleep laboratory with a dozen sensors attached to record various signals.When diagnosing Obstructive Sleep Apnea (OSA), PSG measures the apnea-hypopnea index (AHI), which represents the number of apnea and hypopnea events occurring per hour of sleep.[7].However, the accuracy of OSA diagnoses using PSG may be compromised by two key issues: (1) The first-night effect, where the unfamiliarity of sleeping with sensors in a new environment can skew the initial night's data relative to normal sleep conditions.(2) The nightly variation in respiratory events can lead to significant shifts in OSA severity, but PSG is usually limited to a single night's observation due to its elaborate and costly setup.To address the limitations of PSG, several studies have investigated daily remote sleep monitoring systems using IoT sensors and wearable devices [3,4], aiming to enhance early detection of OSA and comprehensive diagnostic evaluations in healthcare settings.

Before
This work studies the potential of an alternative data form for nonintrusive OSA monitoring during daily sleep: infrared sleep video.Our proposal SlAction targets an application scenario in Figure 1.A video camera is installed on the wall or ceiling, positioned away from the bed to ensure uninterrupted sleep for the subject.The video streams are processed locally in real-time to calculate the AHI automatically, ensuring privacy.The device can aggregate AHI results over multiple nights, enabling self-diagnosis and issuing local alerts for medical attention.Additionally, AHI results can be sent to the doctor for remote diagnosis and personalized treatment.

Dataset
We collected infrared sleep videos from three hospitals with varying setups, as in Figure 2. Our dataset, the largest globally, includes 729 patients of diverse ages and genders.Videos, synchronized with PSG data annotated by experts, had faces anonymized using mosaic processing for privacy.

Preliminary Study with Clinical Expertise
We conducted a preliminary study in collaboration with clinical experts to develop a method better suited for video analysis compared to the existing direct approaches.Building on this, we opted for an indirect detection of OSA, focusing on identifying Respiratory Arousal (RA) events that can occur within 3 seconds following an apnea/hypopnea incident.Our intuition is that RA events exhibit strong linear correlations with apnea/hypopnea events, and involve more noticeable motions.We discovered through analyzing RA movement patterns that RA exhibits distinctive features, setting it apart from other sleep events, and also identified the potential to differentiate spontaneous arousal with similar patterns using deep learning models.

SlAction
SlAction comprises three modules: (1) Input Data Preprocessor, (2) RA Detector, and (3) AHI Estimator, as illustrated in Figure 3.During sleep sessions, a camera continuously records video frames at a low frame rate of 2.5 FPS.Concurrently, the Input Data Preprocessor calculates pixel-wise differences between successive frames and compiles these differences into clips every minute.Both the size of these clips and the sliding window step are meticulously tailored, reflecting the subtle dynamics of Respiratory Arousal (RA).Upon creating a video clip, it undergoes analysis by the RA Detector to ascertain whether it encompasses an RA event.We incorporate MoViNet [5], a state-of-the-art, efficient video recognition model, as our RA Detector.Given the minimal variations observed in sleep videos over short durations, we have curated the training dataset to optimize the training of the RA Detector efficiently.As the RA Detector conducts inference on a given clip, the Input Data Preprocessor simultaneously prepares the subsequent clip.Upon completion of the sleep session, the AHI Estimator calculates the Apnea-Hypopnea Index (AHI) utilizing the aggregate count of RA occurrences throughout the sleep period.For this estimation, we employ the Huber Regressor, a linear regression model that capitalizes on the linear correlation between the RA count and AHI for accurate AHI prediction.

PRELIMINARY RESULT
OSA prediction.We assess our system on 245 patients, 6× more than previous studies [1].The RA detector's effectiveness is measured by the AUC of the ROC curve, with a mean AUC of 0.79 indicating effective discrimination between RA and non-RA clips.Using an AHI threshold of 15, a standard in OSA prevalence studies, we classify AHI values: those above 15 indicate OSA, and those below signify non-OSA.This methodology results in an F1 score of 0.88, marginally surpassing the outcomes of earlier studies.
On-device Performance.We evaluate SlAction on the resource-constrained Jetson Nano (quad-core ARM A57 CPU and 4 GB 64-bit LPDDR4 memory).For comparison, we analyze a 5-hour recorded video, showing that even our SlAction on a low-spec CPU performs roughly 75 times faster than previous work evaluated on high-performance GPU servers.This efficiency stems from using a lightweight model with three times fewer parameters and a strategically designed system that reduces the number of inferences by 60 times.We summarize the overall results in Table 1.

CONCLUSION
This work is the first exploration on non-contact, on-device OSA diagnosis using infrared video.Leveraging the largest dataset of sleep videos and clinical insights, we've developed a novel approach by indirectly detecting OSA via RA events.Our method combines a bespoke data design and a lightweight DNN, facilitating on-device detection of subtle sleep movements.This pioneering use of infrared video paves the way for advancements in sleep disorder diagnostics and broader access to sleep medicine.

Figure 1 :
Figure 1: Application scenario of SlAction.We envision daily sleepbased early diagnosis of obstructive sleep apnea by addressing various problems in the current gold standard Polysomnography.

Figure 2 :
Figure 2: Example images in the dataset, showing various camera angles and room environments across hospitals, as well as diverse body sizes and sleep habits of patients.

Figure 3 :
Figure 3: SlAction Overview and On-Device Operation they also suffer from low reliability due to training and testing on a small sample of around 40 patients.We designed SlAction to be sufficiently lightweight to function on edge devices with resource constraints and robust enough to ensure reliable performance in a variety of environments.

Table 1 :
Performance Comparison of Previous Methods (on GPU Servers) vs. Proposed Approach (on Edge Devices)