skip to main content
research-article
Open Access

MHDeep: Mental Health Disorder Detection System Based on Wearable Sensors and Artificial Neural Networks

Published:12 December 2022Publication History

Skip Abstract Section

Abstract

Mental health problems impact the quality of life of millions of people around the world. However, diagnosis of mental health disorders is a challenging problem that often relies on self-reporting by patients about their behavioral patterns and social interactions. Therefore, there is a need for new strategies for diagnosis and daily monitoring of mental health conditions. The recent introduction of body-area networks consisting of a plethora of accurate sensors embedded in smartwatches and smartphones and edge-compatible deep neural networks (DNNs) points toward a possible solution. Such wearable medical sensors (WMSs) enable continuous monitoring of physiological signals in a passive and non-invasive manner. However, disease diagnosis based on WMSs and DNNs, and their deployment on edge devices, such as smartphones, remains a challenging problem. These challenges stem from the difficulty of feature engineering and knowledge distillation from the raw sensor data, as well as the computational and memory constraints of battery-operated edge devices. To this end, we propose a framework called MHDeep that utilizes commercially available WMSs and efficient DNN models to diagnose three important mental health disorders: schizoaffective, major depressive, and bipolar. MHDeep uses eight different categories of data obtained from sensors integrated in a smartwatch and smartphone. These categories include various physiological signals and additional information on motion patterns and environmental variables related to the wearer. MHDeep eliminates the need for manual feature engineering by directly operating on the data streams obtained from participants. Because the amount of data is limited, MHDeep uses a synthetic data generation module to augment real data with synthetic data drawn from the same probability distribution. We use the synthetic dataset to pre-train the weights of the DNN models, thus imposing a prior on the weights. We use a grow-and-prune DNN synthesis approach to learn both architecture and weights during the training process. We use three different data partitions to evaluate the MHDeep models trained with data collected from 74 individuals. We conduct two types of evaluations: at the data instance level and at the patient level. MHDeep achieves an average test accuracy, across the three data partitions, of 90.4%, 87.3%, and 82.4%, respectively, for classifications between healthy and schizoaffective disorder instances, healthy and major depressive disorder instances, and healthy and bipolar disorder instances. At the patient level, MHDeep DNN models achieve an accuracy of 100%, 100%, and 90.0% for the three mental health disorders, respectively, based on inference that uses 40, 16, and 22 minutes of sensor data collection from each patient.

Skip 1INTRODUCTION Section

1 INTRODUCTION

Mental health problems impact around 20% of the world population [50]. They may negatively affect a person’s mind, emotions, behavior, and even physical health. Mental health issues may include various disorders like bipolar, depression, schizophrenia, and attention-deficit hyperactivity, to name but a few. These disorders affect not only adults but also children and adolescents [43]. Moreover, patients with serious mental health issues have a higher risk of morbidity due to physical health problems.

In order to understand the mental health condition of the patient and provide suitable patient care, early detection is essential. However, this remains a public health challenge. While many other diseases can be diagnosed based on specific medical tests and laboratory measurements, detection of mental health problems mainly relies on self-reports and responses to specific questionnaires designed for identifying certain patterns of behavior and social interactions. Hence, to address this challenge, novel detection strategies are needed.

There has been recent interest in employing machine learning to detect mental health conditions [16]. Neural networks (NNs) are popular machine learning models that use nonlinear computations to make inferences from large datasets. Thus, they have started being deployed in the smart healthcare domain [4, 23, 25, 36, 60, 61].

In previous studies, two main data sources for deep-learning-based analysis of mental health have been clinical data and social media usage data. The former includes studies that use neuro-image data for detecting various mental health disorders [48], electroencephalogram (EEG) data to study brain disorders [3], and analysis of electronic health records (EHRs) to study mental health problems [19]. Moreover, social media usage patterns have been used to predict the personal traits of the user. As a result, several recent works focus on exploiting such patterns to detect psychiatric illness [7].

Although the above works have demonstrated the promise of using machine learning in identifying mental health disorders, daily mental health monitoring is still a challenge. Because mental health condition treatment delays may lead to negative outcomes, potentially even loss of life, it is desirable to have immediate and pervasive mental health detection. This is the motivation behind our mental health detection system, MHDeep. As shown in Figure 1, MHDeep relies on physiological data collected using various wearable medical sensors (WMSs). WMSs can be used to continuously monitor the physiological signals of the wearer throughout the day. This enables constant tracking of the health conditions of the user. MHDeep uses various sensors embedded in smartwatches and smartphones. For training purposes, the collected physiological data are processed to obtain a comprehensive dataset. MHDeep combines data from WMSs with the inference capabilities of deep neural networks (DNNs) to directly extract the mental health condition from the physiological signals. These inferences can be communicated to a health server that is accessible to the physician. This has the potential to enhance the ability of the physician to intervene quickly when mental health conditions deteriorate.

Fig. 1.

Fig. 1. MHDeep mental health disorder detection system.

The difficulty of data collection and labeling limits the amount of available data. Hence, reducing the cost of this process is of great importance [17]. To this end, MHDeep uses synthetic data drawn from the same probability distribution as real data to augment the dataset. It also leverages a grow-and-prune DNN synthesis approach [14, 27] to train accurate and computationally efficient DNN models to detect the mental health condition of the user.

The major contributions of this article are summarized next.

  • We demonstrate an easy-to-use, accurate, and pervasive mental health disorder detection system, called MHDeep. MHDeep combines physiological signals collected from WMSs with the prediction power of DNNs to detect three main mental health disorders: bipolar, major depressive, and schizoaffective. Unlike many other approaches for detecting mental health problems, MHDeep does not rely on any self-reports from the user.

  • We do an extensive search to extract the most appropriate set of data categories for detecting each of the three mental health disorders.

  • MHdeep relies on a synthetic data generation module to alleviate the concerns arising from the unavailability of large datasets. It uses a grow-and-prune DNN synthesis approach to improve the accuracy of the DNN models while reducing their computational costs.

  • We demonstrate the performance, accuracy, and feasibility of MHDeep through extensive evaluations.

The rest of the article is organized as follows. Section 2 presents background information on various works related to MHDeep. Section 3 explains the MHDeep framework thoroughly. Section 4 provides implementation details. Section 5 presents experimental evaluations. Section 6 provides a short discussion related to this work. Finally, Section 7 concludes the article.

Skip 2BACKGROUND Section

2 BACKGROUND

In this section, we first provide background information on various mental health disorders and how they affect patient lives. Next, we discuss various methods for identifying mental health conditions based on machine learning. We also discuss some of the related work on synthesizing efficient DNN models. Finally, we discuss WMSs and their applications to various disease diagnosis frameworks.

2.1 Mental Health Disorders and Their Impact

Mental health conditions can affect a patient’s thinking, feeling, and behavior. They may have a deep impact on the daily life of the person and affect their ability to adequately perform in society. There are hundreds of different mental illnesses [31]. We discuss the three mental health disorders that we target in this work: bipolar, major depressive, and schizoaffective. These are among the major mental health illnesses and have also been the subject of other related studies [37].

Bipolar disorder can cause a dramatic shift in a person’s mood, energy, and behavior. It is characterized by experiences of alternating episodes of manic and depressed states. Major depressive disorder may present different symptoms like loss of interest, sleep disturbance, change in appetite, and feeling of fatigue. Schizoaffective disorder is characterized by various symptoms of schizophrenia such as episodes of hallucinations and delusions. It may also present other symptoms such as disorganized thinking, depressed mood, and manic behavior.

Apart from various conditions that these mental health disorders may cause, stereotypes related to mental health seem to still be widely prevalent in society, not just among uninformed people but even among well-trained professionals [12]. These stereotypes often lead to social and employment discrimination [8] and poor treatment of physical health problems [13].

2.2 Deep Learning for Mental Health

Deep learning has been recently used to better understand and detect mental health problems. Deep learning approaches have been applied to various types of data: mainly clinical data and social media usage data [51]. The three types of clinical data used in these works are neuroimage data, EEG data, and EHR data.

Several studies demonstrate the effectiveness of neuroimages in detecting neuropsychiatric disorders [48]. Two types of neuroimage data used in such works are functional magnetic resonance imaging (fMRI) and structural magnetic resonance imaging (sMRI). fMRI measures brain activity by monitoring blood oxygenation and flow in response to neural activity. sMRI examines the anatomy and pathology of the brain. Deep belief networks have been used to detect the presence of attention-deficit hyperactivity disorder (ADHD) using fMRI and sMRI data [29, 30]. These data types have also been used to detect schizophrenia [42, 62]. Depression has been detected using time-series fMRI data using convolutional neural networks (CNNs) and autoencoders [18]. EEG is another source of data for studying brain disorders. For example, CNN-based feature extraction from EEG data has been used to detect depression [3]. EHR is a collection of patient-centered records and includes both structural data such as laboratory reports and unstructured data such as clinical and discharge notes. Because EHR is a collection of longitudinal records, recurrent neural networks (RNNs) have been used to distill information from them. Pham et al. [41] use RNN architectures to predict future outcomes of depressive episodes. Unstructured clinical notes have also been analyzed with deep-learning-based models to detect depression [19]. Social media usage data have also proved their usefulness in identifying psychiatric illnesses. Birnbaum et al. [7] investigate Facebook messages and patterns of sharing images on social media to distinguish among healthy individuals, individuals with a schizophrenia spectrum disorder, and individuals with mood disorders. Other works have used DNN models with textual data and image data shared on social media platforms to detect stress [32], depression [45], and risk of suicide [11].

MHDeep relies on generating the synthetic data from the same distribution as the real data. However, there are several works that develop compact and accurate dynamical models to capture the characteristics of the system under study. The synthetic data can be generated based on these mathematical models. Xue et al. [58] proposed a multivariate fractal model to capture the long-range memory and spatial dependencies that exist in biological processes and showed the benefits of this approach within the context of the brain-machine-body interface. The authors of [56] proposed a mathematical strategy for constructing models for complex non-linear dynamics. In addition, Xue et al. [57] proposed a polynomial algorithm to obtain a sub-optimal solution with optimality guarantees for the NP-hard problem of determining the minimum number of sensors to enable the recovery of global data dynamics. Although MHDeep uses deep learning for mental health disorder diagnosis, it is worth mentioning that deep learning has also been used in other use cases such as in vaccine discovery for SARS-CoV-2 [59].

2.3 Efficient Neural Network Synthesis

Next, we summarize the main synthesis approaches for obtaining compact DNN models. Conventional synthesis methods are based on the use of efficient building blocks. For example, MobileNetV2 [46] leverages inverted residual blocks to reduce model size and computations significantly. Wu et al. [55] use shift-based operations rather than convolution layers to significantly reduce computational costs of the model. The main drawback of such approaches is the need for considerable design insight and trial-and-error process to design such efficient building blocks. Network compression is another approach for the design of efficient models. It removes the need for design insights. Network pruning is a widely used method that eliminates weights or filters that do not enhance model performance. Han et al. [22] have shown the effectiveness of pruning in removing redundancy in CNNs and multilayer-perceptron architectures. Grow-and-prune DNN synthesis uses network growth followed by network pruning in an iterative process to improve model performance while ensuring its compactness [14, 27].

Another recent approach relies on the use of reinforcement learning (RL) to search for DNN architectures in an automated flow. It is known as neural architecture search (NAS) [63]. NAS generally uses a controller, e.g., an RNN, to iteratively generate candidate architectures in the search process. The RL controller is improved based on candidate performance. As an example, MnasNet [52] uses an RL-based approach to develop efficient DNNs for mobile platforms. However, the downside of the RL-based NAS approach is that it is computationally intensive. FBNet [54] uses the Gumbel softmax function to optimize weights and connections using a single objective function. NEAT [49] uses evolutionary algorithms to generate optimized and increasingly complex architectures over multiple generations. Combining efficient evolutionary search algorithms with various performance predictors, e.g., for accuracy, energy, and latency, is another approach for synthesizing accurate yet compact CNNs and DNNs [15, 24].

2.4 Wearable Medical Sensors

Due to recent developments in low-power sensor design and efficient wireless communication, battery-powered WMSs are becoming ubiquitous. More than 123 million WMSs were sold worldwide in 2018. This number is projected to grow to 1 billion by the end of 2022 [2]. WMSs can track different aspects of human health including heart rate, body/skin temperature, respiration rate, blood pressure, EEG, electrocardiogram (ECG), and Galvanic skin response (GSR) [5]. Furthermore, the number of physiological signals that can be measured using WMSs keeps growing every year.

WMSs have begun to be used in many smart healthcare applications. CodeBlue [33] is a sensor network that collects vital health signs and transmits them to the healthcare provider. MobiHealth [53] is a WMS-based body-area network (BAN) that realizes an end-to-end mobile health monitoring platform. Yin et al. [61] use WMSs for pervasive diagnosis of Type I and Type II diabetes. CovidDeep [25] is a WMS-based framework for quick detection of SARS-CoV-2/COVID-19.

For data collection in MHDeep, we use an Empatica E4 smartwatch [1] to record a subset of patients’ physiological signals. It is a wearable wireless device designed for comfortable, continuous, and real-time data acquisition. We also use a smartphone to simultaneously record signals related to motion information and environmental variables. Because the DNNs developed for diagnosing various mental health conditions can reside on the smartphone, use of a smartwatch/smartphone-based BAN can enable accurate, yet convenient, disease diagnosis and continuous healthcare monitoring.

Skip 3METHODOLOGY Section

3 METHODOLOGY

In this section we describe various parts of the MHDeep framework. First, we give an overview of our approach. Then, we discuss the data collection and preparation process, synthetic data generation, and grow-and-prune DNN synthesis.

3.1 The MHDeep Framework

We illustrate the MHDeep framework in Figure 2. The input data are derived from the physiological signals collected using various WMSs in the smartwatch and smartphone in a non-invasive, passive, and efficient manner. The list of collected data streams include GSR, skin temperature (ST), inter-beat interval (IBI), and three-way acceleration (tri-axial accelerometer) from the smartwatch. In addition, some information related to the motion patterns of the user and ambient information are collected using smartphone sensors. This includes ambient temperature, gravity, acceleration, and angular velocity. After sensor data collection, the collected signals are synchronized, aggregated, and merged into a comprehensive data input for subsequent analysis. To enhance the accuracy of subsequent analysis and improve noise tolerance, we normalize the data. The process of data collection and preparation is discussed in more detail in Section 3.2. When the size of the training dataset is small, it can be useful to generate a synthetic dataset from the same probability distribution as the real training dataset. MHDeep leverages Gaussian mixture model (GMM)-based density estimation to generate the synthetic data. Then, it uses grow-and-prune DNN synthesis to generate inference models that are both accurate and computationally efficient. Section 3.3 discusses the MHDeep DNN synthesis process in detail. MHDeep generates DNN architectures that are efficient enough to be deployed on the edge devices such as smartphones or smartwatches. Section 3.4 discusses the inference process of the MHDeep DNN models for diagnosis and daily monitoring of mental health issues.

Fig. 2.

Fig. 2. Schematic diagram of the MHDeep framework.

3.2 Data Collection and Preparation

We collected WMS data from a total of 74 adult participants at the Hackensack Meridian Health Carrier Clinic, Belle Mead, New Jersey. The participants were diagnosed by medical professionals at the clinic. The 74 participants comprised the following four categories: 25 healthy participants (no mental health disorder), 23 participants with bipolar disorder, 10 participants with major depressive disorder, and 16 participants with schizoaffective disorder. The experimental procedure for data collection and analysis was approved by the Institutional Review Board of Princeton University. The physiological signals of the participants were captured by a commercially available Empatica E4 smartwatch [1] and a Samsung Galaxy S4 smartphone, as shown in Figure 3. First, we collected data from an extensive range of sensors embedded in both the smartwatch and smartphone. We analyzed the mean value and standard deviation of collected data from each sensor for different cohorts. We identified a final set of eight sensors as being the most informative in terms of distinguishing between these four cohorts. Table 1 summarizes the final set of data types that we used in this study. The physiological signals are derived from WMSs embedded in the smartwatch. They include GSR that measures sympathetic nervous system arousal, IBI that indicates the heart rate, ST that provides skin temperature readings, and three-axis accelerometer (Acc-W) that measures acceleration in the \(x\), \(y\), and \(z\) directions. The information collected from these sensors is useful for detecting various mental disorders. For example, the electrodermal response can be used as a feature to detect patients affected by depression disorder or to detect different mood disorders [47]. Bipolar disorder is associated with cardiac autonomic dysregulation [9] that has an impact on IBI. Skin temperature can be used as a feature to detect stress [28] and bipolar [39] disorders. In addition to the physiological signals, ambient and motion information is also captured using sensors in the smartphone. These include ambient temperature (Temp), gravity (Grav), acceleration (Acc-P), and angular velocity (Vel). The motion and ambient information may also be informative in detecting the mental state of the user. For example, Berle et al. [6] show that motor activities of schizophrenic and depressed patients are significantly reduced. In addition to the motion information, ambient temperature can also impact the mental state of the individual and affect the severity of the mental disorder. Mullins and White [38] argue that cold temperatures can reduce the adverse effect of mental disorders, whereas hot temperatures can exacerbate them. In addition, it is worth mentioning that the acceleration sensors in the smartphone and smartwatch have different sampling rates and capture different motion information.

Fig. 3.

Fig. 3. An Empatica E4 smartwatch (left) and Samsung Galaxy S4 smartphone (right) used in the data collection process.

Table 1.
Data TypeSampling Rate (Hz)Data Source
Galvanic skin response (\(\mu\)S)4Smartwatch
Skin temperature (\(^\circ C\))4Smartwatch
Inter-beat interval (\(ms\))1Smartwatch
Acceleration (\(x, y, z\))32Smartwatch
Ambient temperature (\(^\circ C\))5Smartphone
Gravity (\(x, y, z\))5Smartphone
Acceleration (\(x, y, z\))5Smartphone
Angular velocity (\(x, y, z\))5Smartphone

Table 1. Data Types Collected in the MHDeep Framework

Before data collection, all participants are informed about the experiment and are asked to sign a consent form. The data collection setup consists of placing the Empatica E4 smartwatch on the wrist of the participant’s non-dominant hand and placing the Samsung Galaxy S4 smartphone in the opposite front pocket. For all participants, we maintain the same orientation for the phone. Data collection lasts around 1.5 hours, during which time the participant is allowed to freely move around in the room with their on-body devices. During this time, the smartwatch and smartphone continuously record and store physiological signals and ambient/motion information. At the end of the data collection period, we remove the smartwatch from the patient’s wrist and the smartphone from their pocket. We use the Empatica E4 Connect portal for smartwatch data retrieval. We use a private Android application to download the smartphone data streams. All of the recorded data are timestamped at the time of sampling.

Next, we preprocess the dataset for use in DNN training. We first synchronize the smartwatch and smartphone data streams for each participant. This is necessary because the WMS data streams may vary in their start times and frequencies. Then, we divide the data for each participant into 15-second windows. This window size was chosen based on experiments with the validation set, as discussed later. Each 15-second window of the combined smartwatch/smartphone data constitutes one data instance. There is no time overlap between data instances. To obtain each data instance, we flatten and concatenate the data within the same time window from both the smartwatch and smartphone. This results in a feature space of dimension 2,325. The smartwatch (smartphone) contributes 1,575 (750) features. All the smartphone sensors have a sampling rate of 5Hz. In addition, the smartwatch sensors include one data stream at 32Hz, two data streams at 4Hz, and one data stream at 1Hz.

Because the participants are in a room during the data collection process, they do not enjoy a wide range of motions, and hence the higher sampling rates for the smartphone sensors are not needed. In addition, the Empatica E4 used for data collection is a medical-grade smartwatch that is designed to capture various physiological signals with their optimal sampling rates. Although collecting data from more sensors at higher sampling rates may provide more information, unnecessarily high sampling rates can lead to a decrease in the battery life of the device. In addition, by targeting a window of 15 seconds for each data instance, we can remedy the low sampling frequency of some of the sensors by considering multiple sensor readings in each data instance.

For each classification task, because the number of individuals in each of the four categories (healthy and three disorders) is small, we created three different data partitions for evaluation. We used circular shifts on a list of numbers denoting patients in each category to create these three data partitions. The value of the stride used for the circular shift is equal to the number of test individuals in each group (as explained later). The data instances extracted from the individuals in each of the four groups (healthy, schizoaffective, depressive, and bipolar) were divided into three sets: training, validation, and test. To evaluate the models on different unseen patients, data instances included in the training, validation, and test sets came from different individuals; i.e., no individual contributed data to more than one of these sets. Among the healthy participants, for each of the three data partitions, data instances from 15 individuals (60% of the healthy participants) are selected for the training set, from 5 individuals (20% of the healthy participants) for the validation set, and from the remaining 5 individuals (20% of the healthy participants) for the test set. For individuals with bipolar disorder, the training, validation, and test sets contain data instances from 13, 5, and 5 participants, respectively. Among the participants who had major depressive disorder, data instances from 6 participants are selected for the training set and from 2 participants each for the validation and test sets. For individuals with schizoaffective disorder, the training, validation, and test sets include data instances from 10, 3, and 3 participants, respectively.

We create the final dataset for each binary classification task (healthy vs. the mental health disorder) by combining the training, validation, and test sets of the two classes involved in that task. We use SMOTE [10] to up-sample data instances from the minority class. The up-sampling is applied only to the training set. Table 2 shows the number of instances for each classification task for all three data partitions.

Table 2.
Classification TaskData PartitionTraining InstancesValidation InstancesTest Set
(# Individuals)(# Individuals)(# Individuals)
Healthy vs. Bipolar114,828 (28)3,582 (10)3,754 (10)
Healthy vs. MDD114,828 (21)2,414 (7)2,515 (7)
Healthy vs. Schizo.114,828 (25)2,789 (8)3,047 (8)
Healthy vs. Bipolar213,330 (28)3,922 (10)3,582 (10)
Healthy vs. MDD213,330 (21)3,266 (7)2,414 (7)
Healthy vs. Schizo.213,330 (25)3,773 (8)2,789 (8)
Healthy vs. Bipolar312,054 (28)4,102 (10)3,922 (10)
Healthy vs. MDD312,054 (21)3,088 (7)3,266 (7)
Healthy vs. Schizo.312,054 (25)3,522 (8)3,773 (8)
  • MDD: major depressive disorder.

Table 2. Details of Various Datasets

  • MDD: major depressive disorder.

3.3 MHDeep DNN Synthesis

Figure 4 shows the DNN architectures used in the MHDeep framework. The architectures receive the input data at the bottom and make their diagnostic decisions at the top. For the healthy vs. major depressive disorder and healthy vs. schizoaffective disorder binary classification tasks, the DNN architecture has four layers with a width of 256, 128, 128, and 2, respectively. For the healthy vs. bipolar disorder binary classification task, we use an DNN architecture with five layers with a width of 256, 128, 64, 32, and 2, respectively. We selected these architectures by verifying the performance of various DNNs (with different numbers of layers and numbers of neurons per layer) on the validation set and picking the best-performing one. These architectures are initially fully connected.

Fig. 4.

Fig. 4. Architecture of MHDeep DNNs for (a) healthy vs. major depressive disorder and healthy vs. schizoaffective disorder and (b) healthy vs. bipolar disorder.

We then apply three sequential steps: (1) synthetic data generation to mimic the distribution of the real training data, (2) pre-training of the DNN architectures with the synthetic data, and (3) grow-and-prune DNN synthesis to reduce the redundancy of the model while improving its performance. Next, we discuss each step in more detail.

(1)

Synthetic data generation: In this step, we generate a synthetic dataset that mimics the probability distribution of the real training dataset. Figure 5 illustrates the synthetic data generation process. We utilize the approach proposed in [26] in order to alleviate the need for large datasets to train DNN architectures. We first use GMM to estimate the density of the training dataset. We optimize the number of mixtures in the GMM by monitoring the likelihood of validation data. We choose the number of components that maximizes the following criterion: \(\begin{equation*} N^{*} = \underset{N}{\arg \max } \left(\texttt {GMM}_{N}(x).\texttt {score}(X_{validation}) \right)\!. \end{equation*}\) Finally, we train the optimal GMM model with \(N^{*}\) mixtures on the combination of the training and validation datasets. By sampling this model, we are able to generate the synthetic data: \(\begin{equation*} X^{*} = GMM_{N^{*}}(X_{total}).sample(). \end{equation*}\) For our experiments, we generate 100,000 synthetic data instances. The final step is labeling of the synthetic dataset. We use a traditional machine learning model for this purpose. We evaluate various models, e.g., the support vector machine and random forest models based on different splitting criteria (such as Gini index and entropy), and different depth limits on the decision trees, on the validation set. The model with the highest accuracy is used to label the synthetic data. Note that because synthetic data are only used to pre-train a DNN (with subsequent training with real data), the accuracy of the support vector machine or random forest model is not a critical factor.

Schematic diagram of the MHDeep synthetic data generation process.

(2)

DNN pre-training: In this step, we use the labeled synthetic data to obtain a prior on the weights of the DNN architecture by pre-training them. The intuition behind this step is that pre-training the DNN provides a suitable inductive bias to the parameters of the DNN. As a result, we can commence with the final training stage with a better weight initialization. Therefore, it alleviates the need for large training datasets. By using this methodology, we obtain models that are more accurate than both the traditional machine learning model used for labeling and the DNN model trained only on the real dataset.

(3)

Grow-and-prune DNN synthesis: MHDeep uses a grow-and-prune DNN synthesis paradigm to train the models. Algorithm 1 summarizes this process. It uses a mask-based approach. For each weight matrix, there is an associated binary mask of the same size that is used to disregard dormant connections in the architecture. It applies magnitude-based pruning and full growth to fully connected DNNs iteratively. For magnitude-based pruning, a hyperparameter \(\alpha\) is used to depict the pruning ratio. We prune a connection if and only if its weight is in the lowest \(\alpha *100\%\) of the weights in its associated layer. Finally, for the pruned connections, we set the weight and its binary mask both to 0. Because connection pruning is an iterative process, we retrain the network to recover its performance after each pruning iteration. We then grow the network to restore all its connections. In our experiments, after each architecture-changing operation, we train the DNN for 20 epochs. In addition, we set the number of iterations to 5. We evaluate the model on the validation set after each epoch, and we record the learned weights and masks of the pruned model with the highest validation accuracy.

3.4 MHDeep Inference Process

The trained DNN models can be used for diagnosis or daily monitoring of the mental state of the user based on a collection of physiological signals and ambient information during the day. The collected data streams are processed based on the steps explained in Section 3.2. We feed the processed data to the MHDeep DNN models that predict the mental health condition of the user. This information can then be sent to the physician for early treatment.

Skip 4IMPLEMENTATION DETAILS Section

4 IMPLEMENTATION DETAILS

In this section, we give an overview of the hardware and software packages we used in the implementation of the MHDeep framework. We have implemented the data processing and preparation parts of the MHDeep framework in Python and the MHDeep DNN synthesis framework in PyTorch. We use the Nvidia Tesla P100 data center accelerator for DNN training and evaluation. We use the cuDNN library to accelerate GPU processing. For training, we use a stochastic gradient descent (SGD) optimizer with a learning rate of 5e-4 and a batch size of 256. We use 100,000 synthetic data instances to pre-train the network architecture. In the grow-and-prune synthesis phase, we train the network for 20 epochs each time the architecture changes. We use an SGD optimizer with an initialized learning rate of 1e-4 that we halve in each succeeding iteration. We apply network-changing operations over five iterations.

Skip 5EVALUATION Section

5 EVALUATION

In this section, we analyze the performance of MHDeep DNN models for diagnosing three mental health disorders. This entails three binary classifications: (1) schizoaffective disorder vs. healthy individuals, (2) major depressive disorder vs. healthy individuals, and (3) bipolar disorder vs. healthy individuals. For each classification task, we use three different data partitions, each partition with data instances obtained from different individuals in the training, validation, and test sets.

The MHDeep DNN models are evaluated with four different metrics: test accuracy, false-positive rate (FPR), false-negative rate (FNR), and F1 score. Accuracy measures overall classification performance. It is simply the ratio of all the correct predictions on the test data instances and the total number of such instances. FPR and FNR measure how often healthy individuals are declared to have the corresponding mental health condition and vice versa, respectively. In addition to these four metrics, we also report the TPR (sensitivity) and the TNR (specificity) values that are equal to 1 minus the FNR and FPR values, respectively.

We conduct two different performance evaluations: at the data instance level and the patient level. First, we report the performance of the MHDeep DNN models in detecting each of the three mental health disorders at the data instance level. Next, we evaluate the accuracy of the models in detecting mental health disorders at the patient level.

5.1 MHDeep Performance Evaluation at the Data Instance Level

We first analyze the performance of the three binary classifiers. We begin by training DNN models on features obtained from subsets of the eight data categories presented in Table 1. We analyze all the subsets of the eight data categories and report results for the top models. Because there are eight data categories, there are 256 subsets, with one being the null subset. We evaluated the remaining 255 subsets. This helps distinguish the impact of each data category and find the most effective combination of categories for each classification task. Next, we highlight the best-performing data categories for each of the three classification tasks. We then compare the performance of MHDeep DNNs with that of traditional machine learning models. Furthermore, we conduct an ablation study that shows the impact of each step of MHDeep DNN training.

Table 3 shows the results of classification between healthy and schizoaffective data instances. The best data category subset, in this case, achieves an average test accuracy of 90.4%. We also report test accuracy, FPR, FNR, TPR, TNR, and F1 score for each of the three data partitions. The model reaches the highest test accuracy of 93.3% on the second data partition. Furthermore, for the healthy instances, the top model achieves a low average FPR of 6.5%, demonstrating its effectiveness in avoiding false alarms. For the schizoaffective instances, the model achieves an average FNR of 16.9%, indicating reasonable effectiveness in raising alarms when schizoaffective disorder does occur. We report the number of parameters (#params) and floating-point operations (FLOPs) required for each model. We also compare #params and FLOPs of the models with those of the fully connected baselines. As can be seen, using the grow-and-prune DNN synthesis approach enables us to reduce both #params and FLOPs, leading to a reduction in memory and computational requirements.

Table 3.
Data CategoryData Partition# Params (Compression)FLOPs (Compression)Acc.FPR (TNR)FNR (TPR)F1 Score
(Acc-P, Temp, Vel,1275.0k (2.1\(\times\))549.5k (2.0\(\times\))91.211.4 (88.6)5.8 (94.2)89.5
Acc-W, GSR, IBI)2300.0k (1.9\(\times\))599.5k (1.9\(\times\))81.46.6 (93.4)37.8 (62.2)72.0
3275.0k (2.1\(\times\))549.5k (2.0\(\times\))87.32.4 (97.6)34.1 (65.9)84.2
Average86.66.8 (93.2)25.9 (74.1)81.9
(Acc-P, Temp, Vel,1200.0k (2.8\(\times\))399.5k (2.8\(\times\))90.513.0 (87.0)4.4 (95.6)89.3
Acc-W, GSR)2300.0k (1.9\(\times\))599.5k (1.8\(\times\))93.34.0 (96.0)13.3 (86.7)89.8
3250.0k (2.3\(\times\))499.5k (2.2\(\times\))87.52.5 (97.5)32.9 (67.1)77.9
Average90.46.5 (93.5)16.9 (83.1)85.7
(Acc-P, Temp, Grav, Vel,1300.0k (2.1\(\times\))599.5k (2.0\(\times\))88.314.4 (85.6)7.7 (92.3)86.8
Acc-W, GSR, IBI)2350.0k (1.8\(\times\))699.5k (1.8\(\times\))82.65.9 (94.1)45.7 (54.3)66.3
3300.0k (2.1\(\times\))599.5k (2.0\(\times\))88.73.9 (96.1)26.4 (73.6)81.1
Average86.58.1 (91.9)26.6 (73.4)78.1

Table 3. Test Accuracy, FPR, FNR, and F1 Score (All in %) for Top Data Categories for Classification between Healthy and Schizoaffective Disorder Data Instances

We present the results for classification between healthy and major depressive disorder instances in Table 4. The data category subset with the best performance achieves an average test accuracy of 87.3%. This model achieves the highest accuracy of 91.2% on the second data partition. It achieves an average FPR (FNR) of 6.8% (29.3%).

Table 4.
Data CategoryData Partition# Params (Compression)FLOPs (Compression)Acc.FPR (TNR)FNR (TPR)F1 Score
(Temp, Grav, Vel, GSR)175.0k (2.7\(\times\))149.5k (2.4\(\times\))89.04.6 (95.4)26.7 (73.3)79.4
2120.0k (1.7\(\times\))239.5k (1.5\(\times\))90.54.5 (95.5)21.7 (78.3)82.7
3150.0k (1.3\(\times\))299.5k (1.2\(\times\))81.712.7 (87.3)37.9 (62.1)60.2
Average87.17.3 (92.7)28.8 (71.2)74.1
(Acc-P, Grav, Vel, GSR)1120.0k (2.0\(\times\))239.5k (1.8\(\times\))88.26.5 (93.5)24.8 (75.2)78.7
2145.0k (1.6\(\times\))289.5k (1.5\(\times\))91.21.9 (98.1)25.7 (74.3)83.0
3185.0k (1.3\(\times\))369.5k (1.2\(\times\))82.411.9 (88.1)37.5 (62.5)61.3
Average87.36.8 (93.2)29.3 (70.7)74.3

Table 4. Test Accuracy, FPR, FNR, and F1 Score (All in %) for Top Data Categories for Classification between Healthy and Major Depressive Disorder Data Instances

Table 5 presents the results for classification between healthy and bipolar disorder instances. In this case, the model trained on the best data category subset achieves an average test accuracy of 82.4%, with an FPR (FNR) of 16.7% (20.7%).

Table 5.
Data CategoryData Partition# Params (Compression)FLOPs (Compression)Acc.FPR (TNR)FNR (TPR)F1 Score
(Acc-P, Temp, Grav, Vel,1500.0k (1.2\(\times\))999.5k (1.2\(\times\))76.147.1 (52.9)2.8 (97.2)81.0
Acc-W, IBI, ST)2480.0k (1.3\(\times\))959.5k (1.3\(\times\))81.41.0 (99.0)34.3 (65.7)78.9
3400.0k (1.6\(\times\))799.5k (1.5\(\times\))89.82.1 (97.9)25.1 (74.9)83.8
Average82.416.7 (83.3)20.7 (79.3)81.2
(Temp, Grav, Vel,1490.0k (1.2\(\times\))979.5k (1.1\(\times\))75.647.1 (52.9)3.8 (96.2)80.5
Acc-W, GSR, IBI)2450.0k (1.3\(\times\))899.5k (1.2\(\times\))81.60.9 (99.1)34.2 (65.8)79.0
3380.0k (1.5\(\times\))759.5k (1.5\(\times\))87.02.1 (97.9)33.0 (67.0)78.4
Average81.416.7 (83.3)23.7 (76.3)79.3

Table 5. Test Accuracy, FPR, FNR, and F1 Score (All in %) for Top Data Categories for Classification between Healthy and Bipolar Disorder Data Instances

As the results for these three classification tasks show, there is some variability in test performance on the three data partitions. Because the number of represented individuals in each classification task is limited and each partition consists of different individuals in the training, validation, and test sets, performance variability on the test set can be expected. Hence, we report average values across the three data partitions for each metric. In addition, by comparing the results of the three classification tasks, we can infer that it is more difficult to detect bipolar disorder than the other two disorders. This may be due to the nature of bipolar disorder, which may induce various mood changes. As a result, the physiological signals of the patient can change during these changes of moods, leading to a more difficult diagnosis. It is also worth mentioning that the subset of data categories that leads to the best-performing models for each classification task differs from each other. This points to the need for accurate sensor subset selection.

Next, we compare the performance of our methodology with that of traditional machine learning methods. We present the results for the first data partition in Table 6. We report the results for two data categories for each classification task. As we can see, our methodology outperforms traditional machine learning models in all cases. This points to the difficulty of distilling information from raw data using traditional machine learning models and highlights the superiority of our methodology in diagnosing various diseases. It is worth mentioning that we achieved similar results on the second and third data partitions.

Table 6.
Classification Task (Data Category)MethodAccuracy (%)FPR (%)FNR (%)F1 Score (%)
Healthy vs. Schizo.SVM79.031.85.779.0
(Acc-P, Temp, Vel,Decision tree68.944.012.968.8
Acc-W, GSR, IBI)Random forest71.942.67.371.9
\(k\)-nearest neighbor (\(k=6\))72.437.912.872.5
Naive Bayes70.430.029.170.0
AdaBoost74.930.417.774.7
Our DNN91.211.45.889.5
Healthy vs. Schizo.SVM77.233.97.077.2
(Acc-P, Temp, Vel,Decision tree68.343.115.468.3
Acc-W, GSR)Random forest67.148.211.067.0
\(k\)-nearest neighbor (\(k=15\))75.234.411.175.2
Naive Bayes70.030.129.969.9
AdaBoost75.529.617.275.4
Our DNN90.513.04.489.3
Healthy vs. MDDSVM74.815.748.668.4
(Temp, Grav, Vel, GSR)Decision tree52.449.143.950.4
Random forest68.725.545.663.6
\(k\)-nearest neighbor (\(k=12\))67.430.238.463.7
Naive Bayes67.035.826.364.9
AdaBoost56.146.338.054.2
Our DNN89.04.626.779.4
Healthy vs. MDDSVM75.516.843.470.0
(Acc-P, Grav, Vel, GSR)Decision tree52.742.558.948.4
Random forest70.024.543.565.1
\(k\)-nearest neighbor (\(k=12\))62.636.240.459.4
Naive Bayes61.245.123.360.1
AdaBoost54.448.538.752.7
Our DNN88.26.524.878.7
Healthy vs. BipolarSVM68.147.617.667.1
(Acc-P, Temp, Grav, Vel,Decision tree63.848.025.563.1
Acc-W, IBI, ST)Random forest61.354.424.360.1
\(k\)-nearest neighbor (\(k=10\))70.952.57.968.8
Naive Bayes69.140.721.968.6
AdaBoost60.159.721.958.1
Our DNN76.147.12.881.0
Healthy vs. BipolarSVM71.443.015.570.5
(Temp, Grav, Vel,Decision tree64.352.320.663.0
Acc-W, GSR, IBI)Random forest65.854.715.663.9
\(k\)-nearest neighbor (\(k=7\))64.650.122.163.5
Naive Bayes61.041.636.660.9
AdaBoost67.142.424.366.6
Our DNN75.647.13.880.5

Table 6. Comparison with Machine Learning Models on First Data Partition

We also performed an ablation study to analyze the impact of three training methods on the final performance, including training only on real data, pre-training with synthetic data, and grow-and-prune synthesis. We present performance results for the first data partition in Table 7. For each classification task, we report the performance of two data categories. We can see that using the synthetic dataset to pre-train the weights of the model helps improve performance in most cases, thanks to a better initialization point for the final training process. In addition, the application of grow-and-prune synthesis yields models specialized for the task at hand and not only results in more compact models (as shown in Tables 3, 4, and 5) but also improves the performance of the models. It is worth mentioning that similar results were obtained on the other two data partitions.

Table 7.
Classification Task (Data Category)Training MethodAccuracy (%)FPR (TNR) (%)FNR (TPR) (%)
Healthy vs. Schizo.Real training dataset87.018.4 (81.6)5.2 (94.8)
(Acc-P, Temp, Vel,Real+synthetic training dataset89.66.1 (93.9)20.8 (79.2)
Acc-W, GSR, IBI)Real+synthetic training dataset91.211.4 (88.6)5.8 (94.2)
+ grow-prune
Healthy vs. Schizo.Real training dataset87.818.2 (81.8)3.7 (96.3)
(Acc-P, Temp, Vel,Real+synthetic training dataset86.219.6 (80.4)5.5 (94.5)
Acc-W, GSR)Real+synthetic training dataset90.513.0 (87.0)4.4 (95.6)
+ grow-prune
Healthy vs. MDDReal training dataset85.07.7 (92.3)33.1 (66.9)
(Temp, Grav, Vel, GSR)Real+synthetic training dataset85.810.4 (89.6)23.6 (76.4)
Real+synthetic training dataset89.04.6 (95.4)26.7 (73.3)
+ grow-prune
Healthy vs. MDDReal training dataset84.411.3 (88.7)26.0 (74.0)
(Acc-P, Grav, Vel, GSR)Real+synthetic training dataset85.48.2 (91.8)30.4 (69.6)
Real+synthetic training dataset88.26.5 (93.5)24.8 (75.2)
+ grow-prune
Healthy vs. BipolarReal training dataset74.648.9 (51.1)3.9 (96.1)
(Acc-P, Temp, Grav, Vel,Real+synthetic training dataset75.947.4 (52.6)2.8 (97.2)
Acc-W, IBI, ST)Real+synthetic training dataset76.147.1 (52.9)2.8 (97.2)
+ grow-prune
Healthy vs. BipolarReal training dataset74.449.9 (50.1)3.5 (96.5)
(Temp, Grav, Vel,Real+synthetic training dataset75.346.7 (53.3)4.6 (95.4)
Acc-W, GSR, IBI)Real+synthetic training dataset75.647.1 (52.9)3.8 (96.2)
+ grow-prune

Table 7. Impact of Different Training Methods on the Performance of the Model

5.2 MHDeep Performance Evaluation at the Patient Level

Next, we show patient-level diagnostic test accuracy. We use the most accurate model from among the models discussed above for each classification task. Figure 6 shows the results. In these graphs, we plot patient-level test accuracy vs. the duration of sensor data collection needed for inference. Prediction is performed for each patient by simply taking the majority of the predicted labels over all data instances in the given sensor data collection duration. As discussed earlier, each data instance is composed of a 15-second window of sensor data. We step up the data duration size by 2 minutes each time. Thus, we add eight data instances in each 2-minute window. By predicting the label for each participant, we define the final test accuracy as the ratio of the participants that are correctly diagnosed over the number of participants in the test set. As we can see, the models reach 100% test accuracy after a certain point for distinguishing healthy individuals from those with schizoaffective and major depressive disorders. In addition, the best model for classification between healthy and bipolar disorder individuals reaches 90.0% patient-level accuracy. Table 8 shows the minimum sensor data collection duration needed to reach saturation accuracy. The durations are 40, 16, and 22 minutes for healthy vs. schizoaffective disorder, healthy vs. major depressive disorder, and healthy vs. bipolar disorder classifications, respectively. Not surprisingly, these results indicate that it is easier to obtain higher test accuracies at the patient level than at the data instance level because the former is simply based on taking the majority of the predictions over all instances in the data duration.

Fig. 6.

Fig. 6. Patient-level test accuracy vs. sensor data duration needed for classification between (a) healthy and schizoaffective disorder individuals, (b) healthy and major depressive disorder individuals, and (c) healthy and bipolar disorder individuals.

Table 8.
ClassificationTime (min)Saturation Accuracy (%)
Healthy vs. Schizoaffective disorder40100
Healthy vs. Major depressive disorder16100
Healthy vs. Bipolar disorder2290.0

Table 8. Minimum Sensor Data Collection Duration (in Minutes) Needed to Reach Saturation Patient-level Accuracy (in %) for Each Classification Task

Although the MHDeep DNNs have been evaluated on the same platform we used for training, they can be encapsulated in a smartphone app for diagnosis of mental disorders. Through the MHDeep app, the user can be instructed to wear the Empatica E4 smartwatch and correctly position the smartphone before data collection begins. The MHDeep preprocessing pipeline can be used to normalize the data using the minimum and maximum values used during training and divide the data into 15-second window data instances. Finally, using DNN models for various mental disorders, we can obtain average prediction probabilities. We can diagnose the mental disorder by comparing this average with a user-defined threshold. The MHDeep app would need a limited amount of battery energy. From the study done in [35], we roughly estimate between 0.5 and 1 watt of battery power for the application. This is based on a range of power estimates of smartphone apps under various categories, such as educational, social media, and entertainment, studied in [35]. As explained earlier, we need 40 minutes of sensor data collection to reach patient-level saturation accuracy for the healthy vs. bipolar disorder diagnosis. As a result, assuming that MHDeep app processing takes 40 minutes in the worst case and smartphone battery works at 3.8V, this translates to 88 to 175 mAh energy consumption. For a smartphone, such as the Samsung Galaxy S8+ with a battery with 3,500 mAh capacity, this results in 2.5% to 5.0% battery consumption for the app. In addition, each DNN inference pass takes 0.6 milliseconds in our desktop implementation. As a result, DNN inference for 160 data instances takes 0.1 seconds. In comparison, the sensor data collection takes 40 minutes or 2,400 seconds. Even taking into account that DNN inference may take longer on a smartphone, it would be considerably shorter than the duration of data collection. As a result, the energy consumption of the MHDeep app will be dominated by other components of the smartphone such as Bluetooth connectivity and the display.

We also report results from some of the related works on use of various machine learning methods for detection of mental health disorders. The results are presented in Table 9. Because these works use different data sources and solve different problems as compared to MHDeep, the goal of this comparison is only to highlight a few related works that target use of machine learning in the mental health domain; the main method, data source, and duration they employ; and their main results. As opposed to these methods, MHDeep does not rely on manual feature extraction from various data sources. It directly works on raw sensor data.

Table 9.
Mental DisorderMethodData DurationData SourcesResult
DepressionRandom forest [44]Several monthsInstagram photos70% accuracy
DepressionHypothesis testing [40]1 weekAccelerometerShowed reduction in physical activity in depressed patients
MDDMHDeep1.5 hoursPhysiological, motion, ambient100% patient-level accuracy
BipolarNaive Bayes [21]12 weeksPhone call logs, microphone76% accuracy
Naive Bayes [20]12 weeksGPS, accelerometer80% accuracy
Bagging [34]12 weeksAccelerometer, microphone, questionnaire85% accuracy
MHDeep1.5 hoursPhysiological, motion, ambient90% patient-level accuracy
Schizophrenia spectrumRandom forest [7]Several monthsSocial media posts and messages0.76 area-under-curve (AUC)
SchizoaffectiveMHDeep1.5 hoursPhysiological, motion, ambient100% patient-level accuracy

Table 9. Comparison with Related Works

Skip 6DISCUSSION Section

6 DISCUSSION

MHDeep combines efficient DNNs with commercially available WMSs to diagnose various mental health disorders. Although several works address mental health problem detection using machine learning, MHDeep is a solution that focuses on an easy-to-use system that can monitor the daily mental health state of the user through their physiological signals, as well as motion and ambient information. The diagnostic decisions can be sent to a health server from where medical professionals can access the information. This can enable them to quickly intervene during severe episodes of the disorder.

Many mental disorders, such as depression, have different stages with different severities. The progress of such mental health problems can impact the patient’s life and health in different ways. As a result, it may be useful if the model can predict disease progression over time. Our framework can be extended to predict the progress of mental health disorders by utilizing longitudinal WMS data collected in the training stage. Furthermore, by accumulating more data from each individual, we can synthesize patient-specific models that are specifically designed based on their data. We can obtain such models by fine-tuning the trained general models based on accumulated data from the specific patient.

Diagnosis of mental health disorders is often based on patient self-report and answers to a questionnaire designed to detect each disorder. In the future, we can improve the performance of MHDeep by adding a specifically designed questionnaire to the data categories. In addition, adding features based on other WMSs such as blood pressure may also help enhance model performance.

Skip 7CONCLUSION Section

7 CONCLUSION

In this article, we proposed a framework called MHDeep that combines data obtained from commercially available WMSs with the knowledge distillation power of DNNs for continuous and pervasive diagnosis of three main mental health disorders: schizoaffective, major depressive, and bipolar. MHDeep uses a synthetic data generation module to address the lack of large datasets. We trained the DNN models by using iterative network growth and pruning to learn both the weights and architecture during the training process. We evaluated MHDeep based on data collected from 74 individuals. It achieves patient-level accuracy of 100%, 100%, and 90.0%, using 40, 16, and 22 minutes of sensor data collected in the inference stage, for classification between healthy and schizoaffective disorder individuals, healthy and major depressive disorder individuals, and healthy and bipolar disorder individuals, respectively. The MHDeep models were also shown to be computationally efficient. Thus, MHDeep can be employed for pervasive diagnosis and daily monitoring while offering high computational efficiency and accuracy.

Skip ACKNOWLEDGMENTS Section

ACKNOWLEDGMENTS

We would like to thank nurses, physicians, and mental health technicians in Acute Care Unit East and West at the Hackensack Meridian Health Carrier Clinic for smartwatch/smartphone based data collection from various patient cohorts and providing patient labels. Special thanks to Donald J. Parker, CEO of Carrier Clinic, for recognizing the promise of such a study and facilitating data collection, and to Dr. Jacqueline Bienenstock, DNP, RN-BC, Jolene DePaolo, and Lynn Hightower for their help in this project.

REFERENCES

  1. [1] 2020. Empatica E4 connect portal. https://www.empatica.com/connect.Google ScholarGoogle Scholar
  2. [2] 2020. Number of connected wearable devices worldwide from 2016 to 2022. https://www.statista.com/statistics/487291/global-connected-wearable-devices/.Google ScholarGoogle Scholar
  3. [3] Acharya U. Rajendra, Oh Shu Lih, Hagiwara Yuki, Tan Jen Hong, Adeli Hojjat, and Subha D. Puthankattil. 2018. Automated EEG-based screening of depression using deep convolutional neural network. Computer Methods and Programs in Biomedicine 161 (2018), 103113.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Akmandor Ayten Ozge and Jha Niraj K.. 2017. Smart health care: An edge-side computing perspective. IEEE Consumer Electronics Magazine 7, 1 (2017), 2937.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Baig Mirza Mansoor and Gholamhosseini Hamid. 2013. Smart health monitoring systems: An overview of design and modeling. J. Medical Systems 37, 2 (2013), 9898.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Berle Jan O., Hauge Erik R., Oedegaard Ketil J., Holsten Fred, and Fasmer Ole B.. 2010. Actigraphic registration of motor activity reveals a more structured behavioural pattern in schizophrenia than in major depression. BMC Research Notes 3, 1 (2010), 17.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Birnbaum Michael L., Norel Raquel, Meter Anna Van, Ali Asra F., Arenare Elizabeth, Eyigoz Elif, Agurto Carla, Germano Nicole, Kane John M., and Cecchi Guillermo A.. 2020. Identifying signals associated with psychiatric illness utilizing language and images posted to Facebook. Nature Schizophrenia 6, 1 (2020), 110.Google ScholarGoogle Scholar
  8. [8] Bordieri James E. and Drehmer David E.. 1986. Hiring decisions for disabled workers: Looking at the cause. J. Applied Social Psychology 16, 3 (1986), 197208.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Chang Hsin-An, Chang Chuan-Chia, Tzeng Nian-Sheng, Kuo Terry B. J., Lu Ru-Band, and Huang San-Yuan. 2014. Heart rate variability in unmedicated patients with bipolar disorder in the manic phase. Psychiatry and Clinical Neurosciences 68, 9 (2014), 674682.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Chawla Nitesh V., Bowyer Kevin W., Hall Lawrence O., and Kegelmeyer W. Philip. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artificial Intelligence Research 16 (2002), 321357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Coppersmith Glen, Leary Ryan, Crutchley Patrick, and Fine Alex. 2018. Natural language processing of social media as screening for suicide risk. Biomedical Informatics Insights 10 (2018), 111.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Corrigan Patrick W.. 2000. Mental health stigma as social attribution: Implications for research methods and attitude change. Clinical Psychology: Science and Practice 7, 1 (2000), 4867.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Corrigan Patrick W., Mittal Dinesh, Reaves Christina M., Haynes Tiffany F., Han Xiaotong, Morris Scott, and Sullivan Greer. 2014. Mental health stigma and primary health care decisions. Psychiatry Research 218, 1–2 (2014), 3538.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Dai Xiaoliang, Yin Hongxu, and Jha Niraj K.. 2019. NeST: A neural network synthesis tool based on a grow-and-prune paradigm. IEEE Trans. Computers 68, 10 (2019), 14871497.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Dai Xiaoliang, Zhang Peizhao, Wu Bichen, Yin Hongxu, Sun Fei, Wang Yanghan, Dukhan Marat, Hu Yunqing, Wu Yiming, Jia Yangqing, Vajda Peter, Uyttendaele Matt, and Jha Niraj K.. 2019. ChamNet: Towards efficient network design through platform-aware model adaptation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  16. [16] Dwyer Dominic B., Falkai Peter, and Koutsouleris Nikolaos. 2018. Machine learning approaches for clinical psychology and psychiatry. Annual Review Clinical Psychology 14 (2018), 91118.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Enayati Saman, Yang Ziyu, Lu Benjamin, and Vucetic Slobodan. 2021. A visualization approach for rapid labeling of clinical notes for smoking status extraction. In Proc. Data Science with Human in the Loop: Language Advances. 2430.Google ScholarGoogle Scholar
  18. [18] Geng Xiang-Fei and Xu Jun-Hai. 2017. Application of autoencoder in depression diagnosis. In Proc. Int. Conf. Computer Science and Mechanical Automation.Google ScholarGoogle Scholar
  19. [19] Geraci Joseph, Wilansky Pamela, Luca Vincenzo de, Roy Anvesh, Kennedy James L., and Strauss John. 2017. Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression. Evidence-based Mental Health 20, 3 (2017), 8387.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Gruenerbl Agnes, Osmani Venet, Bahle Gernot, Carrasco Jose C., Oehler Stefan, Mayora Oscar, Haring Christian, and Lukowicz Paul. 2014. Using smart phone mobility traces for the diagnosis of depressive and manic episodes in bipolar patients. In Proc. Augmented Human Int. Conf.18.Google ScholarGoogle Scholar
  21. [21] Grünerbl Agnes, Muaremi Amir, Osmani Venet, Bahle Gernot, Oehler Stefan, Tröster Gerhard, Mayora Oscar, Haring Christian, and Lukowicz Paul. 2014. Smartphone-based recognition of states and state changes in bipolar disorder patients. IEEE J. Biomedical and Health Informatics 19, 1 (2014), 140148.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Han Song, Mao Huizi, and Dally William J.. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman Coding. In Proc. Int. Conf. Learning Representations.Google ScholarGoogle Scholar
  23. [23] Hassantabar Shayan, Ahmadi Mohsen, and Sharifi Abbas. 2020. Diagnosis and detection of infected tissue of COVID-19 patients based on lung X-Ray image using convolutional neural network approaches. Chaos, Solitons & Fractals (2020), 110170.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Hassantabar Shayan, Dai Xiaoliang, and Jha Niraj K.. 2019. STEERAGE: Synthesis of neural networks using architecture search and grow-and-prune methods. arXiv preprint arXiv:1912.05831 (2019).Google ScholarGoogle Scholar
  25. [25] Hassantabar Shayan, Stefano Novati, Ghanakota Vishweshwar, Ferrari Alessandra, Nicola Gregory N., Bruno Raffaele, Marino Ignazio R., Hamidouche Kenza, and Jha Niraj K.. 2021. CovidDeep: SARS-CoV-2/COVID-19 test based on wearable medical sensors and efficient neural networks. IEEE Trans. Consumer Electronics 67, 4 (2021), 244256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Hassantabar Shayan, Terway Prerit, and Jha Niraj K.. 2020. TUTOR: Training neural networks using decision rules as model priors. arXiv preprint arXiv:2010.05429 (2020).Google ScholarGoogle Scholar
  27. [27] Hassantabar Shayan, Wang Zeyu, and Jha Niraj K.. 2021. SCANN: Synthesis of compact and accurate neural networks. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems (2021).Google ScholarGoogle Scholar
  28. [28] Herborn Katherine A., Graves James L., Jerem Paul, Evans Neil P., Nager Ruedi, McCafferty Dominic J., and McKeegan Dorothy E. F.. 2015. Skin temperature reveals the intensity of acute stress. Physiology & Behavior 152 (2015), 225230.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Kuang Deping, Guo Xiaojiao, An Xiu, Zhao Yilu, and He Lianghua. 2014. Discrimination of ADHD based on fMRI data with deep belief network. In Proc. Int. Conf. Intelligent Computing. 225232.Google ScholarGoogle Scholar
  30. [30] Kuang Deping and He Lianghua. 2014. Classification on ADHD with deep learning. In Proc. Int. Conf. Cloud Computing and Big Data. 2732.Google ScholarGoogle Scholar
  31. [31] Lehman Jill Fain. 2000. The Diagnostic and Statistical Manual of Mental Disorders. Am. Psychiatric Assoc.Google ScholarGoogle Scholar
  32. [32] Lin Huijie, Jia Jia, Guo Quan, Xue Yuanyuan, Li Qi, Huang Jie, Cai Lianhong, and Feng Ling. 2014. User-level psychological stress detection from social media using deep neural network. In Proc. Int. Conf. Multimedia. 507516.Google ScholarGoogle Scholar
  33. [33] Malan David J., Fulford-Jones Thaddeus, Welsh Matt, and Moulton Steve. 2004. CodeBlue: An ad hoc sensor network infrastructure for emergency medical care. In Proc. Int. Workshop Wearable and Implantable Body Sensor Networks.Google ScholarGoogle Scholar
  34. [34] Maxhuni Alban, Muñoz-Meléndez Angélica, Osmani Venet, Perez Humberto, Mayora Oscar, and Morales Eduardo F.. 2016. Classification of bipolar disorder episodes based on analysis of voice and motor activity of patients. Pervasive and Mobile Computing 31 (2016), 5066.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Mehrotra D., Nagpal D., Srivastava R., and Nagpal R.. 2018. Analyse power consumption by mobile applications using fuzzy clustering approach. Int. J. Engineering 31, 12 (2018), 20372043.Google ScholarGoogle Scholar
  36. [36] Miotto Riccardo, Wang Fei, Wang Shuang, Jiang Xiaoqian, and Dudley Joel T.. 2018. Deep learning for healthcare: Review, opportunities and challenges. Briefings in Bioinformatics 19, 6 (2018), 12361246.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Mohr David C., Zhang Mi, and Schueller Stephen M.. 2017. Personal sensing: Understanding mental health using ubiquitous sensors and machine learning. Review of Clinical Psychology 13 (2017), 2347.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Mullins Jamie T. and White Corey. 2019. Temperature and mental health: Evidence from the spectrum of mental health outcomes. J. Health Economics 68 (2019), 102240.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Murphy Patricia J., Frei Mark G., and Papolos Demitri. 2014. Alterations in skin temperature and sleep in the fear of harm phenotype of pediatric bipolar disorder. J. Clinical Medicine 3, 3 (2014), 959971.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] O’Brien J. T., Gallagher P., Stow D., Hammerla N., Ploetz T., Firbank M., Ladha C., Ladha K., Jackson D., McNaney Roisin, et al. 2017. A study of wrist-worn activity measurement as a potential real-world biomarker for late-life depression. Psychological Medicine 47, 1 (2017), 93102.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Pham Trang, Tran Truyen, Phung Dinh, and Venkatesh Svetha. 2017. Predicting healthcare trajectories from medical records: A deep learning approach. J. Biomedical Informatics 69 (2017), 218229.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Pinaya Walter H. L., Gadelha Ary, Doyle Orla M., Noto Cristiano, Zugman André, Cordeiro Quirino, Jackowski Andrea P., Bressan Rodrigo A., and Sato João R.. 2016. Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia. Scientific Reports 6 (2016), 38897.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Polanczyk Guilherme V., Salum Giovanni A., Sugaya Luisa S., Caye Arthur, and Rohde Luis A.. 2015. Annual research review: A meta-analysis of the worldwide prevalence of mental disorders in children and adolescents. J. Child Psychology and Psychiatry 56, 3 (2015), 345365.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Reece Andrew G. and Danforth Christopher M.. 2017. Instagram photos reveal predictive markers of depression. EPJ Data Science 6 (2017), 112.Google ScholarGoogle Scholar
  45. [45] Sadeque Farig, Xu Dongfang, and Bethard Steven. 2017. UArizona at the CLEF eRisk 2017 pilot task: Linear and recurrent models for early depression detection. In Proc. CEUR Workshop, Vol. 1866. NIH Public Access.Google ScholarGoogle Scholar
  46. [46] Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, and Chen Liang-Chieh. 2018. MobileNet v2: Inverted residuals and linear bottlenecks. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition. 45104520.Google ScholarGoogle Scholar
  47. [47] Sarchiapone Marco, Gramaglia Carla, Iosue Miriam, Carli Vladimir, Mandelli Laura, Serretti Alessandro, Marangon Debora, and Zeppegno Patrizia. 2018. The association between electrodermal activity (EDA), depression and suicidal behaviour: A systematic review and narrative synthesis. BMC Psychiatry 18, 1 (2018), 127.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Schnack Hugo G., Nieuwenhuis Mireille, Haren Neeltje E. M. van, Abramovic Lucija, Scheewe Thomas W., Brouwer Rachel M., Pol Hilleke E. Hulshoff, and Kahn René S.. 2014. Can structural MRI aid in clinical classification? A machine learning study in two independent samples of patients with schizophrenia, bipolar disorder and healthy subjects. Neuroimage 84 (2014), 299306.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Stanley Kenneth O. and Miikkulainen Risto. 2002. Evolving neural networks through augmenting topologies. Evolutionary Computation 10, 2 (2002), 99127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Steel Zachary, Marnane Claire, Iranpour Changiz, Chey Tien, Jackson John W., Patel Vikram, and Silove Derrick. 2014. The global prevalence of common mental disorders: A systematic review and meta-analysis 1980–2013. Int. J. Epidemiology 43, 2 (2014), 476493.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Su Chang, Xu Zhenxing, Pathak Jyotishman, and Wang Fei. 2020. Deep learning in mental health outcome research: A scoping review. Translational Psychiatry 10, 1 (2020), 126.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Tan Mingxing, Chen Bo, Pang Ruoming, Vasudevan Vijay, Sandler Mark, Howard Andrew, and Le Quoc V.. 2019. MnasNet: Platform-aware neural architecture search for mobile. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. 28202828.Google ScholarGoogle Scholar
  53. [53] Wac Katarzyna, Halteren Aart Van, and Konstantas Dimitri. 2006. QoS-predictions service: Infrastructural support for proactive QoS-and context-aware mobile services (position paper). In Proc. Int. Conf. Move to Meaningful Internet Systems. Springer, 19241933.Google ScholarGoogle Scholar
  54. [54] Wu Bichen, Dai Xiaoliang, Zhang Peizhao, Wang Yanghan, Sun Fei, Wu Yiming, Tian Yuandong, Vajda Peter, Jia Yangqing, and Keutzer Kurt. 2019. FBNet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. 1073410742.Google ScholarGoogle Scholar
  55. [55] Wu Bichen, Wan Alvin, Yue Xiangyu, Jin Peter, Zhao Sicheng, Golmant Noah, Gholaminejad Amir, Gonzalez Joseph, and Keutzer Kurt. 2018. Shift: A zero flop, zero parameter alternative to spatial convolutions. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. 91279135.Google ScholarGoogle Scholar
  56. [56] Xue Yuankun and Bogdan Paul. 2017. Constructing compact causal mathematical models for complex dynamics. In Proc. Int. Conf. Cyber-Physical Systems. 97107.Google ScholarGoogle Scholar
  57. [57] Xue Yuankun, Pequito Sergio, Coelho Joana R., Bogdan Paul, and Pappas George J.. 2016. Minimum number of sensors to ensure observability of physiological systems: A case study. In Proc. Allerton Conf. Communication, Control, and Computing. 11811188.Google ScholarGoogle Scholar
  58. [58] Xue Yuankun, Rodriguez Saul, and Bogdan Paul. 2016. A spatio-temporal fractal model for a CPS approach to brain-machine-body interfaces. In Proc. Design, Automation & Test in Europe Conf. & Exhibition. 642647.Google ScholarGoogle Scholar
  59. [59] Yang Zikun, Bogdan Paul, and Nazarian Shahin. 2021. An in silico deep learning approach to multi-epitope vaccine design: A SARS-CoV-2 case study. Scientific Reports 11, 1 (2021), 121.Google ScholarGoogle Scholar
  60. [60] Yin Hongxu, Akmandor Ayten Ozge, Mosenia Arsalan, and Jha Niraj K.. 2018. Smart healthcare. Foundations and Trends in Electronic Design Automation 12, 4 (2018), 401466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Yin Hongxu, Mukadam Bilal, Dai Xiaoliang, and Jha Niraj K.. 2019. DiabDeep: Pervasive diabetes diagnosis based on wearable medical sensors and efficient neural networks. IEEE Trans. Emerging Topics in Computing (2019).Google ScholarGoogle Scholar
  62. [62] Zeng Ling-Li, Wang Huaning, Hu Panpan, Yang Bo, Pu Weidan, Shen Hui, Chen Xingui, Liu Zhening, Yin Hong, Tan Qingrong, et al. 2018. Multi-site diagnostic classification of schizophrenia using discriminant deep learning with functional connectivity MRI. EBioMedicine 30 (2018), 7485.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Zoph Barret and Le Quoc V.. 2017. Neural architecture search with reinforcement learning. In Proc. Int. Conf. Learning Representations.Google ScholarGoogle Scholar

Index Terms

  1. MHDeep: Mental Health Disorder Detection System Based on Wearable Sensors and Artificial Neural Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 21, Issue 6
      November 2022
      498 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3561948
      • Editor:
      • Tulika Mitra
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 December 2022
      • Online AM: 26 March 2022
      • Accepted: 14 March 2022
      • Revised: 9 February 2022
      • Received: 27 July 2021
      Published in tecs Volume 21, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!