IMChew: Chewing Analysis using Earphone Inertial Measurement Units

Eating at a slower pace can aid in improved digestion and nutrient absorption. It further contributes to a lower risk of obesity and gastric cancer. Hence, our work aims to explore unobtrusive tools for detecting and counting chewing activity to assist users in developing healthier eating habits. This paper investigate the feasibility of leveraging earphones embedded with Inertial Measurement Units (IMUs) to detect and count chewing activity. We constructed a chewing analysis system, IMChew, consisting of two major parts, namely, chewing detector and chewing counter. To devise the chewing detector, we explored various time and frequency domain features which we applied to 3 classic machine learning classifiers. Additionally, we innovated a chewing counting pipeline that detects chewing frequency in the recognised chewing episodes from the chewing detector. We collected data from 8 participants, encompassing both chewing activities with various food and a broad range of non-chewing activities. Overall, the performance of our chewing detector using a leave-one-subject-out (LOSO) approach achieved both accuracy and F1-score of 0.91, while our chewing counter attained a Mean Absolute Percentage Error (MAPE) of 9.51%.


INTRODUCTION
According to World Health Organisation, in 2022, 1 in 8 people in worldwide population were living with obesity.Additionally, adult obesity has more than doubled since 1990.Studies on eating behaviour suggests that increasing the number of chews per bite is a potential strategy to reduce food intake and may aid in body-weight management [21].Eating at a slower pace has also been found to aid in improved digestion, nutrient absorption and further contributes to a lower risk of gastric cancer, tooth loss, and facial distortion [6,15].Hence, chewing analysis is essential to assisting users in developing a healthier eating habit.
A variety of devices, in particular wearables, have been investigated for conducting chewing analysis.Microphones are of the most popular and effective sensors for chewing detection, embedded in various kinds of devices ranging from smartglasses [17] to a novel head-mounted device [5].However, these wearables are obtrusive and not socially accepted for daily-life uses, limiting their usefulness for chewing analysis.Consequently, more recent studies [9,13] have leveraged earphones for chewing analysis, enabling unobtrusive, convenient, and widely adopted solutions for daily uses.These works have achieved initial success with [13] using microphones and IMUs on earphones for chewing detecting and [9] using only IMUs for snacking detection.
Our work aims to take a step further in earable-based chewing analysis.We extend the applications to include chewing counting, a critical step towards detecting chewing rates and ultimately analysis of users' eating habits.Current works on chewing counting have explored a range of devices including microphones on glasses [19] and on user's neck [8], which are obtrusive.In contrast, our study introduces a non-invasive and user-friendly solution for both chewing detection and counting using earphones.
Specifically, we utilize IMUs in earphones, which are standard and low-cost sensors in commercial earbuds (e.g., Apple Airpods [1], Google Pixel Buds [2] and Samsung Galaxy Buds [3]), and show promise in capturing the jaw movements [10] induced by chewing [13].We propose a system, IMChew, comprising two main components receiving IMU signals from earphones: chewing detector and chewing counter.Employing three machine learning models (Logistic Regression, Decision Tree, and Random Forest) with exploring various time and frequency domain features, the chewing detector is implemented to recognize chewing activities from various non-chewing activities.For the chewing counter, we develop a signal processing pipeline to detect chewing frequency in the recognized chewing episodes from the chewing detector.
We implemented a prototype system of IMChew using the eSense platform [12,14], equipped with a 6-axis IMU (i.e., a 3-axis accelerometer and a 3-axis gyroscope).We collected data from 8 participants aged 20-60 years, encompassing both chewing activities with various foods and a broad range of non-chewing activities.We evaluated the system's performance across different participants and activities.Overall, the performance of our chewing detector using a LOSO approach achieved both accuracy and F1-score of 0.91, while our chewing counter attained a MAPE of 9.51%.
In summary, this paper makes the following contributions: 1) To the best of our knowledge, IMChew is the first earablebased system for both chewing detection and counting using IMUs.2) We proposed an efficient pipeline for chewing detection and counting using earphone IMU signals.3) We developed a prototype system and conducted comprehensive evaluations involving various foods for chewing and a wide range of non-chewing activities across 8 participants.The evaluation results demonstrate the design effectiveness.

FEASIBILITY STUDY
We initially explored the feasibility of using earphone IMUs for chewing analysis.To this end, we collected eSense IMU data during a variety of daily activities, including chewing.
Figure 1 showcases the time-domain plots of IMU data during chewing, head turning and watching a movie, respectively.Distinct features are presented in the signal during each activity.Chewing has a distinguishable pattern of regular small spikes which indicates to the regular motion of biting down.An oscillation can be observed in the x-axis of the gyroscope and the y and z axes of the accelerometer during the head turning activity, corresponding to the turning of the head from left to right repeatedly.Watching a movie is characterised by erratic changes in the gyroscope signal with small amplitudes of less than 5 deg/s.Given these prominent features of each activity, it can be concluded that it is feasible to use IMU signals for chewing analysis.In particular, the regular spikes in the signals during chewing activity suggest that both time-domain and frequency-domain features are relevant and beneficial in chewing detection and counting.

SYSTEM DESIGN
Our proposed chewing analysis system, IMChew, as shown in Figure 2, has two main components, namely, chewing detector and chewing counter.More specifically, the chewing detector classifies and detects the chewing activity from earphone IMU signals.Then, the chewing counter counts the total number of chews within the chewing activity.

Chewing Detector
The chewing detector aims to classify and detect the chewing activity from the IMU signals with different classifiers.
Segmentation.The earphone IMU signals collected undergo preprocessing before being fed into the subsequent modeling steps.Following previous work [13], we segment the input signal into 3 seconds of non-overlapped windows.
Feature Extraction.Next, a feature extractor will be applied to extract both time and frequency domain features of the IMU signal.In the time domain, we calculate the the mean, variance, and power [13] for each of the 6 axes of the IMU signal.This process results in a total of 6 × 3 = 18 time domain features for each window.The frequency domain features include the Spectral Centroid (SC) and Mel Frequency Cepstral Coefficient (MFCC) [13].SC is a measure used to characterise a spectrum by indicating the location of the centre of mass of the spectrum.We calculate the SC for 6 axes of the IMU signals.Moreover, we extract MFCCs with 12 coefficients on each of the 6 axes of IMU signal based on its effectiveness on IMU feature extraction [13].This results in 78 frequency domain features.
Chewing Classifiers.The outcome of the processing step is 96 features from each window of the IMU signal.These features are then used to train three classifiers: Logistic Regression, Decision Tree, and Random Forest for Chewing or Non-Chewing activity classification.We define a chewing episode as a sequence of chewing activities.
Chewing Episode Aggregation.When IMChew first detects a chewing activity, it begins the process of chewing episode recognition.Chewing is marked as '1' and nonchewing as '0'.This marking continues until IMChew marks three consecutive '0's, indicating the end of aggregation.To determine the episode from the sequence spanning from the first '1' marked window to the last '1', we use majority voting.Specifically, if more than half of the 3-second windows are classified as Chewing activity, we will aggregate the Chewing windows and recognize them as the chewing episode and then feed them into the chewing counter module.

Chewing Counter
After detecting the chewing episode using chewing detector, we employ the chewing counter to count the chewing occurrences by analyzing the chewing frequency in the episode.This approach is based on the observation from the preliminary study, which reveals that chewing signals have regular intervals between peaks.This finding suggests that chewing frequency is relatively constant and should be detectable as the frequency in the signal with the highest intensity.
Preprocessing.First, the chewing episodes are segmented into longer fixed-size windows, i.e., 10s in IMChew implementation.The IMU signals of each window are then filtered using Butterworth bandpass filter with a low and high cutoff frequencies of 0.1Hz and 3Hz, and a moving average filter to determine the chewing frequency.
Chewing Frequency Detection.To calculate the chewing frequency within each window, we first apply the Fast Fourier Transform (FFT) to convert the preprocessed IMU signals from the time domain to the frequency domain.Based on the previous work [18], we select the frequency with the highest intensity in the range of 0.5Hz to 2.5Hz as the representative chewing frequency.Next, we multiply the frequency with the size of each window to find the number of chews that occurred within that specific time frames.Finally, the sum of chewing counts of all the windows is calculated and output as the count of chews in the chewing episode.

IMPLEMENTATION AND USER STUDY 4.1 Implementation
To collect our dataset, we used the Nokia Bell Labs eSense platform [12,14].The eSense is a sensor equipped wireless earable augmented with a 6-axis IMU and a microphone.We sampled the IMU data at 60Hz.Users wore only the left earbud as this is the one containing the IMU for the duration of the data collection.For ground truth data collection, we simultaneously recorded a video of the experiment which we used to label the IMU data.Additionally, to obtain more precise ground truth of chewing counting, we relied on users manually annotating their chewing events.To do this, we wrote a script that would record the timestamp each time the spacebar was pressed.
We then asked participants to press the spacebar each time they chewed.Thus we were able to combine both the video and the user annotations to obtain accurate ground truth.

Data Collection
To assess our chewing detection and counting algorithms, we collected data from participants while seated and undergoing various head-related, eating and non eating, activities.Our experiment was conducted with approval by the Ethics Committee of our institution.We collected data from 8 participants (4 male, 4 female) aged 20-60 years.
Participants underwent five different eating tasks, each with a duration of two minutes.We studied five different foods (chips, pretzels, apples, mangoes, and bread), and in each task participants consumed one food type.When selecting our foods, we chose foods with different textures as they produce different levels of vibrations when chewed, which can then be detected by IMU sensors [11].
Participants then performed 8 non-eating tasks, with different durations involving comparable periods of head, facial, and body movements, as shown in Table 1.The non-eating data was collected to ensure that our chewing detection algorithm can distinguish chewing activity from other head movements which occur frequently during eating episodes.
Participants were asked to execute the activity and then go back to a baseline state and then execute the activity again.For example, for the smiling activity, participants smiled, then returned to a neutral face, then smiled again for the duration of the task.Overall, we obtained 10 minutes of data for chewing activities and 10 minutes of data for non-chewing activities per participant.

PERFORMANCE EVALUATION
We assess the performance of our two subsystems: the chewing detector and the chew counter.We examine the performance of the chewing detector models using 80/20 train-test split and LOSO cross validation (CV).The 80/20 train-test split is performed by splitting 80% of each user's data for training and using the other 20% for testing.This means that data from all the participants was seen by the model prior to the testing.Meanwhile, LOSO CV better reflects the real-world application since all new users are unseen by the system.For these evaluation methods, we report Recall, Precision, F1-score and Accuracy.The performance of the chewing counter was evaluated using MAPE.

Chewing Detection
Table 2 presents the overall performance of different classifiers for chewing detection when evaluated with an 80/20 train-test split.Random Forest (RF) outperforms Logistic Regression and Decision Tree.Likewise, for LOSO CV, RF is the best performing classifier.From Table 3, we see that LOSO CV has slightly worse performance (0.86) than 80/20 split (0.97), due to the model making predictions on fully unseen data.However, this result indicates that our system has good performance even on unseen data, and that we have the potential for huge improvements through personalisation.We further breakdown the results to examine the per class performance of detecting chew vs non chew.This is provided in Figure 3a and Figure 3b for 80/20 split and LOSO evaluation, respectively.Overall, the 80/20 split resulted in a higher recall, F1-score and accuracy (0.95 -1) than LOSO CV (0.84 -0.93).We see that for the 80/20 split, recall of the chewing activities is higher than non-chew, however for the LOSO CV, the recall of the non-chew is higher.This indicates that chewing is a highly personalised activity, since each individual has a vastly different chewing pattern.Nonetheless, our solution performs well under both evaluation methods.Finally, we evaluate the performance of our chewing episode recognition using majority voting on a 10s window of data.
From Table 4, we see that RF had the best performance with recall, precision, F1-score and accuracy of 0.91.Existing work [13] achieves accuracy of 0.94, precision of 0.87, recall of 0.92 and F1-score of 0.89.

Chewing Counting
The performance of chewing counting is assessed based on the chewing episodes detected by the chewing detector.We first examine the impact of different filters on the input signal in Table 5. Applying Butterworth bandpass and moving average filters results in the best performing model with a MAPE of 9.51%.Existing work on detecting chewing counting show results of MAPE in a similar range: 12.2% using head-mounted accelerometer [20], 8.38% using video recordings [4] and 10.32% using throat microphone [8].Thus our work is competitive with chewing counting systems in the literature with a more convenient and ubiquitous form factor. Figure 4 depicts the performance of the chewing counter for each user and each activity on the left and right respectively.The worst performance of the chewing counter was for User2 with a MAPE of 12.92%.User0, and User3 also have high MAPE scores of over 12%.In general, we find that the larger the chewing rate (i.e. the more chews per second), the higher the estimation error will be.This is likely because of the parameters chosen for the moving average filter.The chewing counter performed best for User4 who had a lower chewing rate, as per Table 6.For the individual activities, there was the highest error for mangoes (12.38%), and the lowest for bread (7.41%).This is likely due to the softness of mango, which does not require as much jaw movement to chew as harder or chewier foods such as bread and pretzels.Finally, we assess the impact of the length of the chew episode on chew counting in Figure 5.It was discovered that the best performing window size was at 10 seconds, producing the minimum MAPE (9.51%).Hence, our model uses 10 seconds window as a chewing episode for counting the number of chews.

RELATED WORK 6.1 Wearable Chewing Detection Systems
A range of devices and sensors have been investigated for automatic chewing detection.Nakamura et al. [15] investigated the feasibility of using 2 channel microphones placed under the ear for the detection of chewing and swallowing.AutoDietry [7] utilised a neck-mounted microphone to detect chewing and perform food-type recognition.[17] used microphones in smart glasses for detecting chewing activity.However, these works are obtrusive and are unlikely to be adopted in everyday life due to inconvenience and discomfort of using the devices.

Earable Chewing Detection Systems
Due to the ubiquitous nature of earables, research efforts have been taken into using earables for chewing detection.Papapanagiotou et al. [16] used PPG sensors on the ear, and found the method to be feasible for chewing detection while snacking with both recall and precision of over 91%.The eSense platform, with embedded IMU sensor and microphone, have been found to be capable of detecting chewing activity with an accuracy of 97% when utilising both audio and IMU signals [13].Bin et al. [9] further found the eSense earbuds could be used for snacking detection when applied to a personalised model, obtaining around 90% F1 score.IMChew encompasses a broader range of non-chewing activities and a greater variety of foods for the development of its chewing detector.Additionally, IMChew advances earable-based chewing analysis by extending its applications to include chewing counting.This enhancement is a crucial step towards detecting chewing rates and ultimately contributes to the analysis of users' eating habits.

Chewing Counting
Even though chewing counting is less explored in the literature, a number of works have already been conducted on this using a variety of sensors and data.Video recording was investigated for chewing counting [4] with the proposed system counting chewing occurrences with MAPE of 8.38%.However, using video recordings raised the concern of reduced anonymity and user's privacy protection, leading to the system being less applicable in real life.Billah et al. [8] used a throat microphone for chew counting with an average MAPE of 10.32%.Head-mounted accelerometer was also found to be able to estimate chewing count with MAPE of 12.2% [20].To our knowledge, less intrusive devices and methods have yet to be explored for the application of chewing counting.Hence, our work makes an important contribution by investigating the feasibility of extending the applications of earables, specifically earbuds, from chewing detection to chewing counting.

DISCUSSION AND FUTURE WORK
Firstly, our study found that earables with IMU sensors can be used to detect and count chewing occurrences.We have studied this approach on a range of food types with varying textures, from softer to harder food.The model had also been trained and tested on data from a sample with a range of chewing rates from 0.79 to 1.16 chews per second (see Table 6).Our chewing detection model was able to recognise chewing signal pieces with an accuracy of 0.91 while our chewing counting algorithm performed with MAPE of 9.51%.
Combined with the unobtrusive nature of earbuds, it can be concluded that earables with IMU sensors are feasible tools for detecting and counting chewing activities.However, there are a few limitations to our approach in the study including noisy labelling and potential data collection bias.Noisy labelling.The IMU signals collected from eSense earbuds in each session were labelled uniformly with the same label.For example, every data point collected during an eating activity will be labelled as Chewing.This is inaccurate since, during sessions of eating activities, participants pause and stop chewing every now and then to put more food in their mouths or grab a new piece of food.The chewing detection algorithm is most affected by this as a smaller window size was used (3 seconds), and the training or testing label of each window may be inaccurate.In future work, we will use video recording or create novel software to annotate the data points more precisely.
Data collection bias.Furthermore, our method for ground truth collection for chewing counting may lead to potential bias in our data.As shown in Table 6, the range of chewing rate in our sample is only 0.79 to 1.16 chews per second, while the literature claims that the range of human chewing rate is 0.94 to 2.5 chews per second [18].A potential cause of the lower chewing rate observed in our sample is the ground truth collection.As participants have to press on the laptop's Spacebar every time they chew on the food, it is likely that participants chew at a slower rate to press the Spacebar more accurately and conveniently.This limitation should be handled in future work to improve the generalisability of our study by preventing any disruption or intervention with participants' chewing habits.
Simultaneous activities.Our study did not evaluate the chewing counting model's performance when chewing is combined with other simultaneous activities, such as head turning, facial expressions, or speaking, which are common in real-life scenarios.In future work, we will further apply IMChew in more complex free-living situations.

CONCLUSION
This paper investigates the use of earphone IMUs for detecting and counting chewing events.We propose a system, IMChew, with two key components: chewing detector and chewing counter.For the chewing detector, we experimented with various time and frequency domain features across 3 classic machine learning classifiers.We also developed a novel chewing counting pipeline to detect chewing frequency within identified chewing episodes.Our evaluation of IMChew on 8 participants, involving various chewing and non-chewing activities, demonstrated that the chewing detector recognizes activities with a recall, precision, F1-score, and accuracy of 0.91 using a LOSO approach, while the chewing counter achieved a MAPE of 9.51%.This indicates that earables are a viable platform for monitoring chewing activities and could potentially aid users in maintaining healthy eating habits.Future work could explore alternative data collection methods to improve the generalizability of our approach and refine our model for real-world applications.

Figure 1 :
Figure 1: Samples of accelerometer and gyroscope data under various activities.

Figure 3 :
Figure 3: Comparison of Random Forest performance for chewing and non-chewing activities.

Figure 4 :
Figure 4: Evaluation of chewing counting for individual users (left) and individual activities (right).

Figure 5 :
Figure 5: Evaluation of chewing counting across window sizes.

Table 1 :
Non-eating activities performed by participants

Table 3 :
Evaluation of chewing detection using LOSO CV

Table 4 :
LOSO evaluation of chewing episode recognition via majority voting

Table 5 :
Evaluation of chewing counting with various filters