TouchTone: Smartwatch Privacy Protection via Unobtrusive Finger Touch Gestures

Privacy concerns over the security of personal information have grown in tandem with the spread of smartwatches. However, effective methods for protecting private data on smartwatches are very limited. Personal identity number (PIN) input is the only privacy protection method on off-the-shelf smartwatches, which requires tedious user effort. This is ineffective at securing information such as notifications and attention-grabbing alerts, which may leak personal data to passersby and adversaries, causing embarrassment or revealing sensitive communications. In this work, we propose a novel privacy protection system, TouchTone, that verifies users and secure personal data in a convenient and low-effort manner. Our system employs a challenge-response process to passively capture finger biometrics from an unobtrusive touch gesture using only microphones, speakers, and accelerometer sensors already built in smartwatches. To address smartwatch incompatibility with traditional high-frequency sensing techniques, we develop non-intrusive low-frequency challenge signals and cross-domain sensing techniques (i.e., measuring acoustic signals in the vibration domain) to capture robust and effective features specific to user fingers. A low-cost profile matching-based classifier is designed to enable stand-alone privacy protection on smartwatches. We conduct extensive experiments with 54 participants using varied hardware, environments, noise levels, user motions, and other impact factors, achieving around 97% true positive rate and 2% false positive rate in recognizing participants' identities for privacy protection.


INTRODUCTION
Smartwatches have gained widespread popularity in the mobile internet-of-things (IoT) landscape.As of 2022, over 216 million people own a smartwatch and by 2026, 231 million units will be sold worldwide [36].As a result, smartwatches are increasingly processing sensitive user data, including contacts, emails, bank accounts, locations, and health records [9,23,38].Moreover, this information can be displayed onscreen anytime and anywhere without warning.For example, incoming phone calls reveal caller IDs, email notifications preview message contents, and exercise apps send alerts for calorie milestones.This creates a significant privacy leakage risk as anyone near the smartwatch may see private content, causing stress and anxiety for the smartwatch user.Privacy leakage is a recognized threat in mobile device security [39,46].However, without sophisticated sensors or large interfaces like those on smartphones, smartwatches are restricted to crude OS-level defenses (e.g., PIN, toggling off messages) that fail to deliver on speed, usability, and security.Moreover, these defenses can be reverted by anyone, meaning they cannot stop motivated attackers.Thus, protecting private information on smartwatches is increasingly important as smartwatch usage continues to climb in coming years.
Privacy leakage is threatening anytime and anywhere, especially in settings such as conferences, business meetings, airports, and train stations.For example, messages from colleagues containing commercial secrets can be exposed during meetings with business partners via smartwatch notifications.Similarly, personal messages from loved ones can arrive at untimely moments in crowded public spaces, leaking our private lives to nearby strangers or adversaries.In such scenarios, current access control technologies like PIN input may not be sufficient as they inconvenience the user (e.g., typing or swiping pass codes) [3] and distract people interacting with the user (e.g., interrupting conversations with colleagues).To support the explosive usage of smartwatches in a privacy preserving manner, we aim to (1) display information at the right time and right place (2) to the right person (3) in a non-intrusive and controllable manner.Toward this end, we propose TouchTone, which uses a single finger touch to dynamically display or hide sensitive information and provide privacy protection on-demand.To achieve this, Touch-Tone verifies finger touches are input by authorized users only via finger biometric information captured from low-fidelity sensors onboard commercial smartwatches.Extracting such biometric information on smartwatches without specialized sensors or usability compromises has not been achieved before.As illustrated in Figure 1, TouchTone cleverly utilizes a challenge-response system facilitated by universally accessible microphones and accelerometers to ensure notifications are hidden until an authorized user response is detected.We find that pressure applied by finger touches on the smartwatch creates observable impacts in the acoustic domain via microphone and vibration domain via accelerometers.The fused analysis of data from the acoustic and vibration domains yields a cross-domain response which can distinguish users and protect privacy with greater effectiveness and robustness than either domain alone.Because fingers are naturally structurally diverse (e.g., shape, fat ratio, tissue structure) [32], we observe empirically that different users will create unique cross-domain responses specific to their fingers.Thus, by relying on finger biometric information, TouchTone allows the legitimate user to use a single touch to view sensitive information at the right time and right place.Existing studies on smartwatches and wearable devices may use inertial motion units (IMU) to track user body motions for gesture recognition [29] or gait recognition [8].However, these approaches cannot be applied directly for subtle, fine-grained tasks like privacy protection.
Using TouchTone, a single touch is sufficient to recognize users.Similar to fingerprint scanning, we extract finger biometric information specific to the user through finger touch gestures.This information is quick to extract and difficult to forge or replay, making it highly usable and secure.Privacy leaks are thwarted as the user has full control over when content is displayed onscreen.Only cross-domain responses from a live and legitimate user are accepted.No effort beyond a finger touch is required.This allows TouchTone to keep sensitive information obscured from adversaries, business competitors, and strangers until a legitimate finger touch is detected.During business meetings with colleagues, hiding untimely notifications with a mere finger touch is more efficient and appropriate than entering PIN codes or stowing the smartwatch in sleeves or pockets.When waiting in crowded airports and train stations, users can conceal data, such as work emails, personal messages, or social media alerts, from fellow passengers until better opportunities to view them emerge.Similarly, sensitive information can be re-displayed easily through another finger touch.
In contrast, OS-level privacy protection methods like PIN are less user-friendly.PIN is also not secure as it can be stolen or guessed.Short PIN codes are quick to enter but provide low security, whereas long PIN codes are more secure but difficult to remember and use.Other options include disabling the display of all notifications, which requires navigating through multiple menus to toggle settings.This method is tedious and may prevent the user from responding to time-sensitive notices (e.g., phone calls, food deliveries, taxi services).TouchTone overcomes many of these burdens in a fast, secure, and low-effort manner.Our cross-domain approach provides natural resistance to acoustic interference via vibration domain data and vice versa, making TouchTone highly versatile in everyday scenarios.We also note that TouchTone is flexible and can operate as a standalone privacy protection measure or as a complementary method to OS-level systems through multifactor authentication.
Capturing finger biometric information on smartwatches is significantly more challenging than other popular mobile devices (e.g., smartphones and tablets), especially in two areas.First, acoustic sensing works usually utilize inaudible ultrasonic signals to avoid disturbing users [2,20,45].However, we found smartwatches can only sample up to 16 kHz, far below the Nyquist criterion, meaning ultrasonic frequencies are not viable on smartwatches.This leaves only low-frequency ranges accessible, which overlap with many noise sources in daily life.Second, acoustic works extract structureborne sound (i.e., sound waves through solid objects) separately from conventional airborne sound (i.e., sound waves through air) to mitigate noise or estimate sound propagation characteristics.However, the small size of smartwatches makes it much more difficult to separate airborne and structure-borne multipath reflections.
To address these challenges, instead of using ultrasonic frequencies on the upper bound of human hearing, we find frequencies from 50ℎ to 500ℎ on the lower bound of human hearing to be suitable for acoustic sensing on smartwatches.We use these lower range frequencies to construct a challenge signal that determines finger biometric information based on the acoustic and vibration responses observed.This bandwidth is shared with many noise sources, necessitating a noise suppression algorithm that filters acoustic interference without destroying our desired biometric response.We compensate for low-fidelity sensor data by leveraging only two sensors in a cross-domain architecture.The vibration domain provides natural advantages against acoustic noise and vice versa.To mitigate the multipath effect, we identify finger touch gestures that produce the most consistent user responses agnostic of environmental factors such as noise or body motions.Our main contributions are as follows: • We develop TouchTone for smartwatches to protect private information, such as incoming calls, text messages, and health information, from being leaked in business meetings or public venues.TouchTone is the first non-intrusive privacy protection system thanks to its utilization of only a single finger touch to display or hide private information, making it deployable on any smartwatch model.• We develop a novel challenge-response process to derive biometric traits from a single touch.We capture response signals in both the acoustic and vibration domains to mitigate multipath effects and enhance robustness of the biometric features.A cross-domain sensing method is developed to generate additional dimensions of finger biometrics, protecting against interference or attacks that target single domains.• Most existing works achieve inaudibility by leveraging ultrasonic frequencies.However, we found these frequencies to be largely inaccessible on weaker smartwatch speakers and microphones.This forced us to develop a new sensing approach using frequencies on the lower bounds of human hearing instead of the upper bounds.These low frequencies are largely underutilized in mobile sensing literature and required new extensive studies to apply them toward privacy protection.• We conduct extensive experiments with 54 users, studying different finger touches, attacks, smartwatch hardware, noise levels, and user motions to verify the effectiveness and robustness of our system.Results show TouchTone recognizes users consistently with around 97% accuracy.

BACKGROUND OVERVIEW 2.1 Feasibility Study
Capturing Finger Biometrics Using Finger Touches.Touch-Tone is designed to protect privacy on smartwatches by hiding or revealing sensitive information at the discretion of authorized users only.To distinguish authorized users from non-authorized users, we first explore the possibility of capturing unique biometric information from finger touches on smartwatches.This requires the study of finger touches and the impact of their variables (e.g., numbers of fingers used, parts of the hand involved, areas of the smartwatch touched).Moreover, finger interactions with Touch-Tone should be designed such that they are easy to perform without distracting the user or others near the user.Accordingly, we conduct preliminary experiments to study how different hand and finger configurations, referred to as finger touch gestures, can convey biometric information in a challenge-response system.We design four representative finger touch gestures which are easy to perform without interrupting daily routines, depicted in Figure 2.Each gesture involves different parts of the hand and fingers.We recruit 8 volunteers to collect acoustic and vibration responses to a challenge signal (i.e., a chirp sweeping from 50Hz to 500Hz) on a Samsung Galaxy Watch 4. Each volunteer is asked to perform our finger touch gestures 10 times each.A comparison of the microphone recordings for two users is shown in the middle row of Figure 2. We observe that gestures Palm Press, Finger Press Sides, and Finger Press Top can produce significant amplitude differences between different pairs of volunteers.Because environmental and hardware factors are controlled, the only sources of variation are the user's hands and fingers (e.g., shape, curvature, tissue structure).The results indicate acoustic challenge-responses are responsive to subtle differences in finger biometric information when different users touch the smartwatch.Moreover, we demonstrate that finger biometric information can be obtained non-intrusively with no interruption to the user.
We note that responsiveness to user differences alone is not sufficient; TouchTone must also recognize similarities between finger touches made by the same user.Thus, we also explore the consistency and stability of finger touch gestures when performed repeatedly.The bottom row of Figure 2 illustrates the stability of responses collected by the same volunteer using four different touch gestures.We find our Finger Press Sides and Finger Press Top gestures generate nearly identical responses each time whereas our Palm Press and Hand Cover gestures have noticeable differences between iterations.Palm Press and Hand Cover gestures, which involve more parts of the hand, may be less stable compared to Finger Press Sides and Finger Press Top gestures, which involve fewer parts of the hand and may therefore be easier to input consistently.Thus, we use Finger Press Sides and Finger Press Top in TouchTone's full evaluation.
Exploration of Cross-Domain Sensing.Acoustic sensing techniques often rely on powerful speakers to transmit inaudible ultrasonic frequencies and avoid disturbing people [2].They may also leverage high-resolution microphones to mitigate noise by separating airborne and structure-borne sound propagation and discarding any echoes that follow [45].However, commercial smartwatch hardware is not powerful enough to transmit high frequencies (i.e., > 16) or differentiate the arrival of airborne and structure-borne sound or their echoes.The only inaudible frequencies accessible to smartwatch sensors are those of the low-frequency bandwidth, which can be polluted by many common noise sources found in daily life (i.e., wind, insects, car engines).To address these acoustic-based problems, we explore the feasibility of incorporating accelerometer data from the vibration domain to detect biometric information in structural changes and mitigate noise sources.
Accelerometers are known to be robust against acoustic noises that microphones cannot tolerate [37].Conversely, microphones are largely unaffected by motions (e.g., user walking) that may dominate accelerometer data.To illustrate this complementary effect, we collect acoustic and vibration responses to five consecutive challenge signals using the embedded microphone and accelerometers in the Samsung Galaxy Watch 4 with loud noise (i.e., human speech ∼ 65dB).Figure 3(a) shows vibration responses along the x-axis (Acc.Amplitude) are highly robust when there are loud human voices, whereas the acoustic responses (Mic.Amplitude) are more visibly affected.Note in Figure 3(b) that the inverse is true when introducing vibration noise in the form of walking (i.e., about 1 step every 1.5s); the vibration response is corrupted while the acoustic response is not.We explored the possibility of filtering data to mitigate the impacts of body motions like walking.However, we found that establishing cutoff frequencies for the filter to be challenging because the signal could vary significantly between different users and different motions (e.g., running, driving, exercising, etc.).Our findings indicate the acoustic and vibration domains are both effective and necessary complements to each other, strengthening our idea of deploying cross-domain sensing for privacy protection.

Motivations & Usability Study
We further study the nature of privacy leakage risks and existing solutions on smartwatches.The critical problem with privacy on smartwatches is that information can be displayed at untimely moments or locations.A 2023 report estimates mobile devices may receive upwards of 46 notifications per day, with 38.39% of users believing "They disturb me at the wrong time" [10].These notifications may expose sensitive data such as conversations from messaging apps, meeting reminders from calendars, alerts from dating apps, and health reports from fitness aids or sleep monitoring apps.Leakage of this information not only threatens the user, but close associates of the user as well (e.g., friends, relatives, coworkers).For example, parents may use smartwatches to receive periodic reports on their children's well-being or location [27].In the wrong hands, possession of such private information can lead to severe consequences, enabling adversaries to stalk or threaten victims.
Usability Study.We develop TouchTone to not only defend against privacy leakages, but do so in a non-intrusive manner using only microphones and speakers.We understand that PIN has higher security guarantees, but is not easy to enter in real-time and can  be distracting or interrupting.We thus survey how smartwatch users feel about their experiences with PIN input.In particular, we aimed to understand usability concerns by asking respondents to provide their preference in one of five options (i.e., strongly disagree, disagree, neutral, agree, strongly agree) for each of the statements in Figure 4 The general reception of TouchTone was largely positive, with average Likert Scores over 4, indicating people found our system highly usable.PIN, while highly secure, was found to be distracting during daily usage, leading to more neutral or negative replies (i.e., scoring 3 or under).PIN scored highest in ease-of-use, possibly due to the lack of available alternatives on smartwatches.However, our respondents preferred TouchTone, especially in regards to speed and user satisfaction, scoring 1.6 and 2.2 points more than PIN, respectively, illustrating the usability of TouchTone.Most of our participants did not report hearing any noise from TouchTone either.This is because our low frequencies are non-intrusive and beyond what most commercial smartwatch speakers are designed to transmit, meaning they tend to be played with low audibility, even at maximum volume.This also indicates TouchTone will not disturb nearby people.

Threat Model
Leakage of intellectual property, social media, call history, or health app records to adversaries can lead to damage of business or personal reputations, blackmail, and other undesirable situations.Therefore, attackers may conspire to circumvent or sabotage TouchTone in order to gain access to such private data or deny access of it to others.To accomplish this, attackers may attempt to spoof finger touches made by the legitimate user or block TouchTone from detecting finger touches entirely.We assume the attacker is allowed to be in close proximity to the legitimate user such that they can monitor user finger-smartwatch interactions.We also assume the attacker is knowledgeable of TouchTone and how it measures the acoustic and vibration domains.It is worth noting that most attacks can only go unnoticed when the victim is less attentive of the smartwatch, such as when they are asleep or not wearing the device.Accordingly, we define the most feasible attacks below: Impersonation Attack.To gain unauthorized access to private data, the attacker may attempt to impersonate the victim's finger touch gestures on the smartwatch and reveal hidden content.This form of impersonation requires both physical mimicry (i.e., finger structure) and behavioral mimicry (i.e., finger touch gesture).Attackers may try to maximize their chances at successful impersonation by specifically targeting victims with physically similar fingers.Observing authentic finger touch gestures (i.e., watching or photographing the victim) may also empower the attacker and increase their impersonation success.
Eavesdropping and Replay Attack.The attacker may try to gain unauthorized access to private data via stolen authentic finger touch responses.Because TouchTone utilizes data from both the acoustic and vibration domains, the attacker is required to eavesdrop both domains in order to copy an authentic victim response.This can be done by placing sensors (i.e., microphones and accelerometers) on or near the target smartwatch without the victim's notice to obtain near-identical measurements.Alternatively, the victim could be tricked into installing malware on their smartwatch which extracts microphone and accelerometer data directly and sends it to the attacker.Regardless of eavesdropping method, the attacker later replays the eavesdropped credentials in guise of the victim.For the acoustic domain, the attacker may utilize an external speaker to play their eavesdropped audio.For the vibration domain, the attacker may use a programmable motor or their own hands to generate motions.
Jamming Attack.The attacker may try to deny or delay the victim from accessing their own private information by sabotaging TouchTone's ability to recognize finger touch attempts.This may be carried out by playing loud noise near the user to disrupt the acoustic domain or by touching the device or victim to disrupt the vibration domain sensing.

System Overview
Unlike current privacy management methods on smartwatches, TouchTone is designed to hide private contents until it verifies the user via a simple touch.The basic idea is to examine the user's  biometric information captured using a challenge-response process during events where private information may leak from the smartwatch.These events include incoming messages, notifications, and phone calls, which can inadvertently expose sensitive communications from loved ones, business partners, or health practitioners.Viewing these communications requires the user to move their wrists.Therefore, instead of continuously monitoring for touches, our approach only activates and transmits challenge signals when consistent wrist movements associated with touching gestures are detected, which is easily achievable by built-in OS functions.Biometric information is embedded in sound wave propagation through the physical smartwatch when the user touches the smartwatch.The challenge signal instigates this propagation and the built-in microphone and accelerometer will receive the acoustic reverberations and vibration movements of the wave, providing robustness via two perspectives of the same event.We capture these perspectives through our cross-domain features.The combination of the smartwatch and the user's finger creates a unique entity that produces distinct features for differentiating users.The architecture of TouchTone, depicted in Figure 5, consists of four major components: (1) When the system detects a finger touch on the watch, it triggers Challenge Signal Transmission.In particular, our system leverages native functions on the smartwatch to detect wrist movements before touches.The challenge signal propagates through the watch, carrying the unique impacts of the user's touch, and is eventually captured by the microphone and accelerometer of the smartwatch in the acoustic and vibration domains, respectively.(2) Next, the system performs Response Signal Pre-Processing on the acoustic and vibration responses to enhance the signal components corresponding to the finger touch impact.The system employs acoustic noise removal and body motion mitigation methods to isolate the responses from interference.(3) After pre-processing, the Biometric Feature Extraction component extracts unique finger biometric features from the acoustic and vibration responses, including behavioral traits (i.e., finger touch gestures) and physiological traits (i.e., finger geometry and tissue structures).(4) Finally, our system performs Cross-domain Finger Response Verification to confirm user identity by measuring the distance between the feature vector from the run-time finger response and those from the pre-constructed finger response profile.The finger response profile includes the acoustic and vibration responses collected during enrollment, where the user operates TouchTone for the first time.Based on the verification result, the smartwatch can display private information on the screen to the verified user.

CHALLENGE-RESPONSE DESIGN 3.1 Challenge Signal Design
Frequency Band.Our literature review showed that low-frequencies are a necessary design choice as many mainstream smartwatch models do not support high sampling rates [4,16,22,28,51].Those that do are uncommon and come with extreme energy costs, discouraging development [17].Most commercial smartwatch microphones have sampling rates   as low as 16 kHz, restricting the maximum transmittable frequency to 8 kHz according to the Nyquist-Shannon Theorem.Thus, it is not desirable to use higher frequencies (e.g., greater than 20 kHz) for the sake of inaudibility as in other stateof-the-art works.However, we notice that frequencies below 500 Hz are also difficult to notice and insensitive to most environmental noises.Given these factors, our system adopts a low-frequency challenge signal using chirp sweeps from 50 Hz to 500 Hz.
Length.The length  of the chirp could impact the accuracy and reliability of our system.Commercial smartwatch sensors cannot generate or detect chirp signals that are very short.Thus, the chirp length needs to keep a certain length to ensure detection.However, the chirp length should not be too long since reflections from other objects will also be collected during sensing, causing severe multipath distortion.Therefore, we conducted preliminary experiments, similar to Section 2.1, to determine ideal chirp parameters and empirically found  = 125 to be ideal in speed and robustness.This length reduces multipath distortions but keeps enough energy at each frequency for chirp detection.
Time Interval.The time interval between chirps is related to the sensing speed of our system: a larger time interval results in longer user latency.On the other hand, a short interval may cause detection errors since the received responses may accidentally overlap with each other.We empirically found time intervals of 100 ms to be sufficient.
Number of Chirps.Utilizing more chirps may provide more information about the user's finger at the expense of increasing wait time for the user.As a compromise, we choose a small, odd number of chirps (i.e., 3) and apply a majority vote decision making process.This process is described further in Section 4.2.Note that in scenarios where security is prioritized more over convenience (e.g., retrieving one-time passwords, viewing encrypted emails or messages), more chirps may be preferable.

Response Signal Pre-processing
Acoustic Noise Removal.We distinguish noise from non-noise by segmenting the response signal into frames with  samples each where  =   *  .We use  to denote the index of the sample.A bandpass filter is applied to remove frequencies unrelated to the challenge signal.The key difference between the ambient noise and the challenge signal is variance.This is because chirp amplitudes change significantly over short periods while ambient noise does not.Thus, we detect frames that only contain ambient noise by calculating variance   = var(  ).To deal with the changes of variances under different noise profiles, our system further performs normalization to achieve robust noise detection: where   and   denote the mean and standard deviation within the observation window, respectively.After normalization, we use a threshold to detect noisy frames.If   < 1, frame   will be detected as containing only ambient noise.We found 1 to be a sufficient threshold based on empirical experiments.We then perform noise subtraction by estimating the spectral magnitude from the detected frames.Specifically, we let r () be a sequence of received samples after filtering and () be the noise added to the clean signal  ().By taking the Fourier Transform, we receive: We then subtract the noise magnitude spectrum to yield: (3) Finally, we obtain the cleaned signal  () using the Inverse Fourier transform on (   ).
Response Signal Segmentation.Our system uses the chirps of the response signal as the proof for authentication.To detect the arrival of each chirp, we propose a correlation-based technique since the challenge signal should have high similarity with the chirps embedded in the response signal.Let  () represent the response signal at the microphone after filtering and  () be the original challenge signal.To identify the beginning points of the chirps, the challenge signal  () is slid across the response signal  () with a moving window and the correlation sequence is calculated using the matched filter as follows: The highest peaks within sequence  () can be used as candidates to identify the beginning points of the chirp signal.The correlation sequence is also used to synchronize  () with the response signal in the vibration domain.Assuming the acoustic and vibration domains begin recording at the same time, the start point of the vibration To mitigate multipath effects, we locate the time interval buffers between chirps and discard their contents to ensure our response signal is embedded with finger biometric information.
Body Motion Mitigation.We use a two-step Kalman filterbased approach to eliminate artifacts caused by unrelated body motions.We first predict the baseline state of the accelerometer by modeling motion-free sensor data collected over a sliding window as a 3D Gaussian distribution (corresponding to the 3 axes).The window step is based on the accelerometer sampling rate (e.g., ∼100 Hz).Then, we remove detected motions if they fall beyond one standard deviation of the mean within our model by subtracting the baseline sensor value from the outlier.This approach is effective at mitigating low-intensity body motions that occur in practical scenarios when users check messages on smartwatches, such as when walking or riding vehicles.The baseline modeling is a one-time process but can be calibrated at anytime.We use linear acceleration decoupled from the impacts of gravity provided by Android built-in functions.

TOUCHTONE IMPLEMENTATION 4.1 Biometric Feature Extraction based on Finger Touch Responses
The response signals captured by TouchTone interact with the user's finger and capture unique physiological biometric information (i.e., shape and tissue structure of the user's finger) and behavioral biometric information (i.e., touch strengths and locations).
We propose exploiting spectrograms and Mel-Frequency Cepstral Coefficients (MFCC) to obtain sufficient biometric information for privacy protection.Both tools have flexible and fine-grained analysis capabilities for deriving temporal and frequency biometric properties from acoustic and vibration reflections [45].We specifically compute 256 spectral points using the Fast Fourier Transform (FFT) and short-term power spectrum of a response represented by 20 cepstral coefficients derived by a linear cosine transform.We further develop statistical features from the acoustic and vibration responses processed by spectrograms and MFCCs, including mean absolute value, standard deviation, maximum, minimum, variance, root mean square, range, skewness, and kurtosis.In total, we adopt 52 biometric features in TouchTone: 13 statistics for each of the four response signals in the acoustic and vibration domains (i.e., mono-channel audio and 3-axis vibration).
To demonstrate the effectiveness of our biometric features, we randomly sample subsets features and graph them for a subset users.Figure 6 illustrates a small-scale example using a 3D scatter plot of standard deviation, kurtosis, and range.We find that our biometric features can separate different users and cluster the same user because no users have identical finger structures when touching the smartwatch.We further verify effectiveness in more detail in Section 5.

Privacy Protection Via A Finger Touch
Touch Gesture Recognition.Our system allows a user to protect his/her privacy using a straightforward touch.As most finger touches are momentary, the system needs to accurately identify the performed gestures to ensure it can capture the user's finger biometric within a short time.Since different touch gestures may have stark differences in finger configuration (e.g., Finger Press Sides or Finger Press Top of Figure 2), we develop a gesture classification model to recognize the changes of sensor data in the acoustic and vibration domains.The insight is that when the user touches the smartwatch, the contact on the smartwatch causes a significant impact, reflected In particular, we adopt a model based on the support vector machine (SVM) for its simplicity, however, other binary or multi-class classification models (e.g., K-Nearest Neighbors [5] or Random Forest [6]) can be the alternative choices.Note that the touch gesture recognition can be triggered by recognizing common wrist movements (before touch gestures), which are readily-available, native functions on most smartwatches [12].These functions only need to be active for less than 1s before revealing the private information on the screen.
Cross-Domain Finger Response Verification.After recognizing a touch gesture, the system extract finger biometric features based on the run-time finger response captured in the acoustic and vibration domains.It then verifies the user's identity by comparing the finger biometric features to those derived from the profile of the legitimate user's finger response.We employ a verification vector (52-bit string) denoted as C ′ to record the differences between the run-time finger response and the profile.Each bit of C ′ is a Boolean indicating whether the specific feature from the run-time finger response is different from the corresponding feature from the user's profile.Specifically, we calculate the Euclidean distance between each statistical feature of C ′ and the target profile C  , and compare this distance with maximum deviation     .If the Euclidean distance of a feature is less than the maximum deviation, we set the element of the verification vector with the corresponding index as 1, treating it as a positive match.Otherwise, it is set to 0. If the ratio of matching features is greater than a threshold , we determine the finger response is from the legitimate user.Furthermore, we perform finger response verification for each chirp in the response signal, with the total number of chirps denoted as  ′ .To accept the user, we require a successful match in more than half ⌈  ′ 2 ⌉ of the matching attempts.Figure 7 shows an example where 2 out of 3 chirp feature vectors match a profile, resulting in a majority vote to accept the user.
Recall in Section 2.1 that noise may impact our domains differently.If one domain has significantly fewer positive bits than the other, it is likely experiencing noise and, therefore, has less reliable features; thus, we weigh it lower.The weights of the distances in the acoustic domain and the vibration domain are set dynamically Profile Matching

Majority voting
Feature Vector Bit Matching according to the proportion of positively flagged bits (1's), respectively.In particular, we employ a threshold  to examine whether the features in either domain are not significantly impacted by noise or interference and can still provide sufficient biometric information for privacy protection.If the proportion of positively flagged bits  in the verification vector corresponding to a specific domain is greater than , we set the weight of the distance in this domain as 1.Otherwise, we set the weight of the distance in this domain to 1  1+ − , which is the sigmoid function widely used to bound the output below 1.The determined weight is then applied to the counts of the positively flagged bits in the corresponding domain.The weighted counts are added up and compared to the threshold  to determine whether the finger response is from the legitimate user or not.
Finger Response Profile Construction.Our system needs to construct a user's finger response profile containing the biometric information before use.This can be done in an enrollment process when using the system for the first time.In enrollment, the user manually initiates the challenge signal transmission and extracts the finger biometric features from the audio and vibration responses to create a profile.We elaborate on the definition of the response signal from Section 3. ( C  and    are employed to establish the finger response of user , forming the profile that will be utilized for matching future response inputs.

PERFORMANCE EVALUATION 5.1 Evaluation Methodology
Hardware and Scenarios.A prototype of TouchTone was developed for Android-based Wear OS platforms.We use the Samsung Galaxy Watch 4, Fossil Watch 6 and Skagen Falster Watch 6 due to their varied sensor positions and dimensions.All models include one speaker and one microphone, marked in Figure 8.The challenge signal is generated by the speaker on the smartwatch with 100% volume.The response is sampled by the microphone and accelerometer on the smartwatches at 16khz and 100Hz, respectively (the Falster accelerometer is an exception, only sampling up to 50Hz).Unless mentioned elsewhere, we evaluate our system in a typical office environment with an average 35dB background noise level.We assess the security of our system against the threat model defined in Section 2.3.We also investigate the robustness of our system with various impact factors including attack distance, noise level, body motion, long term variations, touch locations, and other system parameters.In total, we devised 22 scenarios in our study.
Data Collection.We recruit 54 volunteers, 38 males and 16 females ranging from ages 18-50, to participate in our study with the approval of our IRB.These volunteers include the 15 participants featured previously in Section 2.2.We do not seek specific finger sizes or other demographics to ensure a representative test population through random sampling.Participants are allotted several minutes to acquaint themselves with TouchTone, including the software and finger touch gestures to be performed.As mentioned in Section 2.1, we identified the Finger Press Sides and Finger Press Top gestures (see Figure 2) as capable of producing the most distinct and consistent user responses.In each scenario, a participant touches the smartwatch 60 times using 2 gestures on 3 devices, compiling a total of 19, 440 user samples.Data collection per participant is completed within 1 week with the exception of our long-term performance scenario, where a subset of 10 participants volunteered to continue the study for 6 weeks.To ensure our samples are diverse, participants un-equip and re-equip the smartwatches throughout data collection and perform finger touch gestures with some variability.Participants are encouraged to wear the smartwatch naturally based on their own habits (e.g., tightness of the band or the preferred hand).We treat each participant as legitimate user of the smartwatch and all other as non-legitimate users.We randomly choose half the user's measured cross-domain chirps to build the legitimate user profile and use the rest as TP test signals, reducing compared to chronological splitting.
Performance Metrics.We measure performance through the following metrics: True Positive (TP) Rate, False Positive (FP) Rate, and False Negative (FN) Rate.We compare TP and FP through receiver operating characteristic (ROC) curves to measure general performance and compare FP and FN through equal error rates (EER) to quantify the balance between security and convenience.Intuitively, a high TP and low FN rate means a lower probability that the legitimate user is misidentified while a lower FP rate means a lower probability for non-legitimate users to be mistaken for the legitimate user.The ideal system has a simultaneous 100% TP rate, 0% FN rate, and 0% FP rate.

Overall Performance
1) Setup: We evaluate the overall performance of TouchTone by distinguishing users based on their finger touches.This evaluation ensures privacy is protected on smartwatches as legitimate users can freely display or hide information on the screen whereas nonlegitimate users cannot.Specifically, we consider the performance for each combination of our 2 touch gestures and 3 smartwatch models.The setup and data collection are routine as described in Section 5.1.
2) Result: Figure 9 shows the average performance when distinguishing all participants with two different gestures.We observe that our system can achieve 97.1% TP rate / 1.8% FP rate for our press sides gesture and 96.9% TP rate / 1.1% FP rate for our press top gesture when we set thresholds  = 85% and  = 30%.We  also compute the EER to be 2.48% and 2.63% for the two gestures, respectively.This performance was consistent even for smartwatch models with different sensor locations (i.e., Samsung vs. Fossil).Overall, our results show that TouchTone can effectively protect user privacy.

Performance Under Impersonation Attacks
1) Setup: Attackers may try to impersonate the legitimate user's finger touches in order to access the legitimate user's private data.
We study the effectiveness of TouchTone under this attack with a subset of 10 participants recruited to act as victims and attackers across multiple rounds of experiments.In a single round, one participant is the victim and the remainder serve as attackers.The roles of victim and attacker are rotated in round robin fashion such that every participant is victim at least once and every participant attacks every other participant at least once, resulting in 45 attacker-victim pairs.Each victim chooses their preferred finger touch gesture for the experiment.Thus, the finger touch gesture varied between users.The victim demonstrates how they perform the gesture directly to the attacker 10 times.The attacker learns to mimic the victim's gesture through visual observation.
The attacker is then given 10 attempts to impersonate the victim.Note that we are intentionally providing more preparation for the attacker than what they are likely to receive in a real-world attack scenario in order to thoroughly test TouchTone and maximize the chances of the attack succeeding.
2) Result: Figure 10 shows the FP and FN rates when varying  from 75% to 95%.We find FP decreases whereas FN increases as  increases since a higher threshold means fewer attempts are accepted.EER can be maintained under 3% for two of our three tested watches.The EER for the Skagen Falster Watch 6 was found to be the worst performing due to a lower accelerometer sampling rate (e.g.50hz compared to 100hz for the Samsung Watch 4).This was verified by downsampling the other watch accelerometers to 50hz, causing a decrease in performance approaching the Skagen Falster Watch 6.The threshold can be controlled by the user to prioritize lower FP or lower FN depending on the desired balance of security vs. convenience.

Performance Under Eavesdropping and
Replay Attacks 1) Setup: Next, we study how attackers may eavesdrop and replay copies of valid user responses to gain access to private data.We recruit 10 of our participants to act as victims seated at a desk.A smartphone with active microphones and accelerometers 0.2m from the victim acts as a malicious sensor and listens for authentication signals.Such signal will be recorded and replayed by an attacker in an attempt to falsify their identity to TouchTone.Acoustic signals are replayed using an attacking speaker while vibration signals are reproduced by the attacker's hand motions.Note that we intentionally assume the victim does not notice the attack or interfere with the attacker in order to thoroughly test TouchTone and maximize the chances of the attack succeeding.
2) Result: TouchTone is able to correctly recognize victims while blocking the attacker using eavesdropped responses, resulting in 0% EER and 100% TP for our two finger touch gestures.The recorded signal must travel through the air from the victim's smartwatch to the eavesdropper, causing significant attenuation.This attenuation is too severe for the attacker to know the victim's true signal.

Performance Under Jamming Attacks
1) Setup: We consider multiple jamming attack setups where an attacker aims to block the legitimate user from being successfully recognized by TouchTone.We set up a loudspeaker and calibrate the frequency band and sound pressure of the jamming signal as 50-500hz and 55db, respectively.We assume jamming attackers have perfect knowledge of TouchTone parameters and design a repeating signal sweep to pose the biggest threat possible.We then invited 10 participants to act as victims and played the jamming attack signal from 1-3m away as shown in Figure 11(a).
2) Result: Figure 11(b) shows our system remains robust.When distance increases, performance improves as the jamming signal attenuates more significantly.Although performance is not as good at very close distances of 1m, it can still reach 95.9% accuracy and 3.97% EER.Note that jamming at close range is likely to be noticed by the victim, greatly increasing the chance of exposing the attacker.

Impact of Noise Pressure Level and Distance
1) Setup: To ensure TouchTone is usable in adverse acoustic conditions, we study the robustness of our system under different noisy environments with 10 participants.We first simulate ambient noise by playing pre-recorded sound (i.e., light music, passing cars, etc.) on a laptop while participants use TouchTone in an office environment.The laptop was placed 1-3m away from the user and set to volumes averaging around 50dB and 75dB to create various environmental noise conditions.Then, we ask participants to use TouchTone in actual locations (i.e., library, meeting room, and student center), where the system is exposed to real noise of people  moving or talking with average sound levels of around 60dB.Note that existing acoustic sensing techniques have only been tested against noises up to 40dB [32], which is much quieter than our settings.
2) Result: Figure 12 indicates participants can be recognized with a high accuracy of over 95% and EER of 3.67%.Although the acoustic domain is very noisy, the vibration domain is largely unaffected.Moreover, our noise removal and domain weighing still allow the acoustic domain to contribute some information.Only in extreme cases with loud noise and close distances (e.g., 75dB at 1m) can performance begin to deteriorate, showing 94.5% TP rate and 6.9% FP rate.We find that TouchTone has similar performance against both simulated noise and real noise, indicating TouchTone can effectively protect users' privacy under various acoustic noise.

Impact of User Body Motions
1) Setup: To ensure TouchTone is usable in adverse vibration conditions, we ask 10 participants to walk and run while using our system.Data was collected in an unoccupied hallway with ample space for movement.We asked participants to maintain a consistent walking or running speed throughout the data collection.We therefore advised participants to choose a walking or running speed they would be comfortable maintaining for several minutes uninterrupted.No further instruction on body motions were given in order to create the most natural testing conditions.We observed walking speeds comparable to casual everyday walking and running speeds comparable to light jogging exercise.
2) Result: We observe in Figure 13 that most participants can maintain a similar TP rate with static action at averages around 98% TP rate with a EER of around 6.77%.While motion corrupted the vibration domain considerably, the acoustic domain was operating normally, leading to only minor deterioration in FP rates.We consider this acceptable as continuous user movements, especially jogging, is a very challenging scenario for attacks to be launched.Note that existing vibration-based sensing techniques cannot tolerate dynamic motions [19], illustrating the necessity of cross-domain sensing for high robustness.

Long-term Performance Study
1) Setup: We study whether finger touch gestures can still be recognized by TouchTone after extensive time.Specifically, we invited 10 of our participants to return for another round of data collection after 6 weeks had elapsed.We consider 6 weeks a sufficiently  long time for users to accrue slight differences in finger shape (i.e., weight gain or loss) or touch behavior.However, to ensure participants continued natural lifestyles, no specific instructions were given to change or maintain finger traits over the 6-week period.
2) Result: As shown in Figure 14, we find TouchTone can still successfully recognize individuals.While some minor decrease in accuracy was observed after 6 weeks, all three smartwatches still achieved 95% TP or higher for a simultaneous 5% FP or lower, indicating our system is time-invariant.Similarly, we find EER to be 4.13%.This suggests TouchTone can tolerate natural variations in finger structure touch behavior.Note that performance can be further improved by periodically updating the user profile.

Impact of Different Touch Locations
1) Setup: In practice, users may not always touch the exact same location when performing their finger touch gestures, which may potentially impact the performance of TouchTone.To quantify the effects of different touch locations, we divide the smartwatch screen into five areas as 'Top', 'Down', 'Left', 'Right', and 'Center', pictured in Figure 16.We ask 10 participants to touch each of five locations 60 times.10% of the collected data from a single location is used to build a profile for said location.The rest of them are used to examine the differences between locations by matching them to the correct profiles.Note that ordinarily, TouchTone only requires the user to enroll one profile using one touch location.
2) Result: The confusion matrix presented in Figure 15(a) shows that the finger responses collected from different locations are not significantly different from each other, suggesting that our system is not overfitting touch locations on the smartwatch.Figure 15(b) presents our accuracy at recognizing legitimate users when touching different locations on different smartwatch models.The average accuracy and EER is around 97.93% and 2.89%, respectively, showing no major difference between different smartwatches.This demonstrates that our system is agnostic to touch locations.The stability around 95% TP and 4.9% EER can be maintained with around 15 chirps as an optimal case.

Impact of Training and Overhead
System Overhead.We study the memory impact, power consumption, and run time of TouchTone.Our challenge and response signals (both acoustic and vibration) require only 365KB.The size of the authentication model of TouchTone is 6KB.The total size of the TouchTone app is about 8.1MB, which is less than many popular smartwatch apps (e.g., Spotify requires 27.1MB [31]).In addition, we continuously collect 600 responses on each of the three smartwatches models and measure the battery reduction through Watch Battery Monitor [24].We find that our method consumes around 0.025% battery capacity per authentication on all three smartwatches, which is negligible for daily usage.Our power consumption is an upper estimate, including power consumed in between experiments.In ordinary use cases, TouchTone is only powered during a finger touch.TouchTone requires an average of 0.6s to extract the features from the response signals (including response capture time) and another 1.31 × 10 −4 s to complete profile matching, indicating we can recognize the user within 1s after the user touches the smartwatch.

DISCUSSION
Generalizing TouchTone.While our evaluation considers a few types of smartwatches, our system generally applies to a broader spectrum of wearable devices susceptible to on-screen privacy leakage The smartwatches used in our evaluation are carefully chosen to represent a diverse range of software and hardware constraints commonly found in market.Furthermore, our system is designed to work with low-fidelity, readily-available sensors on all commodity smartwatches.Thus, our work lays the foundation for a solution to safeguard user privacy across a broad spectrum of devices.
Physical Changes to Fingers.The participants of our study all performed our finger touch gestures with their bare fingertips, with no remarkable characteristics related to their fingers.However, physical changes such as the presence of sweat or wool from gloves may alter or obscure biometric information.Our acoustic and vibration responses capture unique finger biometric information related to the shape and force the finger exerts.In the case of mild sweat, the moisture does not drastically alter finger geometry, meaning any differences observed may be behavioral (i.e., the finger touch may vary due to slippery surfaces).Furthermore, the impact of gloves is highly dependent on the glove quality.Thinner gloves are more likely to conform to the shape of the user's finger and thus preserve the finger shape.Thicker gloves may obscure the finger shape and challenge capturing finger biometrics.In the future, we intend to explore the impacts of additional external factors.For example, long sleeves may cover the smartwatch and impact the system.We expect these impacts to be minimal as sleeves do not exert significant force relative to finger touches.Moreover, users would likely roll their sleeves back to more easily view and touch the smartwatch.Future studies may verify these ideas.
Impact of Voting Number.The number of votes necessary to pass a majority vote on finger identity influences the confidence of any verification decision made by TouchTone.We evaluate the effect of the voting number of the majority voting mechanism on TP and EER by varying the number of chirps needed to make a decision from 1 to 5.More chirps led to modest improvements at the expense of latency.However, even a single chirp allowed us to achieve over 93% TP rate and under 7.8% EER for all smartwatches.This indicates TouchTone can adapt to different usability and security requirements, if needed.

RELATED WORK
In the broader acoustic domain, human voices have been utilized to verify users on smart devices by examining traits such as intonation and duration [15,43], utterances [42], spectrums [11,35], and Cepstral features [7].However, these traits are personally revealing, making use of them a privacy risk itself.Other traits studied include keystroke dynamics [34], mouse movements [50], gait patterns [33], speech-related facial movements [49], and vocal tract vibrations [25].While these approaches pose less risk to privacy, they require significant sensing time and user effort.
Meanwhile in the vibration domain, research has primarily focused on authenticating specific user For example, Mondol et al. [29] propose leveraging wrist-worn motion sensors to capture signatures in the air.Daily activities (e.g., walking, jumping, climbing stairs, arm gestures) [8,21,44,47,48] have also been used for behavioral authentication.More recently, researchers have combined pressure from touchscreen presses with PIN inputs for second factor authentication [40,41].However, all of these works require considerable efforts of active participation, which is undesirable for non-intrusive privacy protection.In addition, some operating systems allow users to mute notification displays [13], but such settings are "all-or-nothing" approaches, reducing the overall user experience.
The only user authentication methods available on commodity smartwatches is PIN [30].While it provides higher security guarantee, PIN is not easy to enter on smartwatches' small screens and can be obtrusive in many practical scenarios (e.g., meeting with bosses or business partners).There have been biometric-based user authentication methods on smartphones or tablets (e.g., faces [14], fingerprints [1], iris patterns [18], and retina patterns [26]), but they are not directly applicable to smartwatches with limited sensing capabilities and power.
The most closely related works on smartwatches include Son-icPrint [32] and Lee et al. [19].SonicPrint utilizes the sound induced by a swiping gesture authenticate users but is susceptible to environmental noise and surface conditions.It also requires training samples in adverse conditions (e.g., 40db acoustic noise) to function reliably as an acoustic-only sensing approach.TouchTone has no such requirement and has been demonstrated to function in much noisier environments (e.g., 75db acoustic noise).Moreover, Son-icPrint is designed to detect audible sounds from swipe gestures.TouchTone, however, detects unobtrusive signals using unconventional low frequencies.Low-frequency sensing is an underexplored area in mobile sensing that has not been applied to privacy protection, making TouchTone a pioneer work in this regard.Lee et al. [19] leverage accelerometers and gyroscopes to produce vibration challenge signals and measure changes in the vibration response to authenticate users.Their approach requires up to 5 seconds of authentication time, which is incompatible with the requirements of on-demand privacy protection that TouchTone addresses.Moreover, as a vibration-only sensing approach, the system is susceptible to interference from motions such as walking.No results were presented to indicate that their system can tolerate such conditions.TouchTone's technical novelty stems from its innovative cross-domain sensing approach, leveraging the strengths of both the acoustic and vibration domains to achieve performance that matches or outperforms existing work while supporting more challenging scenarios than previously studied.

CONCLUSION
We presented a cross-domain privacy protection system leveraging acoustic and vibration sensors to effectively and conveniently secure smartwatch data (e.g., contacts, location, account passwords, health information) from multiple attacks (e.g., impersonation, eavesdropping, replay attacks).Our system measures the response of the user finger as it touches the smartwatch during an acoustic challenge signal transmission.The response is captured not only by microphones in the acoustic domain, but also by accelerometers in the vibration domain.We demonstrate that the use of crossdomain finger biometric information enables novel, non-intrusive, low-effort, privacy protection on smartwatches.Experiments with commodity smartwatches in various use case scenarios showed that we can identify users with around 97% true positive rate and 2% false positive rate.

Figure 1 :
Figure 1: Notifications on smartwatches may leak private information.TouchTone secures this data via cross-domain sensing of user-specific finger touches.

Figure 2 :
Figure 2: (Top) Possible finger touch gestures.Microphone readings are shown for (middle) different users and (bottom) different samples of the same user.The gestures of (b) and (c) are ideal for being unique and consistent.

( a )
Microphone and accelerometer data with loud human voices (b) Microphone and accelerometer data with slow walking activities

Figure 3 :
Figure 3: Finger touch responses under (a) loud noise and (b) user motion.Acoustic noise may disrupt the microphone but not the accelerometer and vice versa.

Figure 4 :
Figure 4: Survey of user experiences with TouchTone.
(a).Responses were converted to a 5-point Likert scale (i.e., 1 indicates strong disagreement, 5 indicates strong agreement).The average survey scores for 15 respondents are listed in Figure 4(a) and visualized in Figure 4(b).

Figure 6 :
Figure 6: Statistical feature responses distinguishing finger touches on the smartwatch for different users.

Figure 7 :
Figure 7: Illustration of the finger response profile construction and matching process.

Figure 11 :
Figure 11: Jamming attack setup and performance at varying jamming distances.
(a) ROC curve for simulated noise (b) ROC curve real noise

Figure 12 :
Figure 12: Performance under (a) simulated noise using prerecorded sounds and (b) real-world noise sources.

Figure 13 :
Figure 13: Performance for varied user motions.

Figure 14 :
Figure 14: Performance in long-term study.
Impact of Training Size.Training size is a key factor in balancing usability and security as fewer training samples could simplify user enrollment at the expense of performance and vice versa.Thus, we evaluate different numbers of chirps in the response signals for training.Note that all training data is collected in a single experiment, similar to how users provide fingerprint data in a single session when registering a new finger on a smart device.From Figure17, we observe diminishing returns rapidly after reaching 15 chirps when increasing the training number to enhance the TP.
(a) Confusion matrix showing locations are interchangeable.(b) Performance differences for different locations are negligible.

Figure 15 :
Figure 15: Impact of touch locations.Users can touch anywhere as locations have similar responses.

Figure 17 :
Figure 17: Performance for varied training sizes.