ABSTRACT
A large collection of research literature has identified the privacy risks of keystroke inference attacks that use statistical models to extract content typed onto a keyboard. Yet existing attacks cannot operate in realistic settings, and rely on strong assumptions of labeled training data, knowledge of keyboard layout, carefully placed sensors or data from other side-channels. This paper describes experiences developing and evaluating a general, video-based keystroke inference attack that operates in common public settings using a single commodity camera phone, with no pretraining, no keyboard knowledge, no local sensors, and no side-channels. We show that using a self-supervised approach, noisy finger tracking data from a video can be processed, labeled and filtered to train DNN keystroke inference models that operate accurately on the same video. Using IRB approved user studies, we validate attack efficacy across a variety of environments, keyboards, and content, and users with different typing behaviors and abilities. Our project website is located at: https://sandlab.cs.uchicago.edu/keystroke/.
- Kamran Ali, Alex X. Liu, Wei Wang, and Muhammad Shahzad. Keystroke recognition using WiFi signals. In Proc. of MobiCom, 2015.Google Scholar
Digital Library
- Apple Inc. https://apps.apple.com/us/app/gboard-the-google-keyboard/id1091700242.Google Scholar
- Eric Arazo, Diego Ortego, Paul Albert, Noel O'Connor, and Kevin McGuinness. Unsupervised label noise modeling and loss correction. In Proc. of ICML, 2019.Google Scholar
- Yuki Markus Asano, Christian Rupprecht, and Andrea Vedaldi. Self-labelling via simultaneous clustering and representation learning. In Proc. of ICLR, 2020.Google Scholar
- D. Asonov and R. Agrawal. Keyboard acoustic emanations. In Proc. of IEEE S&P, 2004.Google Scholar
Cross Ref
- Davide Balzarotti, Marco Cova, and Giovanni Vigna. Clearshot: Eavesdropping on keyboard input from video. In Proc. of IEEE S&P, 2008.Google Scholar
- Salil P Banerjee and Damon L Woodard. Biometric authentication and identification using keystroke dynamics: A survey. Journal of Pattern Recognition Research, 7(1), 2012.Google Scholar
Cross Ref
- Leonard E. Baum. An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In Inequalities III: Proceedings of the Third Symposium on Inequalities, 1972.Google Scholar
- Valentin Bazarevsky and Fan Zhang. Ondevice, real-time hand tracking with MediaPipe. https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html, 2021.Google Scholar
- Daniel Buschek, Alexander De Luca, and Florian Alt. Improving accuracy, applicability and usability of keystroke biometrics on mobile touchscreen devices. In Proc. of CHI, 2015.Google Scholar
Digital Library
- Arpan Chakraborty, Brent Harrison, Pu Yang, David Roberts, and Robert St. Amant. Exploring key-level analytics for computational modeling of typing behavior. In Proc. of HotSoS, 2014.Google Scholar
Digital Library
- Theocharis Chatzis, Andreas Stergioulas, Dimitrios Konstantinidis, Kosmas Dimitropoulos, and Petros Daras. A comprehensive study on deep learning-based 3d hand pose estimation methods. Applied Sciences, 10(19), 2020.Google Scholar
- Bo Chen, Vivek Yenamandra, and Kannan Srinivasan. Tracking keystrokes using wireless signals. In Proc. of MobiSys, 2015.Google Scholar
Digital Library
- William W. Cohen. Enron email dataset. https://www.cs.cmu.edu/~enron/, 2015.Google Scholar
- The SciPy community. scipy.signal.peak_prominences. https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.peak_prominences.html, 2022.Google Scholar
- CopyLeaks. Plagiarism checker api - integrate ai powered api, copyleaks. https://api.copyleaks.com/.Google Scholar
- Vivek Dhakal. Identification of typing behaviors from large keystroke dataset. Master Thesis, Aalto University, 2017.Google Scholar
- C. Doersch and A. Zisserman. Multi-task self-supervised visual learning. In Proc. of ICCV, 2017.Google Scholar
Cross Ref
- EDUCBA. Opencv perspectivetransform. https://www.educba.com/opencv-perspectivetransform/.Google Scholar
- Hugging Face. WER - a hugging face space by evaluate-metric. https://huggingface.co/spaces/evaluate-metric/wer.Google Scholar
- Anna Maria Feit, Daryl Weir, and Antti Oulasvirta. How we type: Movement strategies and performance in everyday typing. In Proc. of CHI, 2016.Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. of CVPR, 2016.Google Scholar
Cross Ref
- Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching machines to read and comprehend. In Proc. of NIPS, 2015.Google Scholar
- Jayakumar Hoskere. Everyday ai: Beyond spell check, how google docs is smart enough to correct grammar | google cloud blog. https://cloud.google.com/blog/products/g-suite/everyday-ai-beyond-spell-check-how\-google-docs-is-smart-enough-to-correct-grammar.Google Scholar
- Wolfgang Jank. The em algorithm, its randomized implementation and global optimization: Some challenges and opportunities for operations research. In Perspectives in operations research. 2006.Google Scholar
Cross Ref
- Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In Proc. of ICML, 2018.Google Scholar
- Xinhui Jiang, Jussi P.P. Jokinen, Antti Oulasvirta, and Xiangshi Ren. Learning to type with mobile keyboards: Findings with a randomized keyboard. Comput. Hum. Behav., 126, jan 2022.Google Scholar
- Wenqiang Jin, Srinivasan Murali, Huadi Zhu, and Ming Li. Periscope: A keystroke inference attack using human coupled electromagnetic emanations. In Proc. of ACM CCS, 2021.Google Scholar
Digital Library
- Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, and Gerhard Rigoll. Real-time hand gesture detection and classification using convolutional neural networks. In Proc. of IEEE FG 2019.Google Scholar
Digital Library
- Dominik Kulon, Riza Alp Guler, Iasonas Kokkinos, Michael M Bronstein, and Stefanos Zafeiriou. Weakly-supervised mesh-convolutional hand reconstruction in the wild. In Proc. of CVPR, 2020.Google Scholar
Cross Ref
- Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics. Doklady, 10:707-710, 1965.Google Scholar
- Mengcheng Li, Liang An, Hongwen Zhang, Lianpeng Wu, Feng Chen, Tao Yu, and Yebin Liu. Interacting attention graph for single image two-hand reconstruction. In Proc. of CVPR, 2022.Google Scholar
Cross Ref
- Mengyuan Li, Yan Meng, Junyi Liu, Haojin Zhu, Xiaohui Liang, Yao Liu, and Na Ruan. When CSI meets public WiFi: Inferring your mobile phone password via WiFi signals. In Proc. of ACM CCS, 2016.Google Scholar
Digital Library
- John Lim, True Price, Fabian Monrose, and Jan-Michael Frahm. Revisiting the threat space for vision-based keystroke inference attacks. In Proc. of ECCV, 2020.Google Scholar
Digital Library
- Kang Ling, Yuntang Liu, Ke Sun, Wei Wang, Lei Xie, and Qing Gu. Spidermon: Towards using cell towers as illuminating sources for keystroke monitoring. In Proc. of IEEE INFOCOM, 2020.Google Scholar
Digital Library
- Shiqing Luo, Xinyu Hu, and Zhisheng Yan. Holologger: Keystroke inference on mixed reality head mounted displays. In Proc. of IEEE VR, 2022.Google Scholar
Cross Ref
- Philip Marquardt, Arunabh Verma, Henry Carter, and Patrick Traynor. (Sp)IPhone: Decoding vibrations from nearby keyboards using mobile phone accelerometers. In Proc. of ACM CCS, 2011.Google Scholar
- SMM Martens, Joris M Mooij, N Jeremy Hill, Jason Farquhar, and Bernhard Schölkopf. A graphical model framework for decoding in the visual ERP-Based BCI speller. Neural Computation, 23(1):160-182, 01 2011.Google Scholar
Digital Library
- Joanna Materzynska, Guillaume Berger, Ingo Bax, and Roland Memisevic. The jester dataset: A large-scale video dataset of human gestures. In Proc. of IEEE/CVF ICCVW, 2019.Google Scholar
Cross Ref
- MediaPipe Hands. Javascript solution api. https://google.github.io/mediapipe/solutions/hands#javascript-solution-api.Google Scholar
- Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. Ganerated hands for real-time 3d hand tracking from monocular rgb. In Proc. of CVPR, 2018.Google Scholar
Cross Ref
- Lawrence Rabiner and Biinghwang Juang. An introduction to hidden markov models. IEEE Acoustics, Speech, and Signal Processing magazine, 3(1):4-16, 1986.Google Scholar
Cross Ref
- Rahul Raguram, Andrew M. White, Dibyendusekhar Goswami, Fabian Monrose, and Jan-Michael Frahm. ISpy: Automatic reconstruction of typed input from compromising reflections. In Proc. of ACM CCS, 2011.Google Scholar
- Scott Reed, Honglak Lee, Dragomir Anguelov, Christian Szegedy, Dumitru Erhan, and Andrew Rabinovich. Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596, 2014.Google Scholar
- Mohd Sabra, Anindya Maiti, and Murtuza Jadliwala. Zoom on the keystrokes: Exploiting video calls for keystroke inference attacks. CoRR, abs/2010.12078, 2020.Google Scholar
- SeleniumHQ. SeleniumHQ/selenium: A browser automation framework and ecosystem. https://github.com/SeleniumHQ/selenium.Google Scholar
- Diksha Shukla, Rajesh Kumar, Abdul Serwadda, and Vir V. Phoha. Beware, your hands reveal your secrets! In Proc. of ACM CCS, 2014.Google Scholar
- Hwanjun Song, Minseok Kim, Dongmin Park, and Jae-Gil Lee. Learning from noisy labels with deep neural networks: A survey. CoRR, abs/2007.08199, 2020.Google Scholar
- University of Notre Dame. The frequency of the letters of the alphabet in english. https://www3.nd.edu/~busiforc/handouts/cryptography/letterfrequencies.html.Google Scholar
- Andrew Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE transactions on Information Theory, 13(2):260- 269, 1967.Google Scholar
Digital Library
- He Wang, Ted Tsung-Te Lai, and Romit Roy Choudhury. MoLe: Motion leaks through smartwatch sensors. In Proc. of MobiCom, 2015.Google Scholar
- Jiayi Wang et al. RGB2Hands: Real-time tracking of 3D hand interactions from monocular RGB video. ACM Trans. Graph., nov 2020.Google Scholar
- Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. CoRR, abs/1611.05431, 2016.Google Scholar
- Yi Xu, Jared Heinly, Andrew M White, Fabian Monrose, and Jan-Michael Frahm. Seeing double: Reconstructing obscured typed input from repeated compromising reflections. In Proc. of ACM CCS, 2013.Google Scholar
- Xin Yi, Chun Yu, Mingrui Zhang, Sida Gao, Ke Sun, and Yuanchun Shi. Atk: Enabling ten-finger freehand typing in air based on 3d hand tracking data. In Proc of UIST, 2015.Google Scholar
- Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. How does disagreement help generalization against label corruption? In Proc. of ICML, 2019.Google Scholar
- Qinggang Yue, Zhen Ling, Xinwen Fu, Benyuan Liu, Kui Ren, and Wei Zhao. Blind recognition of touched keys on mobile devices. In Proc. of ACM CCS, 2014.Google Scholar
Digital Library
- Qinggang Yue, Zhen Ling, Wei Yu, Benyuan Liu, and Xinwen Fu. Blind recognition of text input on mobile devices via natural language processing. In Proc. of PAMCO, 2015.Google Scholar
Digital Library
- Chen Yunfang, Zhu Yihong, Zhou Hao, Chen Wei, and Zhang Wei. Enhanced keystroke recognition based on moving distance of keystrokes through WiFi. In Proc. of NSS, 2018.Google Scholar
Cross Ref
- Baowen Zhang, Yangang Wang, Xiaoming Deng, Yinda Zhang, Ping Tan, Cuixia Ma, and Hongan Wang. Interacting two-hand 3D pose and shape reconstruction from single color image. In Proc. of ICCV, 2021.Google Scholar
Cross Ref
- Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, and Matthias Grundmann. MediaPipe Hands: On-device real-time hand tracking. CoRR, abs/2006.10214, 2020.Google Scholar
- Hongyi Zhang, Moustapha Cissé, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. CoRR, abs/1710.09412, 2017.Google Scholar
- Linjun Zhang, Zhun Deng, Kenji Kawaguchi, Amirata Ghorbani, and James Y. Zou. How does mixup help with robustness and generalization? CoRR, abs/2010.04819, 2020.Google Scholar
- Yifan Zhang, Congqi Cao, Jian Cheng, and Hanqing Lu. Egogesture: A new dataset and benchmark for egocentric hand gesture recognition. IEEE Transactions on Multimedia, 20(5):1038-1050, 2018.Google Scholar
Cross Ref
- Li Zhuang, Feng Zhou, and J. Tygar. Keyboard acoustic emanations revisited. In Proc. of ACM CCS, 2005.Google Scholar
Digital Library
Recommendations
Examining a Large Keystroke Biometrics Dataset for Statistical-Attack Openings
Research on keystroke-based authentication has traditionally assumed human impostors who generate forgeries by physically typing on the keyboard. With bots now well understood to have the capacity to originate precisely timed keystroke sequences, this ...
Wireless Training-Free Keystroke Inference Attack and Defense
Existing research work has identified a new class of attacks that can eavesdrop on the keystrokes in a non-invasive way without infecting the target computer to install malware. The common idea is that pressing a key of a keyboard can cause a unique and ...
ArmSpy: Video-assisted PIN Inference Leveraging Keystroke-induced Arm Posture Changes
IEEE INFOCOM 2022 - IEEE Conference on Computer CommunicationsPIN inference attack leveraging keystroke-induced side-channel information poses a substantial threat to the security of people’s privacy and properties. Among various PIN inference attacks, video-assisted method provide more intuitive and robust ...




Comments