skip to main content
10.5555/3620237.3620246guideproceedingsArticle/Chapter ViewAbstractPublication PagessecConference Proceedingsconference-collections
research-article

Towards a general video-based keystroke inference attack

Published:09 August 2023Publication History

ABSTRACT

A large collection of research literature has identified the privacy risks of keystroke inference attacks that use statistical models to extract content typed onto a keyboard. Yet existing attacks cannot operate in realistic settings, and rely on strong assumptions of labeled training data, knowledge of keyboard layout, carefully placed sensors or data from other side-channels. This paper describes experiences developing and evaluating a general, video-based keystroke inference attack that operates in common public settings using a single commodity camera phone, with no pretraining, no keyboard knowledge, no local sensors, and no side-channels. We show that using a self-supervised approach, noisy finger tracking data from a video can be processed, labeled and filtered to train DNN keystroke inference models that operate accurately on the same video. Using IRB approved user studies, we validate attack efficacy across a variety of environments, keyboards, and content, and users with different typing behaviors and abilities. Our project website is located at: https://sandlab.cs.uchicago.edu/keystroke/.

References

  1. Kamran Ali, Alex X. Liu, Wei Wang, and Muhammad Shahzad. Keystroke recognition using WiFi signals. In Proc. of MobiCom, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apple Inc. https://apps.apple.com/us/app/gboard-the-google-keyboard/id1091700242.Google ScholarGoogle Scholar
  3. Eric Arazo, Diego Ortego, Paul Albert, Noel O'Connor, and Kevin McGuinness. Unsupervised label noise modeling and loss correction. In Proc. of ICML, 2019.Google ScholarGoogle Scholar
  4. Yuki Markus Asano, Christian Rupprecht, and Andrea Vedaldi. Self-labelling via simultaneous clustering and representation learning. In Proc. of ICLR, 2020.Google ScholarGoogle Scholar
  5. D. Asonov and R. Agrawal. Keyboard acoustic emanations. In Proc. of IEEE S&P, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. Davide Balzarotti, Marco Cova, and Giovanni Vigna. Clearshot: Eavesdropping on keyboard input from video. In Proc. of IEEE S&P, 2008.Google ScholarGoogle Scholar
  7. Salil P Banerjee and Damon L Woodard. Biometric authentication and identification using keystroke dynamics: A survey. Journal of Pattern Recognition Research, 7(1), 2012.Google ScholarGoogle ScholarCross RefCross Ref
  8. Leonard E. Baum. An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In Inequalities III: Proceedings of the Third Symposium on Inequalities, 1972.Google ScholarGoogle Scholar
  9. Valentin Bazarevsky and Fan Zhang. Ondevice, real-time hand tracking with MediaPipe. https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html, 2021.Google ScholarGoogle Scholar
  10. Daniel Buschek, Alexander De Luca, and Florian Alt. Improving accuracy, applicability and usability of keystroke biometrics on mobile touchscreen devices. In Proc. of CHI, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Arpan Chakraborty, Brent Harrison, Pu Yang, David Roberts, and Robert St. Amant. Exploring key-level analytics for computational modeling of typing behavior. In Proc. of HotSoS, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Theocharis Chatzis, Andreas Stergioulas, Dimitrios Konstantinidis, Kosmas Dimitropoulos, and Petros Daras. A comprehensive study on deep learning-based 3d hand pose estimation methods. Applied Sciences, 10(19), 2020.Google ScholarGoogle Scholar
  13. Bo Chen, Vivek Yenamandra, and Kannan Srinivasan. Tracking keystrokes using wireless signals. In Proc. of MobiSys, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. William W. Cohen. Enron email dataset. https://www.cs.cmu.edu/~enron/, 2015.Google ScholarGoogle Scholar
  15. The SciPy community. scipy.signal.peak_prominences. https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.peak_prominences.html, 2022.Google ScholarGoogle Scholar
  16. CopyLeaks. Plagiarism checker api - integrate ai powered api, copyleaks. https://api.copyleaks.com/.Google ScholarGoogle Scholar
  17. Vivek Dhakal. Identification of typing behaviors from large keystroke dataset. Master Thesis, Aalto University, 2017.Google ScholarGoogle Scholar
  18. C. Doersch and A. Zisserman. Multi-task self-supervised visual learning. In Proc. of ICCV, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  19. EDUCBA. Opencv perspectivetransform. https://www.educba.com/opencv-perspectivetransform/.Google ScholarGoogle Scholar
  20. Hugging Face. WER - a hugging face space by evaluate-metric. https://huggingface.co/spaces/evaluate-metric/wer.Google ScholarGoogle Scholar
  21. Anna Maria Feit, Daryl Weir, and Antti Oulasvirta. How we type: Movement strategies and performance in everyday typing. In Proc. of CHI, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. of CVPR, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  23. Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching machines to read and comprehend. In Proc. of NIPS, 2015.Google ScholarGoogle Scholar
  24. Jayakumar Hoskere. Everyday ai: Beyond spell check, how google docs is smart enough to correct grammar | google cloud blog. https://cloud.google.com/blog/products/g-suite/everyday-ai-beyond-spell-check-how\-google-docs-is-smart-enough-to-correct-grammar.Google ScholarGoogle Scholar
  25. Wolfgang Jank. The em algorithm, its randomized implementation and global optimization: Some challenges and opportunities for operations research. In Perspectives in operations research. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  26. Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In Proc. of ICML, 2018.Google ScholarGoogle Scholar
  27. Xinhui Jiang, Jussi P.P. Jokinen, Antti Oulasvirta, and Xiangshi Ren. Learning to type with mobile keyboards: Findings with a randomized keyboard. Comput. Hum. Behav., 126, jan 2022.Google ScholarGoogle Scholar
  28. Wenqiang Jin, Srinivasan Murali, Huadi Zhu, and Ming Li. Periscope: A keystroke inference attack using human coupled electromagnetic emanations. In Proc. of ACM CCS, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, and Gerhard Rigoll. Real-time hand gesture detection and classification using convolutional neural networks. In Proc. of IEEE FG 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Dominik Kulon, Riza Alp Guler, Iasonas Kokkinos, Michael M Bronstein, and Stefanos Zafeiriou. Weakly-supervised mesh-convolutional hand reconstruction in the wild. In Proc. of CVPR, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  31. Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics. Doklady, 10:707-710, 1965.Google ScholarGoogle Scholar
  32. Mengcheng Li, Liang An, Hongwen Zhang, Lianpeng Wu, Feng Chen, Tao Yu, and Yebin Liu. Interacting attention graph for single image two-hand reconstruction. In Proc. of CVPR, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  33. Mengyuan Li, Yan Meng, Junyi Liu, Haojin Zhu, Xiaohui Liang, Yao Liu, and Na Ruan. When CSI meets public WiFi: Inferring your mobile phone password via WiFi signals. In Proc. of ACM CCS, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. John Lim, True Price, Fabian Monrose, and Jan-Michael Frahm. Revisiting the threat space for vision-based keystroke inference attacks. In Proc. of ECCV, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kang Ling, Yuntang Liu, Ke Sun, Wei Wang, Lei Xie, and Qing Gu. Spidermon: Towards using cell towers as illuminating sources for keystroke monitoring. In Proc. of IEEE INFOCOM, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Shiqing Luo, Xinyu Hu, and Zhisheng Yan. Holologger: Keystroke inference on mixed reality head mounted displays. In Proc. of IEEE VR, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  37. Philip Marquardt, Arunabh Verma, Henry Carter, and Patrick Traynor. (Sp)IPhone: Decoding vibrations from nearby keyboards using mobile phone accelerometers. In Proc. of ACM CCS, 2011.Google ScholarGoogle Scholar
  38. SMM Martens, Joris M Mooij, N Jeremy Hill, Jason Farquhar, and Bernhard Schölkopf. A graphical model framework for decoding in the visual ERP-Based BCI speller. Neural Computation, 23(1):160-182, 01 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Joanna Materzynska, Guillaume Berger, Ingo Bax, and Roland Memisevic. The jester dataset: A large-scale video dataset of human gestures. In Proc. of IEEE/CVF ICCVW, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  40. MediaPipe Hands. Javascript solution api. https://google.github.io/mediapipe/solutions/hands#javascript-solution-api.Google ScholarGoogle Scholar
  41. Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. Ganerated hands for real-time 3d hand tracking from monocular rgb. In Proc. of CVPR, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  42. Lawrence Rabiner and Biinghwang Juang. An introduction to hidden markov models. IEEE Acoustics, Speech, and Signal Processing magazine, 3(1):4-16, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  43. Rahul Raguram, Andrew M. White, Dibyendusekhar Goswami, Fabian Monrose, and Jan-Michael Frahm. ISpy: Automatic reconstruction of typed input from compromising reflections. In Proc. of ACM CCS, 2011.Google ScholarGoogle Scholar
  44. Scott Reed, Honglak Lee, Dragomir Anguelov, Christian Szegedy, Dumitru Erhan, and Andrew Rabinovich. Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596, 2014.Google ScholarGoogle Scholar
  45. Mohd Sabra, Anindya Maiti, and Murtuza Jadliwala. Zoom on the keystrokes: Exploiting video calls for keystroke inference attacks. CoRR, abs/2010.12078, 2020.Google ScholarGoogle Scholar
  46. SeleniumHQ. SeleniumHQ/selenium: A browser automation framework and ecosystem. https://github.com/SeleniumHQ/selenium.Google ScholarGoogle Scholar
  47. Diksha Shukla, Rajesh Kumar, Abdul Serwadda, and Vir V. Phoha. Beware, your hands reveal your secrets! In Proc. of ACM CCS, 2014.Google ScholarGoogle Scholar
  48. Hwanjun Song, Minseok Kim, Dongmin Park, and Jae-Gil Lee. Learning from noisy labels with deep neural networks: A survey. CoRR, abs/2007.08199, 2020.Google ScholarGoogle Scholar
  49. University of Notre Dame. The frequency of the letters of the alphabet in english. https://www3.nd.edu/~busiforc/handouts/cryptography/letterfrequencies.html.Google ScholarGoogle Scholar
  50. Andrew Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE transactions on Information Theory, 13(2):260- 269, 1967.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. He Wang, Ted Tsung-Te Lai, and Romit Roy Choudhury. MoLe: Motion leaks through smartwatch sensors. In Proc. of MobiCom, 2015.Google ScholarGoogle Scholar
  52. Jiayi Wang et al. RGB2Hands: Real-time tracking of 3D hand interactions from monocular RGB video. ACM Trans. Graph., nov 2020.Google ScholarGoogle Scholar
  53. Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. CoRR, abs/1611.05431, 2016.Google ScholarGoogle Scholar
  54. Yi Xu, Jared Heinly, Andrew M White, Fabian Monrose, and Jan-Michael Frahm. Seeing double: Reconstructing obscured typed input from repeated compromising reflections. In Proc. of ACM CCS, 2013.Google ScholarGoogle Scholar
  55. Xin Yi, Chun Yu, Mingrui Zhang, Sida Gao, Ke Sun, and Yuanchun Shi. Atk: Enabling ten-finger freehand typing in air based on 3d hand tracking data. In Proc of UIST, 2015.Google ScholarGoogle Scholar
  56. Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. How does disagreement help generalization against label corruption? In Proc. of ICML, 2019.Google ScholarGoogle Scholar
  57. Qinggang Yue, Zhen Ling, Xinwen Fu, Benyuan Liu, Kui Ren, and Wei Zhao. Blind recognition of touched keys on mobile devices. In Proc. of ACM CCS, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Qinggang Yue, Zhen Ling, Wei Yu, Benyuan Liu, and Xinwen Fu. Blind recognition of text input on mobile devices via natural language processing. In Proc. of PAMCO, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Chen Yunfang, Zhu Yihong, Zhou Hao, Chen Wei, and Zhang Wei. Enhanced keystroke recognition based on moving distance of keystrokes through WiFi. In Proc. of NSS, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  60. Baowen Zhang, Yangang Wang, Xiaoming Deng, Yinda Zhang, Ping Tan, Cuixia Ma, and Hongan Wang. Interacting two-hand 3D pose and shape reconstruction from single color image. In Proc. of ICCV, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  61. Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, and Matthias Grundmann. MediaPipe Hands: On-device real-time hand tracking. CoRR, abs/2006.10214, 2020.Google ScholarGoogle Scholar
  62. Hongyi Zhang, Moustapha Cissé, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. CoRR, abs/1710.09412, 2017.Google ScholarGoogle Scholar
  63. Linjun Zhang, Zhun Deng, Kenji Kawaguchi, Amirata Ghorbani, and James Y. Zou. How does mixup help with robustness and generalization? CoRR, abs/2010.04819, 2020.Google ScholarGoogle Scholar
  64. Yifan Zhang, Congqi Cao, Jian Cheng, and Hanqing Lu. Egogesture: A new dataset and benchmark for egocentric hand gesture recognition. IEEE Transactions on Multimedia, 20(5):1038-1050, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  65. Li Zhuang, Feng Zhou, and J. Tygar. Keyboard acoustic emanations revisited. In Proc. of ACM CCS, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image Guide Proceedings
    SEC '23: Proceedings of the 32nd USENIX Conference on Security Symposium
    August 2023
    7552 pages
    ISBN:978-1-939133-37-3

    Copyright © 2023 The USENIX Association

    Publisher

    USENIX Association

    United States

    Publication History

    • Published: 9 August 2023

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate40of100submissions,40%
  • Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics