Abstract
Voice-over-IP (VoIP) software are among the most widely spread and pervasive software, counting millions of monthly users. However, we argue that people ignore the drawbacks of transmitting information along with their voice, such as keystroke sounds—as such sound can reveal what someone is typing on a keyboard.
In this article, we present and assess a new keyboard acoustic eavesdropping attack that involves VoIP, called Skype & Type (S&T). Unlike previous attacks, S&T assumes a weak adversary model that is very practical in many real-world settings. Indeed, S&T is very feasible, as it does not require (i) the attacker to be physically close to the victim (either in person or with a recording device) and (ii) precise profiling of the victim’s typing style and keyboard; moreover, it can work with a very small amount of leaked keystrokes. We observe that leakage of keystrokes during a VoIP call is likely, as people often “multi-task” during such calls. As expected, VoIP software acquires and faithfully transmits all sounds, including emanations of pressed keystrokes, which can include passwords and other sensitive information. We show that one very popular VoIP software (Skype) conveys enough audio information to reconstruct the victim’s input—keystrokes typed on the remote keyboard. Our results demonstrate that, given some knowledge on the victim’s typing style and keyboard model, the attacker attains top-5 accuracy of 91.7% in guessing a random key pressed by the victim. This work extends previous results on S&T, demonstrating that our attack is effective with many different recording devices (such as laptop microphones, headset microphones, and smartphones located in proximity of the target keyboard), diverse typing styles and speed, and is particularly threatening when the victim is typing in a known language.
- [n.d.]. Opus Codec Support. Retrieved from https://wiki.xiph.org/OpuSupport.Google Scholar
- [n.d.]. Oxford Dictionary. Which Letters in the Alphabet Are Used Most Often. Retrieved from http://www.oxforddictionaries.com/words/which-letters-are-used-most.Google Scholar
- [n.d.]. 2015: Skype’s Year in Review. Retrieved from http://blogs.skype.com/2015/12/17/2015-skypes-year-in-review/.Google Scholar
- [n.d.]. Over 1 billion Skype Mobile Downloads. Retrieved from http://blogs.skype.com/2016/04/28/over-1-billion-skype-mobile-downloads-thank-you/.Google Scholar
- [n.d.]. Microsoft BUILD 2016 Keynote. Retrieved from https://channel9.msdn.com/Events/Build/2016/KEY01.Google Scholar
- Kamran Ali, Alex Liu, Wei Wang, and Muhammad Shahzad. 2015. Keystroke recognition using WiFi signals. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom’15). 90--102.Google Scholar
Digital Library
- S. Abhishek Anand and Nitesh Saxena. 2018. Keyboard emanations in remote voice calls: Password leakage and noise (less) masking defenses. In Proceedings of the ACM Conference on Data and Application Security and Privacy (CODASPY’18).Google Scholar
- Dmitri Asonov and Rakesh Agrawal. 2004. Keyboard acoustic emanations. In Proceedings of the IEEE Symposium on Security and Privacy (S8P’04). 3--11.Google Scholar
Cross Ref
- Kiran S. Balagani, Mauro Conti, Paolo Gasti, Martin Georgiev, Tristan Gurtler, Daniele Lain, Charissa Miller, Kendall Molas, Nikita Samarin, Eugen Saraci, et al. 2018. SILK-TV: Secret information leakage from keystroke timing videos. In Proceedings of the European Symposium on Research in Computer Security. Springer, 263--280.Google Scholar
Cross Ref
- Davide Balzarotti, Marco Cova, and Giovanni Vigna. 2008. Clearshot: Eavesdropping on keyboard input from video. In Proceedings of the IEEE Symposium on Security and Privacy (S8P’08). 170--183.Google Scholar
Digital Library
- Dipak Basu. 2000. Dictionary of Pure and Applied Physics. CRC Press.Google Scholar
- Yigael Berger, Avishai Wool, and Arie Yeredor. 2006. Dictionary attacks using keyboard acoustic emanations. In Proceedings of the ACM Conference on Computer and Communications Security (CCS’06). 245--254.Google Scholar
Digital Library
- Stephen Boyd, Corinna Cortes, Mehryar Mohri, and Ana Radovanovic. 2012. Accuracy at the top. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NIPS’12). 953--961.Google Scholar
- Stuart Card, Thomas Moran, and Allen Newell. 1980. The keystroke-level model for user performance time with interactive systems. Commun. ACM 23, 7 (1980), 396--410.Google Scholar
Digital Library
- Yimin Chen, Tao Li, Rui Zhang, Yanchao Zhang, and Terri Hedgpeth. 2018. Eyetell: Video-assisted touchscreen keystroke inference from eye movements. In Proceedings of the IEEE Symposium on Security and Privacy (S8P’18). 144--160.Google Scholar
Cross Ref
- Charles J. Clopper and Egon S. Pearson. 1934. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26, 4 (1934), 404--413.Google Scholar
Cross Ref
- Alberto Compagno, Mauro Conti, Daniele Lain, and Gene Tsudik. 2017. Don’t skype 8 type!: Acoustic eavesdropping in voice-over-IP. In Proceedings of the ACM Asia Conference on Computer and Communications Security (ASIACCS’17).Google Scholar
- Anupam Das, Nikita Borisov, and Matthew Caesar. 2014. Do you hear what I hear?: Fingerprinting smart devices through embedded acoustic components. In Proceedings of the ACM Conference on Computer and Communications Security (CCS’14). 441--452.Google Scholar
Digital Library
- Song Fang, Ian Markwood, Yao Liu, Shangqing Zhao, Zhuo Lu, and Haojin Zhu. 2018. No training hurdles: Fast training-agnostic attacks to infer your typing. In Proceedings of the ACM Conference on Computer and Communications Security (CCS’18). 1747--1760.Google Scholar
Digital Library
- Jeffrey Friedman. 1972. Tempest: A signal problem. NSA Cryptologic Spectrum (1972). https://www.nsa.gov/Portals/70/documents/news-features/declassified-documents/cryptologic-spectrum/tempest.pdf.Google Scholar
- Daniel Genkin, Mihir Pattani, Roei Schuster, and Eran Tromer. 2018. Synesthesia: Detecting screen content via remote acoustic side channels. In Proceedings of the IEEE Symposium on Security and Privacy (S8P’19).Google Scholar
- Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. 2002. Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 1--3 (2002), 389--422.Google Scholar
Digital Library
- Tzipora Halevi and Nitesh Saxena. 2012. A closer look at keyboard acoustic emanations: Random passwords, typing styles and decoding techniques. In Proceedings of the ACM Conference on Computer and Communications Security (CCS’12). 89--90.Google Scholar
Digital Library
- Tzipora Halevi and Nitesh Saxena. 2015. Keyboard acoustic side channel attacks: Exploring realistic and security-sensitive scenarios. Int. J. Inf. Secur. 14, 5 (2015), 443--456.Google Scholar
Digital Library
- Kun Jin, Si Fang, Chunyi Peng, Zhiyang Teng, Xufei Mao, Lan Zhang, and Xiangyang Li. 2017. Vivisnoop: Someone is snooping your typing without seeing it!. In Proceedings of the IEEE Conference on Communications and Network Security Dedicated to Communications and Network Security (CNS’17). 1--9.Google Scholar
Cross Ref
- Tyler Kaczmarek, Ercan Ozturk, and Gene Tsudik. 2018. Thermanator: Thermal residue-based post factum attacks on keyboard password entry. In Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security (ASIACCS’19).Google Scholar
- Tadayoshi Kohno, Andre Broido, and Kimberly Claffy. 2005. Remote physical device fingerprinting. IEEE Trans. Depend. Sec. Comput. 2, 2 (2005), 93--108.Google Scholar
Digital Library
- Paul Lamere, Philip Kwok, Evandro Gouvea, Bhiksha Raj, Rita Singh, William Walker, Manfred Warmuth, and Peter Wolf. 2003. The CMU SPHINX-4 speech recognition system. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’03), Vol. 1. 2--5.Google Scholar
- Jian Liu, Yan Wang, Gorkem Kar, Yingying Chen, Jie Yang, and Marco Gruteser. 2015. Snooping keystrokes with mm-level audio ranging on a single phone. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom’15). 142--154.Google Scholar
Digital Library
- Beth Logan et al. 2000. Mel frequency cepstral coefficients for music modeling. In Proceedings of the Conference of the International Society for Music Information Retrieval (ISMIR’08).Google Scholar
- Jan Lukas, Jessica Fridrich, and Miroslav Goljan. 2006. Digital camera identification from sensor pattern noise. IEEE Trans. Inf. Forens. Secur. 1, 2 (2006), 205--214.Google Scholar
Digital Library
- Philip Marquardt, Arunabh Verma, Henry Carter, and Patrick Traynor. 2011. (sp) iPhone: Decoding vibrations from nearby keyboards using mobile phone accelerometers. In Proceedings of the ACM Conference on Computer and Communications Security (CCS’11). 551--562.Google Scholar
Digital Library
- Zdenek Martinasek, Vlastimil Clupek, and Krisztina Trasy. 2015. Acoustic attack on keyboard using spectrogram and neural network. In Proceedings of the International Conference on Telecommunications and Signal Processing (TSP’15). 637--641.Google Scholar
Cross Ref
- John Monaco. 2018. SoK: Keylogging side channels. In Proceedings of the IEEE Symposium on Security and Privacy (S8P’18).Google Scholar
Cross Ref
- E. H. Rothauser, W. D. Chapman, N. Guttman, K. S. Nordby, H. R. Silbiger, G. E. Urbanek, and M. Weinstock. 1969. IEEE recommended practice for speech quality measurements. IEEE Trans. Aud. Electroacoust. 17, 3 (1969), 225--246.Google Scholar
Cross Ref
- Diksha Shukla, Rajesh Kumar, Abdul Serwadda, and Vir Phoha. 2014. Beware, your hands reveal your secrets!. In Proceedings of the ACM Conference on Computer and Communications Security (CCS’14). 904--917.Google Scholar
- Dawn Xiaodong Song, David Wagner, and Xuqing Tian. 2001. Timing analysis of keystrokes and timing attacks on SSH. In Proceedings of the USENIX Security Symposium, Vol. 2001.Google Scholar
- Jean-Marc Valin, Koen Vos, and T. Terriberry. 2012. Definition of the Opus audio codec. IETF J. RFC 6716: 1-326 (2012).Google Scholar
- Martin Vuagnoux and Sylvain Pasini. 2009. Compromising electromagnetic emanations of wired and wireless keyboards. In Proceedings of the USENIX Security Symposium. 1--16.Google Scholar
- Junjue Wang, Kaichen Zhao, Xinyu Zhang, and Chunyi Peng. 2014. Ubiquitous keyboard for small mobile devices: Harnessing multipath fading for fine-grained keystroke localization. In Proceedings of the ACM International Conference on Mobile Systems, Applications, and Services (MobiSys’14). 14--27.Google Scholar
Digital Library
- R. L. Wegel and C. E. Lane. 1924. The auditory masking of one pure tone by another and its probable relation to the dynamics of the inner ear. Phys. Rev. 23, 2 (1924), 266.Google Scholar
Cross Ref
- Teng Wei, Shu Wang, Anfu Zhou, and Xinyu Zhang. 2015. Acoustic eavesdropping through wireless vibrometry. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom’15). 130--141.Google Scholar
Digital Library
- Wojciech Wodo and Lucjan Hanzlik. 2016. Thermal imaging attacks on keypad security systems. In Proceedings of the International Joint Conference on E-Business and Telecommunications (ICETE’16).Google Scholar
Digital Library
- Tong Zhu, Qiang Ma, Shanfeng Zhang, and Yunhao Liu. 2014. Context-free attacks using keyboard acoustic emanations. In Proceedings of the ACM Conference on Computer and Communications Security (CCS’14). 453--464.Google Scholar
Digital Library
- Li Zhuang, Feng Zhou, and Doug Tygar. 2009. Keyboard acoustic emanations revisited. ACM Trans. Inf. Syst. Sec. 13, 1 (2009), 3.Google Scholar
Digital Library
Index Terms
Skype & Type: Keyboard Eavesdropping in Voice-over-IP
Recommendations
Don't Skype & Type!: Acoustic Eavesdropping in Voice-Over-IP
ASIA CCS '17: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications SecurityAcoustic emanations of computer keyboards represent a serious privacy issue. As demonstrated in prior work, physical properties of keystroke sounds might reveal what a user is typing. However, previous attacks assumed relatively strong adversary models ...
3D Freehand Gestural Navigation for Interactive Public Displays
Users increasingly expect more-interactive experiences with public displays for applications including learning, gaming, urban visualization, and planning. However, user interaction with applications on public displays is challenging and often doesn't ...
Traffic analysis attacks on Skype VoIP calls
Skype is one of the most popular voice-over-IP (VoIP) service providers. One of the main reasons for the popularity of Skype VoIP services is its unique set of features to protect privacy of VoIP calls such as strong encryption, proprietary protocols, ...






Comments