skip to main content
research-article

SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems

Authors Info & Claims
Published:19 May 2022Publication History
Skip Abstract Section

Abstract

With the wide use of Automatic Speech Recognition (ASR) in applications such as human machine interaction, simultaneous interpretation, audio transcription, and so on, its security protection becomes increasingly important. Although recent studies have brought to light the weaknesses of popular ASR systems that enable out-of-band signal attack, adversarial attack, and so on, and further proposed various remedies (signal smoothing, adversarial training, etc.), a systematic understanding of ASR security (both attacks and defenses) is still missing, especially on how realistic such threats are and how general existing protection could be. In this article, we present our systematization of knowledge for ASR security and provide a comprehensive taxonomy for existing work based on a modularized workflow. More importantly, we align the research in this domain with that on security in Image Recognition System (IRS), which has been extensively studied, using the domain knowledge in the latter to help understand where we stand in the former. Generally, both IRS and ASR are perceptual systems. Their similarities allow us to systematically study existing literature in ASR security based on the spectrum of attacks and defense solutions proposed for IRS, and pinpoint the directions of more advanced attacks and the directions potentially leading to more effective protection in ASR. In contrast, their differences, especially the complexity of ASR compared with IRS, help us learn unique challenges and opportunities in ASR security. Particularly, our experimental study shows that transfer attacks across ASR models are feasible, even in the absence of knowledge about models (even their types) and training data.

REFERENCES

  1. [1] Abdoli Sajjad, Hafemann Luiz G., Rony Jerome, Ayed Ismail Ben, Cardinal Patrick, and Koerich Alessandro L.. 2019. Universal adversarial audio perturbations. IEEE Trans. Pattern Anal. Mach. Intell. (2019).Google ScholarGoogle Scholar
  2. [2] Abdullah Hadi, Garcia Washington, Peeters Christian, Traynor Patrick, Butler Kevin R. B., and Wilson Joseph. 2019. Practical hidden voice attacks against speech and speaker recognition systems. In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS).Google ScholarGoogle Scholar
  3. [3] Abdullah Hadi, Rahman Muhammad Sajidur, Garcia Washington, Blue Logan, Warren Kevin, Yadav Anurag Swarnim, Shrimpton Tom, and Traynor Patrick. 2021. Hear “No Evil,” see “Kenansville”: Efficient and transferable black-box attacks on speech recognition and voice identification systems. In 42nd IEEE Symposium on Security and Privacy.Google ScholarGoogle Scholar
  4. [4] Abdullah Hadi, Rahman Muhammad Sajidur, Peeters Christian, Gibson Cassidy, Garcia Washington, Bindschaedler Vincent, Shrimpton Thomas, and Traynor Patrick. 2021. Beyond \( L\_p \) clipping: Equalization-based psychoacoustic attacks against ASRs. arXiv preprint arXiv:2110.13250 (2021).Google ScholarGoogle Scholar
  5. [5] Abdullah Hadi, Warren Kevin, Bindschaedler Vincent, Papernot Nicolas, and Traynor Patrick. 2021. SoK: The faults in our ASRs: An overview of attacks against automatic speech recognition and speaker identification systems. In 42nd IEEE Symposium on Security and Privacy.Google ScholarGoogle Scholar
  6. [6] Akinwande Victor, Cintas Celia, Speakman Skyler, and Sridharan Srihari. 2020. Identifying audio adversarial examples via anomalous pattern detection. arXiv preprint arXiv:2002.05463 (2020).Google ScholarGoogle Scholar
  7. [7] Alanwar Amr, Balaji Bharathan, Tian Yuan, Yang Shuo, and Srivastava Mani. 2017. EchoSafe: Sonar-based verifiable interaction with intelligent digital agents. In 1st ACM Workshop on the Internet of Safe Things. 3843.Google ScholarGoogle Scholar
  8. [8] Alepis Efthimios and Patsakis Constantinos. 2017. Monkey says, monkey does: Security and privacy on voice assistants. IEEE Access 5 (2017), 1784117851.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Alzantot Moustafa, Balaji Bharathan, and Srivastava Mani. 2017. Did you hear that? Adversarial examples against automatic speech recognition. In NIPS 2017 Machine Deception Workshop.Google ScholarGoogle Scholar
  10. [10] Amodei Dario, Ananthanarayanan Sundaram, Anubhai Rishita, Bai Jingliang, Battenberg Eric, Case Carl, Casper Jared, Catanzaro Bryan, Cheng Qiang, Chen Guoliang, et al. 2016. Deep Speech 2: End-to-end speech recognition in English and Mandarin. In International Conference on Machine Learning. 173182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Athalye Anish, Engstrom Logan, Ilyas Andrew, and Kwok Kevin. 2018. Synthesizing robust adversarial examples. In International Conference on Machine Learning. PMLR, 284293.Google ScholarGoogle Scholar
  12. [12] Benesty Jacob, Makino Shoji, and Chen Jingdong. 2006. Speech Enhancement. Springer Science & Business Media.Google ScholarGoogle Scholar
  13. [13] Bispham Mary K., Agrafiotis Ioannis, and Goldsmith Michael. 2019. Nonsense attacks on Google Assistant and missense attacks on Amazon Alexa. (2019).Google ScholarGoogle Scholar
  14. [14] Blue Logan, Abdullah Hadi, Vargas Luis, and Traynor Patrick. 2018. 2MA: Verifying voice commands via two microphone authentication. In Asia Conference on Computer and Communications Security. 89100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Bolton Connor, Rampazzi Sara, Li Chaohao, Kwong Andrew, Xu Wenyuan, and Fu Kevin. 2018. Blue Note: How intentional acoustic interference damages availability and integrity in hard disk drives and operating systems. In IEEE Symposium on Security and Privacy (SP). IEEE, 10481062.Google ScholarGoogle Scholar
  16. [16] Byun Junyoung, Go Hyojun, and Kim Changick. 2021. Small input noise is enough to defend against query-based black-box attacks. arXiv preprint arXiv:2101.04829 (2021).Google ScholarGoogle Scholar
  17. [17] Cao Yulong, Wang Ningfei, Xiao Chaowei, Yang Dawei, Fang Jin, Yang Ruigang, Chen Qi Alfred, Liu Mingyan, and Li Bo. 2021. Invisible for both camera and lidar: Security of multi-sensor fusion based perception in autonomous driving under physical-world attacks. In IEEE Symposium on Security and Privacy (SP). IEEE, 176194.Google ScholarGoogle Scholar
  18. [18] Caputo D., Verderame L., Merlo A., Ranieri A., and Caviglione L.. 2020. Are you (Google) home? Detecting users’ presence through traffic analysis of smart speakers. ITASEC 2020.Google ScholarGoogle Scholar
  19. [19] Cardoso J.-F. and Laheld Beate H.. 1996. Equivariant adaptive source separation. IEEE Trans. Sig. Process. 44, 12 (1996), 30173030.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Carlini Nicholas, Mishra Pratyush, Vaidya Tavish, Zhang Yuankai, Sherr Micah, Shields Clay, Wagner David, and Zhou Wenchao. 2016. Hidden voice commands. In 25th USENIX Security Symposium (USENIX Security’16). 513530.Google ScholarGoogle Scholar
  21. [21] Carlini Nicholas and Wagner David. 2017. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (SP). IEEE, 3957.Google ScholarGoogle Scholar
  22. [22] Carlini Nicholas and Wagner David. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In IEEE Security and Privacy Workshops (SPW). IEEE, 17.Google ScholarGoogle Scholar
  23. [23] Chai Lucy, Illandara Thavishi, and Yan Zhongxia. [n.d.]. Private speech adversaries. ([n. d.]).Google ScholarGoogle Scholar
  24. [24] Chang Kuei-Huan, Huang Po-Hao, Yu Honggang, Jin Yier, and Wang Ting-Chi. 2020. Audio adversarial examples generation with recurrent neural networks. In 25th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 488493.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Chen Guangke, Chen Sen, Fan Lingling, Du Xiaoning, Zhao Zhe, Song Fu, and Liu Yang. 2019. Who is real Bob? Adversarial attacks on speaker recognition systems. arXiv preprint arXiv:1911.01840 (2019).Google ScholarGoogle Scholar
  26. [26] Chen Jingdong, Benesty Jacob, Huang Yiteng, and Doclo Simon. 2006. New insights into the noise reduction Wiener filter. IEEE Trans. Audio, Speech Lang. Process. 14, 4 (2006), 12181234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Chen Pin-Yu, Zhang Huan, Sharma Yash, Yi Jinfeng, and Hsieh Cho-Jui. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In 10th ACM Workshop on Artificial Intelligence and Security. 1526.Google ScholarGoogle Scholar
  28. [28] Chen Steven, Carlini Nicholas, and Wagner David. 2020. Stateful detection of black-box adversarial attacks. In 1st ACM Workshop on Security and Privacy on Artificial Intelligence. 3039.Google ScholarGoogle Scholar
  29. [29] Chen Si, Ren Kui, Piao Sixu, Wang Cong, Wang Qian, Weng Jian, Su Lu, and Mohaisen Aziz. 2017. You can hear but you cannot steal: Defending against voice impersonation attacks on smartphones. In IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 183195.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Chen Tao, Ma Kai-Kuang, and Chen Li-Hui. 1999. Tri-state median filter for image denoising. IEEE Trans. Image Process. 8, 12 (1999), 18341838.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Chen Tao, Shangguan Longfei, Li Zhenjiang, and Jamieson Kyle. 2020. Metamorph: Injecting inaudible commands into over-the-air voice controlled systems. NDSS (2020).Google ScholarGoogle Scholar
  32. [32] Chen Yuxin, Li Huiying, Nagels Steven, Li Zhijing, Lopes Pedro, Zhao Ben Y., and Zheng Haitao. 2019. Understanding the effectiveness of ultrasonic microphone jammer. arXiv preprint arXiv:1904.08490 (2019).Google ScholarGoogle Scholar
  33. [33] Chen Yuxuan, Yuan Xuejing, Zhang Jiangshan, Zhao Yue, Zhang Shengzhi, Chen Kai, and Wang XiaoFeng. 2020. Devil’s Whisper: A general approach for physical adversarial attacks against commercial black-box speech recognition devices. In 29th USENIX Security Symposium (USENIX Security’20).Google ScholarGoogle Scholar
  34. [34] Cisse Moustapha, Adi Yossi, Neverova Natalia, and Keshet Joseph. 2017. Houdini: Fooling deep structured prediction models. NIPS (2017).Google ScholarGoogle Scholar
  35. [35] Collobert Ronan, Puhrsch Christian, and Synnaeve Gabriel. 2016. Wav2Letter: An end-to-end convNet-based speech recognition system. arXiv preprint arXiv:1609.03193 (2016).Google ScholarGoogle Scholar
  36. [36] Das Nilaksh, Shanbhogue Madhuri, Chen Shang-Tse, Chen Li, Kounavis Michael E., and Chau Duen Horng. 2018. Adagio: Interactive experimentation with adversarial attack and defense for audio. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 677681.Google ScholarGoogle Scholar
  37. [37] Demontis Ambra, Melis Marco, Pintor Maura, Jagielski Matthew, Biggio Battista, Oprea Alina, Nita-Rotaru Cristina, and Roli Fabio. 2019. Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks. In 28th USENIX Security Symposium (USENIX Security’19). 321338.Google ScholarGoogle Scholar
  38. [38] Deng Guang and Cahill L. W.. 1993. An adaptive Gaussian filter for noise reduction and edge detection. In IEEE Conference Record Nuclear Science Symposium and Medical Imaging Conference. IEEE, 16151619.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Dong Yinpeng, Liao Fangzhou, Pang Tianyu, Su Hang, Zhu Jun, Hu Xiaolin, and Li Jianguo. 2018. Boosting adversarial attacks with momentum. In IEEE Conference on Computer Vision and Pattern Recognition. 91859193.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Du Tianyu, Ji Shouling, Li Jinfeng, Gu Qinchen, Wang Ting, and Beyah Raheem. 2020. SirenAttack: Generating adversarial audio for end-to-end acoustic systems. ASIACCS (2020).Google ScholarGoogle Scholar
  41. [41] Dumpala Sri Harsha, Sheikh Imran, Chakraborty Rupayan, and Kopparapu Sunil Kumar. 2019. Improving ASR robustness to perturbed speech using cycle-consistent generative adversarial networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 57265730.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Eykholt Kevin, Evtimov Ivan, Fernandes Earlence, Li Bo, Rahmati Amir, Xiao Chaowei, Prakash Atul, Kohno Tadayoshi, and Song Dawn. 2018. Robust physical-world attacks on deep learning visual classification. In IEEE Conference on Computer Vision and Pattern Recognition. 16251634.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Feng Huan, Fawaz Kassem, and Shin Kang G.. 2017. Continuous authentication for voice assistants. In 23rd Annual International Conference on Mobile Computing and Networking. 343355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Fu Kevin and Xu Wenyuan. 2018. Risks of trusting the physics of sensors. Commun. ACM 61, 2 (2018), 2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Gong Taesik, Ramos Alberto Gil C. P., Bhattacharya Sourav, Mathur Akhil, and Kawsar Fahim. 2019. AudiDoS: Real-Time denial-of-service adversarial attacks on deep audio models. In 18th IEEE International Conference on Machine Learning And Applications (ICMLA). IEEE, 978985.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Gong Yuan, Li Boyang, Poellabauer Christian, and Shi Yiyu. 2019. Real-time adversarial attacks. In 28th International Joint Conference on Artificial Intelligence. AAAI Press, 46724680.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Gong Yuan and Poellabauer Christian. 2018. Crafting adversarial examples for speech paralinguistics applications. DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security (DYNAMICS) Workshop.Google ScholarGoogle Scholar
  48. [48] Gong Yuan and Poellabauer Christian. 2018. Protecting voice controlled systems using sound source identification based on acoustic cues. In 27th International Conference on Computer Communication and Networks (ICCCN). IEEE, 19.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Goodfellow Ian J., Shlens Jonathon, and Szegedy Christian. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google ScholarGoogle Scholar
  50. [50] Guo Qingli, Ye Jing, Chen Yiran, Hu Yu, Lan Yazhu, Zhang Guohe, and Li Xiaowei. 2020. INOR—An intelligent noise reduction method to defend against adversarial audio examples. Neurocomputing (2020).Google ScholarGoogle Scholar
  51. [51] Guo Qingli, Ye Jing, Hu Yu, Zhang Guohe, Li Xiaowei, and Li Huawei. 2020. MultiPAD: A multivariant partition-based method for audio adversarial examples detection. IEEE Access 8 (2020), 6336863380.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Han Joon Kuy, Kim Hyoungshick, and Woo Simon S.. 2019. Nickel to Lego: Using Foolgle to create adversarial examples to fool Google cloud speech-to-text API. In ACM SIGSAC Conference on Computer and Communications Security. 25932595.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Haykin Simon and Widrow Bernard. 2003. Least-mean-square Adaptive Filters. Vol. 31. John Wiley & Sons.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] He Yitao, Bian Junyu, Tong Xinyu, Qian Zihui, Zhu Wei, Tian Xiaohua, and Wang Xinbing. 2019. Canceling inaudible voice commands against voice control systems. In 25th Annual International Conference on Mobile Computing and Networking. 115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Huang Qian, Katsman Isay, He Horace, Gu Zeqi, Belongie Serge, and Lim Ser-Nam. 2019. Enhancing adversarial example transferability with an intermediate level attack. In IEEE International Conference on Computer Vision. 47334742.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Huang Wenchao, Xiong Yan, Li Xiang-Yang, Lin Hao, Mao Xufei, Yang Panlong, and Liu Yunhao. 2014. Shake and walk: Acoustic direction finding and fine-grained indoor localization using smartphones. In IEEE Conference on Computer Communications. IEEE, 370378.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Hussain Shehzeen, Neekhara Paarth, Dubnov Shlomo, McAuley Julian, and Koushanfar Farinaz. 2021. WaveGuard: Understanding and mitigating audio adversarial examples. In 30th USENIX Security Symposium (USENIX Security’21).Google ScholarGoogle Scholar
  58. [58] Jang Yeongjin, Song Chengyu, Chung Simon P., Wang Tielei, and Lee Wenke. 2014. A11y attacks: Exploiting accessibility in operating systems. In ACM SIGSAC Conference on Computer and Communications Security. 103115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Kasmi Chaouki and Esteves Jose Lopes. 2015. IEMI threats for information security: Remote command injection on modern smartphones. IEEE Trans. Electromag. Compatib. 57, 6 (2015), 17521755.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Khare Shreya, Aralikatte Rahul, and Mani Senthil. 2019. Adversarial black-box attacks on automatic speech recognition systems using multi-objective evolutionary optimization. In Interspeech Conference. 32083212.Google ScholarGoogle Scholar
  61. [61] Kong Yehao and Zhang Jiliang. 2019. Adversarial audio: A new information hiding method and backdoor for DNN-based speech recognition models. arXiv preprint arXiv:1904.03829 (2019).Google ScholarGoogle Scholar
  62. [62] Kreuk Felix, Adi Yossi, Cisse Moustapha, and Keshet Joseph. 2018. Fooling end-to-end speaker verification with adversarial examples. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 19621966.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Kumar Deepak, Paccagnella Riccardo, Murley Paul, Hennenfent Eric, Mason Joshua, Bates Adam, and Bailey Michael. 2018. Skill squatting attacks on amazon alexa. In 27th USENIX Security Symposium (USENIX Security’18). 3347.Google ScholarGoogle Scholar
  64. [64] Kurakin Alexey, Goodfellow Ian, and Bengio Samy. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).Google ScholarGoogle Scholar
  65. [65] Kurakin Alexey, Goodfellow Ian, and Bengio Samy. 2016. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016).Google ScholarGoogle Scholar
  66. [66] Kwon Hyun, Kim Yongchul, Yoon Hyunsoo, and Choi Daeseon. 2019. Selective audio adversarial example in evasion attack on speech recognition system. IEEE Trans. Inf. Forens. Secur. 15 (2019), 526538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Kwon Hyun, Yoon Hyunsoo, and Park Ki-Woong. 2019. POSTER: Detecting audio adversarial example through audio modification. In ACM SIGSAC Conference on Computer and Communications Security. 25212523.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] LeCun Yann et al. 2015. LeNet-5, convolutional neural networks. 20, 5 (2015), 14. Retrieved from http://yann.lecun.com/exdb/lenet.Google ScholarGoogle Scholar
  69. [69] Lee Yeonjoon, Zhao Yue, Zeng Jiutian, Lee Kwangwuk, Zhang Nan, Shezan Faysal Hossain, Tian Yuan, Chen Kai, and Wang XiaoFeng. 2020. Using sonar for liveness detection to protect smart speakers against remote attackers. Proc. ACM Interact., Mob., Wear. Ubiq. Technol. 4, 1 (2020), 128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Lei Xinyu, Tu Guan-Hua, Liu Alex X., Ali Kamran, Li Chi-Yu, and Xie Tian. 2017. The insecurity of home digital voice assistants–Amazon Alexa as a case study. arXiv preprint arXiv:1712.03327 (2017).Google ScholarGoogle Scholar
  71. [71] Li Juncheng, Qu Shuhui, Li Xinjian, Szurley Joseph, Kolter J. Zico, and Metze Florian. 2019. Adversarial music: Real world audio adversary against wake-word detection system. In Conference on Advances in Neural Information Processing Systems. 1190811918.Google ScholarGoogle Scholar
  72. [72] Li Xu, Zhong Jinghua, Wu Xixin, Yu Jianwei, Liu Xunying, and Meng Helen. 2020. Adversarial attacks on GMM I-Vector based speaker verification systems. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 65796583.Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Li Zhuohang, Shi Cong, Xie Yi, Liu Jian, Yuan Bo, and Chen Yingying. 2020. Practical adversarial attacks against speaker recognition systems. In 21st International Workshop on Mobile Computing Systems and Applications. 914.Google ScholarGoogle Scholar
  74. [74] Li Zhuohang, Wu Yi, Liu Jian, Chen Yingying, and Yuan Bo. 2020. AdvPulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations. In ACM SIGSAC Conference on Computer and Communications Security. 11211134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. [75] Liu Jinpeng and Tang Ke. 2013. Scaling up covariance matrix adaptation evolution strategy using cooperative coevolution. In International Conference on Intelligent Data Engineering and Automated Learning. Springer, 350357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. [76] Liu Songxiang, Wu Haibin, Lee Hung-Yi, and Meng Helen. 2019. Adversarial attacks on spoofing countermeasures of automatic speaker verification. ASRU (2019).Google ScholarGoogle Scholar
  77. [77] Liu Xiaolei, Zhang Xiaosong, Wan Kun, Zhu Qingxin, and Ding Yufei. 2020. Towards weighted-sampling audio adversarial example attack. AAAI (2020).Google ScholarGoogle ScholarCross RefCross Ref
  78. [78] Liu Yanpei, Chen Xinyun, Liu Chang, and Song Dawn. 2016. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770 (2016).Google ScholarGoogle Scholar
  79. [79] Madry Aleksander, Makelov Aleksandar, Schmidt Ludwig, Tsipras Dimitris, and Vladu Adrian. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017).Google ScholarGoogle Scholar
  80. [80] Meng Dongyu and Chen Hao. 2017. MagNet: A two-pronged defense against adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security. 135147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. [81] Meng Yan, Wang Zichang, Zhang Wei, Wu Peilin, Zhu Haojin, Liang Xiaohui, and Liu Yao. 2018. WiVo: Enhancing the security of voice control system via wireless signal in IoT environment. In 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing. 8190.Google ScholarGoogle Scholar
  82. [82] Mitev Richard, Miettinen Markus, and Sadeghi Ahmad-Reza. 2019. Alexa lied to me: Skill-based man-in-the-middle attacks on virtual assistants. In ACM Asia Conference on Computer and Communications Security. 465478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Muda Lindasalwa, Begam Mumtaj, and Elamvazuthi Irraivan. 2010. Voice recognition algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010).Google ScholarGoogle Scholar
  84. [84] Nakamura Taiki, Saito Yuki, Takamichi Shinnosuke, Ijima Yusuke, and Saruwatari Hiroshi. [n.d.]. V2S attack: Building DNN-based voice conversion from automatic speaker verification. In 10th ISCA Speech Synthesis Workshop. 161165.Google ScholarGoogle Scholar
  85. [85] Nashimoto Shoei, Suzuki Daisuke, Sugawara Takeshi, and Sakiyama Kazuo. 2018. Sensor CON-Fusion: Defeating Kalman filter in signal injection attack. In Asia Conference on Computer and Communications Security. 511524.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. [86] Neekhara Paarth, Hussain Shehzeen, Pandey Prakhar, Dubnov Shlomo, McAuley Julian, and Koushanfar Farinaz. 2019. Universal adversarial perturbations for speech recognition systems. In Interspeech Conference. 481485.Google ScholarGoogle Scholar
  87. [87] Oh Tae-Hyun, Dekel Tali, Kim Changil, Mosseri Inbar, Freeman William T., Rubinstein Michael, and Matusik Wojciech. 2019. Speech2Face: Learning the face behind a voice. In IEEE Conference on Computer Vision and Pattern Recognition. 75397548.Google ScholarGoogle ScholarCross RefCross Ref
  88. [88] Pang Ren, Shen Hua, Zhang Xinyang, Ji Shouling, Vorobeychik Yevgeniy, Luo Xiapu, Liu Alex, and Wang Ting. 2020. A tale of evil twins: Adversarial inputs versus poisoned models. In ACM SIGSAC Conference on Computer and Communications Security. 8599.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. [89] Pang Ren, Zhang Xinyang, Ji Shouling, Luo Xiapu, and Wang Ting. 2020. AdvMind: Inferring adversary intent of black-box attacks. In 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 18991907.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. [90] Pang Tianyu, Du Chao, Dong Yinpeng, and Zhu Jun. 2018. Towards robust detection of adversarial examples. In Conference on Advances in Neural Information Processing Systems. 45794589.Google ScholarGoogle Scholar
  91. [91] Papernot Nicolas, McDaniel Patrick, and Goodfellow Ian. 2016. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).Google ScholarGoogle Scholar
  92. [92] Papernot Nicolas, McDaniel Patrick, Goodfellow Ian, Jha Somesh, Celik Z. Berkay, and Swami Ananthram. 2017. Practical black-box attacks against machine learning. In ACM on Asia conference on Computer and Communications Security. 506519.Google ScholarGoogle Scholar
  93. [93] Papernot Nicolas, McDaniel Patrick, Jha Somesh, Fredrikson Matt, Celik Z. Berkay, and Swami Ananthram. 2016. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 372387.Google ScholarGoogle Scholar
  94. [94] Papernot Nicolas, McDaniel Patrick, Wu Xi, Jha Somesh, and Swami Ananthram. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy (SP). IEEE, 582597.Google ScholarGoogle Scholar
  95. [95] Qin Yao, Carlini Nicholas, Goodfellow Ian, Cottrell Garrison, and Raffel Colin. 2019. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. ICML (2019).Google ScholarGoogle Scholar
  96. [96] Quiring Erwin, Klein David, Arp Daniel, Johns Martin, and Rieck Konrad. 2020. Adversarial preprocessing: Understanding and preventing image-scaling attacks in machine learning. In 29th USENIX Security Symposium (USENIX Security’20).Google ScholarGoogle Scholar
  97. [97] Rajaratnam Krishan and Kalita Jugal. 2018. Noise flooding for detecting audio adversarial examples against automatic speech recognition. In IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 197201.Google ScholarGoogle Scholar
  98. [98] Rajaratnam Krishan, Shah Kunal, and Kalita Jugal. 2018. Isolated and ensemble audio preprocessing methods for detecting adversarial examples against automatic speech recognition. In 30th Conference on Computational Linguistics and Speech Processing (ROCLING’18). 1630.Google ScholarGoogle Scholar
  99. [99] Ross Andrew Slavin and Doshi-Velez Finale. 2017. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. arXiv preprint arXiv:1711.09404 (2017).Google ScholarGoogle Scholar
  100. [100] Roy Nirupam, Hassanieh Haitham, and Choudhury Romit Roy. 2017. BackDoor: Making microphones hear inaudible sounds. In 15th Annual International Conference on Mobile Systems, Applications, and Services. 214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. [101] Roy Nirupam, Shen Sheng, Hassanieh Haitham, and Choudhury Romit Roy. 2018. Inaudible voice commands: The long-range attack and defense. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI’18). 547560.Google ScholarGoogle Scholar
  102. [102] Sabour Sara, Cao Yanshuai, Faghri Fartash, and Fleet David J.. 2015. Adversarial manipulation of deep representations. arXiv preprint arXiv:1511.05122 (2015).Google ScholarGoogle Scholar
  103. [103] Samizade Saeid, Tan Zheng-Hua, Shen Chao, and Guan Xiaohong. 2020. Adversarial example detection by classification for deep speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 31023106.Google ScholarGoogle ScholarCross RefCross Ref
  104. [104] Schönherr Lea, Kohls Katharina, Zeiler Steffen, Holz Thorsten, and Kolossa Dorothea. 2019. Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. NDSS (2019).Google ScholarGoogle Scholar
  105. [105] Schönherr Lea, Zeiler Steffen, Holz Thorsten, and Kolossa Dorothea. 2019. Robust over-the-air adversarial examples against automatic speech recognition systems. arXiv preprint arXiv:1908.01551 (2019).Google ScholarGoogle Scholar
  106. [106] Sharif Mahmood, Bhagavatula Sruti, Bauer Lujo, and Reiter Michael K.. 2016. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In ACM SIGSAC Conference on Computer and Communications Security. 15281540.Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. [107] Shen Hao, Zhang Weiming, Fang Han, Ma Zehua, and Yu Nenghai. 2019. JamSys: Coverage optimization of a microphone jamming system based on ultrasounds. IEEE Access 7 (2019), 6748367496.Google ScholarGoogle ScholarCross RefCross Ref
  108. [108] Shen Jonathan, Nguyen Patrick, Wu Yonghui, Chen Zhifeng, Chen Mia X., Jia Ye, Kannan Anjuli, Sainath Tara, Cao Yuan, Chiu Chung-Cheng, et al. 2019. Lingvo: A modular and scalable framework for sequence-to-sequence modeling. arXiv preprint arXiv:1902.08295 (2019).Google ScholarGoogle Scholar
  109. [109] Shi Yucheng, Wang Siyu, and Han Yahong. 2019. Curls & Whey: Boosting black-box adversarial attacks. In IEEE Conference on Computer Vision and Pattern Recognition. 65196527.Google ScholarGoogle ScholarCross RefCross Ref
  110. [110] Shin Hocheol, Kim Dohyun, Kwon Yujin, and Kim Yongdae. 2017. Illusion and dazzle: Adversarial optical channel exploits against lidars for automotive applications. In International Conference on Cryptographic Hardware and Embedded Systems. Springer, 445467.Google ScholarGoogle ScholarCross RefCross Ref
  111. [111] Son Yunmok, Shin Hocheol, Kim Dongkwan, Park Youngseok, Noh Juhwan, Choi Kibum, Choi Jungwoo, and Kim Yongdae. 2015. Rocking drones with intentional sound noise on gyroscopic sensors. In 24th USENIX Security Symposium (USENIX Security’15). 881896.Google ScholarGoogle Scholar
  112. [112] Song Liwei and Mittal Prateek. 2017. Poster: Inaudible voice commands. In ACM SIGSAC Conference on Computer and Communications Security. 25832585.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. [113] Su Jiawei, Vargas Danilo Vasconcellos, and Sakurai Kouichi. 2019. One pixel attack for fooling deep neural networks. IEEE Trans. Evolut. Comput. 23, 5 (2019), 828841.Google ScholarGoogle ScholarCross RefCross Ref
  114. [114] Subramanian Vinod, Benetos Emmanouil, and Sandler Mark B.. 2019. Robustness of adversarial attacks in sound event classification. (2019).Google ScholarGoogle Scholar
  115. [115] Sugawara Takeshi, Cyr Benjamin, Rampazzi Sara, Genkin Daniel, and Fu Kevin. [n.d.]. Light commands: Laser-Based audio injection attacks on voice-controllable systems. ([n.d.]).Google ScholarGoogle Scholar
  116. [116] Sun Jiachen, Cao Yulong, Chen Qi Alfred, and Mao Z. Morley. 2020. Towards robust lidar-based perception in autonomous driving: General black-box adversarial sensor attack and countermeasures. In 29th USENIX Security Symposium (USENIX Security’20). 877894.Google ScholarGoogle Scholar
  117. [117] Sun Sining, Guo Pengcheng, Xie Lei, and Hwang Mei-Yuh. 2019. Adversarial regularization for attention based end-to-end robust speech recognition. IEEE/ACM Trans. Audio, Speech Lang. Process. 27, 11 (2019), 18261838.Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. [118] Sun Sining, Yeh Ching-Feng, Ostendorf Mari, Hwang Mei-Yuh, and Xie Lei. 2018. Training augmentation with adversarial examples for robust speech recognition. In Interspeech Conference. 24042408.Google ScholarGoogle ScholarCross RefCross Ref
  119. [119] Sun Zheng, Purohit Aveek, Bose Raja, and Zhang Pei. 2013. Spartacus: Spatially-aware interaction for mobile devices through energy-efficient audio sensing. In 11th Annual International Conference on Mobile Systems, Applications, and Services. 263276.Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. [120] Szegedy Christian, Zaremba Wojciech, Sutskever Ilya, Bruna Joan, Erhan Dumitru, Goodfellow Ian, and Fergus Rob. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google ScholarGoogle Scholar
  121. [121] Szurley Joseph and Kolter Zico J.. 2019. Perceptual based adversarial audio attacks. CoRR (2019).Google ScholarGoogle Scholar
  122. [122] Tamura Keiichi, Omagari Akitada, and Hashida Shuichi. 2019. Novel defense method against audio adversarial example for speech-to-text transcription neural networks. In IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA). IEEE, 115120.Google ScholarGoogle Scholar
  123. [123] Taori Rohan, Kamsetty Amog, Chu Brenton, and Vemuri Nikita. 2019. Targeted adversarial examples for black box audio systems. In IEEE Security and Privacy Workshops (SPW). IEEE, 1520.Google ScholarGoogle Scholar
  124. [124] Thanh Dang Ngoc Hoang, Engínoğlu Serdar, et al. 2019. An iterative mean filter for image denoising. IEEE Access 7 (2019), 167847167859.Google ScholarGoogle ScholarCross RefCross Ref
  125. [125] Tian Xiaohai, Das Rohan Kumar, and Li Haizhou. 2019. Black-box attacks on automatic speaker verification using feedback-controlled voice conversion. arXiv preprint arXiv:1909.07655 (2019).Google ScholarGoogle Scholar
  126. [126] Tramèr Florian, Kurakin Alexey, Papernot Nicolas, Goodfellow Ian, Boneh Dan, and McDaniel Patrick. 2017. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204 (2017).Google ScholarGoogle Scholar
  127. [127] Trippel Timothy, Weisse Ofir, Xu Wenyuan, Honeyman Peter, and Fu Kevin. 2017. WALNUT: Waging doubt on the integrity of MEMS accelerometers with acoustic injection attacks. In IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 318.Google ScholarGoogle Scholar
  128. [128] Tu Yazhou, Lin Zhiqiang, Lee Insup, and Hei Xiali. 2018. Injected and delivered: Fabricating implicit control over actuation systems by spoofing inertial sensors. In 27th USENIX Security Symposium (USENIX Security’18). 15451562.Google ScholarGoogle Scholar
  129. [129] Vadillo Jon and Santana Roberto. 2019. Universal adversarial examples in speech command classification. arXiv preprint arXiv:1911.10182 (2019).Google ScholarGoogle Scholar
  130. [130] Vadillo Jon and Santana Roberto. 2022. On the human evaluation of universal audio adversarial perturbations. Comput. Secur. 112 (2022), 102495.Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. [131] Vaidya Tavish, Zhang Yuankai, Sherr Micah, and Shields Clay. 2015. Cocaine noodles: Exploiting the gap between human and machine speech recognition. In 9th USENIX Workshop on Offensive Technologies (WOOT’15).Google ScholarGoogle Scholar
  132. [132] Vestman Ville, Kinnunen Tomi, Hautamäki Rosa González, and Sahidullah Md. 2020. Voice mimicry attacks assisted by automatic speaker verification. Comput. Speech Lang. 59 (2020), 3654.Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. [133] Wang Chen, Anand S. Abhishek, Liu Jian, Walker Payton, Chen Yingying, and Saxena Nitesh. 2019. Defeating hidden audio channel attacks on voice assistants via audio-induced surface vibrations. In 35th Annual Computer Security Applications Conference. 4256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. [134] Wang Qian, Ren Kui, Zhou Man, Lei Tao, Koutsonikolas Dimitrios, and Su Lu. 2016. Messages behind the sound: Real-time hidden acoustic signal capture with smartphones. In 22nd Annual International Conference on Mobile Computing and Networking. 2941.Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. [135] Wang Xiong, Sun Sining, Shan Changhao, Hou Jingyong, Xie Lei, Li Shen, and Lei Xin. 2019. Adversarial examples for improving end-to-end attention-based small-footprint keyword spotting. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 63666370.Google ScholarGoogle ScholarCross RefCross Ref
  136. [136] Wang Xiong, Sun Sining, Shan Changhao, Hou Jingyong, Xie Lei, Li Shen, and Lei Xin. 2019. Adversarial examples for improving end-to-end attention-based small-footprint keyword spotting. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 63666370.Google ScholarGoogle ScholarCross RefCross Ref
  137. [137] Wang Yao, Cai Wandong, Gu Tao, Shao Wei, Li Yannan, and Yu Yong. 2019. Secure your voice: An oral airflow-based continuous liveness detection for voice assistants. Proc. ACM Interact., Mob., Wear. Ubiq. Technol. 3, 4 (2019), 128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. [138] Wu Lei, Zhu Zhanxing, Tai Cheng, and Weinan E.. 2018. Enhancing the transferability of adversarial examples with noise reduced gradient. (2018).Google ScholarGoogle Scholar
  139. [139] Wu Yi, Liu Jian, Chen Yingying, and Cheng Jerry. 2019. Semi-black-box attacks against speech recognition systems using adversarial samples. In IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN). IEEE, 15.Google ScholarGoogle Scholar
  140. [140] Xiao Qixue, Chen Yufei, Shen Chao, Chen Yu, and Li Kang. 2019. Seeing is not believing: Camouflage attacks on image scaling algorithms. In 28th USENIX Security Symposium (USENIX Security’19). 443460.Google ScholarGoogle Scholar
  141. [141] Xie Cihang, Zhang Zhishuai, Zhou Yuyin, Bai Song, Wang Jianyu, Ren Zhou, and Yuille Alan L.. 2019. Improving transferability of adversarial examples with input diversity. In IEEE Conference on Computer Vision and Pattern Recognition. 27302739.Google ScholarGoogle ScholarCross RefCross Ref
  142. [142] Xie Yaxiong, Li Zhenjiang, and Li Mo. 2018. Precise power delay profiling with commodity Wi-Fi. IEEE Trans. Mob. Comput. 18, 6 (2018), 13421355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. [143] Xie Yi, Li Zhuohang, Shi Cong, Liu Jian, Chen Yingying, and Yuan Bo. 2020. Enabling fast and universal audio adversarial attack using generative model. arXiv preprint arXiv:2004.12261 (2020).Google ScholarGoogle Scholar
  144. [144] Xie Yi, Shi Cong, Li Zhuohang, Liu Jian, Chen Yingying, and Yuan Bo. 2020. Real-time, universal, and robust adversarial attacks against speaker recognition systems. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 17381742.Google ScholarGoogle ScholarCross RefCross Ref
  145. [145] Xu Wenyuan, Yan Chen, Jia Weibin, Ji Xiaoyu, and Liu Jianhao. 2018. Analyzing and enhancing the security of ultrasonic sensors for autonomous vehicles. IEEE Internet Things J. 5, 6 (2018), 50155029.Google ScholarGoogle Scholar
  146. [146] Xu Zirui, Yu Fuxun, and Chen Xiang. 2020. LanCe: A comprehensive and lightweight CNN defense methodology against physical adversarial attacks on embedded multimedia applications. In 25th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 470475.Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. [147] Yakura Hiromu and Sakuma Jun. 2019. Robust audio adversarial example for a physical attack. In 28th International Joint Conference on Artificial Intelligence. AAAI Press, 53345341.Google ScholarGoogle ScholarCross RefCross Ref
  148. [148] Yan Chen, Shin Hocheol, Bolton Connor, Xu Wenyuan, Kim Yongdae, and Fu Kevin. 2020. SoK: A minimalist approach to formalizing analog sensor security. In IEEE Symposium on Security and Privacy (SP). 480495.Google ScholarGoogle Scholar
  149. [149] Yan Chen, Xu Wenyuan, and Liu Jianhao. 2016. Can you trust autonomous vehicles: Contactless attacks against sensors of self-driving vehicle. DEF CON 24, 8 (2016), 109.Google ScholarGoogle Scholar
  150. [150] Yan Chen, Zhang Guoming, Ji Xiaoyu, Zhang Tianchen, Zhang Taimin, and Xu Wenyuan. 2019. The feasibility of injecting inaudible voice commands to voice assistants. IEEE Trans. Depend. Secure Comput. (2019).Google ScholarGoogle ScholarCross RefCross Ref
  151. [151] Yan Qiben, Liu Kehai, Zhou Qin, Guo Hanqing, and Zhang Ning. [n.d.]. SurfingAttack: Interactive hidden attack on voice assistants using ultrasonic guided waves. ([n.d.]).Google ScholarGoogle Scholar
  152. [152] Yang Chao-Han, Qi Jun, Chen Pin-Yu, Ma Xiaoli, and Lee Chin-Hui. 2020. Characterizing speech adversarial examples using self-attention U-Net enhancement. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 31073111.Google ScholarGoogle ScholarCross RefCross Ref
  153. [153] Yang Zhuolin, Chen Pin Yu, Li Bo, and Song Dawn. 2019. Characterizing audio adversarial examples using temporal dependency. In 7th International Conference on Learning Representations.Google ScholarGoogle Scholar
  154. [154] Young Park Joon, Jin Jo Hyo, Woo Samuel, and Lee Dong Hoon. 2016. BadVoice: Soundless voice-control replay attack on modern smartphones. In 8th International Conference on Ubiquitous and Future Networks (ICUFN). IEEE, 882887.Google ScholarGoogle Scholar
  155. [155] Yuan Xuejing, Chen Yuxuan, Wang Aohui, Chen Kai, Zhang Shengzhi, Huang Heqing, and Molloy Ian M.. 2018. All your Alexa are belong to us: A remote voice control attack against Echo. In IEEE Global Communications Conference (GLOBECOM). IEEE, 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. [156] Yuan Xuejing, Chen Yuxuan, Zhao Yue, Long Yunhui, Liu Xiaokang, Chen Kai, Zhang Shengzhi, Huang Heqing, Wang Xiaofeng, and Gunter Carl A.. 2018. CommanderSong: A systematic approach for practical adversarial voice recognition. In 27th USENIX Security Symposium (USENIX Security’18). 4964.Google ScholarGoogle Scholar
  157. [157] Zeng Qiang, Su Jianhai, Fu Chenglong, Kayas Golam, Luo Lannan, Du Xiaojiang, Tan Chiu C., and Wu Jie. 2019. A multiversion programming inspired approach to detecting audio adversarial examples. In 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 3951.Google ScholarGoogle ScholarCross RefCross Ref
  158. [158] Zhang Guoming, Yan Chen, Ji Xiaoyu, Zhang Tianchen, Zhang Taimin, and Xu Wenyuan. 2017. DolphinAttack: Inaudible voice commands. In ACM SIGSAC Conference on Computer and Communications Security. 103117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  159. [159] Zhang Hongting, Zhou Pan, Yan Qiben, and Liu Xiao-Yang. 2020. Generating robust audio adversarial examples with temporal dependency. In International Joint Conferences on Artificial Intelligence. 31673173.Google ScholarGoogle ScholarCross RefCross Ref
  160. [160] Zhang Jiajie, Zhang Bingsheng, and Zhang Bincheng. 2019. Defending adversarial attacks on cloud-aided automatic speech recognition systems. In 7th International Workshop on Security in Cloud Computing. 2331.Google ScholarGoogle Scholar
  161. [161] Zhang Nan, Mi Xianghang, Feng Xuan, Wang XiaoFeng, Tian Yuan, and Qian Feng. 2019. Dangerous skills: Understanding and mitigating security risks of voice-controlled third-party functions on virtual personal assistant systems. In IEEE Symposium on Security and Privacy (SP). IEEE, 13811396.Google ScholarGoogle Scholar
  162. [162] Zhang Rongjunchen, Chen Xiao, Wen Sheng, and Zheng James. 2019. Who activated my voice assistant? A stealthy attack on Android phones without users’ awareness. In International Conference on Machine Learning for Cyber Security. Springer, 378396.Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. [163] Zhang Yangyong, Xu Lei, Mendoza Abner, Yang Guangliang, Chinprutthiwong Phakpoom, and Gu Guofei. 2019. Life after speech recognition: Fuzzing semantic misinterpretation for voice assistant applications. NDSS.Google ScholarGoogle Scholar
  164. [164] Zheng Baolin, Jiang Peipei, Wang Qian, Li Qi, Shen Chao, Wang Cong, Ge Yunjie, Teng Qingyang, and Zhang Shenyi. 2021. Black-box adversarial attacks on commercial speech platforms with minimal information. arXiv preprint arXiv:2110.09714 (2021).Google ScholarGoogle Scholar
  165. [165] Zhou Bing, Elbadry Mohammed, Gao Ruipeng, and Ye Fan. 2017. BatTracker: High precision infrastructure-free mobile device tracking in indoor environments. In 15th ACM Conference on Embedded Network Sensor Systems. 114.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Privacy and Security
      ACM Transactions on Privacy and Security  Volume 25, Issue 3
      August 2022
      288 pages
      ISSN:2471-2566
      EISSN:2471-2574
      DOI:10.1145/3530305
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 May 2022
      • Online AM: 29 March 2022
      • Accepted: 1 January 2022
      • Revised: 1 November 2021
      • Received: 1 March 2021
      Published in tops Volume 25, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!