Abstract
With the wide use of Automatic Speech Recognition (ASR) in applications such as human machine interaction, simultaneous interpretation, audio transcription, and so on, its security protection becomes increasingly important. Although recent studies have brought to light the weaknesses of popular ASR systems that enable out-of-band signal attack, adversarial attack, and so on, and further proposed various remedies (signal smoothing, adversarial training, etc.), a systematic understanding of ASR security (both attacks and defenses) is still missing, especially on how realistic such threats are and how general existing protection could be. In this article, we present our systematization of knowledge for ASR security and provide a comprehensive taxonomy for existing work based on a modularized workflow. More importantly, we align the research in this domain with that on security in Image Recognition System (IRS), which has been extensively studied, using the domain knowledge in the latter to help understand where we stand in the former. Generally, both IRS and ASR are perceptual systems. Their similarities allow us to systematically study existing literature in ASR security based on the spectrum of attacks and defense solutions proposed for IRS, and pinpoint the directions of more advanced attacks and the directions potentially leading to more effective protection in ASR. In contrast, their differences, especially the complexity of ASR compared with IRS, help us learn unique challenges and opportunities in ASR security. Particularly, our experimental study shows that transfer attacks across ASR models are feasible, even in the absence of knowledge about models (even their types) and training data.
- [1] . 2019. Universal adversarial audio perturbations. IEEE Trans. Pattern Anal. Mach. Intell. (2019).Google Scholar
- [2] . 2019. Practical hidden voice attacks against speech and speaker recognition systems. In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS).Google Scholar
- [3] . 2021. Hear “No Evil,” see “Kenansville”: Efficient and transferable black-box attacks on speech recognition and voice identification systems. In 42nd IEEE Symposium on Security and Privacy.Google Scholar
- [4] . 2021. Beyond \( L\_p \) clipping: Equalization-based psychoacoustic attacks against ASRs. arXiv preprint arXiv:2110.13250 (2021).Google Scholar
- [5] . 2021. SoK: The faults in our ASRs: An overview of attacks against automatic speech recognition and speaker identification systems. In 42nd IEEE Symposium on Security and Privacy.Google Scholar
- [6] . 2020. Identifying audio adversarial examples via anomalous pattern detection. arXiv preprint arXiv:2002.05463 (2020).Google Scholar
- [7] . 2017. EchoSafe: Sonar-based verifiable interaction with intelligent digital agents. In 1st ACM Workshop on the Internet of Safe Things. 38–43.Google Scholar
- [8] . 2017. Monkey says, monkey does: Security and privacy on voice assistants. IEEE Access 5 (2017), 17841–17851.Google Scholar
Cross Ref
- [9] . 2017. Did you hear that? Adversarial examples against automatic speech recognition. In NIPS 2017 Machine Deception Workshop.Google Scholar
- [10] . 2016. Deep Speech 2: End-to-end speech recognition in English and Mandarin. In International Conference on Machine Learning. 173–182.Google Scholar
Digital Library
- [11] . 2018. Synthesizing robust adversarial examples. In International Conference on Machine Learning. PMLR, 284–293.Google Scholar
- [12] . 2006. Speech Enhancement. Springer Science & Business Media.Google Scholar
- [13] . 2019. Nonsense attacks on Google Assistant and missense attacks on Amazon Alexa. (2019).Google Scholar
- [14] . 2018. 2MA: Verifying voice commands via two microphone authentication. In Asia Conference on Computer and Communications Security. 89–100.Google Scholar
Digital Library
- [15] . 2018. Blue Note: How intentional acoustic interference damages availability and integrity in hard disk drives and operating systems. In IEEE Symposium on Security and Privacy (SP). IEEE, 1048–1062.Google Scholar
- [16] . 2021. Small input noise is enough to defend against query-based black-box attacks. arXiv preprint arXiv:2101.04829 (2021).Google Scholar
- [17] . 2021. Invisible for both camera and lidar: Security of multi-sensor fusion based perception in autonomous driving under physical-world attacks. In IEEE Symposium on Security and Privacy (SP). IEEE, 176–194.Google Scholar
- [18] . 2020. Are you (Google) home? Detecting users’ presence through traffic analysis of smart speakers. ITASEC 2020.Google Scholar
- [19] . 1996. Equivariant adaptive source separation. IEEE Trans. Sig. Process. 44, 12 (1996), 3017–3030.Google Scholar
Digital Library
- [20] . 2016. Hidden voice commands. In 25th USENIX Security Symposium (USENIX Security’16). 513–530.Google Scholar
- [21] . 2017. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (SP). IEEE, 39–57.Google Scholar
- [22] . 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In IEEE Security and Privacy Workshops (SPW). IEEE, 1–7.Google Scholar
- [23] . [n.d.]. Private speech adversaries. ([n. d.]).Google Scholar
- [24] . 2020. Audio adversarial examples generation with recurrent neural networks. In 25th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 488–493.Google Scholar
Digital Library
- [25] . 2019. Who is real Bob? Adversarial attacks on speaker recognition systems. arXiv preprint arXiv:1911.01840 (2019).Google Scholar
- [26] . 2006. New insights into the noise reduction Wiener filter. IEEE Trans. Audio, Speech Lang. Process. 14, 4 (2006), 1218–1234.Google Scholar
Digital Library
- [27] . 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In 10th ACM Workshop on Artificial Intelligence and Security. 15–26.Google Scholar
- [28] . 2020. Stateful detection of black-box adversarial attacks. In 1st ACM Workshop on Security and Privacy on Artificial Intelligence. 30–39.Google Scholar
- [29] . 2017. You can hear but you cannot steal: Defending against voice impersonation attacks on smartphones. In IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 183–195.Google Scholar
Cross Ref
- [30] . 1999. Tri-state median filter for image denoising. IEEE Trans. Image Process. 8, 12 (1999), 1834–1838.Google Scholar
Digital Library
- [31] . 2020. Metamorph: Injecting inaudible commands into over-the-air voice controlled systems. NDSS (2020).Google Scholar
- [32] . 2019. Understanding the effectiveness of ultrasonic microphone jammer. arXiv preprint arXiv:1904.08490 (2019).Google Scholar
- [33] . 2020. Devil’s Whisper: A general approach for physical adversarial attacks against commercial black-box speech recognition devices. In 29th USENIX Security Symposium (USENIX Security’20).Google Scholar
- [34] . 2017. Houdini: Fooling deep structured prediction models. NIPS (2017).Google Scholar
- [35] . 2016. Wav2Letter: An end-to-end convNet-based speech recognition system. arXiv preprint arXiv:1609.03193 (2016).Google Scholar
- [36] . 2018. Adagio: Interactive experimentation with adversarial attack and defense for audio. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 677–681.Google Scholar
- [37] . 2019. Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks. In 28th USENIX Security Symposium (USENIX Security’19). 321–338.Google Scholar
- [38] . 1993. An adaptive Gaussian filter for noise reduction and edge detection. In IEEE Conference Record Nuclear Science Symposium and Medical Imaging Conference. IEEE, 1615–1619.Google Scholar
Cross Ref
- [39] . 2018. Boosting adversarial attacks with momentum. In IEEE Conference on Computer Vision and Pattern Recognition. 9185–9193.Google Scholar
Cross Ref
- [40] . 2020. SirenAttack: Generating adversarial audio for end-to-end acoustic systems. ASIACCS (2020).Google Scholar
- [41] . 2019. Improving ASR robustness to perturbed speech using cycle-consistent generative adversarial networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5726–5730.Google Scholar
Cross Ref
- [42] . 2018. Robust physical-world attacks on deep learning visual classification. In IEEE Conference on Computer Vision and Pattern Recognition. 1625–1634.Google Scholar
Cross Ref
- [43] . 2017. Continuous authentication for voice assistants. In 23rd Annual International Conference on Mobile Computing and Networking. 343–355.Google Scholar
Digital Library
- [44] . 2018. Risks of trusting the physics of sensors. Commun. ACM 61, 2 (2018), 20–23.Google Scholar
Digital Library
- [45] . 2019. AudiDoS: Real-Time denial-of-service adversarial attacks on deep audio models. In 18th IEEE International Conference on Machine Learning And Applications (ICMLA). IEEE, 978–985.Google Scholar
Cross Ref
- [46] . 2019. Real-time adversarial attacks. In 28th International Joint Conference on Artificial Intelligence. AAAI Press, 4672–4680.Google Scholar
Digital Library
- [47] . 2018. Crafting adversarial examples for speech paralinguistics applications. DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security (DYNAMICS) Workshop.Google Scholar
- [48] . 2018. Protecting voice controlled systems using sound source identification based on acoustic cues. In 27th International Conference on Computer Communication and Networks (ICCCN). IEEE, 1–9.Google Scholar
Cross Ref
- [49] . 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google Scholar
- [50] . 2020. INOR—An intelligent noise reduction method to defend against adversarial audio examples. Neurocomputing (2020).Google Scholar
- [51] . 2020. MultiPAD: A multivariant partition-based method for audio adversarial examples detection. IEEE Access 8 (2020), 63368–63380.Google Scholar
Cross Ref
- [52] . 2019. Nickel to Lego: Using Foolgle to create adversarial examples to fool Google cloud speech-to-text API. In ACM SIGSAC Conference on Computer and Communications Security. 2593–2595.Google Scholar
Digital Library
- [53] . 2003. Least-mean-square Adaptive Filters. Vol. 31. John Wiley & Sons.Google Scholar
Cross Ref
- [54] . 2019. Canceling inaudible voice commands against voice control systems. In 25th Annual International Conference on Mobile Computing and Networking. 1–15.Google Scholar
Digital Library
- [55] . 2019. Enhancing adversarial example transferability with an intermediate level attack. In IEEE International Conference on Computer Vision. 4733–4742.Google Scholar
Cross Ref
- [56] . 2014. Shake and walk: Acoustic direction finding and fine-grained indoor localization using smartphones. In IEEE Conference on Computer Communications. IEEE, 370–378.Google Scholar
Cross Ref
- [57] . 2021. WaveGuard: Understanding and mitigating audio adversarial examples. In 30th USENIX Security Symposium (USENIX Security’21).Google Scholar
- [58] . 2014. A11y attacks: Exploiting accessibility in operating systems. In ACM SIGSAC Conference on Computer and Communications Security. 103–115.Google Scholar
Digital Library
- [59] . 2015. IEMI threats for information security: Remote command injection on modern smartphones. IEEE Trans. Electromag. Compatib. 57, 6 (2015), 1752–1755.Google Scholar
Cross Ref
- [60] . 2019. Adversarial black-box attacks on automatic speech recognition systems using multi-objective evolutionary optimization. In Interspeech Conference. 3208–3212.Google Scholar
- [61] . 2019. Adversarial audio: A new information hiding method and backdoor for DNN-based speech recognition models. arXiv preprint arXiv:1904.03829 (2019).Google Scholar
- [62] . 2018. Fooling end-to-end speaker verification with adversarial examples. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1962–1966.Google Scholar
Digital Library
- [63] . 2018. Skill squatting attacks on amazon alexa. In 27th USENIX Security Symposium (USENIX Security’18). 33–47.Google Scholar
- [64] . 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).Google Scholar
- [65] . 2016. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016).Google Scholar
- [66] . 2019. Selective audio adversarial example in evasion attack on speech recognition system. IEEE Trans. Inf. Forens. Secur. 15 (2019), 526–538.Google Scholar
Digital Library
- [67] . 2019. POSTER: Detecting audio adversarial example through audio modification. In ACM SIGSAC Conference on Computer and Communications Security. 2521–2523.Google Scholar
Digital Library
- [68] . 2015. LeNet-5, convolutional neural networks. 20, 5 (2015), 14. Retrieved from http://yann.lecun.com/exdb/lenet.Google Scholar
- [69] . 2020. Using sonar for liveness detection to protect smart speakers against remote attackers. Proc. ACM Interact., Mob., Wear. Ubiq. Technol. 4, 1 (2020), 1–28.Google Scholar
Digital Library
- [70] . 2017. The insecurity of home digital voice assistants–Amazon Alexa as a case study. arXiv preprint arXiv:1712.03327 (2017).Google Scholar
- [71] . 2019. Adversarial music: Real world audio adversary against wake-word detection system. In Conference on Advances in Neural Information Processing Systems. 11908–11918.Google Scholar
- [72] . 2020. Adversarial attacks on GMM I-Vector based speaker verification systems. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6579–6583.Google Scholar
Cross Ref
- [73] . 2020. Practical adversarial attacks against speaker recognition systems. In 21st International Workshop on Mobile Computing Systems and Applications. 9–14.Google Scholar
- [74] . 2020. AdvPulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations. In ACM SIGSAC Conference on Computer and Communications Security. 1121–1134.Google Scholar
Digital Library
- [75] . 2013. Scaling up covariance matrix adaptation evolution strategy using cooperative coevolution. In International Conference on Intelligent Data Engineering and Automated Learning. Springer, 350–357.Google Scholar
Digital Library
- [76] . 2019. Adversarial attacks on spoofing countermeasures of automatic speaker verification. ASRU (2019).Google Scholar
- [77] . 2020. Towards weighted-sampling audio adversarial example attack. AAAI (2020).Google Scholar
Cross Ref
- [78] . 2016. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770 (2016).Google Scholar
- [79] . 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017).Google Scholar
- [80] . 2017. MagNet: A two-pronged defense against adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security. 135–147.Google Scholar
Digital Library
- [81] . 2018. WiVo: Enhancing the security of voice control system via wireless signal in IoT environment. In 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing. 81–90.Google Scholar
- [82] . 2019. Alexa lied to me: Skill-based man-in-the-middle attacks on virtual assistants. In ACM Asia Conference on Computer and Communications Security. 465–478.Google Scholar
Digital Library
- [83] . 2010. Voice recognition algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010).Google Scholar
- [84] . [n.d.]. V2S attack: Building DNN-based voice conversion from automatic speaker verification. In 10th ISCA Speech Synthesis Workshop. 161–165.Google Scholar
- [85] . 2018. Sensor CON-Fusion: Defeating Kalman filter in signal injection attack. In Asia Conference on Computer and Communications Security. 511–524.Google Scholar
Digital Library
- [86] . 2019. Universal adversarial perturbations for speech recognition systems. In Interspeech Conference. 481–485.Google Scholar
- [87] . 2019. Speech2Face: Learning the face behind a voice. In IEEE Conference on Computer Vision and Pattern Recognition. 7539–7548.Google Scholar
Cross Ref
- [88] . 2020. A tale of evil twins: Adversarial inputs versus poisoned models. In ACM SIGSAC Conference on Computer and Communications Security. 85–99.Google Scholar
Digital Library
- [89] . 2020. AdvMind: Inferring adversary intent of black-box attacks. In 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1899–1907.Google Scholar
Digital Library
- [90] . 2018. Towards robust detection of adversarial examples. In Conference on Advances in Neural Information Processing Systems. 4579–4589.Google Scholar
- [91] . 2016. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).Google Scholar
- [92] . 2017. Practical black-box attacks against machine learning. In ACM on Asia conference on Computer and Communications Security. 506–519.Google Scholar
- [93] . 2016. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 372–387.Google Scholar
- [94] . 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy (SP). IEEE, 582–597.Google Scholar
- [95] . 2019. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. ICML (2019).Google Scholar
- [96] . 2020. Adversarial preprocessing: Understanding and preventing image-scaling attacks in machine learning. In 29th USENIX Security Symposium (USENIX Security’20).Google Scholar
- [97] . 2018. Noise flooding for detecting audio adversarial examples against automatic speech recognition. In IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 197–201.Google Scholar
- [98] . 2018. Isolated and ensemble audio preprocessing methods for detecting adversarial examples against automatic speech recognition. In 30th Conference on Computational Linguistics and Speech Processing (ROCLING’18). 16–30.Google Scholar
- [99] . 2017. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. arXiv preprint arXiv:1711.09404 (2017).Google Scholar
- [100] . 2017. BackDoor: Making microphones hear inaudible sounds. In 15th Annual International Conference on Mobile Systems, Applications, and Services. 2–14.Google Scholar
Digital Library
- [101] . 2018. Inaudible voice commands: The long-range attack and defense. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI’18). 547–560.Google Scholar
- [102] . 2015. Adversarial manipulation of deep representations. arXiv preprint arXiv:1511.05122 (2015).Google Scholar
- [103] . 2020. Adversarial example detection by classification for deep speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3102–3106.Google Scholar
Cross Ref
- [104] . 2019. Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. NDSS (2019).Google Scholar
- [105] . 2019. Robust over-the-air adversarial examples against automatic speech recognition systems. arXiv preprint arXiv:1908.01551 (2019).Google Scholar
- [106] . 2016. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In ACM SIGSAC Conference on Computer and Communications Security. 1528–1540.Google Scholar
Digital Library
- [107] . 2019. JamSys: Coverage optimization of a microphone jamming system based on ultrasounds. IEEE Access 7 (2019), 67483–67496.Google Scholar
Cross Ref
- [108] . 2019. Lingvo: A modular and scalable framework for sequence-to-sequence modeling. arXiv preprint arXiv:1902.08295 (2019).Google Scholar
- [109] . 2019. Curls & Whey: Boosting black-box adversarial attacks. In IEEE Conference on Computer Vision and Pattern Recognition. 6519–6527.Google Scholar
Cross Ref
- [110] . 2017. Illusion and dazzle: Adversarial optical channel exploits against lidars for automotive applications. In International Conference on Cryptographic Hardware and Embedded Systems. Springer, 445–467.Google Scholar
Cross Ref
- [111] . 2015. Rocking drones with intentional sound noise on gyroscopic sensors. In 24th USENIX Security Symposium (USENIX Security’15). 881–896.Google Scholar
- [112] . 2017. Poster: Inaudible voice commands. In ACM SIGSAC Conference on Computer and Communications Security. 2583–2585.Google Scholar
Digital Library
- [113] . 2019. One pixel attack for fooling deep neural networks. IEEE Trans. Evolut. Comput. 23, 5 (2019), 828–841.Google Scholar
Cross Ref
- [114] . 2019. Robustness of adversarial attacks in sound event classification. (2019).Google Scholar
- [115] . [n.d.]. Light commands: Laser-Based audio injection attacks on voice-controllable systems. ([n.d.]).Google Scholar
- [116] . 2020. Towards robust lidar-based perception in autonomous driving: General black-box adversarial sensor attack and countermeasures. In 29th USENIX Security Symposium (USENIX Security’20). 877–894.Google Scholar
- [117] . 2019. Adversarial regularization for attention based end-to-end robust speech recognition. IEEE/ACM Trans. Audio, Speech Lang. Process. 27, 11 (2019), 1826–1838.Google Scholar
Digital Library
- [118] . 2018. Training augmentation with adversarial examples for robust speech recognition. In Interspeech Conference. 2404–2408.Google Scholar
Cross Ref
- [119] . 2013. Spartacus: Spatially-aware interaction for mobile devices through energy-efficient audio sensing. In 11th Annual International Conference on Mobile Systems, Applications, and Services. 263–276.Google Scholar
Digital Library
- [120] . 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google Scholar
- [121] . 2019. Perceptual based adversarial audio attacks. CoRR (2019).Google Scholar
- [122] . 2019. Novel defense method against audio adversarial example for speech-to-text transcription neural networks. In IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA). IEEE, 115–120.Google Scholar
- [123] . 2019. Targeted adversarial examples for black box audio systems. In IEEE Security and Privacy Workshops (SPW). IEEE, 15–20.Google Scholar
- [124] . 2019. An iterative mean filter for image denoising. IEEE Access 7 (2019), 167847–167859.Google Scholar
Cross Ref
- [125] . 2019. Black-box attacks on automatic speaker verification using feedback-controlled voice conversion. arXiv preprint arXiv:1909.07655 (2019).Google Scholar
- [126] . 2017. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204 (2017).Google Scholar
- [127] . 2017. WALNUT: Waging doubt on the integrity of MEMS accelerometers with acoustic injection attacks. In IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 3–18.Google Scholar
- [128] . 2018. Injected and delivered: Fabricating implicit control over actuation systems by spoofing inertial sensors. In 27th USENIX Security Symposium (USENIX Security’18). 1545–1562.Google Scholar
- [129] . 2019. Universal adversarial examples in speech command classification. arXiv preprint arXiv:1911.10182 (2019).Google Scholar
- [130] . 2022. On the human evaluation of universal audio adversarial perturbations. Comput. Secur. 112 (2022), 102495.Google Scholar
Digital Library
- [131] . 2015. Cocaine noodles: Exploiting the gap between human and machine speech recognition. In 9th USENIX Workshop on Offensive Technologies (WOOT’15).Google Scholar
- [132] . 2020. Voice mimicry attacks assisted by automatic speaker verification. Comput. Speech Lang. 59 (2020), 36–54.Google Scholar
Digital Library
- [133] . 2019. Defeating hidden audio channel attacks on voice assistants via audio-induced surface vibrations. In 35th Annual Computer Security Applications Conference. 42–56.Google Scholar
Digital Library
- [134] . 2016. Messages behind the sound: Real-time hidden acoustic signal capture with smartphones. In 22nd Annual International Conference on Mobile Computing and Networking. 29–41.Google Scholar
Digital Library
- [135] . 2019. Adversarial examples for improving end-to-end attention-based small-footprint keyword spotting. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6366–6370.Google Scholar
Cross Ref
- [136] . 2019. Adversarial examples for improving end-to-end attention-based small-footprint keyword spotting. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6366–6370.Google Scholar
Cross Ref
- [137] . 2019. Secure your voice: An oral airflow-based continuous liveness detection for voice assistants. Proc. ACM Interact., Mob., Wear. Ubiq. Technol. 3, 4 (2019), 1–28.Google Scholar
Digital Library
- [138] . 2018. Enhancing the transferability of adversarial examples with noise reduced gradient. (2018).Google Scholar
- [139] . 2019. Semi-black-box attacks against speech recognition systems using adversarial samples. In IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN). IEEE, 1–5.Google Scholar
- [140] . 2019. Seeing is not believing: Camouflage attacks on image scaling algorithms. In 28th USENIX Security Symposium (USENIX Security’19). 443–460.Google Scholar
- [141] . 2019. Improving transferability of adversarial examples with input diversity. In IEEE Conference on Computer Vision and Pattern Recognition. 2730–2739.Google Scholar
Cross Ref
- [142] . 2018. Precise power delay profiling with commodity Wi-Fi. IEEE Trans. Mob. Comput. 18, 6 (2018), 1342–1355.Google Scholar
Digital Library
- [143] . 2020. Enabling fast and universal audio adversarial attack using generative model. arXiv preprint arXiv:2004.12261 (2020).Google Scholar
- [144] . 2020. Real-time, universal, and robust adversarial attacks against speaker recognition systems. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1738–1742.Google Scholar
Cross Ref
- [145] . 2018. Analyzing and enhancing the security of ultrasonic sensors for autonomous vehicles. IEEE Internet Things J. 5, 6 (2018), 5015–5029.Google Scholar
- [146] . 2020. LanCe: A comprehensive and lightweight CNN defense methodology against physical adversarial attacks on embedded multimedia applications. In 25th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 470–475.Google Scholar
Digital Library
- [147] . 2019. Robust audio adversarial example for a physical attack. In 28th International Joint Conference on Artificial Intelligence. AAAI Press, 5334–5341.Google Scholar
Cross Ref
- [148] . 2020. SoK: A minimalist approach to formalizing analog sensor security. In IEEE Symposium on Security and Privacy (SP). 480–495.Google Scholar
- [149] . 2016. Can you trust autonomous vehicles: Contactless attacks against sensors of self-driving vehicle. DEF CON 24, 8 (2016), 109.Google Scholar
- [150] . 2019. The feasibility of injecting inaudible voice commands to voice assistants. IEEE Trans. Depend. Secure Comput. (2019).Google Scholar
Cross Ref
- [151] . [n.d.]. SurfingAttack: Interactive hidden attack on voice assistants using ultrasonic guided waves. ([n.d.]).Google Scholar
- [152] . 2020. Characterizing speech adversarial examples using self-attention U-Net enhancement. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3107–3111.Google Scholar
Cross Ref
- [153] . 2019. Characterizing audio adversarial examples using temporal dependency. In
7th International Conference on Learning Representations. Google Scholar - [154] . 2016. BadVoice: Soundless voice-control replay attack on modern smartphones. In 8th International Conference on Ubiquitous and Future Networks (ICUFN). IEEE, 882–887.Google Scholar
- [155] . 2018. All your Alexa are belong to us: A remote voice control attack against Echo. In IEEE Global Communications Conference (GLOBECOM). IEEE, 1–6.Google Scholar
Digital Library
- [156] . 2018. CommanderSong: A systematic approach for practical adversarial voice recognition. In 27th USENIX Security Symposium (USENIX Security’18). 49–64.Google Scholar
- [157] . 2019. A multiversion programming inspired approach to detecting audio adversarial examples. In 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 39–51.Google Scholar
Cross Ref
- [158] . 2017. DolphinAttack: Inaudible voice commands. In ACM SIGSAC Conference on Computer and Communications Security. 103–117.Google Scholar
Digital Library
- [159] . 2020. Generating robust audio adversarial examples with temporal dependency. In International Joint Conferences on Artificial Intelligence. 3167–3173.Google Scholar
Cross Ref
- [160] . 2019. Defending adversarial attacks on cloud-aided automatic speech recognition systems. In 7th International Workshop on Security in Cloud Computing. 23–31.Google Scholar
- [161] . 2019. Dangerous skills: Understanding and mitigating security risks of voice-controlled third-party functions on virtual personal assistant systems. In IEEE Symposium on Security and Privacy (SP). IEEE, 1381–1396.Google Scholar
- [162] . 2019. Who activated my voice assistant? A stealthy attack on Android phones without users’ awareness. In International Conference on Machine Learning for Cyber Security. Springer, 378–396.Google Scholar
Digital Library
- [163] . 2019. Life after speech recognition: Fuzzing semantic misinterpretation for voice assistant applications. NDSS.Google Scholar
- [164] . 2021. Black-box adversarial attacks on commercial speech platforms with minimal information. arXiv preprint arXiv:2110.09714 (2021).Google Scholar
- [165] . 2017. BatTracker: High precision infrastructure-free mobile device tracking in indoor environments. In 15th ACM Conference on Embedded Network Sensor Systems. 1–14.Google Scholar
Digital Library
Index Terms
SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems
Recommendations
Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
Verbal and Nonverbal Features of Human-Human and Human-Machine InteractionSpeech intelligibility is the most important parameter in evaluation of speech quality. In the contribution, a new objective intelligibility assessment of general speech processing algorithms is proposed. It is based on automatic recognition methods ...
Comparing humans and automatic speech recognition systems in recognizing dysarthric speech
Canadian AI'11: Proceedings of the 24th Canadian conference on Advances in artificial intelligenceSpeech is a complex process that requires control and coordination of articulation, breathing, voicing, and prosody. Dysarthria is a manifestation of an inability to control and coordinate one or more of these aspects, which results in poorly ...
Cued Speech automatic recognition in normal-hearing and deaf subjects
This article discusses the automatic recognition of Cued Speech in French based on hidden Markov models (HMMs). Cued Speech is a visual mode which, by using hand shapes in different positions and in combination with lip patterns of speech, makes all the ...






Comments