skip to main content
research-article

OdeBERT: One-stage Deep-supervised Early-exiting BERT for Fast Inference in User Intent Classification

Published:09 May 2023Publication History
Skip Abstract Section

Abstract

User intent classification is a vital task for analyzing users’ essential requirements from the users’ input query in information retrieval systems, question answering systems, and dialogue systems. Pre-trained language model Bidirectional Encoder Representation from Transformers (BERT) has been widely applied to the user intent classification task. However, BERT is compute intensive and time-consuming during inference and usually causes latency in real-time applications. To improve the inference efficiency of BERT for the user intent classification task, this article proposes a new network named one-stage deep-supervised early-exiting BERT as one-stage deep-supervised early-exiting BERT (OdeBERT). In addition, a deep supervision strategy is developed to incorporate the network with internal classifiers by one-stage joint training to improve the learning process of classifiers by extracting discriminative category features. Experiments are conducted on publicly available datasets, including ECDT, SNIPS, and FDQuestion. The results show that the OdeBERT can speed up original BERT 12 times faster at most with the same performance, outperforming state-of-the-art baseline methods.

REFERENCES

  1. [1] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). Association for Computational Linguistics, 41714186. Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Chen Qian, Zhuo Zhu, and Wang Wen. 2019. BERT for joint intent classification and slot filling. arXiv:1902.10909. Retrieved from https://arxiv.org/abs/1902.10909Google ScholarGoogle Scholar
  3. [3] Liu Yuanxia, Liu Hai, Wong Leung-Pun, Lee Lap-Kei, Zhang Haijun, and Hao Tianyong. 2020. A hybrid neural network RBERT-C Based on Pre-trained RoBERTa and CNN for user intent classification. In Proceedings of Neural Computing for Advanced Applications (NCAA’20). Springer, Singapore, 306319. Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] He Changai, Chen Sibao, Huang Shilei, and Zhang Jian. 2019. Using convolutional neural network with BERT for intent determination. In Proceedings of International Conference on Asian Language Processing (IALP’19). IEEE, China. 6570. Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Doha, Qatar, 17461751. Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Liu Pengfei, Qiu Xipeng, and Huang Xuanjing. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI'16). AAAI Press, 28732879. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Hou Lu, Huang Zhiqi, Shang Lifeng, Jiang Xin, Chen Xiao, and Liu Qun. 2020. DynaBERT: Dynamic BERT with adaptive width and depth. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS’20), 97829793.Google ScholarGoogle Scholar
  8. [8] Ma Xindian, Zhang Peng, Zhang Shuai, Duan Nan, Hou Yuexian, Song Dawei, and Zhou Ming. 2019. A tensorized transformer for language modeling. In Proceedings of 33rd Conference on Neural Information Processing Systems (NeurIPS’19), 22322242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Gordon Mitchell, Duh Kevin, and Andrews Nicholas. 2020. Compressing BERT: Studying the effects of weight pruning on transfer learning. In Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, 143155. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] McCarley J. S.. 2019. Pruning a bert-based question answering model. arXiv:1910.06360. Retrieved from https://arxiv.org/pdf/1910.06360v1Google ScholarGoogle Scholar
  11. [11] Voita Elena, Talbot David, Moiseev Fedor, Sennrich Rico, and Titov Ivan. 2019. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 57975808. Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Michel Paul, Levy Omer, and Neubig Graham. 2019. Are sixteen heads really better than one? In Proceedings of 33rd Conference on Neural Information Processing Systems (NeurIPS’19), 1401414024.Google ScholarGoogle Scholar
  13. [13] Zafrir Ofir, Boudoukh Guy, Izsak Peter, and Wasserblat Moshe. 2019. Q8bert: Quantized 8bit bert. In Proceedings of 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS’19). IEEE, 3639. Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Bhandare Aishwarya, Sripathi Vamsi, Karkada Deepthi, Menon Vivek, Choi Sun, Datta Kushal, and Saletore Vikram. 2019. Efficient 8-bit quantization of transformer neural machine language translation model. arXiv:1906.00532. Retrieved from https://arxiv.org/abs/1906.00532Google ScholarGoogle Scholar
  15. [15] Shen Sheng, Dong Zhen, Ye Jiayu, Ma Linjian, Yao Zhewei, Gholami Amir, Mahoney Michael W., and Keutzer Kurt. 2019. Q-BERT: Hessian based ultra low precision quantization of BERT. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’19). 88158821.Google ScholarGoogle Scholar
  16. [16] Sun Siqi, Cheng Yu, Gan Zhe, and Liu Jingjing. 2019. Patient knowledge distillation for bert model compression. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 43234332. Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Jiao Xiaoqi, Yin Yichun, Shang Lifeng, Jiang Xin, Chen Xiao, Li Linlin, Wang Fang, and Liu Qun. 2019. TinyBERT: Distilling BERT for natural language understanding. In Proceedings of the Association for Computational Linguistics (EMNLP’20). Association for Computational Linguistics, 41634174. Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Sanh Victor, Debut Lysandre, Chaumond Julien, and Wolf Thomas. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv:1910.01108. Retrieved from https://arxiv.org/abs/1910.01108v4Google ScholarGoogle Scholar
  19. [19] Lan Zhenzhong, Chen Mingda, Goodman Sebastian, Gimpel Kevin, Sharma Piyush, and Soricut Radu. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv:1909.11942. Retrieved from https://arxiv.org/abs/1909.11942Google ScholarGoogle Scholar
  20. [20] Xu Canwen and McAuley Julian. 2022. A survey on dynamic neural networks for natural language processing. arXiv:2202.07101. Retrieved from https://arxiv.org/pdf/2202.07101Google ScholarGoogle Scholar
  21. [21] Teerapittayanon Surat, McDanel Bradley, and Kung HsiangTsung. 2017. BranchyNet: Fast inference via early exiting from deep neural networks. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR’17). 24642469.Google ScholarGoogle Scholar
  22. [22] Kaya Yigitcan, Hong Sanghyun, and Dumitras Tudor. 2019. Shallow-deep networks: Understanding and mitigating network overthinking. In Proceedings of the 36th International Conference on Machine Learning (ICML’19). PMLR, 33013310.Google ScholarGoogle Scholar
  23. [23] Li Hao, Zhang Hong, Qi Xiaojuan, Yang Ruigang, and Huang Gao. 2019. Improved techniques for training adaptive deep networks. In Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV’19). IEEE, 18911900.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Zhang Linfeng, Song Jiebo, Gao Anni, Chen Jingwei, Bao Chenglong, and Ma Kaisheng. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV’19). IEEE, 37123721. Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Xin Ji, Tang Raphael, Lee Jaejun, Yu Yaoliang, and Lin Jimmy. 2020. DeeBERT: Dynamic early exiting for accelerating BERT inference. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 22462251.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Schwartz Roy, Stanovsky Gabriel, Swayamdipta Swabha, Dodge Jesse, and Smith Noah A.. 2020. The right tool for the job: Matching model and instance complexities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 66406651.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Geng Shijie, Gao Peng, Fu Zuohui, and Zhang Yongfeng. Romebert: Robust training of multi-exit bert. arXiv: 2101.09755. Retrieved from https://arxiv.org/abs/2101.09755v1.Google ScholarGoogle Scholar
  28. [28] Laskaridis Stefanos, Kouris Alexandros, and Lane Nicholas D.. Adaptive inference through early-exit networks: Design, challenges and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning. Association for Computing Machinery, New York, NY, 16.Google ScholarGoogle Scholar
  29. [29] Sabour Sara, Frosst Nicholas, and Hinton Geoffrey E.. 2017. Dynamic routing between capsules. In Proceedings of Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing System. 38593869.Google ScholarGoogle Scholar
  30. [30] Wang Hao, Wang Yitong, Zhou Zheng, Ji Xing, Gong Dihong, Zhou Jingchao, Li Zhifeng, and Liu Wei. 2018. CosFace: Large margin cosine loss for deep face recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, 52655274.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. OdeBERT: One-stage Deep-supervised Early-exiting BERT for Fast Inference in User Intent Classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 5
      May 2023
      653 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3596451
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 May 2023
      • Online AM: 13 March 2023
      • Accepted: 7 March 2023
      • Revised: 18 January 2023
      • Received: 10 May 2022
      Published in tallip Volume 22, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)86
      • Downloads (Last 6 weeks)17

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!