skip to main content
research-article

Combining Self-supervised Learning and Active Learning for Disfluency Detection

Authors Info & Claims
Published:13 December 2021Publication History
Skip Abstract Section

Abstract

Spoken language is fundamentally different from the written language in that it contains frequent disfluencies or parts of an utterance that are corrected by the speaker. Disfluency detection (removing these disfluencies) is desirable to clean the input for use in downstream NLP tasks. Most existing approaches to disfluency detection heavily rely on human-annotated data, which is scarce and expensive to obtain in practice. To tackle the training data bottleneck, in this work, we investigate methods for combining self-supervised learning and active learning for disfluency detection. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled data and propose two self-supervised pre-training tasks: (i) a tagging task to detect the added noisy words and (ii) sentence classification to distinguish original sentences from grammatically incorrect sentences. We then combine these two tasks to jointly pre-train a neural network. The pre-trained neural network is then fine-tuned using human-annotated disfluency detection training data. The self-supervised learning method can capture task-special knowledge for disfluency detection and achieve better performance when fine-tuning on a small annotated dataset compared to other supervised methods. However, limited in that the pseudo training data are generated based on simple heuristics and cannot fully cover all the disfluency patterns, there is still a performance gap compared to the supervised models trained on the full training dataset. We further explore how to bridge the performance gap by integrating active learning during the fine-tuning process. Active learning strives to reduce annotation costs by choosing the most critical examples to label and can address the weakness of self-supervised learning with a small annotated dataset. We show that by combining self-supervised learning with active learning, our model is able to match state-of-the-art performance with just about 10% of the original training data on both the commonly used English Switchboard test set and a set of in-house annotated Chinese data.

REFERENCES

  1. [1] Agrawal Pulkit, Carreira Joao, and Malik Jitendra. 2015. Learning to see by moving. In Proceedings of International Conference on Computer Vision (ICCV’15). 3745. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Bach Nguyen and Huang Fei. 2019. Noisy BiLSTM-based models for disfluency detection. In Proceedings of the INTERSPEECH Conference (INTERSPEECH’19). 42304234.Google ScholarGoogle Scholar
  3. [3] Bengio Yoshua, Ducharme Réjean, Vincent Pascal, and Jauvin Christian. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb.2003), 11371155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Bingel Joachim and Søgaard Anders. 2017. Identifying beneficial task relations for multi-task learning in deep neural networks. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL’17). 164169.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Charniak Eugene and Johnson Mark. 2001. Edit detection and parsing for transcribed speech. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chen Danqi and Manning Christopher D.. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 740750.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Culotta Aron and McCallum Andrew. 2005. Reducing labeling effort for structured prediction tasks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’05), Vol. 5. 746751. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’19).Google ScholarGoogle Scholar
  9. [9] Dong Qianqian, Wang Feng, Yang Zhen, Chen Wei, Xu Shuang, and Xu Bo. 2019. Adapting translation models for transcript disfluency detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6351–6358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Ein-Dor Liat, Halfon Alon, Gera Ariel, Shnarch Eyal, Dankin Lena, Choshen Leshem, Danilevsky Marina, Aharonov Ranit, Katz Yoav, and Slonim Noam. 2020. Active learning for BERT: An empirical study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). Association for Computational Linguistics, 79497962.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Ferguson James, Durrett Greg, and Klein Dan. 2015. Disfluency detection with a semi-markov model and prosodic features. In Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL’15). 257262.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Fernando Basura, Bilen Hakan, Gavves Efstratios, and Gould Stephen. 2017. Self-supervised video representation learning with odd-one-out networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’17). 36363645.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Gal Yarin and Ghahramani Zoubin. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning. 10501059. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Georgila Kallirroi. 2009. Using integer linear programming for detecting speech disfluencies. In Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Gissin Daniel and Shalev-Shwartz Shai. 2019. Discriminative active learning. arXiv:1907.06347. Retrieved from https://arxiv.org/abs/1907.06347.Google ScholarGoogle Scholar
  16. [16] Godfrey John J., Holliman Edward C., and McDaniel Jane. 1992. SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’92). IEEE, 517520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Hendrycks Dan and Gimpel Kevin. 2016. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. arXiv:1606.08415. Retrieved from https://arxiv.org/abs/1606.08415.Google ScholarGoogle Scholar
  18. [18] Honnibal Matthew and Johnson Mark. 2014. Joint incremental disfluency detection and dependency parsing. Trans. Assoc. Comput. Linguist. 2 (2014).Google ScholarGoogle Scholar
  19. [19] Hough Julian and Schlangen David. 2015. Recurrent neural networks for incremental disfluency detection. In Proceedings of the INTERSPEECH Conference (INTERSPEECH’15).Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Lou Paria Jamshid, Anderson Peter, and Johnson Mark. 2018. Disfluency detection using auto-correlational neural networks. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP’18). 46104619.Google ScholarGoogle Scholar
  21. [21] Lou Paria Jamshid, Wang Yufei, and Johnson Mark. 2019. Neural constituency parsing of speech transcripts. In Proceedings of the 2019 Conference of the North American, Volume 1. 2756–2765.Google ScholarGoogle Scholar
  22. [22] Johnson Mark and Charniak Eugene. 2004. A TAG-based noisy channel model of speech repairs. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Lewis David D. and Gale William A.. 1994. A sequential algorithm for training text classifiers. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94). Springer, 312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Liu Ming, Buntine Wray, and Haffari Gholamreza. 2018. Learning to actively learn neural machine translation. In Proceedings of the 22nd Conference on Computational Natural Language Learning. 334344.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Liu Ting, Cui Yiming, Yin Qingyu, Zhang Weinan, Wang Shijin, and Hu Guoping. 2016. Generating and exploiting large-scale pseudo training data for zero pronoun resolution. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 102–111.Google ScholarGoogle Scholar
  26. [26] Lou Paria Jamshid and Johnson Mark. 2017. Disfluency detection using a noisy channel model and a deep neural language model. In Proceedings of ACL (Volume 2: Short Papers). 547–553.Google ScholarGoogle Scholar
  27. [27] Lou Paria Jamshid and Johnson Mark. 2020. Improving disfluency detection by self-training a self-attentive model In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3754–3763.Google ScholarGoogle Scholar
  28. [28] Alonso Héctor Martínez and Plank Barbara. 2017. When is multitask learning effective? Semantic sequence prediction under varying data conditions. In Proceedings of Conference of the European Chapter of the Association for Computational Linguistics (EACL’17).Google ScholarGoogle Scholar
  29. [29] Meteer Marie W., Taylor Ann A., MacIntyre Robert, and Iyer Rukmini. 1995. Dysfluency Annotation Stylebook for the Switchboard Corpus. University of Pennsylvania.Google ScholarGoogle Scholar
  30. [30] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. Computer Science.Google ScholarGoogle Scholar
  31. [31] Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Greg S., and Dean Jeff. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 31113119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Ostendorf Mari and Hahn Sangyun. 2013. A sequential repetition model for improved disfluency detection. In Proceedings of the INTERSPEECH Conference (INTERSPEECH’13).Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Peng Hao, Thomson Sam, and Smith Noah A.. 2017. Deep multitask learning for semantic dependency parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17). https://doi.org/10.18653/v1/P17-1186Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Peris Álvaro and Casacuberta Francisco. 2018. Active learning for interactive neural machine translation of data streams. In Proceedings of the 22nd Conference on Computational Natural Language Learning. Association for Computational Linguistics, 151160. https://doi.org/10.18653/v1/K18-1015Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Peters Matthew, Neumann Mark, Iyyer Mohit, Gardner Matt, Clark Christopher, Lee Kenton, and Zettlemoyer Luke. 2018. Deep contextualized word representations. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’18). https://doi.org/10.18653/v1/N18-1202Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Prabhu Ameya, Dognin Charles, and Singh Maneesh. 2019. Sampling bias in deep active classification: An empirical study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 40584068. https://doi.org/10.18653/v1/D19-1417Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Qian Xian and Liu Yang. 2013. Disfluency detection using multi-step stacked learning. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL’13). 820825.Google ScholarGoogle Scholar
  38. [38] Radford Alec, Narasimhan Karthik, Salimans Time, and Sutskever Ilya. 2018. Improving Language Understanding with Unsupervised Learning. Technical Report. OpenAI.Google ScholarGoogle Scholar
  39. [39] Rasooli Mohammad Sadegh and Tetreault Joel R.. 2013. Joint parsing and disfluency detection in linear time. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP’13). 124129.Google ScholarGoogle Scholar
  40. [40] Ru Dongyu, Luo Yating, Qiu Lin, Zhou Hao, Li Lei, Zhang Weinan, and Yu Yong. 2020. Active sentence learning by adversarial uncertainty sampling in discrete space. In Findings of the Association for Computational Linguistics: EMNLP 2020. 4908–4917.Google ScholarGoogle Scholar
  41. [41] Sener Ozan and Savarese Silvio. 2017. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations. https://openreview.net/forum?id=H1aIuk-RW.Google ScholarGoogle Scholar
  42. [42] Settles Burr. 2009. Active Learning Literature Survey. Technical Report. University of Wisconsin—Madison Department of Computer Sciences.Google ScholarGoogle Scholar
  43. [43] Shen Yanyao, Yun Hyokun, Lipton Zachary, Kronrod Yakov, and Anandkumar Animashree. 2017. Deep active learning for named entity recognition. In Proceedings of the 2nd Workshop on Representation Learning for NLP. Association for Computational Linguistics, 252256. https://doi.org/10.18653/v1/W17-2630Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Shriberg Elizabeth Ellen. 1994. Preliminaries to a Theory of Speech Disfluencies. Ph.D. Dissertation. Citeseer.Google ScholarGoogle Scholar
  45. [45] Siddhant Aditya and Lipton Zachary C.. 2018. Deep Bayesian active learning for natural language processing: Results of a large-scale empirical study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 29042909. https://doi.org/10.18653/v1/D18-1318Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Tanaka Tomohiro, Masumura Ryo, Moriya Takafumi, Oba Takanobu, and Aono Yushi. 2019. Disfluency detection based on speech-aware token-by-token sequence labeling with BLSTM-CRFs and attention mechanisms. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC’19). IEEE, 10091013.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Tu Zhaopeng, Lu Zhengdong, Liu Yang, Liu Xiaohua, and Li Hang. 2016. Modeling coverage for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).Google ScholarGoogle Scholar
  48. [48] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N, Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 59986008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Wang Feng, Chen Wei, Yang Zhen, Dong Qianqian, Xu Shuang, and Xu Bo. 2018. Semi-supervised disfluency detection. In Proceedings of the International Conference on Computational Linguistics (COLING’18).Google ScholarGoogle Scholar
  50. [50] Wang Shaolei, Che W., Liu Qi, Qin Pengda, Liu Ting, and Wang William Yang. 2020. Multi-task self-supervised learning for disfluency detection. In Proceedings of the AAAI Conference on Artificial Intellgence (AAAI’20).Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Wang Shaolei, Che Wanxiang, and Liu Ting. 2016. A neural attention model for disfluency detection. In Proceedings of the International Conference on Computational Linguistics (COLING’16).Google ScholarGoogle Scholar
  52. [52] Wang Shaolei, Che Wanxiang, Zhang Yue, Zhang Meishan, and Liu Ting. 2017. Transition-based disfluency detection using LSTMs. In Proceedings of Proceedings of the Empirical Methods in Natural Language Processing (EMNLP’17). 27852794.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Wang Xiaolong and Gupta Abhinav. 2015. Unsupervised learning of visual representations using videos. In Proceedings of the International Conference on Computer Vision (ICCV’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Wu Shuangzhi, Zhang Dongdong, Zhou Ming, and Zhao Tiejun. 2015. Efficient disfluency detection with transition-based parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’15).Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Yoshikawa Masashi, Shindo Hiroyuki, and Matsumoto Yuji. 2016. Joint transition-based dependency parsing and disfluency detection for automatic speech recognition texts. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP’16).Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Zayats Vicky and Ostendorf Mari. 2018. Robust cross-domain disfluency detection with pattern match networks. arXiv:1811.07236. Retrieved from https://arxiv.org/abs/1811.07236.Google ScholarGoogle Scholar
  57. [57] Zayats Vicky and Ostendorf Mari. 2019. Giving attention to the unexpected: Using prosody innovations in disfluency detection. In Proceedings of the 2019 Conference of the North American. Minneapolis, Minnesota, 86–95.Google ScholarGoogle Scholar
  58. [58] Zayats Victoria, Ostendorf Mari, and Hajishirzi Hannaneh. 2014. Multi-domain disfluency and repair detection. In Proceedings of the INTERSPEECH Conference (INTERSPEECH’14).Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Zayats Vicky, Ostendorf Mari, and Hajishirzi Hannaneh. 2016. Disfluency detection using a bidirectional LSTM. In INTERSPEECH.Google ScholarGoogle Scholar
  60. [60] Zhang Ye, Lease Matthew, and Wallace Byron C.. 2016. Active discriminative text representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Zhao Yuekai, Zhang Haoran, Zhou Shuchang, and Zhang Zhihua. 2020. Active learning approaches to enhancing neural machine translation. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP’20). Association for Computational Linguistics, 17961806.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Zwarts Simon, Johnson Mark, and Dale Robert. 2010. Detecting speech repairs incrementally using a noisy channel approach. In Proceedings of the International Conference on Computational Linguistics (COLING’10). 13711378.Google ScholarGoogle Scholar

Index Terms

  1. Combining Self-supervised Learning and Active Learning for Disfluency Detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 3
      May 2022
      413 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3505182
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 December 2021
      • Accepted: 1 September 2021
      • Revised: 1 July 2021
      • Received: 1 December 2020
      Published in tallip Volume 21, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)156
      • Downloads (Last 6 weeks)11

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!