skip to main content
research-article

Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation

Authors Info & Claims
Published:31 October 2021Publication History
Skip Abstract Section

Abstract

Chinese dialects discrimination is a challenging natural language processing task due to scarce annotation resource. In this article, we develop a novel Chinese dialects discrimination framework with transfer learning and data augmentation (CDDTLDA) in order to overcome the shortage of resources. To be more specific, we first use a relatively larger Chinese dialects corpus to train a source-side automatic speech recognition (ASR) model. Then, we adopt a simple but effective data augmentation method (i.e., speed, pitch, and noise disturbance) to augment the target-side low-resource Chinese dialects, and fine-tune another target ASR model based on the previous source-side ASR model. Meanwhile, the potential common semantic features between source-side and target-side ASR models can be captured by using self-attention mechanism. Finally, we extract the hidden semantic representation in the target ASR model to conduct Chinese dialects discrimination. Our extensive experimental results demonstrate that our model significantly outperforms state-of-the-art methods on two benchmark Chinese dialects corpora.

REFERENCES

  1. [1] Biao Zhang, Deyi Xiong, Jinsong Su, and Jiebo Luo. 2019. Future-aware knowledge distillation for neural machine translation. IEEE/ACM Transactions on Audio, Speech and Language Processing 27, 12 (2019), 22782287.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Chatziagapi Aggelina, Paraskevopoulos Georgios, Sgouropoulos Dimitris, Pantazopoulos Georgios, and Narayanan Shrikanth. 2019. Data augmentation using GANs for speech emotion recognition. In Proceedings of the Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Çöltekin Çağrı and Rama Taraka. 2016. Discriminating similar languages with linear SVMs and neural networks. In Proceedings of the 3rd Workshop on NLP for Similar Languages, Varieties and Dialects. 1524.Google ScholarGoogle Scholar
  4. [4] Dai Wenyuan, Yang Qiang, Xue Gui-Rong, and Yu Yong. 2007. Boosting for transfer learning. In Proceedings of the International Conference on Machine Learning. 193200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Davis Stan W. and Mermelstein Paul. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics Speech and Signal Processing 28, 4 (1980), 357366.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Dehak Najim, Torres-Carrasquillo Pedro A., Reynolds Douglas A., and Dehak Réda. 2011. Language recognition via i-vectors and dimensionality reduction. In Proceedings of the Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Elfardy Heba and Diab Mona. 2013. Sentence level dialect identification in arabic. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 456461.Google ScholarGoogle Scholar
  8. [8] Etienne Caroline, Fidanza Guillaume, Petrovskii Andrei, Devillers Laurence, and Schmauch Benoit. 2018. CNN+LSTM architecture for speech emotion recognition with data augmentation. In Proceedings of the Workshop on Speech, Music and Mind.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Fan Xu, Mingwen Wang, and Maoxi Li. 2018. Building parallel monolingual gan Chinese dialects corpus. In Proceedings of the 11th International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).Google ScholarGoogle Scholar
  10. [10] Goutte Cyril, Léger Serge, and Carpuat Marine. 2014. The NRC system for discriminating similar languages. In Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects. 139145.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Grefenstette Greg. 1995. Comparing two language identification schemes. In Proceedings of the 3rd International Conference on Statistical Analysis of Textual Data.Google ScholarGoogle Scholar
  12. [12] Gómez-Adorno Helena, Markov Ilia, Baptista Jorge, Sidorov Grigori, and Pinto David. 2017. Discriminating between similar languages using a combination of typed and untyped character n-grams and words. In Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties and Dialects. 137145.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] IFLYTEK. 2018. IFLYTEK world-wide contest for dialect discrimination: A baseline system. Retrieved from http://challenge.xfyun.cn/2018/aicompetition/tech#tr_3.Google ScholarGoogle Scholar
  14. [14] IFLYTEK. 2018. Ranking list of IFLYTEK world-wide contest for dialect discrimination. Retrieved from http://challenge.xfyun.cn/2018/aicompetition/tech.Google ScholarGoogle Scholar
  15. [15] Jauhiainen Tommi, Lui Marco, Zampieri Marcos, Baldwin Timothy, and Lindén Krister. 2019. Automatic language identification in texts: A survey. Journal of Artificial Intelligence Research 65 (2019), 675–782. https://jair.org/index.php/jair/article/view/11675. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Jia Ye, Zhang Yu, Weiss Ron, Wang Quan, Shen Jonathan, Ren Fei, Chen Zhifeng, Nguyen Patrick, Pang Ruoming, Moreno Ignacio, and Wu Yonghui. 2018. Transfer learning from speaker verification to multispeaker text-to-speech synthesis. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 44854495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Jiali Zeng, Jinsong Su, Huating Wen, Yang Liu, Jun Xie, Yongjing Yin, and Jianqiang Zhao. 2018. Multi-domain neural machine translation with word-level domain context discrimination. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 447457.Google ScholarGoogle Scholar
  18. [18] Jiali Zeng, Yang Liu, Jinsong Su, Yubing Ge, Yaojie Lu, Yongjing Yin, and Jiebo Luo. 2019. Iterative dual domain adaptation for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 845855.Google ScholarGoogle Scholar
  19. [19] Jinsong Su, Jiali Zeng, Jun Xie, Huating Wen, Yongjing Yin, and Yang Liu. 2019. Exploring discriminative word-level domain contexts for multi-domain neural machine translation. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (2019), 15301545.Google ScholarGoogle Scholar
  20. [20] Klakow Dietrich and Peters Jochen. 2002. Testing the correlation of word error rate and perplexity. Speech Communication 38, 1/2 (2002), 1928. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Malmasi Shervin and Dras Mark. 2015. Automatic language identification for persian and dari texts. In Proceedings of the Pacific Association for Computational Linguistics. 5358.Google ScholarGoogle Scholar
  22. [22] Murthy Kavi Narayana and Kumar G. Bharadwaja. 2006. Language identification from small text samples. Journal of Quantitative Linguistics 13, 1 (2006), 5780.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Nikola LjubesicDenis and. 2015. Discriminating between closely related languages on Twitter. Informatica: An International Journal of Computing and Informatics 39, 1 (2015), 18.Google ScholarGoogle Scholar
  24. [24] Sinno Pan Jialin, and Yang Qiang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 13451359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Pan Sinno Jialin, Tsang Ivor W., Kwok James, and Yang Qiang. 2011. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks 22, 2 (2011), 199210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Park Daniel S., Chan William, Zhang Yu, Chiu Chung Cheng, Zoph Barret, Cubuk Ekin D., and Le Quoc V.. 2019. Specaugment: A simple data augmentation method for automatic speech recognition. In Proceedings of the Conference of the International Speech Communication Association (Interspeech), Graz, Austria, September 15-19, 2019. 26132617.Google ScholarGoogle Scholar
  27. [27] Ren Yi, Tan Xu, Qin Tao, Zhao Sheng, and Liu Tie Yan. 2019. Almost unsupervised text to speech and automatic speech recognition. Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, California, PMLR 97, 2019.Google ScholarGoogle Scholar
  28. [28] Shorten Connor and Khoshgoftaar Taghi M.. 2019. A survey on image data augmentation for deep learning. Journal of Big Data 6, 1 (2019), 1–48. https://journalofbigdata.springeropen.com/track/pdf/10.1186/s40537-019-0197-0.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Simões Alberto, Almeida José, and Byers S. D.. 2014. Language identification: A neural network approach. In 3rd Symposium on Languages, Applications and Technologies. Maria João Varanda Pereira, José Paulo Leal, and Alberto Simões (Eds.), Vol. 38, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 251265. DOI: https://doi.org/10.4230/OASIcs.SLATE.2014.251Google ScholarGoogle Scholar
  30. [30] Snyder David, Garcia-Romero Daniel, Mccree Alan, Sell Gregory, and Khudanpur Sanjeev. 2018. Spoken language recognition using x-vectors. In Proceedings of the Odyssey 2018 the Speaker and Language Recognition Workshop.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Tiedemann Jörg and Ljubevsi’C Nikola. 2012. Efficient discrimination between closely related languages. In Proceedings of the International Conference on Computational Linguistics.Google ScholarGoogle Scholar
  32. [32] Wang Yuxuan, Stanton Daisy, Zhang Yu, Skerry-Ryan Rj, Battenberg Eric, Shor Joel, Xiao Ying, Ren Fei, Jia Ye, and Saurous Rif A.. 2018. Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis. In Proceedings of the 35th International Conference on Machine Learning.Google ScholarGoogle Scholar
  33. [33] Wu Nianheng, Demattos Eric, So Kwok Him, Chen Pin Zhen, and Ltekin Ar. 2019. Language discrimination and transfer learning for similar languages: Experiments with feature combinations and adaptation. In Proceedings of the 6th Workshop on NLP for Similar Languages, Varieties and Dialects. 5463.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Xu Fan, Luo Jian, Wang Mingwen, and Zhou Guodong. 2020. Speech-driven end-to-end language discrimination toward chinese dialects. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 5 (2020), 124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Xu Fan, Xu Xiongfei, Wang Mingwen, and Li Maoxi. 2015. Building monolingual word alignment corpus for the greater China region. In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects. Association for Computational Linguistics, 8594.Google ScholarGoogle Scholar
  36. [36] Zampieri Marcos, Malmasi Shervin, Ljubeic Nikola Ljubei, Nakov Preslav, and Aepli Nomi. 2017. Findings of the vardial evaluation campaign 2017. In Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties and Dialects. 115.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Zampieri Marcos, Malmasi Shervin, Scherrer Yves, Samardžić Tanja, Tyers Francis, Silfverberg Miikka, Klyueva Natalia, Pan Tung-Le, Huang Chu-Ren, Ionescu Radu Tudor, Butnaru Andrei M., and Jauhiainen Tommi. 2019. A report on the third vardial evaluation campaign. In Proceedings of the 6th Workshop on NLP for Similar Languages, Varieties and Dialects. Association for Computational Linguistics, 116. DOI: https://doi.org/10.18653/v1/W19-1401Google ScholarGoogle Scholar

Index Terms

  1. Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 2
      March 2022
      413 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3494070
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 October 2021
      • Accepted: 1 June 2021
      • Revised: 1 May 2021
      • Received: 1 February 2021
      Published in tallip Volume 21, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)163
      • Downloads (Last 6 weeks)17

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!