Abstract
Chinese dialects discrimination is a challenging natural language processing task due to scarce annotation resource. In this article, we develop a novel Chinese dialects discrimination framework with transfer learning and data augmentation (CDDTLDA) in order to overcome the shortage of resources. To be more specific, we first use a relatively larger Chinese dialects corpus to train a source-side automatic speech recognition (ASR) model. Then, we adopt a simple but effective data augmentation method (i.e., speed, pitch, and noise disturbance) to augment the target-side low-resource Chinese dialects, and fine-tune another target ASR model based on the previous source-side ASR model. Meanwhile, the potential common semantic features between source-side and target-side ASR models can be captured by using self-attention mechanism. Finally, we extract the hidden semantic representation in the target ASR model to conduct Chinese dialects discrimination. Our extensive experimental results demonstrate that our model significantly outperforms state-of-the-art methods on two benchmark Chinese dialects corpora.
- [1] . 2019. Future-aware knowledge distillation for neural machine translation. IEEE/ACM Transactions on Audio, Speech and Language Processing 27, 12 (2019), 2278–2287.Google Scholar
Digital Library
- [2] . 2019. Data augmentation using GANs for speech emotion recognition. In Proceedings of the Conference of the International Speech Communication Association.Google Scholar
Cross Ref
- [3] . 2016. Discriminating similar languages with linear SVMs and neural networks. In Proceedings of the 3rd Workshop on NLP for Similar Languages, Varieties and Dialects. 15–24.Google Scholar
- [4] . 2007. Boosting for transfer learning. In Proceedings of the International Conference on Machine Learning. 193–200. Google Scholar
Digital Library
- [5] . 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics Speech and Signal Processing 28, 4 (1980), 357–366.Google Scholar
Cross Ref
- [6] . 2011. Language recognition via i-vectors and dimensionality reduction. In Proceedings of the Conference of the International Speech Communication Association.Google Scholar
Cross Ref
- [7] . 2013. Sentence level dialect identification in arabic. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 456–461.Google Scholar
- [8] . 2018. CNN+LSTM architecture for speech emotion recognition with data augmentation. In Proceedings of the Workshop on Speech, Music and Mind.Google Scholar
Cross Ref
- [9] . 2018. Building parallel monolingual gan Chinese dialects corpus. In Proceedings of the 11th International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).Google Scholar
- [10] . 2014. The NRC system for discriminating similar languages. In Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects. 139–145.Google Scholar
Cross Ref
- [11] . 1995. Comparing two language identification schemes. In Proceedings of the 3rd International Conference on Statistical Analysis of Textual Data.Google Scholar
- [12] . 2017. Discriminating between similar languages using a combination of typed and untyped character n-grams and words. In Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties and Dialects. 137–145.Google Scholar
Cross Ref
- [13] . 2018. IFLYTEK world-wide contest for dialect discrimination: A baseline system. Retrieved from http://challenge.xfyun.cn/2018/aicompetition/tech#tr_3.Google Scholar
- [14] . 2018. Ranking list of IFLYTEK world-wide contest for dialect discrimination. Retrieved from http://challenge.xfyun.cn/2018/aicompetition/tech.Google Scholar
- [15] . 2019. Automatic language identification in texts: A survey. Journal of Artificial Intelligence Research 65 (2019), 675–782. https://jair.org/index.php/jair/article/view/11675. Google Scholar
Digital Library
- [16] . 2018. Transfer learning from speaker verification to multispeaker text-to-speech synthesis. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 4485–4495. Google Scholar
Digital Library
- [17] . 2018. Multi-domain neural machine translation with word-level domain context discrimination. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 447–457.Google Scholar
- [18] . 2019. Iterative dual domain adaptation for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 845–855.Google Scholar
- [19] . 2019. Exploring discriminative word-level domain contexts for multi-domain neural machine translation. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (2019), 1530–1545.Google Scholar
- [20] . 2002. Testing the correlation of word error rate and perplexity. Speech Communication 38, 1/2 (2002), 19–28. Google Scholar
Digital Library
- [21] . 2015. Automatic language identification for persian and dari texts. In Proceedings of the Pacific Association for Computational Linguistics. 53–58.Google Scholar
- [22] . 2006. Language identification from small text samples. Journal of Quantitative Linguistics 13, 1 (2006), 57–80.Google Scholar
Cross Ref
- [23] . 2015. Discriminating between closely related languages on Twitter. Informatica: An International Journal of Computing and Informatics 39, 1 (2015), 1–8.Google Scholar
- [24] . 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345–1359. Google Scholar
Digital Library
- [25] . 2011. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks 22, 2 (2011), 199–210. Google Scholar
Digital Library
- [26] . 2019. Specaugment: A simple data augmentation method for automatic speech recognition. In Proceedings of the Conference of the International Speech Communication Association (Interspeech), Graz, Austria, September 15-19, 2019. 2613–2617.Google Scholar
- [27] . 2019. Almost unsupervised text to speech and automatic speech recognition. Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, California, PMLR 97, 2019.Google Scholar
- [28] . 2019. A survey on image data augmentation for deep learning. Journal of Big Data 6, 1 (2019), 1–48. https://journalofbigdata.springeropen.com/track/pdf/10.1186/s40537-019-0197-0.pdf.Google Scholar
Cross Ref
- [29] . 2014. Language identification: A neural network approach. In 3rd Symposium on Languages, Applications and Technologies. Maria João Varanda Pereira, José Paulo Leal, and Alberto Simões (Eds.), Vol. 38, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 251–265.
DOI: https://doi.org/10.4230/OASIcs.SLATE.2014.251Google Scholar - [30] . 2018. Spoken language recognition using x-vectors. In Proceedings of the Odyssey 2018 the Speaker and Language Recognition Workshop.Google Scholar
Cross Ref
- [31] . 2012. Efficient discrimination between closely related languages. In Proceedings of the International Conference on Computational Linguistics.Google Scholar
- [32] . 2018. Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis. In Proceedings of the 35th International Conference on Machine Learning.Google Scholar
- [33] . 2019. Language discrimination and transfer learning for similar languages: Experiments with feature combinations and adaptation. In Proceedings of the 6th Workshop on NLP for Similar Languages, Varieties and Dialects. 54–63.Google Scholar
Cross Ref
- [34] . 2020. Speech-driven end-to-end language discrimination toward chinese dialects. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 5 (2020), 1–24. Google Scholar
Digital Library
- [35] . 2015. Building monolingual word alignment corpus for the greater China region. In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects. Association for Computational Linguistics, 85–94.Google Scholar
- [36] . 2017. Findings of the vardial evaluation campaign 2017. In Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties and Dialects. 1–15.Google Scholar
Cross Ref
- [37] . 2019. A report on the third vardial evaluation campaign. In Proceedings of the 6th Workshop on NLP for Similar Languages, Varieties and Dialects. Association for Computational Linguistics, 1–16.
DOI: https://doi.org/10.18653/v1/W19-1401Google Scholar
Index Terms
Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation
Recommendations
Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition
AbstractData augmentation is an essential component in building a dysarthric speech recognition system, as speech data collection from dysarthric speakers with varying degree of disorder is difficult. Dysarthric speech recognition systems are mostly built ...
Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
AbstractDeep learning techniques are currently being applied in automated text-to-speech (TTS) systems, resulting in significant improvements in performance. However, these methods require large amounts of text-speech paired data for model training, and ...
Low-resource automatic speech recognition and error analyses of oral cancer speech
AbstractIn this paper, we introduce a new corpus of oral cancer speech and present our study on the automatic recognition and analysis of oral cancer speech. A two-hour English oral cancer speech dataset is collected from YouTube. Formulated ...
Highlights- We introduce a new annotated dataset of oral cancer speech.
- We propose three ...






Comments