Abstract
Language discrimination among similar languages, varieties, and dialects is a challenging natural language processing task. The traditional text-driven focus leads to poor results. In this article, we explore the effectiveness of speech-driven features toward language discrimination among Chinese dialects. First, we systematically explore the appropriateness of speech-driven MFCC features toward CNN-based language discrimination. Then, we design an end-to-end speech recognition model based on HMM-DNN to predict Chinese dialect words. We adopt attention mechanism to extract the discriminative words related to different Chinese dialects. Finally, through a CNN, we combine the word-level embedding and the MFCC-based features. Evaluation of two benchmark Chinese dialect corpora shows the appropriateness and effectiveness of the proposed speech-driven approach to fine-grained Chinese dialect discrimination compared to the state-of-the-art methods.
- Necip Fazil Ayan and Bonnie J. Dorr. 2006. Going beyond AER: An extensive analysis of word alignments and their impact on MT. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL'06). 9--16.Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- Chenghao Cai, Yanyan Xu, Dengfeng Ke, and Kaile Su. 2015. A fast learning method for multilayer perceptrons in automatic speech recognition systems. Journal of Robotics 2015 (2015), 1--7.Google Scholar
Cross Ref
- Nancy F. Chen, Darren Wee, Rong Tong, Bin Ma, and Haizhou Li. 2016. Large-scale characterization of non-native Mandarin Chinese spoken by speakers of European origin: Analysis on iCALL. Speech Communication 84 (2016), 46--56.Google Scholar
Digital Library
- Çağrı Çöltekin and Taraka Rama. 2016. Discriminating similar languages with linear SVMs and neural networks. In Proceedings of the 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’16). 15--24.Google Scholar
- Steven B. Davis and Paul Mermelstein. 1990. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In Readings in Speech Recognition. Elsevier, 65--74.Google Scholar
- Najim Dehak, Pedro A. Torres-Carrasquillo, Douglas Reynolds, and Reda Dehak. 2011. Language recognition via i-vectors and dimensionality reduction. In 12th Annual Conference of the International Speech Communication Association (Interspeech'11).Google Scholar
- Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael L. Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero. 2013. Recent advances in deep learning for speech research at Microsoft. In International Conference on coustics, Speech, and Signal Processing (ICASSP'88). 8604--8608. DOI:10.1109/ICASSP.2013.6639345Google Scholar
Cross Ref
- Heba Elfardy and Mona Diab. 2013. Sentence level dialect identification in Arabic. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL'13) (Volume 2: Short Papers). 456--461.Google Scholar
- Helena Gomez, Ilia Markov, Jorge Baptista, Grigori Sidorov, and David Pinto. 2017. Discriminating between similar languages using a combination of typed and untyped character n-grams and words. In Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’17). 137--145.Google Scholar
Cross Ref
- Cyril Goutte, Serge Léger, and Marine Carpuat. 2014. The NRC system for discriminating similar languages. In Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial'14). 139--145.Google Scholar
Cross Ref
- Gregory Grefenstette. 1995. Comparing two language idenficiation schemes. In Proceedings of International Conference on Statistical Analysis of Textual Data, Vol. 95.Google Scholar
- Xuedong Huang, James Baker, and Raj Reddy. 2014. A historical perspective of speech recognition. Communications of the ACM 57, 1 (2014), 94--103.Google Scholar
Digital Library
- IFLYTEK. 2018. IFLYTEK world-wide contest for dialect discrimination: A baseline system. Retrieved from http://challenge.xfyun.cn/aicompetition/mobile/techDetail -->http://challenge.xfyun.cn/aicompetition/mobile/techDetail.Google Scholar
- Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016).Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Dietrich Klakow and Jochen Peters. 2002. Testing the correlation of word error rate and perplexity. Speech Communication 38, 1–2 (2002), 19--28.Google Scholar
Digital Library
- Nikola Ljubešić and Denis Kranjčić. 2015. Discriminating between closely related languages on Twitter. Informatica 39, 1 (2015), 1--8.Google Scholar
- Marco Lui and Paul Cook. 2013. Classifying English documents by national dialect. In Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA’13). 5--15.Google Scholar
- Wolfgang Maier and Carlos Gómez-Rodríguez. 2014. Language variety identification in Spanish tweets. In Proceedings of the EMNLP 2014 Workshop on Language Technology for Closely Related Languages and Language Variants (VarDial'14). 25--35.Google Scholar
Cross Ref
- Shervin Malmasi, Mark Dras, et al. 2015. Automatic language identification for Persian and Dari texts. In Proceedings of International Conference of the Pacific Association for Computational Linguistics. 59--64.Google Scholar
- Shervin Malmasi, Marcos Zampieri, Nikola Ljubešić, Preslav Nakov, Ahmed Ali, and Jörg Tiedemann. 2016. Discriminating between similar languages and arabic dialect identification: A report on the third DSL shared task. In Proceedings of the 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’16). 1--14.Google Scholar
- Rada Mihalcea and Ted Pedersen. 2003. An evaluation exercise for word alignment. In Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond-Volume 3. Association for Computational Linguistics, 1--10.Google Scholar
Digital Library
- Kavi Narayana Murthy and G. Bharadwaja Kumar. 2006. Language identification from small text samples. Journal of Quantitative Linguistics 13, 1 (2006), 57--80.Google Scholar
Cross Ref
- Bali Ranaivo-Malançon. 2006. Automatic identification of close languages-case study: Malay and Indonesian. ECTI Transactions on Computer and Information Technology (ECTI-CIT) 2, 2 (2006), 126--134.Google Scholar
Cross Ref
- Wael Salloum, Heba Elfardy, Linda Alamir-Salloum, Nizar Habash, and Mona Diab. 2014. Sentence level dialect identification for machine translation system selection. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL'14) (Volume 2: Short Papers). 772--778.Google Scholar
Cross Ref
- Alberto Simões, José João Almeida, and Simon D. Byers. 2014. Language identification: A neural network approach. In 3rd Symposium on Languages, Applications and Technologies. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
- David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. Spoken language recognition using X-vectors. In Odyssey. 105--111.Google Scholar
- Andreas Stolcke. 2002. SRILM-an extensible language modeling toolkit. In 7th International Conference on Spoken Language Processing.Google Scholar
- Toshiyuki Takezawa, Eiichiro Sumita, Fumiaki Sugaya, Hirofumi Yamamoto, and Seiichi Yamamoto. 2002. Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In International Conference on Language Resources and Evaluation (LREC'02). 147--152.Google Scholar
- Jörg Tiedemann and Nikola Ljubešić. 2012. Efficient discrimination between closely related languages. In Proceedings of International Conference on Computational Linguistics (COLING'12). 2619--2634.Google Scholar
- Christoph Tillmann, Saab Mansour, and Yaser Al-Onaizan. 2014. Improved sentence-level Arabic dialect classification. In Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial'14). 110--119.Google Scholar
Cross Ref
- Dong Wang, Lantian Li, Difei Tang, and Qing Chen. 2016. Ap16-ol7: A multilingual database for oriental languages and a language recognition baseline. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA’16). IEEE, 1--5.Google Scholar
Cross Ref
- Dong Wang and Xuewei Zhang. 2015. Thchs-30: A free Chinese speech corpus. arXiv preprint arXiv:1512.01882 (2015).Google Scholar
- Fan Xu, Mingwen Wang, and Maoxi Li. 2018. Building parallel monolingual Gan Chinese dialects corpus. In International Conference on Language Resources and Evaluation (LREC'18). 244--249.Google Scholar
- Fan Xu, Xiongfei Xu, Mingwen Wang, and Maoxi Li. 2015. Building monolingual word alignment corpus for the Greater China Region. In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (VarDial'15). 85--94.Google Scholar
- Omar F. Zaidan and Chris Callison-Burch. 2014. Arabic dialect identification. Computational Linguistics 40, 1 (2014), 171--202.Google Scholar
Digital Library
- Marcos Zampieri and Binyam Gebrekidan Gebre. 2012. Automatic identification of language varieties: The case of Portuguese. In The 11th Conference on Natural Language Processing (KONVENS’12). Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI), 233--237.Google Scholar
- Marcos Zampieri, Shervin Malmasi, Nikola Ljubešić, Preslav Nakov, Ahmed Ali, Jörg Tiedemann, Yves Scherrer, and Noëmi Aepli. 2017. Findings of the VarDial evaluation campaign 2017 (2017), 1--15.Google Scholar
- Marcos Zampieri, Liling Tan, Nikola Ljubešić, and Jörg Tiedemann. 2014. A report on the DSL shared task 2014. In Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial'14). 58--67.Google Scholar
Cross Ref
- Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, and Preslav Nakov. 2015. Overview of the DSL shared task 2015. In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (VarDial'15). 1--9.Google Scholar
Index Terms
Speech-Driven End-to-End Language Discrimination toward Chinese Dialects
Recommendations
Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation
Chinese dialects discrimination is a challenging natural language processing task due to scarce annotation resource. In this article, we develop a novel Chinese dialects discrimination framework with transfer learning and data augmentation (CDDTLDA) in ...
Attention based end to end Speech Recognition for Voice Search in Hindi and English
FIRE '21: Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval EvaluationWe describe here our work with automatic speech recognition (ASR) in the context of voice search functionality on the Flipkart e-Commerce platform. Starting with the deep learning architecture of Listen-Attend-Spell (LAS), we build upon and expand the ...
Assessment of pitch-adaptive front-end signal processing for childrens speech recognition
Studying the need for pitch normalization during the front-end speech parameterization step in the case of childrens speech recognition system.Analyzing the reasons behind the pitch sensitivity of MFCC features.Exploring the effectiveness of STRAIGHT-...






Comments