skip to main content
research-article

Speech-Driven End-to-End Language Discrimination toward Chinese Dialects

Published:01 June 2020Publication History
Skip Abstract Section

Abstract

Language discrimination among similar languages, varieties, and dialects is a challenging natural language processing task. The traditional text-driven focus leads to poor results. In this article, we explore the effectiveness of speech-driven features toward language discrimination among Chinese dialects. First, we systematically explore the appropriateness of speech-driven MFCC features toward CNN-based language discrimination. Then, we design an end-to-end speech recognition model based on HMM-DNN to predict Chinese dialect words. We adopt attention mechanism to extract the discriminative words related to different Chinese dialects. Finally, through a CNN, we combine the word-level embedding and the MFCC-based features. Evaluation of two benchmark Chinese dialect corpora shows the appropriateness and effectiveness of the proposed speech-driven approach to fine-grained Chinese dialect discrimination compared to the state-of-the-art methods.

References

  1. Necip Fazil Ayan and Bonnie J. Dorr. 2006. Going beyond AER: An extensive analysis of word alignments and their impact on MT. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL'06). 9--16.Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  3. Chenghao Cai, Yanyan Xu, Dengfeng Ke, and Kaile Su. 2015. A fast learning method for multilayer perceptrons in automatic speech recognition systems. Journal of Robotics 2015 (2015), 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  4. Nancy F. Chen, Darren Wee, Rong Tong, Bin Ma, and Haizhou Li. 2016. Large-scale characterization of non-native Mandarin Chinese spoken by speakers of European origin: Analysis on iCALL. Speech Communication 84 (2016), 46--56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Çağrı Çöltekin and Taraka Rama. 2016. Discriminating similar languages with linear SVMs and neural networks. In Proceedings of the 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’16). 15--24.Google ScholarGoogle Scholar
  6. Steven B. Davis and Paul Mermelstein. 1990. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In Readings in Speech Recognition. Elsevier, 65--74.Google ScholarGoogle Scholar
  7. Najim Dehak, Pedro A. Torres-Carrasquillo, Douglas Reynolds, and Reda Dehak. 2011. Language recognition via i-vectors and dimensionality reduction. In 12th Annual Conference of the International Speech Communication Association (Interspeech'11).Google ScholarGoogle Scholar
  8. Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael L. Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero. 2013. Recent advances in deep learning for speech research at Microsoft. In International Conference on coustics, Speech, and Signal Processing (ICASSP'88). 8604--8608. DOI:10.1109/ICASSP.2013.6639345Google ScholarGoogle ScholarCross RefCross Ref
  9. Heba Elfardy and Mona Diab. 2013. Sentence level dialect identification in Arabic. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL'13) (Volume 2: Short Papers). 456--461.Google ScholarGoogle Scholar
  10. Helena Gomez, Ilia Markov, Jorge Baptista, Grigori Sidorov, and David Pinto. 2017. Discriminating between similar languages using a combination of typed and untyped character n-grams and words. In Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’17). 137--145.Google ScholarGoogle ScholarCross RefCross Ref
  11. Cyril Goutte, Serge Léger, and Marine Carpuat. 2014. The NRC system for discriminating similar languages. In Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial'14). 139--145.Google ScholarGoogle ScholarCross RefCross Ref
  12. Gregory Grefenstette. 1995. Comparing two language idenficiation schemes. In Proceedings of International Conference on Statistical Analysis of Textual Data, Vol. 95.Google ScholarGoogle Scholar
  13. Xuedong Huang, James Baker, and Raj Reddy. 2014. A historical perspective of speech recognition. Communications of the ACM 57, 1 (2014), 94--103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. IFLYTEK. 2018. IFLYTEK world-wide contest for dialect discrimination: A baseline system. Retrieved from http://challenge.xfyun.cn/aicompetition/mobile/techDetail -->http://challenge.xfyun.cn/aicompetition/mobile/techDetail.Google ScholarGoogle Scholar
  15. Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016).Google ScholarGoogle Scholar
  16. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  17. Dietrich Klakow and Jochen Peters. 2002. Testing the correlation of word error rate and perplexity. Speech Communication 38, 1–2 (2002), 19--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Nikola Ljubešić and Denis Kranjčić. 2015. Discriminating between closely related languages on Twitter. Informatica 39, 1 (2015), 1--8.Google ScholarGoogle Scholar
  19. Marco Lui and Paul Cook. 2013. Classifying English documents by national dialect. In Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA’13). 5--15.Google ScholarGoogle Scholar
  20. Wolfgang Maier and Carlos Gómez-Rodríguez. 2014. Language variety identification in Spanish tweets. In Proceedings of the EMNLP 2014 Workshop on Language Technology for Closely Related Languages and Language Variants (VarDial'14). 25--35.Google ScholarGoogle ScholarCross RefCross Ref
  21. Shervin Malmasi, Mark Dras, et al. 2015. Automatic language identification for Persian and Dari texts. In Proceedings of International Conference of the Pacific Association for Computational Linguistics. 59--64.Google ScholarGoogle Scholar
  22. Shervin Malmasi, Marcos Zampieri, Nikola Ljubešić, Preslav Nakov, Ahmed Ali, and Jörg Tiedemann. 2016. Discriminating between similar languages and arabic dialect identification: A report on the third DSL shared task. In Proceedings of the 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’16). 1--14.Google ScholarGoogle Scholar
  23. Rada Mihalcea and Ted Pedersen. 2003. An evaluation exercise for word alignment. In Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond-Volume 3. Association for Computational Linguistics, 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kavi Narayana Murthy and G. Bharadwaja Kumar. 2006. Language identification from small text samples. Journal of Quantitative Linguistics 13, 1 (2006), 57--80.Google ScholarGoogle ScholarCross RefCross Ref
  25. Bali Ranaivo-Malançon. 2006. Automatic identification of close languages-case study: Malay and Indonesian. ECTI Transactions on Computer and Information Technology (ECTI-CIT) 2, 2 (2006), 126--134.Google ScholarGoogle ScholarCross RefCross Ref
  26. Wael Salloum, Heba Elfardy, Linda Alamir-Salloum, Nizar Habash, and Mona Diab. 2014. Sentence level dialect identification for machine translation system selection. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL'14) (Volume 2: Short Papers). 772--778.Google ScholarGoogle ScholarCross RefCross Ref
  27. Alberto Simões, José João Almeida, and Simon D. Byers. 2014. Language identification: A neural network approach. In 3rd Symposium on Languages, Applications and Technologies. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google ScholarGoogle Scholar
  28. David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. Spoken language recognition using X-vectors. In Odyssey. 105--111.Google ScholarGoogle Scholar
  29. Andreas Stolcke. 2002. SRILM-an extensible language modeling toolkit. In 7th International Conference on Spoken Language Processing.Google ScholarGoogle Scholar
  30. Toshiyuki Takezawa, Eiichiro Sumita, Fumiaki Sugaya, Hirofumi Yamamoto, and Seiichi Yamamoto. 2002. Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In International Conference on Language Resources and Evaluation (LREC'02). 147--152.Google ScholarGoogle Scholar
  31. Jörg Tiedemann and Nikola Ljubešić. 2012. Efficient discrimination between closely related languages. In Proceedings of International Conference on Computational Linguistics (COLING'12). 2619--2634.Google ScholarGoogle Scholar
  32. Christoph Tillmann, Saab Mansour, and Yaser Al-Onaizan. 2014. Improved sentence-level Arabic dialect classification. In Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial'14). 110--119.Google ScholarGoogle ScholarCross RefCross Ref
  33. Dong Wang, Lantian Li, Difei Tang, and Qing Chen. 2016. Ap16-ol7: A multilingual database for oriental languages and a language recognition baseline. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA’16). IEEE, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  34. Dong Wang and Xuewei Zhang. 2015. Thchs-30: A free Chinese speech corpus. arXiv preprint arXiv:1512.01882 (2015).Google ScholarGoogle Scholar
  35. Fan Xu, Mingwen Wang, and Maoxi Li. 2018. Building parallel monolingual Gan Chinese dialects corpus. In International Conference on Language Resources and Evaluation (LREC'18). 244--249.Google ScholarGoogle Scholar
  36. Fan Xu, Xiongfei Xu, Mingwen Wang, and Maoxi Li. 2015. Building monolingual word alignment corpus for the Greater China Region. In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (VarDial'15). 85--94.Google ScholarGoogle Scholar
  37. Omar F. Zaidan and Chris Callison-Burch. 2014. Arabic dialect identification. Computational Linguistics 40, 1 (2014), 171--202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Marcos Zampieri and Binyam Gebrekidan Gebre. 2012. Automatic identification of language varieties: The case of Portuguese. In The 11th Conference on Natural Language Processing (KONVENS’12). Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI), 233--237.Google ScholarGoogle Scholar
  39. Marcos Zampieri, Shervin Malmasi, Nikola Ljubešić, Preslav Nakov, Ahmed Ali, Jörg Tiedemann, Yves Scherrer, and Noëmi Aepli. 2017. Findings of the VarDial evaluation campaign 2017 (2017), 1--15.Google ScholarGoogle Scholar
  40. Marcos Zampieri, Liling Tan, Nikola Ljubešić, and Jörg Tiedemann. 2014. A report on the DSL shared task 2014. In Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial'14). 58--67.Google ScholarGoogle ScholarCross RefCross Ref
  41. Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, and Preslav Nakov. 2015. Overview of the DSL shared task 2015. In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (VarDial'15). 1--9.Google ScholarGoogle Scholar

Index Terms

  1. Speech-Driven End-to-End Language Discrimination toward Chinese Dialects

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!