Abstract
Using off-the-shelf resources from resource-rich languages to transfer knowledge to low-resource languages has received a lot of attention. The requirements of enabling the model to achieve the reliable performance, including the scale of required annotated data and the effective framework, are not well guided. To address the first question, we empirically investigate the cost-effectiveness of several methods for training intent classification and slot-filling models from scratch in Indonesia (ID) using English data. Confronting the second challenge, we propose a Bi-Confidence-Frequency Cross-Lingual transfer framework (BiCF), which consists of “BiCF Mixing”, “Latent Space Refinement” and “Joint Decoder”, respectively, to overcome the lack of low-resource language dialogue data. BiCF Mixing based on the word-level alignment strategy generates code-mixed data by utilizing the importance-frequency and translating-confidence. Moreover, Latent Space Refinement trains a new dialogue understanding model using code-mixed data and word embedding models. Joint Decoder based on Bidirectional LSTM (BiLSTM) and Conditional Random Field (CRF) is used to obtain experimental results of intent classification and slot-filling. We also release a large-scale fine-labeled Indonesia dialogue dataset (ID-WOZ1) and ID-BERT for experiments. BiCF achieves 93.56% and 85.17% (F1 score) on intent classification and slot filling, respectively. Extensive experiments demonstrate that our framework performs reliably and cost-efficiently on different scales of manually annotated Indonesian data.
- [1] . 2018. Towards explainable NLP: A generative explanation framework for text classification. In Annual Meeting of the Association for Computational Linguistics, 2018.Google Scholar
- [2] . 2016. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In Annual Meeting of the Association for Computational Linguistics, 2016.Google Scholar
- [3] . 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation.Google Scholar
- [4] . 2018. Cross-lingual transfer learning for multilingual task oriented dialog. In North American Chapter of the Association for Computational Linguistics, 2018.Google Scholar
- [5] . 2019. Cross-lingual alignment of contextual word embeddings, with applications to zero-shot dependency parsing. In North American Chapter of the Association for Computational Linguistics, 2019.Google Scholar
- [6] . 2018. Multiwoz-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In Conference on Empirical Methods in Natural Language Processing, 2018.Google Scholar
- [7] . 2018. Tensor2tensor for neural machine translation. In Conference of the Association for Machine Translation in the Americas, 2018.Google Scholar
- [8] . 2019. Semi-supervised learning for neural machine translation. In Proceedings of the Joint Training for Neural Machine Translation. Springer, 25–40.Google Scholar
Cross Ref
- [9] . 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805 (2019).Google Scholar
- [10] . 2019. How multilingual is multilingual BERT? In Annual Meeting of the Association for Computational Linguistics, 2019.Google Scholar
- [11] . 2019. Cross-lingual dependency parsing using code-mixed treebank. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 996–1005.Google Scholar
Cross Ref
- [12] . 2015. Improving the cross-lingual projection of syntactic dependencies. In Proceedings of the 20th Nordic Conference of Computational Linguistics. Linköping University Electronic, 191–199.Google Scholar
- [13] . 2016. Synthetic treebanking for cross-lingual dependency parsing. Journal of Artificial Intelligence Research 55 (2016), 209–248.Google Scholar
Digital Library
- [14] . 2015. Cross-lingual dependency parsing based on distributed representations. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1234–1244.Google Scholar
Cross Ref
- [15] . 2015. A neural network model for low-resource universal dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 339–348.Google Scholar
Cross Ref
- [16] . 2016. Many languages, one parser. Transactions of the Association for Computational Linguistics 4 (2016), 431–444.Google Scholar
Cross Ref
- [17] . 2017. Universal dependencies parsing for colloquial singaporean english. ArXiv, abs/1705.06463 (2017).Google Scholar
- [18] . 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. 5998–6008.Google Scholar
- [19] . 2018. Improving language understanding by generative pre-training. (2018).Google Scholar
- [20] . 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Neural Information Processing Systems, 2019.Google Scholar
- [21] . 2019. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692 (2019).Google Scholar
- [22] . 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 140 (2020), 1–67.Google Scholar
- [23] . 2020. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics 8 (2020), 726–742.Google Scholar
Cross Ref
- [24] . 2017. Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 451–462.Google Scholar
Cross Ref
- [25] . 2017. Learned in translation: Contextualized word vectors. In Proceedings of the Advances in Neural Information Processing Systems. 6294–6305.Google Scholar
- [26] . 2019. Zero-shot cross-lingual dialogue systems with transferable latent variables. In Conference on Empirical Methods in Natural Language Processing (2019).Google Scholar
- [27] . 2020. Attention-informed mixed-language training for zero-shot cross-lingual task-oriented dialogue systems. In Proceedings of the AAAI Conference on Artificial Intelligence. 8433–8440.Google Scholar
Cross Ref
- [28] . 2021. Zero-shot deployment for cross-lingual dialogue system. In Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing. Springer, 193–205.Google Scholar
Digital Library
- [29] . 2021. Conversations powered by cross-lingual knowledge. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1442–1451.Google Scholar
Digital Library
- [30] . 2021. A model of cross-lingual knowledge-grounded response generation for open-domain dialogue systems. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021. 352–365.Google Scholar
Cross Ref
- [31] . 1982. Extended Boolean Information Retrieval.
Technical Report . Cornell University.Google ScholarDigital Library
- [32] . 2003. Using TF-IDF to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning. Piscataway, NJ, 133–142.Google Scholar
- [33] . 2013. A simple, fast, and effective reparameterization of ibm model 2. In North American Chapter of the Association for Computational Linguistics, 2013.Google Scholar
- [34] . 2011. Statistical machine translation: IBM models 1 and 2. Columbia Columbia Univ (2011).Google Scholar
- [35] . 2017. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Systems with Applications 72 (2017), 221–230.Google Scholar
Digital Library
- [36] . 2016. Neural network for heterogeneous annotations. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 731–741.Google Scholar
Cross Ref
- [37] . 2016. Deep biaffine attention for neural dependency parsing. ArXiv, abs/1611.01734 (2016).Google Scholar
- [38] . 2017. Recurrent neural network to deep learn conversation in indonesian. In International Conference on Computer Science and Computational Intelligence, 2017.Google Scholar
- [39] . 2016. A publicly available indonesian corpora for automatic abstractive and extractive chat summarization. In Proceedings of the 10th International Conference on Language Resources and Evaluation. 801–805.Google Scholar
- [40] . 2018. Forming of dyadic conversation dataset for bahasa indonesia. Procedia Computer Science 135 (2018), 315–322.Google Scholar
Cross Ref
- [41] . 1984. An iterative design methodology for user-friendly natural language office information applications. ACM Transactions on Information Systems 2, 1 (1984), 26–41.Google Scholar
Digital Library
- [42] . 1969. Large sample standard errors of kappa and weighted kappa. Psychological Bulletin 72, 5 (1969), 323.Google Scholar
Cross Ref
- [43] . 2019. Transferable multi-domain state generator for task-oriented dialogue systems. In Annual Meeting of the Association for Computational Linguistics, 2019.Google Scholar
- [44] . 2016. FastText.zip: Compressing text classification models. ArXiv, abs/1612.03651 (2016).Google Scholar
- [45] . 2017. Multilingual vector representations of words, sentences, and documents. In Proceedings of the IJCNLP 2017, Tutorial Abstracts. 3–5.Google Scholar
- [46] . 2021. Cross-lingual intermediate fine-tuning improves dialogue state tracking. In Conference on Empirical Methods in Natural Language Processing, 2021.Google Scholar
- [47] . 2021. An empirical study of cross-lingual transferability in generative dialogue state tracker. ArXiv, abs/2101.11360 (2021).Google Scholar
- [48] . 2020. Overview of the ninth dialog system technology challenge: Dstc9. ArXiv, abs/2011.06486 (2020).Google Scholar
- [49] . 2020. A sequence-to-sequence approach to dialogue state tracking. In Annual Meeting of the Association for Computational Linguistics, 2020.Google Scholar
- [50] . 2021. Dual slot selector via local reliability verification for dialogue state tracking. ArXiv, abs/2107.12578 (2021).Google Scholar
- [51] . 2022. Beyond the granularity: Multi-perspective dialogue collaborative selection for dialogue state tracking. ArXiv, abs/2205.10059 (2022).Google Scholar
- [52] . 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311–318.Google Scholar
Index Terms
Building Dialogue Understanding Models for Low-resource Language Indonesian from Scratch
Recommendations
Stemming Indonesian: A confix-stripping approach
Stemming words to (usually) remove suffixes has applications in text search, machine translation, document summarization, and text classification. For example, English stemming reduces the words "computer," "computing," "computation," and "computability"...
An iterative algorithm to build Chinese language models
ACL '96: Proceedings of the 34th annual meeting on Association for Computational LinguisticsWe present an iterative procedure to build a Chinese language model (LM). We segment Chinese text into words based on a word-based Chinese language model. However, the construction of a Chinese LM itself requires word boundaries. To get out of the ...
Recurrent Neural Network to Deep Learn Conversation in Indonesian
Natural Language Processing (NLP) is still considered a daunting task to solve for us, researcher in this field. Specifically, there is not many research has been done in a local language like Indonesian Language. Nowdays, there are hundreds of systems ...






Comments