skip to main content
research-article

Domain-Invariant Feature Progressive Distillation with Adversarial Adaptive Augmentation for Low-Resource Cross-Domain NER

Published:14 April 2023Publication History
Skip Abstract Section

Abstract

Considering the expensive annotation in Named Entity Recognition (NER), Cross-domain NER enables NER in low-resource target domains with few or without labeled data, by transferring the knowledge of high-resource domains. However, the discrepancy between different domains causes the domain shift problem and hampers the performance of cross-domain NER in low-resource scenarios. In this article, we first propose an adversarial adaptive augmentation, where we integrate the adversarial strategy into a multi-task learner to augment and qualify domain adaptive data. We extract domain-invariant features of the adaptive data to bridge the cross-domain gap and alleviate the label-sparsity problem simultaneously. Therefore, another important component in this article is the progressive domain-invariant feature distillation framework. A multi-grained MMD (Maximum Mean Discrepancy) approach in the framework to extract the multi-level domain invariant features and enable knowledge transfer across domains through the adversarial adaptive data. Advanced Knowledge Distillation (KD) schema processes progressively domain adaptation through the powerful pre-trained language models and multi-level domain invariant features. Extensive comparative experiments over four English and two Chinese benchmarks show the importance of adversarial augmentation and effective adaptation from high-resource domains to low-resource target domains. Comparison with two vanilla and four latest baselines indicates the state-of-the-art performance and superiority confronted with both zero-resource and minimal-resource scenarios.

REFERENCES

  1. [1] Aguilar Gustavo, Maharjan Suraj, López-Monroy Adrian Pastor, and Solorio Thamar. 2017. A multi-task approach for named entity recognition in social media data. In Proceedings of the 3rd Workshop on Noisy User-generated Text. 148153.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Balasuriya Dominic, Ringland Nicky, Nothman Joel, Murphy Tara, and Curran James R.. 2009. Named entity recognition in Wikipedia. In Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web). 1018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Ben-David Shai, Blitzer John, Crammer Koby, Kulesza Alex, Pereira Fernando, and Vaughan Jennifer Wortman. 2010. A theory of learning from different domains. Machine Learning 79, 1 (2010), 151175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Cao Yixin, Hu Zikun, Chua Tat Seng, Liu Zhiyuan, and Ji Heng. 2020. Low-resource name tagging learned with weakly labeled data. In 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019. Association for Computational Linguistics, 261270.Google ScholarGoogle Scholar
  5. [5] Chen Ting, Kornblith Simon, Norouzi Mohammad, and Hinton Geoffrey. 2020. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning. PMLR, 15971607.Google ScholarGoogle Scholar
  6. [6] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 41714186.Google ScholarGoogle Scholar
  7. [7] Elsahar Hady and Gallé Matthias. 2019. To annotate or not? Predicting performance drop under domain shift. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 21632173.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Ganin Yaroslav and Lempitsky Victor. 2015. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning. PMLR, 11801189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Hadsell Raia, Chopra Sumit, and LeCun Yann. 2006. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, 17351742.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] He Hangfeng and Sun Xu. 2017. A unified model for cross-domain and semi-supervised named entity recognition in Chinese social media. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31. 32163222.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Jia Chen, Liang Xiaobo, and Zhang Yue. 2019. Cross-domain NER using cross-domain language modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 24642474.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Jia Chen and Zhang Yue. 2020. Multi-cell compositional LSTM for NER domain adaptation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 59065917. https://www.aclweb.org/anthology/2020.acl-main.524.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Karatay Deniz and Karagoz Pinar. 2015. User interest modeling in Twitter with named entity recognition. In 5th Workshop on Making Sense of Microposts.Google ScholarGoogle Scholar
  14. [14] Krasnashchok Katsiaryna and Jouili Salim. 2018. Improving topic quality by promoting named entities in topic modeling. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 247253.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Kulkarni Vivek, Mehdad Yashar, and Chevalier Troy. 2016. Domain adaptation for named entity recognition in online media with word embeddings. arXiv preprint arXiv:1612.00148 (2016).Google ScholarGoogle Scholar
  16. [16] Lample Guillaume, Ballesteros Miguel, Subramanian Sandeep, Kawakami Kazuya, and Dyer Chris. 2016. Neural architectures for named entity recognition. In Proceedings of NAACL-HLT. 260270.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Lee Ji Young, Dernoncourt Franck, and Szolovits Peter. 2018. Transfer learning for named-entity recognition with neural networks. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 44704473.Google ScholarGoogle Scholar
  18. [18] Levow Gina-Anne. 2006. The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. 108117.Google ScholarGoogle Scholar
  19. [19] Liang Chen, Yu Yue, Jiang Haoming, Er Siawpeng, Wang Ruijia, Zhao Tuo, and Zhang Chao. 2020. BOND: BERT-assisted open-domain named entity recognition with distant supervision. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 10541064.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Lin Bill Yuchen and Lu Wei. 2018. Neural adaptation layers for cross-domain named entity recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 20122022.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Liu Xiao, Zhang Fanjin, Hou Zhenyu, Mian Li, Wang Zhaoyu, Zhang Jing, and Tang Jie. 2021. Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering (2021).Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Liu Zhiwei, Fan Ziwei, Wang Yu, and Yu Philip S.. 2021. Augmenting sequential recommendation with pseudo-prior items via reversely pre-training transformer. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 16081612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Liu Zhiwei, Zheng Lei, Zhang Jiawei, Han Jiayu, and Philip S. Yu. 2019. JSCN: Joint spectral convolutional network for cross domain recommendation. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 850859.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Liu Zihan, Winata Genta Indra, and Fung Pascale. 2020. Zero-resource cross-domain named entity recognition. In Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, 16.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Liu Zihan, Xu Yan, Yu Tiezheng, Dai Wenliang, Ji Ziwei, Cahyawijaya Samuel, Madotto Andrea, and Fung Pascale. 2021. CrossNER: Evaluating cross-domain named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 1345213460.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Low Jin Kiat, Ng Hwee Tou, and Guo Wenyuan. 2005. A maximum entropy approach to Chinese word segmentation. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. 161164.Google ScholarGoogle Scholar
  27. [27] Ma Xiaofei, Xu Peng, Wang Zhiguo, Nallapati Ramesh, and Xiang Bing. 2019. Domain adaptation with BERT-based domain classification and data selection. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019). 7683.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Mintz Mike, Bills Steven, Snow Rion, and Jurafsky Dan. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 10031011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Miyato Takeru, Maeda Shin-ichi, Koyama Masanori, and Ishii Shin. 2018. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 8 (2018), 19791993.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Pan Sinno Jialin, Toh Zhiqiang, and Su Jian. 2013. Transfer joint embedding for cross-domain named entity recognition. ACM Transactions on Information Systems (TOIS) 31, 2 (2013), 127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Pei Zhongyi, Cao Zhangjie, Long Mingsheng, and Wang Jianmin. 2018. Multi-adversarial domain adaptation. In Thirty-second AAAI Conference on Artificial Intelligence. 39343941.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Peng Hao, Li Haoran, Song Yangqiu, Zheng Vincent, and Li Jianxin. 2021. Differentially private federated knowledge graphs embedding. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM ’21). Association for Computing Machinery, 14161425.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Peng Hao, Li Jianxin, Song Yangqiu, Yang Renyu, Ranjan Rajiv, Yu Philip S., and He Lifang. 2021. Streaming social event detection and evolution discovery in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 15, 5 (2021), 133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Peng Hao, Zhang Ruitong, Li Shaoning, Cao Yuwei, Pan Shirui, and Yu Philip. 2022. Reinforced, incremental and cross-lingual event detection from social messages. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022), 11.Google ScholarGoogle Scholar
  35. [35] Peng Nanyun and Dredze Mark. 2015. Named entity recognition for Chinese social media with jointly trained embeddings. In Processings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 548554.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Rai Piyush, Saha Avishek, III Hal Daumé, and Venkatasubramanian Suresh. 2010. Domain adaptation meets active learning. In Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing. 2732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Ratinov Lev and Roth Dan. 2009. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009). 147155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Ratnaparkhi Adwait. 1996. A maximum entropy model for part-of-speech tagging. In Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar
  39. [39] Rethmeier Nils and Augenstein Isabelle. 2021. A primer on contrastive pretraining in language processing: Methods, lessons learned & perspectives. ACM Computing Surveys (CSUR) (2021).Google ScholarGoogle Scholar
  40. [40] Sang Erik F. and Meulder Fien De. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050 (2003).Google ScholarGoogle Scholar
  41. [41] Simpson Edwin, Pfeiffer Jonas, and Gurevych Iryna. 2020. Low resource sequence tagging with weak labels. In AAAI. 88628869.Google ScholarGoogle Scholar
  42. [42] Strauss Benjamin, Toma Bethany, Ritter Alan, Marneffe Marie-Catherine De, and Xu Wei. 2016. Results of the WNUT16 named entity recognition shared task. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). 138144.Google ScholarGoogle Scholar
  43. [43] Sun Shiliang, Shi Honglei, and Wu Yuanbin. 2015. A survey of multi-source domain adaptation. Information Fusion 24 (2015), 8492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Tarvainen Antti and Valpola Harri. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. arXiv preprint arXiv:1703.01780 (2017).Google ScholarGoogle Scholar
  45. [45] Wang Zhenghui, Qu Yanru, Chen Liheng, Shen Jian, Zhang Weinan, Zhang Shaodian, Gao Yimei, Gu Gen, Chen Ken, and Yu Yong. 2018. Label-aware double transfer learning for cross-specialty medical named entity recognition. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 115.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Wei Jason and Zou Kai. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 63826388.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Xia Congying, Xiong Caiming, Philip S. Yu, and Socher Richard. 2020. Composed variational natural language generation for few-shot intents. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 33793388.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Xia Congying, Yin Wenpeng, Feng Yihao, and Philip S. Yu. 2021. Incremental few-shot text classification with multi-round new classes: Formulation, dataset and system. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 13511360.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Xia Congying, Zhang Chenwei, Yan Xiaohui, Chang Yi, and Yu Philip. 2018. Zero-shot user intent detection via capsule neural networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 30903099.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Yang Yiying, Yin Xi, Yang Haiqin, Fei Xingjian, Peng Hao, Zhou Kaijie, Lai Kunfeng, and Shen Jianping. 2021. KGSynNet: A novel entity synonyms discovery framework with knowledge graph. In Proceedings of the Database Systems for Advanced Applications: 26th International Conference, DASFAA. 174190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Yang Zhilin, Salakhutdinov Ruslan, and Cohen William W.. 2017. Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint arXiv:1703.06345 (2017).Google ScholarGoogle Scholar
  52. [52] Ye Hai, Tan Qingyu, He Ruidan, Li Juntao, Ng Hwee Tou, and Bing Lidong. 2020. Feature adaptation of pre-trained language models across languages and domains with robust self-training. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 73867399.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Zhang Jianguo, Hashimoto Kazuma, Liu Wenhao, Wu Chien-Sheng, Wan Yao, Philip S. Yu, Socher Richard, and Xiong Caiming. 2020. Discriminative nearest neighbor few-shot intent detection by transferring natural language inference. In EMNLP. 50645082.Google ScholarGoogle Scholar
  54. [54] Zhang Jianguo, Hashimoto Kazuma, Wu Chien-Sheng, Wang Yao, Philip S. Yu, Socher Richard, and Xiong Caiming. 2020. Find or classify? Dual strategy for slot-value predictions on multi-domain dialog state tracking. In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics. 154167.Google ScholarGoogle Scholar
  55. [55] Zhang Tao, Xia Congying, Lu Chun-Ta, and Philip S. Yu. 2020. MZET: Memory augmented zero-shot fine-grained named entity typing. In Proceedings of the 28th International Conference on Computational Linguistics. 7787.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Zhang Tao, Xia Congying, Yu Philip S., Liu Zhiwei, and Zhao Shu. 2021. PDALN: Progressive domain adaptation over a pre-trained model for low-resource cross-domain named entity recognition. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 54415451. .Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Zhou Joey Tianyi, Zhang Hao, Jin Di, Zhu Hongyuan, Fang Meng, Goh Rick Siow Mong, and Kwok Kenneth. 2019. Dual adversarial neural transfer for low-resource named entity recognition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 34613471.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Domain-Invariant Feature Progressive Distillation with Adversarial Adaptive Augmentation for Low-Resource Cross-Domain NER

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 3
      March 2023
      570 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3579816
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 April 2023
      • Online AM: 14 December 2022
      • Accepted: 24 October 2022
      • Revised: 24 September 2022
      • Received: 24 March 2022
      Published in tallip Volume 22, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)216
      • Downloads (Last 6 weeks)29

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!