Abstract
Considering the expensive annotation in Named Entity Recognition (NER), Cross-domain NER enables NER in low-resource target domains with few or without labeled data, by transferring the knowledge of high-resource domains. However, the discrepancy between different domains causes the domain shift problem and hampers the performance of cross-domain NER in low-resource scenarios. In this article, we first propose an adversarial adaptive augmentation, where we integrate the adversarial strategy into a multi-task learner to augment and qualify domain adaptive data. We extract domain-invariant features of the adaptive data to bridge the cross-domain gap and alleviate the label-sparsity problem simultaneously. Therefore, another important component in this article is the progressive domain-invariant feature distillation framework. A multi-grained MMD (Maximum Mean Discrepancy) approach in the framework to extract the multi-level domain invariant features and enable knowledge transfer across domains through the adversarial adaptive data. Advanced Knowledge Distillation (KD) schema processes progressively domain adaptation through the powerful pre-trained language models and multi-level domain invariant features. Extensive comparative experiments over four English and two Chinese benchmarks show the importance of adversarial augmentation and effective adaptation from high-resource domains to low-resource target domains. Comparison with two vanilla and four latest baselines indicates the state-of-the-art performance and superiority confronted with both zero-resource and minimal-resource scenarios.
- [1] . 2017. A multi-task approach for named entity recognition in social media data. In Proceedings of the 3rd Workshop on Noisy User-generated Text. 148–153.Google Scholar
Cross Ref
- [2] . 2009. Named entity recognition in Wikipedia. In Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web). 10–18.Google Scholar
Digital Library
- [3] . 2010. A theory of learning from different domains. Machine Learning 79, 1 (2010), 151–175.Google Scholar
Digital Library
- [4] . 2020. Low-resource name tagging learned with weakly labeled data. In 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019. Association for Computational Linguistics, 261–270.Google Scholar
- [5] . 2020. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning. PMLR, 1597–1607.Google Scholar
- [6] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google Scholar
- [7] . 2019. To annotate or not? Predicting performance drop under domain shift. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2163–2173.Google Scholar
Cross Ref
- [8] . 2015. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning. PMLR, 1180–1189.Google Scholar
Digital Library
- [9] . 2006. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, 1735–1742.Google Scholar
Digital Library
- [10] . 2017. A unified model for cross-domain and semi-supervised named entity recognition in Chinese social media. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31. 3216–3222.Google Scholar
Cross Ref
- [11] . 2019. Cross-domain NER using cross-domain language modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2464–2474.Google Scholar
Cross Ref
- [12] . 2020. Multi-cell compositional LSTM for NER domain adaptation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5906–5917. https://www.aclweb.org/anthology/2020.acl-main.524.Google Scholar
Cross Ref
- [13] . 2015. User interest modeling in Twitter with named entity recognition. In 5th Workshop on Making Sense of Microposts.Google Scholar
- [14] . 2018. Improving topic quality by promoting named entities in topic modeling. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 247–253.Google Scholar
Cross Ref
- [15] . 2016. Domain adaptation for named entity recognition in online media with word embeddings. arXiv preprint arXiv:1612.00148 (2016).Google Scholar
- [16] . 2016. Neural architectures for named entity recognition. In Proceedings of NAACL-HLT. 260–270.Google Scholar
Cross Ref
- [17] . 2018. Transfer learning for named-entity recognition with neural networks. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 4470–4473.Google Scholar
- [18] . 2006. The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. 108–117.Google Scholar
- [19] . 2020. BOND: BERT-assisted open-domain named entity recognition with distant supervision. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1054–1064.Google Scholar
Digital Library
- [20] . 2018. Neural adaptation layers for cross-domain named entity recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2012–2022.Google Scholar
Cross Ref
- [21] . 2021. Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering (2021).Google Scholar
Cross Ref
- [22] . 2021. Augmenting sequential recommendation with pseudo-prior items via reversely pre-training transformer. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1608–1612.Google Scholar
Digital Library
- [23] . 2019. JSCN: Joint spectral convolutional network for cross domain recommendation. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 850–859.Google Scholar
Cross Ref
- [24] . 2020. Zero-resource cross-domain named entity recognition. In Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, 1–6.Google Scholar
Cross Ref
- [25] . 2021. CrossNER: Evaluating cross-domain named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 13452–13460.Google Scholar
Cross Ref
- [26] . 2005. A maximum entropy approach to Chinese word segmentation. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. 161–164.Google Scholar
- [27] . 2019. Domain adaptation with BERT-based domain classification and data selection. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019). 76–83.Google Scholar
Cross Ref
- [28] . 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 1003–1011.Google Scholar
Digital Library
- [29] . 2018. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 8 (2018), 1979–1993.Google Scholar
Cross Ref
- [30] . 2013. Transfer joint embedding for cross-domain named entity recognition. ACM Transactions on Information Systems (TOIS) 31, 2 (2013), 1–27.Google Scholar
Digital Library
- [31] . 2018. Multi-adversarial domain adaptation. In Thirty-second AAAI Conference on Artificial Intelligence. 3934–3941.Google Scholar
Cross Ref
- [32] . 2021. Differentially private federated knowledge graphs embedding. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (
CIKM ’21 ). Association for Computing Machinery, 1416–1425.Google ScholarDigital Library
- [33] . 2021. Streaming social event detection and evolution discovery in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 15, 5 (2021), 1–33.Google Scholar
Digital Library
- [34] . 2022. Reinforced, incremental and cross-lingual event detection from social messages. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022), 1–1.Google Scholar
- [35] . 2015. Named entity recognition for Chinese social media with jointly trained embeddings. In Processings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 548–554.Google Scholar
Cross Ref
- [36] . 2010. Domain adaptation meets active learning. In Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing. 27–32.Google Scholar
Digital Library
- [37] . 2009. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009). 147–155.Google Scholar
Digital Library
- [38] . 1996. A maximum entropy model for part-of-speech tagging. In Conference on Empirical Methods in Natural Language Processing.Google Scholar
- [39] . 2021. A primer on contrastive pretraining in language processing: Methods, lessons learned & perspectives. ACM Computing Surveys (CSUR) (2021).Google Scholar
- [40] . 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050 (2003).Google Scholar
- [41] . 2020. Low resource sequence tagging with weak labels. In AAAI. 8862–8869.Google Scholar
- [42] . 2016. Results of the WNUT16 named entity recognition shared task. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). 138–144.Google Scholar
- [43] . 2015. A survey of multi-source domain adaptation. Information Fusion 24 (2015), 84–92.Google Scholar
Digital Library
- [44] . 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. arXiv preprint arXiv:1703.01780 (2017).Google Scholar
- [45] . 2018. Label-aware double transfer learning for cross-specialty medical named entity recognition. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1–15.Google Scholar
Cross Ref
- [46] . 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 6382–6388.Google Scholar
Cross Ref
- [47] . 2020. Composed variational natural language generation for few-shot intents. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 3379–3388.Google Scholar
Cross Ref
- [48] . 2021. Incremental few-shot text classification with multi-round new classes: Formulation, dataset and system. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1351–1360.Google Scholar
Cross Ref
- [49] . 2018. Zero-shot user intent detection via capsule neural networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3090–3099.Google Scholar
Cross Ref
- [50] . 2021. KGSynNet: A novel entity synonyms discovery framework with knowledge graph. In Proceedings of the Database Systems for Advanced Applications: 26th International Conference, DASFAA. 174–190.Google Scholar
Digital Library
- [51] . 2017. Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint arXiv:1703.06345 (2017).Google Scholar
- [52] . 2020. Feature adaptation of pre-trained language models across languages and domains with robust self-training. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 7386–7399.Google Scholar
Cross Ref
- [53] . 2020. Discriminative nearest neighbor few-shot intent detection by transferring natural language inference. In EMNLP. 5064–5082.Google Scholar
- [54] . 2020. Find or classify? Dual strategy for slot-value predictions on multi-domain dialog state tracking. In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics. 154–167.Google Scholar
- [55] . 2020. MZET: Memory augmented zero-shot fine-grained named entity typing. In Proceedings of the 28th International Conference on Computational Linguistics. 77–87.Google Scholar
Cross Ref
- [56] . 2021. PDALN: Progressive domain adaptation over a pre-trained model for low-resource cross-domain named entity recognition. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 5441–5451. .Google Scholar
Cross Ref
- [57] . 2019. Dual adversarial neural transfer for low-resource named entity recognition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 3461–3471.Google Scholar
Cross Ref
Index Terms
Domain-Invariant Feature Progressive Distillation with Adversarial Adaptive Augmentation for Low-Resource Cross-Domain NER
Recommendations
An Enhanced Approach to Map Domain-Specific Words in Cross-Domain Sentiment Analysis
AbstractDomain adaptation in sentiment analysis is one of the areas where a classifier trained in one domain often classifies sentiments poorly when applied to another domain due to domain-specific words. Extracting features and their relevant opinion ...
Cross-domain feature enhancement for unsupervised domain adaptation
AbstractTill the present, the domain adaptation has been widely researched by transferring the knowledge from a labeled source domain to an unlabeled target domain. Adversarial adaptation methods have achieved great success, learning domain-invariant ...
Domain Adaptation Using Domain Similarity- and Domain Complexity-Based Instance Selection for Cross-Domain Sentiment Analysis
ICDMW '12: Proceedings of the 2012 IEEE 12th International Conference on Data Mining WorkshopsWe propose an approach to domain adaptation that selects instances from a source domain training set, which are most similar to a target domain. The factor by which the original source domain training set size is reduced is determined automatically by ...






Comments