Abstract
Relation classification (sometimes called relation extraction) requires trustworthy datasets for fine-tuning large language models, as well as for evaluation. Data collection is challenging for Indian languages, because they are syntactically and morphologically diverse, as well as different from resource-rich languages like English. Despite recent interest in deep generative models for Indian languages, relation classification is still not well served by public datasets. In response, we present IndoRE, a dataset with 21K entity- and relation-tagged gold sentences in three Indian languages (Bengali, Hindi, and Telugu), plus English. We start with a multilingual BERT (mBERT)-based system that captures entity span positions and type information, and provides competitive performance on monolingual relation classification. Using this baseline system, we explore transfer mechanisms between languages and the scope to reduce expensive data annotation while achieving reasonable relation extraction performance. Specifically, we
(a) | study the accuracy-efficiency trade-off between expensive, manually labeled gold instances vs. automatically translated and aligned silver instances to train a relation extractor, | ||||
(b) | device a simple mechanism for budgeted gold data annotation by intelligently converting distant-supervised silver training instances to gold training instances with human annotators using active learning, and finally | ||||
(c) | propose an ensemble model to provide a performance boost over that achieved via limited gold training instances. | ||||
- [1] . 1988. Queries and concept learning. Machine Learning 2, 4 (1988), 319–342.Google Scholar
Cross Ref
- [2] . 2019. Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671.Google Scholar
- [3] . 2009. Agnostic active learning. Journal of Computer and System Sciences 75, 1 (2009), 78–89.Google Scholar
Digital Library
- [4] . 2019. Matching the blanks: Distributional similarity for relation learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2895–2905. Google Scholar
Cross Ref
- [5] . 2020. RECON: Relation extraction using knowledge graph context in a graph neural network. In Proceedings of the Web Conference 2021 (WWW’21). Google Scholar
Cross Ref
- [6] . 2018. Adversarial training for multi-context joint entity and relation extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2830–2836. Google Scholar
Cross Ref
- [7] . 2018. The power of ensembles for active learning in image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9368–9377.Google Scholar
Cross Ref
- [8] . 2009. Importance weighted active learning. In Proceedings of the 26th Annual International Conference on Machine Learning. 49–56.Google Scholar
Digital Library
- [9] . 2009. Link-based active learning. In Proceedings of the NIPS Workshop on Analyzing Networks and Learning with Graphs, Vol. 4.Google Scholar
- [10] . 2014. Bucking the trend: Large-scale cost-focused active learning for statistical machine translation. arXiv preprint arXiv:1410.5877.Google Scholar
- [11] . 2015. Weight uncertainty in neural network. In Proceedings of the International Conference on Machine Learning. 1613–1622.Google Scholar
- [12] . 2016. Bidirectional recurrent convolutional neural network for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 756–765. Google Scholar
Cross Ref
- [13] . 2020. Efficient long-distance relation extraction with DG-SpanBERT. arXiv preprint arXiv:2004.03636. Google Scholar
Cross Ref
- [14] . 2020. Joint learning with pre-trained transformer on named entity recognition and relation extraction tasks for clinical analytics. In Proceedings of the 3rd Clinical Natural Language Processing Workshop. 234–242. Google Scholar
Cross Ref
- [15] . 1995. Committee-based sampling for training probabilistic classifiers. In Proceedings of the 12th International Conference on Machine Learning. 150–157.Google Scholar
Digital Library
- [16] . 2005. Analysis of perceptron-based active learning. In Proceedings of the International Conference on Computational Learning Theory. 249–263.Google Scholar
Digital Library
- [17] . 2015. Classifying relations by ranking with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 626–634. Google Scholar
Cross Ref
- [18] . 2019. Span-based joint entity and relation extraction with transformer pre-training.
arxiv:cs.CL/1909.07755. Google Scholar - [19] . 2012. Active learning for clinical text classification: Is it better than random sampling? Journal of the American Medical Informatics Association 19, 5 (2012), 809–816.Google Scholar
Cross Ref
- [20] . 1997. Selective sampling using the query by committee algorithm. Machine Learning 28, 2 (1997), 133–168.Google Scholar
Digital Library
- [21] . 2014. Selecting influential examples: Active learning with expected model output changes. In Proceedings of the European Conference on Computer Vision. 562–577.Google Scholar
Cross Ref
- [22] . 2019. GraphRel: Modeling text as relational graphs for joint entity and relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1409–1418. Google Scholar
Cross Ref
- [23] . 2016. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning. 1050–1059.Google Scholar
Digital Library
- [24] . 2017. Deep Bayesian active learning with image data. In Proceedings of the International Conference on Machine Learning. 1183–1192.Google Scholar
- [25] . 2013. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211.Google Scholar
- [26] . 2010. Active instance sampling via matrix partition. In Advances in Neural Information Processing Systems (NIPS’10). 802–810.Google Scholar
- [27] . 2009. Active learning for statistical phrase-based machine translation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 415–423.Google Scholar
Digital Library
- [28] . 2018. Hierarchical relation extraction with coarse-to-fine grained attention. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2236–2245. Google Scholar
Cross Ref
- [29] . 2018. FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4803–4809. Google Scholar
Cross Ref
- [30] . 2010. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation. 33–38. https://www.aclweb.org/anthology/S10-1006.Google Scholar
Digital Library
- [31] . 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 541–550. https://www.aclweb.org/anthology/P11-1055.Google Scholar
Digital Library
- [32] . 2016. Relation extraction with multi-instance multi-label convolutional neural networks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1471–1480. https://www.aclweb.org/anthology/C16-1139.Google Scholar
- [33] . 2020. Relation of the relations: A new paradigm of the relation extraction problem. arXiv preprint arXiv:2006.03719. Google Scholar
Cross Ref
- [34] . 2009. Multi-class active learning for image classification. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2372–2379.Google Scholar
Cross Ref
- [35] . 2020. SpanBERT: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics 8 (2020), 64–77. Google Scholar
Cross Ref
- [36] . 2018. Measuring catastrophic forgetting in neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google Scholar
Cross Ref
- [37] . 2021. MergeDistill: Merging pre-trained language models using distillation.
arxiv:cs.CL/2106.02834. Google Scholar - [38] . 2004. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427, 6971 (2004), 247–252.Google Scholar
Cross Ref
- [39] . 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences 114, 13 (2017), 3521–3526.Google Scholar
Cross Ref
- [40] . 2014. Cross-lingual model transfer using feature representation projection. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 579–585. Google Scholar
Cross Ref
- [41] . 2002. Algorithms for optimal scheduling and management of hidden Markov model sensors. IEEE Transactions on Signal Processing 50, 6 (2002), 1382–1397.Google Scholar
Digital Library
- [42] . 2019. Semantic relation classification via bidirectional LSTM networks with entity-aware attention using latent entity typing. arXiv preprint arXiv:1901.08163. Google Scholar
Cross Ref
- [43] . 2017. MIT at SemEval-2017 task 10: Relation extraction with convolutional neural networks. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval’17). 978–984. Google Scholar
Cross Ref
- [44] . 1995. A sequential algorithm for training text classifiers. ACM SIGIR Forum 29, 2 (1995), 13–19.Google Scholar
- [45] . 2016. Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2124–2133. Google Scholar
Cross Ref
- [46] . 2018. Neural relation extraction via inner-sentence noise reduction and transfer learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2195–2204. Google Scholar
Cross Ref
- [47] . 2015. A dependency-based neural network for relation classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 285–290. Google Scholar
Cross Ref
- [48] . 1998. Employing EM and pool-based active learning for text classification. In Proceedings of the International Conference on Machine Learning (ICML’98). 359–367.Google Scholar
- [49] . 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 1003–1011. https://www.aclweb.org/anthology/P09-1113.Google Scholar
Digital Library
- [50] . 2016. End-to-end relation extraction using LSTMs on sequences and tree structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1105–1116. Google Scholar
Cross Ref
- [51] . 2021. KGPool: Dynamic knowledge graph context selection for relation extraction. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, 535–548. Google Scholar
Cross Ref
- [52] . 2021. A data bootstrapping recipe for low-resource multilingual relation classification. In Proceedings of the 25th Conference on Computational Natural Language Learning. 575–587.Google Scholar
Cross Ref
- [53] . 2019. End-to-end neural relation extraction using deep biaffine attention. Advances in Information Retrieval 2019 (2019), 729–738. Google Scholar
Cross Ref
- [54] . 2004. Active learning using pre-clustering. In Proceedings of the 21st International Conference on Machine Learning. 79.Google Scholar
Digital Library
- [55] . 2015. Relation extraction: Perspective from convolutional neural networks. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 39–48. Google Scholar
Cross Ref
- [56] . 2020. Cross-lingual relation extraction with transformers.
arxiv:cs.CL/2010.08652. Google Scholar - [57] . 2020. AdapterHub: A framework for adapting transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Systems Demonstrations (EMNLP’20). 46–54. https://www.aclweb.org/anthology/2020.emnlp-demos.7.Google Scholar
Cross Ref
- [58] . 2020. MAD-X: An adapter-based framework for multi-task cross-lingual transfer.
arxiv:cs.CL/2005.00052. Google Scholar - [59] . 2019. How multilingual is multilingual BERT?
arxiv:cs.CL/1906.01502. Google Scholar - [60] . 2017. Deep active learning for image classification. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP’17). IEEE, Los Alamitos, CA, 3934–3938.Google Scholar
Digital Library
- [61] . 2010. Modeling relations and their mentions without labeled text. In Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, Vol. 10535. Springer, 148–163. Google Scholar
- [62] . 2001. Toward optimal active learning through Monte Carlo estimation of error reduction. ICML, Williamstown 2 (2001), 441–448.Google Scholar
- [63] . 2018. Overcoming catastrophic forgetting with hard attention to the task. In Proceedings of the International Conference on Machine Learning. 4548–4557.Google Scholar
- [64] . 2007. Multiple-instance active learning. Advances in Neural Information Processing Systems 20 (2007), 1289–1296.Google Scholar
- [65] . 1992. Query by committee. In Proceedings of the 5th Annual Workshop on Computational Learning Theory. 287–294.Google Scholar
Digital Library
- [66] . 2007. Large margin hidden Markov models for automatic speech recognition. Advances in Neural Information Processing Systems 19 (2007), 1249.Google Scholar
- [67] . 2016. Attention-based convolutional neural network for semantic relation extraction. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2526–2536. https://www.aclweb.org/anthology/C16-1238.Google Scholar
- [68] . 2017. Deep active learning for named entity recognition. arXiv preprint arXiv:1707.05928.Google Scholar
- [69] . 2020. Deep active learning: Unified and principled method for query and training. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 1308–1318.Google Scholar
- [70] . 2019. Matching the blanks: Distributional similarity for relation learning.
arxiv:cs.CL/1906.03158. Google Scholar - [71] . 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 455–465. https://www.aclweb.org/anthology/D12-1042.Google Scholar
Digital Library
- [72] . 1999. Active learning for natural language parsing and information extraction. In Proceedings of the 16th International Conference on Machine Learning (ICML’99). 406–414.Google Scholar
- [73] . 2001. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2 (Nov. 2001), 45–66.Google Scholar
- [74] . 2018. RESIDE: Improving distantly-supervised neural relation extraction using side information. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1257–1266. Google Scholar
Cross Ref
- [75] . 2020. Two are better than one: Joint entity and relation extraction with table-sequence encoders. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 1706–1721. Google Scholar
Cross Ref
- [76] . 2016. Relation classification via multi-level attention CNNs. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1298–1307. Google Scholar
Cross Ref
- [77] . 2021. Efficient test time adapter ensembling for low-resource language varieties. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 730–737. Google Scholar
Cross Ref
- [78] . 2019. Enriching pre-trained language model with entity information for relation classification. arXiv preprint arXiv:1905.08284. Google Scholar
Cross Ref
- [79] . 2017. Adversarial training for relation extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1778–1783. Google Scholar
Cross Ref
- [80] . 2016. Semantic relation classification via hierarchical recurrent neural network with attention. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1254–1263. https://www.aclweb.org/anthology/C16-1119.Google Scholar
- [81] . 2015. Semantic relation classification via convolutional neural networks with simple negative sampling. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 536–540. Google Scholar
Cross Ref
- [82] . 2016. Improved relation classification by deep recurrent neural networks with data augmentation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1461–1470. https://www.aclweb.org/anthology/C16-1138.Google Scholar
- [83] . 2015. Classifying relations via long short term memory networks along shortest dependency paths. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1785–1794. Google Scholar
Cross Ref
- [84] . 2020. GDPNet: Refining latent multi-view graph for relation extraction. arXiv preprint arXiv:2012.06780. Google Scholar
Cross Ref
- [85] . 2020. LUKE: Deep contextualized entity representations with entity-aware self-attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 6442–6454. Google Scholar
Cross Ref
- [86] . 2019. Distant supervision relation extraction with intra-bag and inter-bag attentions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers). 2810–2819. Google Scholar
Cross Ref
- [87] . 2017. Deep similarity-based batch mode active learning with exploration-exploitation. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM’17). IEEE, Los Alamitos, CA, 575–584.Google Scholar
Cross Ref
- [88] . 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1753–1762. Google Scholar
Cross Ref
- [89] . 2014. Relation classification via convolutional deep neural network. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 2335–2344. https://www.aclweb.org/anthology/C14-1220.Google Scholar
- [90] . 2017. Incorporating relation paths in neural relation extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1768–1777. Google Scholar
Cross Ref
- [91] . 2018. Efficient active learning of sparse halfspaces. In Proceedings of the Conference on Learning Theory. 1856–1880.Google Scholar
- [92] . 2015. Relation classification via recurrent neural network. arXiv preprint arXiv:1508.01006. Google Scholar
Cross Ref
- [93] . 2015. Bidirectional long short-term memory networks for relation classification. In Proceedings of the 29th Pacific Asia Conference on Language, Information, and Computation. 73–78. https://www.aclweb.org/anthology/Y15-1009.Google Scholar
- [94] . 2017. Position-aware attention and supervised data improve slot filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 35–45. Google Scholar
Cross Ref
- [95] . 2019. Diverse mini-batch active learning. arXiv preprint arXiv:1901.05954.Google Scholar
- [96] . 2017. Joint extraction of entities and relations based on a novel tagging scheme. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1227–1236. Google Scholar
Cross Ref
- [97] . 2016. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 207–212. Google Scholar
Cross Ref
Index Terms
Transfer Learning for Low-Resource Multilingual Relation Classification
Recommendations
Experiments on pattern-based relation learning
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementRelation extraction is the task of extracting semantic relations - such as synonymy or hypernymy - between word pairs from corpus data. Past work in relation extraction has concentrated on manually creating templates to use in directly extracting word ...
Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning
Existing supervised solutions for Named Entity Recognition (NER) typically rely on a large annotated corpus. Collecting large amounts of NER annotated corpus is time-consuming and requires considerable human effort. However, collecting small amounts of ...
Enriching Pre-trained Language Model with Entity Information for Relation Classification
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge ManagementRelation classification is an important NLP task to extract relations between entities. The state-of-the-art methods for relation classification are primarily based on Convolutional or Recurrent Neural Networks. Recently, the pre-trained BERT model ...






Comments