skip to main content
research-article

Construction of a Chinese Corpus for Multi-Type Economic Event Relation

Authors Info & Claims
Published:12 November 2022Publication History
Skip Abstract Section

Abstract

We construct a Chinese Economic Event Treebank (CEETB), focusing on revealing economic and finance events and their relations. Investigating economic event relations will benefit academic research and practice in not just economics but many other scientific areas. The characteristics of economic-related texts (e.g., abundant longer enterprises names and terms) and the Chinese language speciality (e.g., component ellipsis in long sentences) have resulted in challenges in the event relation extraction task. Existing Chinese corpora containing economic event relations mainly focused on finance areas (e.g., the equity market) and only covered a few event types. To support research that may involve economic text analysis in Chinese, our CEETB is constructed following a carefully designed process. First, based on practical and research requirements, we summarize nine different types of event relations and four types of component ellipses in economic texts. Then, an excellent annotation scheme is presented to hyalinize the model, strategy, and process in annotation, followed by statistical analysis and quality evaluation for the CEETB corpus. Finally, to demonstrate the strengths of the constructed corpus in practical applications, we conduct experiments on five SOTA models for event relation extraction.

REFERENCES

  1. [1] Bhattacharjya Debarun, Gao Tian, Mattei Nicholas, and Subramanian Dharmashankar. 2020. Cause-effect association between event pairs in event datasets. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI). 12021208.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Caselli Tommaso and Vossen Piek. 2017. The event storyline corpus: A new benchmark for causal and temporal relation extraction. In Proceedings of the Events and Stories in the News Workshop. 7786.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Cheng Dawei, Yang Fangzhou, Wang Xiaoyang, Zhang Ying, and Zhang Liqing. 2020. Knowledge graph-based event embedding framework for financial quantitative investments. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 22212230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Choubey Prafulla Kumar and Huang Ruihong. 2017. A sequential model for classifying temporal relations between intra-sentence events. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). 17961802.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Cohen Jacob. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin 70, 4 (1968), 213.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Cui Yiming, Che Wanxiang, Liu Ting, Qin Bing, and Yang Ziqing. 2021. Pre-training with whole word masking for Chinese BERT. IEEE Transactions on Audio, Speech and Language Processing (TASLP) 29 (2021), 35043514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Devlin Jacob, Chang Mingwei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 41714186.Google ScholarGoogle Scholar
  8. [8] Ding Xiao, Zhang Yue, Liu Ting, and Duan Junwen. 2015. Deep learning for event-driven stock prediction. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI). 23272333.Google ScholarGoogle Scholar
  9. [9] Du Li, Ding Xiao, Liu Ting, and Li Zhongyang. 2019. Modeling event background for if-then commonsense reasoning using context-aware variational autoencoder. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 26822691.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Eisenberg Joshua D. and Sheriff Michael. 2020. Automatic extraction of personal events from dialogue. In Proceedings of the 1st Joint Workshop on Narrative Understanding, Storylines, and Events. 6371.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Friedrich Annemarie, Adel Heike, Tomazic Federico, Hinger Johannes, Benteau Renou, Maruscyk Anika, and Lange Lukas. 2020. The SOFC-Exp corpus and neural approaches to information extraction in the materials science domain. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). 12551268.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Glavaš Goran, Šnajder Jan, Kordjamshidi Parisa, and Moens Marie-Francine. 2014. HiEve: A corpus for extracting event hierarchies from news stories. In Proceedings of 9th Language Resources and Evaluation Conference (LREC). 36783683.Google ScholarGoogle Scholar
  13. [13] Han Rujun, Zhou Yichao, and Peng Nanyun. 2020. Domain knowledge empowered structured neural net for end-to-end event temporal relation extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 57175729.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Hogenboom Frederik, Frasincar Flavius, Kaymak Uzay, De Jong Franciska, and Caron Emiel. 2016. A survey of event extraction methods from text for decision support systems. Decision Support Systems 85 (2016), 1222.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Hong Yu, Zhang Tongtao, O’Gorman Tim, Horowit-Hendler Sharone, Ji Heng, and Palmer Martha. 2016. Building a cross-document event-event relation corpus. In Proceedings of the 10th Linguistic Annotation Workshop Held in Conjunction with ACL 2016. 16.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Huang Kaiyu, Xiao Keli, Mo Fengran, Jin Bo, Liu Zhuang, and Huang Degen. 2021. Domain-aware word segmentation for Chinese language: A document-level context-aware model. Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 21, 2 (2021), 116.Google ScholarGoogle Scholar
  17. [17] Huang Kung-Hsiang, Yang Mu, and Peng Nanyun. 2020. Biomedical event extraction with hierarchical knowledge graphs. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 12771285.Google ScholarGoogle Scholar
  18. [18] Kruengkrai Canasai, Torisawa Kentaro, Hashimoto Chikara, Kloetzer Julien, Oh Jong-Hoon, and Tanaka Masahiro. 2017. Improving event causality recognition with multiple background knowledge sources using multi-column convolutional neural networks. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI). 34663473.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Lan Zhenzhong, Chen Mingda, Goodman Sebastian, Gimpel Kevin, Sharma Piyush, and Soricut Radu. 2020. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the 8th International Conference on Learning Representations (ICLR). 117.Google ScholarGoogle Scholar
  20. [20] Leeuwenberg Artuur and Moens Marie-Francine. 2020. Towards extracting absolute event timelines from English clinical reports. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) 28 (2020), 27102719.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Li Diya, Huang Lifu, Je Heng, and Han Jiawei. 2019. Biomedical event extraction based on knowledge-driven tree-LSTM. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 14211430.Google ScholarGoogle Scholar
  22. [22] Li Peifeng, Zhu Qiaoming, and Zhou Guodong. 2016. Semantics-based joint model of Chinese event trigger extraction. Journal of Software 27, 2 (2016), 280294.Google ScholarGoogle Scholar
  23. [23] Li Quanzhi and Zhang Qiong. 2020. A unified model for financial event classification, detection and summarization. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI). 46684674.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Li Yancui, Feng Wenhe, Sun Jing, Kong Fang, and Zhou Guodong. 2014. Building Chinese discourse corpus with connective-driven dependency tree structure. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 21052114.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Liang Xin, Cheng Dawei, Yang Fanzhou, Luo Yifeng, Qain Weining, and Zhou Aoying. 2020. F-HMTC: Detecting financial events for investment decisions based on neural hierarchical multi-label text classification. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI). 44904496.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Liu Kang, Chen Yubo, Liu Jian, Zuo Xinyu, and Zhao Jun. 2020. Extracting events and their relations from texts: A survey on recent research progress and challenges. AI Open 1 (2020), 2239.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Liu Xiao, Huang Heyan, and Zhang Yue. 2019. Open domain event extraction using neural latent variable models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL). 28602871.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis Mike, Zettlemoyer Luke, and Stoyanov Veselin. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019), 113.Google ScholarGoogle Scholar
  29. [29] Lv Shuxiang. 1979. Analysis of Chinese Grammar. The Commercial Press, Shanghai.Google ScholarGoogle Scholar
  30. [30] Min Bonan, Srivastava Manaj, Qiu Haoling, Muthukumar Prasannakumar, and Fasching Joshua. 2020. LearnIt: On-demand rapid customization for event-event relation extraction. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI). 1363013631.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Mirza Paramita, Sprugnoli Rachele, Tonelli Sara, and Speranza Manuela. 2014. Annotating causality in the TempEval-3 corpus. In Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language. 1019.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Mostafazadeh Nasrin, Grealish Alyson, Chambers Nathanael, Allen James, and Vanderwende Lucy. 2016. CaTeRS: Causal and temporal relation scheme for semantic annotation of event structures. In Proceedings of the 4th Workshop on Events: Definition, Detection, Coreference, and Representation. 5161.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Munir Kashif, Zhao Hai, and Li Zuchao. 2021. Learning context-aware convolutional filters for implicit discourse relation classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) 29 (2021), 24212433.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Naik Aakanksha and Rosé Carolyn. 2020. Towards open domain event trigger identification using adversarial domain adaptation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). 76187624.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Ning Qiang, Feng Zhili, Wu Hao, and Roth Dan. 2018. Joint reasoning for temporal and causal relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL). 22782288.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Ning Qiang, Wu Hao, and Roth Dan. 2018. A multi-axis annotation scheme for event temporal relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL). 13181328.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] O’Gorman Tim, Wright-Bettner Kristin, and Palmer Martha. 2016. Richer event description: Integrating event coreference with temporal, causal and bridging annotation. In Proceedings of the 2nd Workshop on Computing News Storylines. 4756.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Pustejovsky James, Hanks Patrick, Sauri Roser, See Andrew, Gaizauskas Robert, Setzer Andrea, Radev Dragomir, Sundheim Beth, Day David, Ferro Lisa, et al. 2003. The timebank corpus. In Corpus Linguistics, Vol. 2003. 40.Google ScholarGoogle Scholar
  39. [39] Qian Yu, Deng Xiongwen, Ye Qiongwei, Ma Baojun, and Yuan Hua. 2019. On detecting business event from the headlines and leads of massive online news articles. Information Processing and Management (IPM) 56, 6 (2019), 115.Google ScholarGoogle Scholar
  40. [40] Qin Yanxia, Zhang Yue, Zhang Min, and Zheng Dequan. 2017. Semantic-frame representation for event detection on Twitter. In Proceedings of the 2017 International Conference on Asian Language Processing. 264267.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Ramponi Alan, Goot Rob van der, Lombardo Rosario, and Plank Barbara. 2020. Biomedical event extraction as sequence labeling. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 53575367.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Shen Shirong, Qi Guilin, Li Zhen, Bi Sheng, and Wang Lusheng. 2020. Hierarchical Chinese legal event extraction via pedal attention mechanism. In Proceedings of the 28th International Conference on Computational Linguistics (COLING). 100113.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Sims Matthew, Park Jong Ho, and Bamman David. 2019. Literary event detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL). 36233634.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Sun Weiyi, Rumshisky Anna, and Uzuner Ozlem. 2013. Evaluating temporal relations in clinical text: 2012 i2b2 challenge. Journal of the American Medical Informatics Association 20 (2013), 806813.Google ScholarGoogle Scholar
  45. [45] Sun Yu, Wang Shuohuan, Li Yukun, Feng Shikun, Chen Xuyi, Zhang Han, Tian Xin, Zhu Danxiang, Tian Hao, and Wu Hua. 2019. ERNIE: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019), 18.Google ScholarGoogle Scholar
  46. [46] UzZaman Naushad, Llorens Hector, Derczynski Leon, Allen James, Verhagen Marc, and Pustejovsky James. 2013. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations. In Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). 19.Google ScholarGoogle Scholar
  47. [47] Vempala Alakananda and Blanco Eduardo. 2020. Extracting biographical spatial timelines: Corpus and experiments. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) 28 (2020), 13951403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Vo Duc-Thuan, Al-Obeidat Feras, and Bagheri Ebrahim. 2020. Extracting temporal and causal relations based on event networks. Information Processing and Management (IPM) 57, 6 (2020), 122.Google ScholarGoogle Scholar
  49. [49] Wan Changxuan, Peng Yun, Xiao Keli, Liu Xiping, Jiang Tengjiao, and Liu Dexi. 2020. An association-constrained LDA model for joint extraction of product aspects and opinions. Information Sciences 519 (2020), 243259.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Wan Qizhi, Wan Changxuan, Hu Rong, and Liu Dexi. 2021. Chinese financial event extraction based on syntactic and semantic dependency parsing. Chinese Journal of Computer 44, 3 (2021), 508530.Google ScholarGoogle Scholar
  51. [51] Wang Haoyu, Chen Muhao, Zhang Hongming, and Roth Dan. 2020. Joint constrained learning for event-event extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 696706.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Wang Haoyu, Chen Muhao, Zhang Hongming, and Roth Dan. 2020. Joint constrained learning for event-event relation extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 696706.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Wang Li. 1985. Modern Chinese Grammar. The Commercial Press, Shanghai.Google ScholarGoogle Scholar
  54. [54] Wang Rui, Zhou Deyu, and He Yulan. 2019. Open event extraction from online text using a generative adversarial network. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 282291.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Xu Sheng, Li Peifeng, Kong Fang, Zhu Qiaoming, and Zhou Guodong. 2019. Topic tensor network for implicit discourse relation recognition in Chinese. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL). 608618.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Yang Hang, Chen Yubo, Liu Kang, Xiao Yang, and Zhao Jun. 2018. DCFEE: A document-level Chinese financial event extraction system based on automatically labeled training data. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics-System Demonstrations. 16.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Yao Shibo, Yu Dantong, and Xiao Keli. 2019. Enhancing domain word embedding via latent semantic imputation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). 557565.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Zhang Dongyu, Lin Hongfei, Yang Liang, Zhang Shaowu, and Xu Bo. 2018. Construction of a Chinese corpus for the analysis of the emotionality of metaphorical expressions. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL). 144150.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Zhang Liang, Xiao Keli, Zhu Hengshu, Liu Chuanren, Yang Jingyuan, and Jin Bo. 2018. CADEN: A context-aware deep embedding network for financial opinions mining. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM). 757766.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Zheng Shun, Cao Wei, Xu Wei, and Bian Jiang. 2019. Doc2EDAG: An end-to-end document-level framework for Chinese financial event extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 337346.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Zhou Guangyou, Xie Zhiwen, Yu Zongfu, and Huang Jimmy Xiangji. 2021. DFM: A parameter-shared deep fused model for knowledge base question answering. Information Sciences 547 (2021), 103118.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Zhou Yang, Chen Yubo, Zhao Jun, Wu Yin, Xu Jiexin, and Li Jinlong. 2021. What the role is vs. what plays the role: Semi-supervised event argument extraction via dual question answering. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), Vol. 35. 1463814646.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Construction of a Chinese Corpus for Multi-Type Economic Event Relation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 6
        November 2022
        372 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3568970
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 November 2022
        • Online AM: 26 March 2022
        • Accepted: 15 March 2022
        • Revised: 20 February 2022
        • Received: 18 October 2021
        Published in tallip Volume 21, Issue 6

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)124
        • Downloads (Last 6 weeks)8

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!