Leveraging Knowledge-aware Methodologies for Multi-document Summarization

With the development of information technology, a large amount of information and corpora has been incrementally sparked from the Web, stimulating an increasingly high demand for summarizing. Document Summarization is one of Natural Language Processing tasks, which aims to generate abridged versions of a given single or multiple documents as concise and coherent as possible while preserving salient information from the source texts. Recent research in the area has started to use knowledge graphs as they can capture more factual and applicable information from more facets along with source information, benefiting fact consistency and informativeness of generated summaries, rather than just from a linguistic perspective. However, there is no explicit investigation of the effects of different kinds of knowledge graphs on document summarization. The proposed method is to use structured informative and knowledgeable auxiliary information, especially knowledge graphs, into pre-trained summarization models, advancing summary qualities. Expected outcomes are exploring knowledge and knowledge graph incorporation for multi-document summarization, and achieving more informative, coherent, and factually consistent summaries.


INTRODUCTION
As the development of information technology has incrementally sparked a large quantity of text data and corpus, there is an increasingly high demand for summarizing documents to assist users in gathering the most important and relevant information quickly and easily.Document Summarization (DS) is a Natural Language Processing (NLP) task of generating an abridged version of a given single or multiple documents as concise and coherent as possible while preserving salient and consistent information from the source text.According to the number of input source texts, the document summarization task can be categorized as single document summarization (SDS) task and multi-document summarization (MDS) task.In contrast to single document summarization, multi-document summarization can generate more comprehensive and objective digests, achieving a higher quality of generated summaries.That is because multi-document summarization targets generating a compressed and informative summary across a set of topic-related documents from diverse times, covering various perspectives [16].Accordingly, multi-document summarization tasks have effectively contributed to a wide range of real-world applications such as the generation of summative texts from news, scientific publications, emails, product reviews, lecture feedback, Wikipedia articles, medical documents, and software project activities [16].Consequently, with an increasing abundance of demands and requirements for significant information synthesis from academia and industries, improving multi-document summarization performance has been pumped into enormous research interests and attention.
Knowledge-aware approaches have boosted a range of natural language processing applications over the last decades, such as question-answering [24] and recommendation systems [9].That is because knowledge can represent more applicable information from more facets with the source information, benefiting the informativeness and fact consistency of the generated textual results.With the gathered momentum, knowledge recently has gradually attracted considerable attention in research of document summarization.Generally, advanced knowledge-aware summarization methods utilize graph structures to capture and incorporate the knowledge into models.Identifying and proving the effectiveness of leveraging knowledge and knowledge graphs into document summarization tasks by empirical results is still a potential research direction in the document summarization field.
Relatively, synthesizing a large quantity of topic-related documents would lead to content complementary, overlapping, and conflicting problems, bringing out model degradation and abridgment quality reduction [11].To improve the summarization performance for multiple input documents, a multi-document summarization model highly requires some model capabilities such as (1) analytically processing voluminous documents; (2) faithfully recognizing salient information; and (3) efficiently evaluating and integrating factually consistent snippets [16].Some recent key challenges of multi-document summarization are concentrated on topics about fact-awareness and summary logicality, including the lack of proper inter-document content-aware information, improper logical flow of information, and the need for external deep context representations [18].Therefore, it is still a worthy research direction for multi-document summarization to explore more precise ways with adequate information that can generate high-quality and comprehensive summaries across massive input documents while preserving the coherence, succinctness, and factual consistency of generated summaries.

PROBLEM
This research within three years aims to explore some possible methods of utilizing knowledge or structured knowledge graphs, which can assist in multi-document summarization to efficiently generate more informative, coherent, and factually consistent summaries from a number of long documents.Moreover, evidencing the extent of the benefits of document summarization by empirical results from the usage of knowledge graphs is the following goal of this research.Existing works have reported that knowledge graphs leveraged in document summarization can be constructed from source corpora [10,18,26,27] or extracted from open-source knowledge graph datasets [6,8].And those knowledge-enhanced document summarizers with knowledge graphs achieved better performance than standard document summarizers to varying degrees [19].However, no clear or strong comparison evidence clarifies the benefit of each kind of knowledge graph for document summarization.Also, the potential advantages of knowledge graphs for document summarization, such as grammar correction, redundancy elimination, coherence enhancement, genericity improvement, and fact correction capabilities, lack empirical exploration and investigation.Therefore, more specifically, the following general research questions are required to be addressed in this research: • How to provide clear and supportive evidence on the benefits of different kinds of knowledge bases and knowledge graphs leveraged in the document summarization, for leading further research on effectively choosing and using specific knowledge or knowledge graphs for relative document summarization tasks?• How to provide clear and supportive evidence on the benefits of fusing different kinds of knowledge bases and the profits of merging various knowledge graphs in the perspective of document summarization, for leading further investigations on effectively choosing and using specific combinations of knowledge or knowledge graphs for relative document summarization tasks?• What is a novel and efficient method for the multi-document summarization task with the usage of knowledge or knowledge graphs, advancing the informativeness, coherence, and factual consistency of generated summaries with no additional model burden of the summarizer model?

STATE OF THE ART
Along with the prosperity of knowledge-aware research in the natural language processing field, more and more document summarization models attempted to incorporate knowledge graphs to enhance the quality of generated summaries.SDS with KG Gunel et al. [6]  MDS with KG Zhou et al. [26] presented an entity-aware model for abstractive multi-document summarization, called EMSum, augmenting the classical Transformer-based encoder with a knowledge graph consisting of text units and entities as nodes while utilizing Graph Attention Networks (GAT).Relying on this design, EMSum allows to capture the cross-document information and identify relative information among documents, significantly benefiting the multi-document summarization task.Specifically, the utilized knowledge graph is constructed by extracted semantic entities by the co-reference resolution tool from AllenNLP.Pasunuru et al. [18] presented an efficient graph-enhanced approach denoted as BART-Long-Graph for the multi-document summarization task that achieved remarkable results on benchmark multi-document summarization datasets, Multi-News [5] and DUC-2004.This summarizer is based on the pre-trained BART Seq2Seq Transformer-based model [13] with an integration of a Longformer, containing both the local and global attention mechanisms, for encoding long texts.Additionally, it leveraged a knowledge graph by linearizing and encoding the graphical information within a separate graph encoder.To construct the semantic knowledge graph, Pasunuru et al. [18] utilized AllenNLP at the document level and OpenIE at the sentence level to capture the multi-level semantic information within documents, with more informativeness and factually consistent features.

PROPOSED APPROACH
In order to fill the gap of validating the effect of each knowledge graph or merged knowledge graph in document summarization, we design to incorporate each kind of knowledge graph on the document summarizer and analyze its influenced performance.Knowledge graphs for investigation are mainly constructed by information tools from source corpora or adopted from open-source resources.The edge relationships in those knowledge graphs can be syntactic, semantic, or discourse relationships.Regarding the effectiveness of pre-training models shown in summarization studies, pre-training Transformer-based language models are chosen as base model architectures for experimenting with the incorporation of single knowledge graphs, varied knowledge graphs, or mixed knowledge graphs for document summarization in this research.
The pre-training Transformer-based models can be but are not limited to Transformer [23], BERT [4], BART [13], and T5 [20].Due to the powerful capability of large language models in knowledgeaware document summarization, we also consider exploring the usage of knowledge graphs with them via prompting.
In addition, instead of investigating those kinds of knowledge graphs solely, we also focus on studying the combination of those graphs with other graphs, such as reference graphs or bipartite graphs.Moreover, we emphasize combinations of varied kinds of knowledge graphs, which is the fusion of knowledge graphs (e.g., fusing semantic knowledge graphs constructed from source texts and external open-source semantic knowledge graphs).We design to estimate it in two directions: 1) concatenating knowledge graphs into one whole graph from the graph perspective; 2) appending their linearized sequences into one sequence from the sequence perspective, inserting them into the Transformer-based summarization model.Following it, we design to test knowledge graph sequences or embeddings in the encoder or decoder part of the Transformerbased summarization model to estimate the desirable positions for knowledge graphs leveraged for the document summarizer.

METHODOLOGY
As a start to verify the effectiveness of knowledge graphs in document summarization, we decide to first focus on graph utilization in scientific paper summarization.Scientific paper summarization has been given more and more attention along with the development of the summarization field and the increasing demand for scientific research.Nowadays, works tend to focus on adopting research community information such as paper citations [3,22] to assist in forming summaries.To overcome the gap of summarization for newly published scientific papers, An et al. [1] propose to use reference papers rather than citing papers of the input paper to construct citation graphs of it to enhance paper summarization.Following that, Luo et al. [15] employ the pre-trained BERT encoder and graph contrastive learning method with citation graphs to improve the summarization performance.However, citation information may not be applicable to all papers.Papers without citations or references may not be able to derive the profit in summarization from citation-aware summarization models [3].
Inspired by the work using citation graphs for scientific paper summarization [15], we design and construct knowledge graphs to assist in graph contrastive learning for multi-document summarization of chemical research papers.As shown in Figure 1, the knowledge graph we use is built upon the input paper and its knowledge-related papers.The knowledge relation is according to the overlap similarity between the knowledge label of the input paper and the knowledge labels of other papers.To create knowledge graphs, we collect 2,254 instances of scientific papers in the field of chemical hydrogen evolution reactions from the open-source database Web of Science.And we gather expert-labeled tags of core information mentioned in paper abstracts, including reaction  For the following work in this research, the datasets that are regarded as baseline datasets in the document summarization field are considered, such as CNN/Daily Mail [7], XSum [17], and Multi-News [5].Also, self-built datasets will be expanded and utilized during this research, based on scientific papers from open-source Web corpus, such as the Web of Science.As for evaluation metrics, ROUGE [14], BERTScore [25], and human evaluation are considered as they are popular evaluation metrics in the document summarization area.In addition, some of the state-of-the-art evaluation metrics will also be considered as well, such as FactCC [12] and QuestEval [21].This is due to limitations of ROUGE and BERTScore mentioned by some works in the summarization field about their correlation from human judgments are not that rational [2].

RESULTS
We present the ROUGE F1 score of our proposed methods so far with our chemical dataset in Table 1.
There are three methods we have experimented with so far.
• +  : We define a score that is the overlap similarity of knowledge labels between two documents as the graph edge weight.We use this weighted edge to assist in identifying highly correlated text information as positive content for graph contrastive learning.• +  : We update weights on a bipartite graph, which includes connected sentences and tokens, based on knowledge labels.Once a sentence is detected to contain matched knowledge labels, we add additional weights to the sentence to emphasize its significance in the training process.• +/: We construct 1-hop knowledge graphs using knowledge labels that are corresponding to each document.The graph nodes are documents and the graph edges are the overlap similarities between two connected documents within the knowledge graph.We use this graph along with the citation graph to collect highly relevant documents as paper clusters for relevant sentence selection.

CONCLUSION AND FUTURE WORKS
We propose to investigate the effects of knowledge and knowledge graphs for document summarization.At this stage, we construct and leverage knowledge graphs with citation graphs in graph contrastive learning and in an encoder-decoder model architecture with pre-trained encoders for scientific paper summarization.Despite current experimental results showing that knowledge graphs can advance to improve the quality of generated summaries, those results are still vulnerable to be compared with the state-of-the-art multi-document summarization models, which indicates the improvement space.Also, in our proposed method so far, knowledge graphs are mainly used for selecting input texts for contrastive learning.The usage and methods of knowledge graph embeddings can be further research directions.Moreover, different knowledge graphs, such as the syntactic, semantic, and discourse knowledge graphs, are also worth experimenting with.Furthermore, applying the idea of leveraging knowledge graphs for multi-modal tasks to present varied facets of information simultaneously, into novel pretrained Transformer-based models, or prompting large language models for further summarization performance enhancement can be future works of this research.

Figure 1 :
Figure 1: The model overview.Nodes in graph with R denote reference papers; with K denote knowledge-related papers.
[27]osed a novel architecture for single-document summarization, leveraging the entity-level knowledge from the Wikidata knowledge graph into an extended encoderdecoder architecture based on the Transformer-XL.In Gunel et al.[6]'s work, 5 million information entities with 25 million relationship semantic triples were extracted and sampled from the Wikidata knowledge graph.To incorporate this factual knowledge into the summarization model, TransE was applied for learning knowledge embeddings from this multi-relational knowledge data.Zhu et al.[27]proposed the Fact-Aware abstractive Summarizer (FASUM) for the single-document summarization task, utilizing a Sequence-to-Sequence (Seq2Seq) architecture built upon a standard Transformer.To improve a fact-aware summarization model, Zhu et al.[27]embedded semantic and factual knowledge into each decoder part of the model to assist the summary production.In detail, the knowledge embeddings are acquired from a constructed knowledge graph consisting of multi-relational triplets extracted by the information extraction tool OpenIE from the source article.

Table 1 :
Results of the current work using our dataset.