Exploring Pair-Aware Triangular Attention for Biomedical Relation Extraction

Biomedical relation extraction (BioRE) has become a research hotspot recently due to its crucial role in facilitating clinical diagnosis, treatment, and medical discovery. The advent of domain-specific language models, such as BioBERT and PubMedBERT customized for the biomedical domain, has revolutionized this task by fully learning contextualized entity representations and achieving remarkable performance. However, we argue that relying solely on entity-level modeling while neglecting pair-aware representations can lead to sub-optimal results, particularly in the complicated context of the biomedical literature. To address this issue, in this paper, we propose a novel Triangular Attention framework for Biomedical Relation Extraction (called TriA-BioRE) to comprehensively capture pair-aware representations in the biomedical domain. Specifically, we present a triangular attention module, including two triangular multiplications utilizing outgoing and incoming edges, and two triangular self-attention operations centered on the starting and ending nodes, respectively, together to enhance the pair-level modeling omnidirectionally for better BioRE performance. Extensive experiments on three biomedical datasets demonstrate that TriA-BioRE achieves substantially better results than its strong competitors in BioRE task. For reproducibility, our code and data are available at https://github.com/JasonCLEI/TriA-BioRE.


INTRODUCTION
Biomedical relational facts are explicitly or implicitly hidden in a vast amount of biomedical literature, which have great significance in assisting clinical diagnosis, treatment, medical discovery, etc.However, extracting these valuable knowledge manually through the efforts of experts or researchers is becoming increasingly impractical, especially given the exponential growth of biomedical literature.To tackle this challenge, biomedical relation extraction (BioRE) has received growing attention in recent years from both academia and industry as a means of automatically extracting these relational facts from the unstructured biomedical literature.BioRE aims to identify the true relations between different biomedical entities [13], and many representative benchmark datasets have been built to facilitate the task, such as CDR [7] and GDA [12], which are annotated to predict the binary associations between Chemical and Disease concepts, Gene and Disease concepts, respectively, and BioRED [8], acting as a much more challenging dataset for predicting the multiple associations between Gene, Chemical, Disease and Variant concepts.
With the rapid advancement of deep learning techniques and the development of large-scale pretrained language models like BERT [1], the research of various natural language processing (NLP) tasks in general domain has achieved remarkable success and moved up to a new level.Simultaneously, the advent of domain-specific pretrained language models like BioBERT [6] and PubMedBERT [2] specifically customized for the biomedical domain has revolutionized BioRE task and achieved impressive performance gains by fully learning contextualized entity representations and effectively mining relational knowledge.Furthermore, other efforts like redesigning model structures or loss functions [10,14] have further improved performance in BioRE.For example, ATLOP [14]

PubMedBERT Encoder
(PMID: 11009181) Apomorphine was the first dopaminergic drug ever used to treat symptoms of Parkinson's disease.……A number of small scale clinical trials have unequivocally shown that intermittent subcutaneous apomorphine injections produce antiparkinsonian benefit close if not identical to that seen with levodopa and that apomorphine rescue injections can reliably revert off-periods even in patients with complex on-off motor swings.……In addition, there is convincing clinical evidence that monotherapy with continuous subcutaneous apomorphine infusions is associated with marked reductions of preexisting levodopa-induced dyskinesias.…...Given the marked degree of efficacy of subcutaneous apomorphine treatment in fluctuating Parkinson's disease, this approach seems to deserve more widespread clinical use.thresholding loss.In [10], an enhanced adaptive focal loss was proposed as a replacement for the adaptive thresholding loss to address the issue of class imbalance between positive and negative samples.Despite the remarkable progress achieved by previous studies in BioRE, there remains a critical challenge in learning high-quality entity pair representations for this task.Existing methods like BioBERT and PubMedBERT focusing on entity-level modeling have achieved improved contextualized entity representations.However, we posit that neglecting the high-level pair-aware representations learning is insufficient and can only achieve sub-optimal performance, particularly in the complicated context of the biomedical literature.Additionally, the importance of in-depth interaction of pairlevel representations has been verified in many other tasks, such as protein representation learning and structure prediction [5,9].
To tackle the aforementioned issue, we propose a novel Triangular Attention framework for Biomedical Relation Extraction (called TriA-BioRE) to comprehensively exploring pair-aware representations in the biomedical domain.Specifically, we present a triangular attention module to enhance the pair-level modeling omnidirectionally for better BioRE performance.Concretely, after obtaining the contextualized entity representations through a powerful encoder like PubMedBERT, the initial pair representations are then further modeled by the triangular attention module, which encompasses two triangular multiplications utilizing outgoing and incoming edges, and two triangular self-attention operations centered on the starting and ending nodes, respectively.These triangular operations work in tandem to effectively capture the interdependency between different pairs and update the pair representations for accomplishing the BioRE task.

OUR METHODOLOGY
Problem Definition.The goal of BioRE is to identify the correct relations between different biomedical entities.Formally, given a biomedical document  which consists of a set of biomedical entities {  }  =1 , BioRE aims to predict the true relations from R ∪ { } between head and tail entity pairs ( ℎ ,   ) ℎ, ∈ {1...},ℎ≠ , where R is a pre-defined set of relation types and   represents for No Relation,  ℎ and   refer to head and tail entity respectively, and  is the number of entities.Note that an entity   may occur multiple times in  by entity mentions {   }    =1 , where    represents the number of entity mentions, and a relation exists between a head and tail entity pair ( ℎ ,   ) if it is expressed by any entity pair of

TriA-BioRE
PubMedBERT Encoder.We adopt one of the most powerful domainspecific pretrained language models customized for the biomedical domain, i.e., PubMedBERT, as our encoder given its superior performance in recent studies [2,8].Given a biomedical document  of length , we have  = [  ]   =1 , and we add a special token "*" at the start and end position of each entity mention to mark the entities, following previous studies [10,14].Then, we utilize the PubMedBERT encoder to obtain the contextualized representations  of document : We then take the representations of the special token "*" at the start position of the entity mentions as its embeddings, denoted as     .For each entity   with entity mentions {   }    =1 , its entity representation    is calculated by a smoother logsumexp pooling [4] compared to max pooling operation: Additionally, it is noteworthy that we also adopt the contextual pooling method from ATLOP [14] to obtain the context-enhanced entity representations   , which is verified to be useful in BioRE.Then, the final entity pair representation  ℎ for each head and tail entities (i.e.,  ℎ and   ) is obtained by a feature combination through group bilinear pooling, following [14], which splits the entity representations into  equal-sized groups (e.g.,  ℎ =  1 ℎ ; . . .;   ℎ ) and applies bilinear pooling function within the groups: where    for  = 1 . . . are learnable parameters,   is a bias term.In existing methods, the final entity pair representation is directly used for relation classification, ignoring the high-level pair-aware representations learning, which is insufficient and can only achieve sub-optimal performance, particularly in the complicated context of the biomedical literature.To address this issue, we propose a novel triangular attention module to enhance the pair-level modeling omnidirectionally for better logical reasoning for BioRE.
Triangular Attention Module.To explore the high-level pairaware representations effectively, we propose a novel triangular attention module to enhance the pair-level modeling omnidirectionally for better BioRE performance.As illustrates in Figure 2, triangular attention module consists of two triangular multiplications utilizing outgoing and incoming edges, and two triangular self-attention operations centered on the starting and ending nodes, respectively, to capture the interdependency between different pairs and update the pair representations.
First, we use an R ×× matrix  to represent all head and tail entity pairs, and the diagonal of the  ×  index is neglected, where  represents the number of entities,  is the embedding dimension.Then, the pair representation matrix  is regarded as a directed graph, with each entity   as a node and each pair representation    as a directed edge.Particularly, we construct a triangle with edges    ,   , and   , involving three different nodes   ,   and   , to update the pair representations.The first part in updating is two triangular multiplications utilizing outgoing and incoming edges, which are two symmetric operations.Specifically, a gating mechanism is firstly designed to dynamically choose information Table 1: Dataset statistics (after preprocessing).Note that "# ET" and "# RT" are short for number of entity types and relation types respectively.And "# D", "Avg.#E" and "Avg.#R' are short for total number of documents, average number of entities and relations per document respectively.
where  is the  function,   is a learnable weight matrix.Then, the triangular multiplications utilizing outgoing edges (i.e.,   •   ) and incoming edges (i.e.,   •    ) are sequentially performed along with the gating filtering to accomplish the updates: where   and   are learnable parameters.
After the two triangular multiplications in updating, two another triangular self-attention operations centered on the starting and ending nodes are adopted to further capture the interdependency between the pair representations.Specifically, queries (   ), keys (   ) and values (   ) are all derived by a linear projection from the corresponding pair representation    .And the self-attention weight    is calculated by all edges (i.e.,   ) sharing the same starting node   , as well as modulated by the third edge information   which is derived by a linear projection from the third edge   : where  is the channel dimension, and √  is the scaling factor to avoid the large values of the inner product [11].
Then the updated pair representation    is obtained by the multiplications with the self-attention weights    and the values   , along with a gating filtering operation as well: Similarly, triangular self-attention operation centered on the ending node is a symmetric operation of the above.And the corresponding self-attention weight    and updated pair representation    are formulated as: Note that we add a residual connection after every update for robust performance [3].Through such omnidirectional updates, we can obtain high-quality entity pair representations for better BioRE performance.
Relation Classifier.Finally, a relation classifier implemented with a feedforward neural network (FFN) is adopted to predict the true relation labels  of all entity pairs ( ℎ ,   ) based on the updated pair representations: where   is a learnable weight matrix,   is a bias term, and  ℎ is the final updated entity pair representation for each head and tail entities (i.e.,  ℎ and   ).Particularly, we adopt the adaptive focal loss [10] to optimize the whole TriA-BioRE framework, given its superior performance in tackling the class imbalance problem.

EXPERIMENTS 3.1 Experimental Datasets
We conduct extensive experiments on three benchmark biomedical datasets: CDR [7], GDA [12] and BioRED [8].Specifically, CDR and GDA are annotated to predict the binary associations between Chemical and Disease concepts, Gene and Disease concepts, respectively, while BioRED is a much more challenging dataset for predicting the multiple associations between Gene, Chemical, Disease and Variant concepts.The detailed dataset statistics are reported in Table 1.

Quantitative Results
Table 2 reports the overall performance of TriA-BioRE and baselines on the three biomedical datasets.We can observe that TriA-BioRE consistently surpasses the compared models on all three datasets in terms of F1 score.For example, on CDR and GDA, TriA-BioRE obtains 0.44 and 0.77 improvements of F1 score respectively over the best baseline ATLOP (AFL).Particularly, on the much more challenging BioRED dataset, the improvement of F1 score is 1.23, which indicates the superiority of the pair-aware triangular attention learning compared to solely modeling entity-level representations.

Ablation Study
To assess the contribution of the proposed triangular attention module to the superiority of TriA-BioRE, we conduct an ablation study in terms of discarding the triangular multiplications utilizing outgoing and incoming edges (referred to as w/o Outgoing & Incoming), and the triangular self-attention operations centered on the starting and ending nodes (referred to as w/o Starting & Ending), respectively, as shown in the last two rows in Table 2.Note that our proposed TriA-BioRE will degenerate to the similar architecture of ATLOP (AFL) when removing the whole triangular attention module, resulting in a noticeable decrease of overall performance.Furthermore, when removing either the triangular multiplications utilizing outgoing and incoming edges or the triangular self-attention operations centered on the starting and ending nodes, the overall performance on all three datasets decreases significantly, demonstrating the importance and necessity to enhance the pair-level modeling omnidirectionally.

CONCLUSION
In this paper, we propose TriA-BioRE, a novel pair-aware triangular attention framework for biomedical relation extraction.Specifically, we present a triangular attention module with two triangular multiplications utilizing outgoing and incoming edges, and two triangular self-attention operations centered on the starting and ending nodes, respectively, together to enhance the pair-level modeling omnidirectionally.Extensive experiments on three biomedical datasets demonstrate that TriA-BioRE achieves substantially better results than the strong competitors for BioRE.
attained better contextualized entity representations and training objectives by proposing a localized context pooling strategy and an adaptive This work is licensed under a Creative Commons Attribution International 4.0 License.

e 1 e 2 e i e j e k e n e 1 Figure 1 :
Figure 1: The architecture of TriA-BioRE framework.TriA-BioRE is mainly composed of three fundamental components: a PubMedBERT encoder, a triangular attention module, and a relation classifier.

Figure 2 :
Figure 2: The workflow of Triangular Attention Module.In this core module, two triangular multiplications using outgoing and incoming edges, and two triangular self-attention operations around starting and ending nodes, are adopted to omnidirectionally capture the interdependency between different entity pairs.