Abstract
Searching, reading, and finding information from the massive medical text collections are challenging. A typical biomedical search engine is not feasible to navigate each article to find critical information or keyphrases. Moreover, few tools provide a visualization of the relevant phrases to the query. However, there is a need to extract the keyphrases from each document for indexing and efficient search. The transformer-based neural networks—BERT has been used for various natural language processing tasks. The built-in self-attention mechanism can capture the associations between words and phrases in a sentence. This research investigates whether the self-attentions can be utilized to extract keyphrases from a document in an unsupervised manner and identify relevancy between phrases to construct a query relevancy phrase graph to visualize the search corpus phrases on their relevancy and importance. The comparison with six baseline methods shows that the self-attention-based unsupervised keyphrase extraction works well on a medical literature dataset. This unsupervised keyphrase extraction model can also be applied to other text data. The query relevancy graph model is applied to the COVID-19 literature dataset and to demonstrate that the attention-based phrase graph can successfully identify the medical phrases relevant to the query terms.
- [1] . 2020. “Cord-19: The covid-19 open research dataset.” ArXiv. 2020 Jul 9.Google Scholar
- [2] 2020. People with Certain Medical Conditions. Retrieved from https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/people-with-medical-conditions.html.Google Scholar
- [3] Centers for Disease Control and Prevention. 2020. Symptoms of COVID-19. Retrieved on 7 September, 2021 from https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html.Google Scholar
- [4] World Health Organization (WHO). 2020. Global Research on Coronavirus Disease (COVID-19). Retrieved 7 September, 2021 from https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov. 2020. World Health Organization (WHO) Global Research on Coronavirus Disease (COVID-19). Retrieved from https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov.Google Scholar
- [5] . 2019. Publicly available clinical BERT embeddings. arXiv:1904.03323 (2019).Google Scholar
- [6] . 2016. Keyphrase extraction methodology from short abstracts of medical documents. In 8th Cairo International Biomedical Engineering Conference (CIBEC). IEEE, 23–26.Google Scholar
- [7] . 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the AMIA Symposium. American Medical Informatics Association.Google Scholar
- [8] . 2007. DBpedia: A nucleus for a web of open data. In The Semantic Web. Springer, 722–735. Google Scholar
Digital Library
- [9] . 2016. Merck Diagnostic and Treatment Manual.Google Scholar
- [10] . 2018. Simple unsupervised keyphrase extraction using sentence embeddings. In Proceedings of CoNLL.Google Scholar
- [11] . 2013. Topicrank: Graph-based topic ranking for keyphrase extraction.Google Scholar
- [12] . 2018. YAKE! Keyword extraction from single documents using multiple local features. Information Science 509 (2020), 257–289. YAKE! collection-independent automatic keyword extractor. In European Conference on Information Retrieval. Springer, 806–810.Google Scholar
- [13] . 2019. What does BERT look at? An analysis of BERT’s attention. arXiv:1906.04341 (2019).Google Scholar
- [14] . 2020. Mental health, substance use, and suicidal ideation during the COVID-19 pandemic–United States, June 24–30, 2020. Morbid. Mortal. Week. Rep. 69, 32 (2020), 1049.Google Scholar
- [15] . 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018).Google Scholar
- [16] . 2019. NamedKeys: Unsupervised keyphrase extraction for biomedical documents. In 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 328–337. Google Scholar
Digital Library
- [17] . 2013. Automatic generation of a qualified medical knowledge graph and its usage for retrieving patient cohorts from electronic medical records. In IEEE 7th International Conference on Semantic Computing. IEEE, 363–370. Google Scholar
Digital Library
- [18] . 1999. Finding information on the World Wide Web: The retrieval effectiveness of search engines. Inf. Process. Manag. 35, 2 (1999), 141–180. Google Scholar
Digital Library
- [19] . 2008. Exploring Network Structure, Dynamics, and Function Using NetworkX.
Technical Report . Los Alamos National Lab.(LANL), Los Alamos, NM.Google Scholar - [20] . 2008. Information Retrieval: A Health and Biomedical Perspective. Springer Science & Business Media. Google Scholar
Digital Library
- [21] . 2017. A survey on medical information retrieval. In International Conference on Information and Communication Technology for Intelligent Systems. Springer, 543–550.Google Scholar
- [22] . 2013. AKMiner: Domain-specific knowledge graph mining from academic literatures. In International Conference on Web Information Systems Engineering. Springer, 241–255.Google Scholar
- [23] . 2020. Covid-19: risk factors for severe disease and death.Google Scholar
- [24] . 2018. Multi-head attention with disagreement regularization. arXiv preprint arXiv: 1810.10183Google Scholar
- [25] . 1993. The unified medical language system. Meth Inf. Med. 32, 4 (1993), 281.Google Scholar
Cross Ref
- [26] . 2018. Key2Vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 634–639.Google Scholar
- [27] . 2013. OWL web ontology language overview. W3C recommendation, W3C, Feb. 2004.Google Scholar
- [28] . 2001. Relationships in medical subject headings (MeSH). In Relationships in the Organization of Knowledge. Springer, 171–184.Google Scholar
- [29] . 2019. ScispaCy: Fast and robust models for biomedical natural language processing. arXiv:1902.07669 (2019).Google Scholar
- [30] . 2020. Role of angiotensin-converting enzyme 2 (ACE2) in COVID-19. Crit. Care 24, 1 (2020), 1–10.Google Scholar
- [31] . 1999. The PageRank Citation Ranking: Bringing Order to the Web.
Technical Report . Stanford InfoLab.Google Scholar - [32] . 2018. Local word vectors guiding keyphrase extraction. Inf. Process. Manag. 54, 6 (2018), 888–902.Google Scholar
Cross Ref
- [33] . 2013. Knowledge graph identification. In International Semantic Web Conference. Springer, 542–557. Google Scholar
Digital Library
- [34] . 2015. A remedy for your health-related questions: Health info in the knowledge graph. Google Official Blog 2018 (2015).Google Scholar
- [35] . 2010. Automatic keyword extraction from individual documents. Text Mining: Applic. Theor. 1 (2010), 1–20.Google Scholar
- [36] . 2017. Learning a health knowledge graph from electronic medical records. Sci. Rep. 7, 1 (2017), 1–11.Google Scholar
- [37] . 2017. Semantic health knowledge graph: Semantic integration of heterogeneous medical knowledge and services. BioMed Res. Int. 2017 (2017).Google Scholar
- [38] . 2020. SIFRank: A new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access 8 (2020), 10896–10906.Google Scholar
Cross Ref
- [39] . 2008. Full Text Query and Search Systems and Method of Use.
US Patent App. 11/740, 247. Google Scholar - [40] . 2005. Developing a robust part-of-speech tagger for biomedical text. In Panhellenic Conference on Informatics. Springer, 382–392. Google Scholar
Digital Library
- [41] . 2020. Natural Language Processing with Python and SpaCy: A Practical Introduction. No Starch Press.Google Scholar
- [42] . 2017. Attention is all you need. In International Conference on Advances in Neural Information Processing Systems. 5998–6008. Google Scholar
Digital Library
- [43] . 2008. Single document keyphrase extraction using neighborhood knowledge. In Association for the Advancement of Artificial Intelligence Conference. 855–860. Google Scholar
Digital Library
- [44] . 2018. Information extraction and knowledge graph construction from geoscience literature. Comput. Geosci. 112 (2018), 112–120.Google Scholar
Cross Ref
- [45] . 2014. Corpus-independent generic keyphrase extraction using word embedding vectors. In Software Engineering Research Conference. 1–8.Google Scholar
- [46] . 2020. Interactive attention networks for semantic text matching. In IEEE International Conference on Data Mining (ICDM). IEEE, 861–870.Google Scholar
- [47] . 2017. HDSKG: Harvesting domain specific knowledge graph from content of webpages. In IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 56–67.Google Scholar
Index Terms
Attention-based Unsupervised Keyphrase Extraction and Phrase Graph for COVID-19 Medical Literature Retrieval
Recommendations
Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementDocument keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Automatic keyphrase extraction for Arabic news documents based on KEA system
A keyphrase is a sequence of words that play an important role in the identification of the topics that are embedded in a given document. Keyphrase extraction is a process which extracts such phrases. This has many important applications such as document ...
Rake-Pmi Automated Keyphrase Extraction: An unsupervised approach for automated extraction of keyphrases
ICIA-16: Proceedings of the International Conference on Informatics and AnalyticsKeyphrase extraction is a major step which is used in various applications such as document clustering, summarization. It can be solved using supervised as well as unsupervised approach. The unsupervised approach is based on the ranking of keyphrases ...






Comments