Survey on Recommender Systems for Biomedical Items in Life and Health Sciences

The generation of biomedical data is of such magnitude that its retrieval and analysis have posed several challenges. A survey of recommender system (RS) approaches in biomedical fields is provided in this analysis, along with a discussion of existing challenges related to large-scale biomedical information retrieval systems. We collect original studies, identify entities and models, and discuss how knowledge graphs (KGs) can improve results. As a result, most of the papers used model-based collaborative filtering algorithms, most of the available datasets did not follow the standard format < user, item, rating >, and regarding qualitative evaluations of RSs use mainly classification metrics. Finally, we have assembled and coded a unique dataset of 60 papers — Sur-RS4BioT, available for download at DOI:10.34740/kaggle/ds/2346894


INTRODUCTION
The growth of the Internet has transformed medicine, enabling healthcare to be delivered more efficiently, improving patient outcomes, and increasing access to medical information and services.As technology advances, we can expect to see even more innovative uses of the Internet in medicine, with significant implications for healthcare delivery and patient outcomes.The data collected by smart medical devices and clinical databases can be used to improve their recommendations.The large amounts of data generated by these devices can be analyzed using Artificial Intelligence (AI) and Machine Learning (ML) algorithms to identify patterns and make more accurate predictions about a patient's health status and treatment outcomes.Multi-patient data analysis will enable healthcare organizations to identify trends and patterns that might otherwise go undetected.It may help to identify new therapies, improve the precision of diagnoses and optimize individual patient care plans.In addition to improving patient outcomes, data from smart medical devices and clinical databases can improve the design of future medical devices and treatments.By identifying patterns and trends in patient data, manufacturers can design more effective devices and treatments tailored to individual patient needs.
In addition, due to technological advances, patients' data (treatment history and genetic data) and novel drugs, researchers have focused their attention on the application of recommender systems (RSs) in life and health sciences.Similar to other applications, RSs could help doctors and researchers make a better-informed decision from understanding users' tastes and past experiences.
To make it clearer, let us begin by defining biomedical items as all entities that belong to the biomedical fields and are characterized by attributes that can be modeled, and an entity of research, in which a biomedical field is defined as an area that explores the effects of drugs and medical techniques on biological systems.The purpose of this survey is to collect information on the state of RSs for biomedical items over the last decade in order to answer the following research questions (RQs): RQ1: What are the real experiences with RSs for biomedical items?Which kind of users or items are used?RQ2: Which recommender techniques are being used across different biomedical items?RQ3: What role does the knowledge-graph play as side information in RSs? RQ4: What is the best way to evaluate an RS?
In order to address these questions comprehensively, we take a multidisciplinary approach to the existing RS solutions for biomedical items and compare them according to select criteria.While other survey papers focus on the existing solutions for healthcare providers, our survey tries to understand which RS approach is most frequent and, if a knowledge graph (KG) is explored, the type of evaluation, the source of datasets, the availability of the datasets, and whether they are public or not.Faced with the specificity of these items (biomedical), researchers do not have available datasets to assess RSs because there are no datasets to guide their choices [6].We hope to highlight the importance of RSs in this topic with this survey.Together, we will demonstrate existing solutions and give researchers a glimpse into the research of RS.
The remainder of this article is structured as follows.In Section 2, we give an overview of RSs in general: methods, algorithms, KG-based RSs, evaluation methods, and data.Section 3 describes the methodology used to select the most relevant papers on the RS field in biomedical items.In Section 4, we present the results and discuss some of the limitations associated with them.We draw conclusions in Section 5.The conclusions section also presents our perspective about the near future and challenges are addressed regarding KG-based recommendation.

BACKGROUND 2.1 Recommender Systems: Concepts
RSs are information filtering systems that suggest items to users based on their prior knowledge and mathematical-statistical methods.Based on some information about each user's preferences, the system lists recommendation rankings and proposes items related to each user.Items could 149:3 CF suffers from "cold start" problem (new user/item).Data sparsity can also limit CF.In addition, CF may suffer from "popularity bias, " whereby popular items are recommended more often and others are ignored.

Content-based filtering (CB)
Recommendations are based on the characteristics of the items and the user's preferences.The algorithm determines the items' major characteristics and suggests comparable items to users based on their prior choices.
Ability to make personalized recommendations based on users' explicit preferences and interests.CB can be effective when there are many items and clear patterns in their characteristics.
Overspecialization, limited content analysis, lack of serendipity, and lack of diversity are some of the limitations of CB.

Hybrid approaches
Combine CF and CB methods, in an attempt to minimize their challenges and improve the recommendations.Implementing these models can be done in various ways, including merging the results of two different models or adding characteristics to another model.Seven types of hybrid recommendation were introduced in Burke study [15].
Hybrid models can be effective when both user and item are available and when complex patterns of user behavior and preferences cannot be easily identified.
Hybrid models can be complex to implement and require significant data to train.In addition, hybrid models can suffer from CF and CB limitations, such as data sparsity, the "cold start" problem, and the overspecialization problem.
In short, CB recommendation is primarily based on utilizing the side/content information of users and items to predict ratings and make recommendations.In contrast, CF recommendation does not use the content information about users and items; it considers only ratings/preferences information across users and items.Accordingly, with a commonly accepted taxonomy, both CF and CB recommendations can be grouped into two classes: (a) memory-based and (b) model-based [1,11].
A memory-based method can make recommendations over an entire rating matrix (R) or content matrix (C), if necessary.Using these matrices, on the other hand, model-based approaches estimate user preferences and then make recommendations accordingly on them.Memory-based CF methods use historical ratings to compute the similarity between users or items.Methods can be classified as (i) user-based -predicts items a user might like by looking at ratings given to that item by users with similar tastes to the target user (e.g., "users who are similar to you also liked . . ."); and (ii) item-based -looks for similar items depending on the items users have already liked or positively interacted with (e.g., "users who liked this item also liked . . ."). Memory-based typically uses similarity metrics to calculate the distance between two users or two items based on their ratios.Using ML algorithms, a model generates predictions about how users might rate items that have not been rated.Finally, hybrid filtering uses a combination of CF recommendation with CB recommendation to make use of the benefits of both techniques [3,78].When creating the hybrid RS, we may use (a) monolithic, (b) ensemble, or (c) mixed designs.The monolithic design does not clearly distinguish between CB and CF modules.For example, monolithic can use feature augmentation, in which the features from various sources are aggregated, and meta-level, where one RS uses as input the model created by another RS.The ensemble design consists of combining the results of two different recommendation algorithms.Weighted methods combine the scores of different recommender algorithms into a final score by weighing the scores.
Several survey papers on RSs have been published in the last few years.For example, some medical recommendation engines are described by Stark et al. [86] in their survey, and future research directions are presented.Pincay et al. [75] examined 249 papers published between 2006 and 2018, which provides insights about trends and methods regarding the design and development of a health recommender system (HRS).Recently, De Croon et al. [23] discussed the various subdomains of HRSs that are used as well as the different RS algorithms, the different ways in which they are evaluated, and how they present recommendations to the user.As far as we know, our survey is the first to specifically address issues related to HRS, including the recommendation of biomedical items in Life and Health Sciences.

Knowledge Graphs-Based Recommender Systems
Over the past few years, there has been considerable research conducted on KGs, especially in the Semantic Web community, as can be read in the preface of the 13th International Semantic Web Conference Proceedings (2014): "Linked Data is pervasive: from enabling government transparency to helping integrate data in life sciences and enterprises, to publishing data about museums and integrating bibliographic data.Significantly, major companies, such as Google, Yahoo, Microsoft, and Facebook, have created their own 'knowledge graphs' that power semantic searches and enable smarter processing and delivery of data: The use of these knowledge graphs is now the norm rather than the exception." [63].
A KG is a structure that identifies and disambiguates entities in text, enriches search results with semantic summaries, and provides links to related entities in exploratory search, all to improve the search engine's functionality and improve user experience [34].The information is gathered and displayed from multiple sources.Using KGs in various areas has led researchers to develop KG-based recommendation methods.
The KG describes the objective world's concepts, entities, and their relationships in the form of graphs.Item attributes can be mapped into the KG to determine the relationships between them [4].Further, the KG can be used to store user information, including information about users and items, and even user preferences, which makes relations between members of the KG possible.Recently, KGs have been proposed for recommendation in addressing two of the classic problems of RSs: (a) the limited content analysis problem, which is caused by the lack of content-based features that describe the items; and (b) the overspecialization problem, which is caused by the triviality of the recommendations, which are frequently too similar to the items the user already likes [34].In addition, KGs can also help address some limitations of traditional recommender approaches, such as the "cold start" problem, with insufficient data to recommend new users or items accurately.KGs help recommender engines leverage knowledge about items and users and make more informed recommendations without historical data.
In the life and health sciences, the biomedical knowledge graph (BMKG) connects biomedical entities (e.g., genes, proteins, drugs, diseases, and biological pathways) through defined relationships.BMKGs are important tools to solve computational problems associated with biomedical knowledge.There have been numerous applications of BMKGs in multiple tasks, including identifying disease mechanisms [43], extracting disease biomarkers [89], and predicting the efficacy of a drug over a placebo [42] or a drug discovery [81], all of which could lead to further refinement in precision medicine and clinical decision support.For instance, Cong et al. [19] propose a method for generating a BMKG based on the Semantic MEDLINE Database and Linked Open Data.In addition, Gong et al. [31] propose a novel framework, called safe medicine recommendation (SMR), that aims to provide safe medicines for patients with multiple diseases.It combines the capabilities of electronic medical records and medical KGs to build a high-quality graph and then embed the related relationships between patients and medicines.
There are several publicly available BMKGs, such as the Unified Medical Language System (UMLS), the Medical Subject Headings (MeSH) ontology, the Human Phenotype Ontology (HPO), and the DrugBank.The Kyoto Encyclopedia of Genes and Genomes (KEGG) [48] can also be considered a BMKG database since it represents the relationships between these entities as nodes and edges in a graph.These BMKGs typically combine manual curation with automated techniques such as natural language processing (NLP) and ML to extract information from biomedical literature and databases.
The relationships between different types in a KG can be used to enhance recommender accuracy and diversify recommended things.Using KGs improves the accountability of RSs.The current methods for developing KG-based RS may be divided into three categories: (1) embedding-based, (2) connection-based techniques, and (3) unified methods.
Embedding-based approaches employ KG methods to pre-process the KG embedding, which may be either an item graph or a user-item graph, to produce the embedding of entities and relations, which is then used in the RS.However, this technique ignores the graph's informative connection patterns, and only a few studies can give reasons for the recommended outcomes.Accordingly to Wang et al. [93], the KG embedding algorithms can be divided into two classes: (a) the translation distance models such as TransE [13], TransH [95], TransR [57], and TransD [46]; and (b) the semantic matching models such as RESCAL [68], DistMult [98], and HolE [67].On the one hand, the translation distance models are used to calculate the probability of a fact as the distance between the two entities from distance-based scoring functions.On the other hand, the semantic matching models measure the likelihood of a point by matching the latent semantics of entities and relations in their vector space representations with the similarity-based scoring functions.
Connection-based approaches use the graph's connection patterns to guide the suggestion.The user-item KG is used in most studies to explore the relationships between entities in the graph.KG connection-based can be approached in two ways [34]: (a) the meta-structure-based method, such as user-user, item-item or user-item similarities; and (b) the path-embedding-based method.The meta-structure-based method can restrict user and item representations or forecast user preferences based on similar users or items in the interaction history.In the path-embedding approach, the connection pattern between a user and an item is combined into latent vectors, allowing the mutual influence of the target user and the candidate item to be considered.Most models can also identify and mine connection patterns without specifying meta-structures because they count and select the most meaningful pathways.Therefore, expressive link patterns are likely to be captured.
Unified techniques combine the semantic representation of entities and relations and connectivity information to fully use the KG information for improved recommendations.The embedding propagation concept underpins the suitable technique.With the help of the KG connective structure, these methods enhance the entity representation.To fully leverage information from both sides, a new research trend is to combine the embedding-based technique with the path-based method [34].

Qualitative Evaluations Metrics of Recommendation Systems
For an RS to be effective, it must be evaluated according to certain criteria.The evaluation of RS algorithms has been based on information retrieval [84].Depending on the available resources and the goal of the RS, there are various ways of evaluating its performance.Two methods co-exist: (1) offline and (2) online evaluation.Offline systems are evaluated using a pre-collected dataset and are used to measure the accuracy of RSs [33].The datasets allow us to simulate users' behavior, predicting preferences based on historical data, either implicitly or explicitly, and evaluating RS algorithm performance.During the offline evaluation, the dataset is divided into training and test sets.There are several advantages to modeling and testing algorithms, including their speed and simplicity.However, some bias is inevitable since the results are not directly correlated to newer users.
Offline evaluation is divided into three groups: (a) the accuracy of predicted ratings, i.e., the difference between the prediction and the real rating, for instance, by measuring the mean absolute error (MAE), mean squared error (MSE) and root mean square error (RMSE); (b) the accuracy of recommended items based on classification metrics are hit-ratio (HR), precision (PPV), recall/sensitivity (summarized as recall in the following) -RE, receiver operating characteristic (ROC), area under the receiver operating characteristic curve (AUROC), F-measure (F1); and (c) the accuracy based on ranking metrics, i.e., correlation between the prediction and the real classification, such as by looking at mean reciprocal rank (MRR), halflife utility (HLU), Pearson correlation coefficient (PCC), Spearman correlation, Matthews correlation coefficient (MCC), and normalized discounted cumulative gain (nDCG).
An online evaluation, the A/B-testing1 (or multivariate testing), is different from an offline evaluation in that it measures the observed satisfaction of the user [33].The assumptions about what a user will interact with may differ slightly from the actual interactions within a different context (when experimenting with discovering new interests or with a limited number of items).The primary issue is defining the user's satisfaction since the results depend on clicking on an item (the click-through rate [CTR]).Even though offline evaluation is easy to conduct, repeatable, fast, and allows for arbitrary models to be incorporated, it has been suggested that it is impossible to mirror well the true utility of RSs as seen in online experiments.Alternatively, A/B testing on live systems is quite time-consuming since the time required scales linearly with the number of approaches evaluated since users see harmful recommendations.
Another way to include the method evaluation is to have the user's feedback.Feedback is information that a recommender can collect from its users.One of the most common ways a recommender can collect this type of feedback is through explicit feedback, users' input regarding their interest in an item.An example would be for users to enter their ratings on a numeric scale based on how much they liked or disliked the content.A feature like this can be challenging to implement due to the cognitive load in generating accurate ratings.Implicit feedback, on the other hand, can avoid the restrictions associated with rating systems since the information can be gathered after observing the users' behavior.Hybrid feedback combines both types of feedback to generate a recommendation.The ratings could allow RSs to provide better and more accurate recommendations.
Survey on Recommender Systems for Biomedical Items in Life and Health Sciences 149:7

Biomedical Database
One of the main advantages of using an RS is its ability to provide accurate and interpretable recommendations.This feature can be easily incorporated into various applications.Next, we provide an overview of the most common datasets used in life and biomedical sciences.The DrugBank [96] database consists of information about drugs, their molecular target, and their pharmacological effects.The database contains information on thousands of drugs, including prescription, over-thecounter, and experimental drugs.The KEGG [48] is a large database that provides a wide variety of biological data, including genes, proteins, biological processes, and human diseases.The KEGG is widely used in bioinformatics, drug discovery [64,91], and systems biology [47,49].ChEMBL [30] was initially not created as a "drug-target" database, but instead as a collection of bioactive chemicals.PubChem [52] is a public database of chemical compounds and their biological activities.The database contains over 100 million chemical compounds, including structures, properties, and biological activities.The Genomics of Drug Sensitivity in Cancer (GDSC) database is the largest public repository for information on molecular markers of drug response and drug sensitivity in cancer cells [101].There are more than 75,000 experiments on drug sensitivity in the GDSC database, which describe the response to 518 anticancer drugs across almost 1,000 cancer cell lines.In addition to identifying new marker-driven cancer dependencies, the Cancer Cell Line Encyclopedia (CCLE) provides an unbiased framework for studying genetic variants, target candidates, small molecules, and biologic therapeutics [5].
In the context of an RS, for instance, PubChem can be a valuable resource for predicting chemical compounds' biological activity and identifying potential drug candidates.By using information from PubChem in conjunction with other data sources, such as clinical databases or electronic health records, RSs can make more accurate predictions about drug effectiveness for specific diseases or conditions.They can suggest personalized treatment plans for individual patients.Additionally, PubChem can identify novel chemical entities not yet tested in clinical trials, leading to new drugs and treatments for various diseases.DrugBank is useful for RSs because it provides detailed information on molecular drug targets and pharmacological properties.This information can be used to develop more accurate drug recommendations based on a user's medical history, health status, and other factors.Based on a drug's molecular targets and known pharmacology, a prescription drug recommendation system can identify drugs with a high likelihood.The GDSC database has been used in some RSs to predict drug sensitivity for new cell lines based on genomic features.These models can predict how sensitive cell lines will be to different drugs.The GDSC database can be handy in RSs because it contains a significant amount of drug sensitivity data for various cancer cell lines and accompanying genomic data.This allows the development of more accurate models for predicting drug sensitivity.These are brief examples of applications in RSs that we will develop later.

METHODOLOGY
This systematic review was built upon the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [73] and is limited to the papers identified using the RS in biomedical items.First, we define a set of search terms (ST) that we consider relevant and generic in these studies.Once the ST has been described, the search algorithm is constructed using the logical operators AND and OR to combine them: -The [title/abstract] field MUST contains RS approach -(recommender OR recommendation) AND (system OR engine) -collaborative AND (filtering OR approach) -content-based AND (filtering OR approach) -The [title/abstract] fields MUST contains biomedical items, for instance, -drugs OR medication chemical compounds disease genes/proteins (summarized as gene in the following) -health OR patients health information Although we filter by "Title" and "Abstract, " some systems (i.e., journal databases) also include the "Keywords" of the manuscripts.Likewise, with some other recommendation domains, we should initially get the recommended item.By the information from users and items, the division is made into fields: (1) Biology, (2) Chemistry (3) Genetic, and (4) Health, as shown in Figure 1.Later, we split each domain by the pair < item, user > type.For instance, biology includes cell lines and drugs.Otherwise, the items or users for chemistry should be proteins, chemical compounds, reagents, drug response, and drug targets.Genetic includes diseases, genes, and microRNA.Finally, health is related to health information, professionals, patients, treatments, or advice activities.
Afterward, an electronic journal database search is conducted to provide a comprehensive list of scientific papers on RS: -ACM Digital Library, -IEEEXplore, -PubMed, -ScienceDirect, and -Springer Link.
The search was conducted initially in June 2021 and updated in November 2021.Only peerreviewed journal papers and conference proceedings with complete text are included in the search from early 2015 to 2021.We have also included well-documented and well-written English studies that clearly stated their findings and arguments with a minimum of 10 sources.Technical reports, surveys, and master's and Ph.D. dissertations are excluded.Upon completing the database search, duplicate papers are removed.During the analysis, we focus on obtaining information about the purpose and methodology of each study by noting the most critical aspects in the research papers' "Method" and "Discussion" sections.Legend: ST means search term.
The search structure in the above databases is listed in Table 2 and presents the numbers of papers from the initial phase (n = 1, 883 in total).The ST "drugs OR medication, " "disease, " and "health information" yielded an average of 27, 34, and 34 papers, respectively, out of the 4 databases for potential review when combined with RS, following the CF.The ST "genes/proteins" (summarized as gene in the following) yielded an average of 8 papers out of the 5 databases for potential review when combined with CF following the RS.The highest value is found with the ST of "health OR patients, " which was 85 papers combined with RS following the CF.According to our analysis of algorithmic approaches, and as shown in Table 2, CB approaches are used in a small number of reviewed papers.At this stage, there was no concern for the search for a hybrid approach, leaving this analysis for later.

RESULTS AND DISCUSSION
A total of 1, 878 papers are initially included as shown in Table 2, leaving only 60 research papers remaining from 58 journals after the PRISMA guidelines.These guidelines include (i) identification of records, with removal of duplicates; (ii) screening by title and abstract, not complete proceedings; and (iii) continuing with the eligibility, for instance, reports with no full text available, technical reports, surveys that are well documented, and many others.A summary of the studies included in this review is provided in the appendix, Table 6, as a ready reference summary of the existing research.The first column of Table 6 shows the paper's author(s) and publication year.The second column lists the field/area we defined above, and the following two list each paper's users and items.Column five shows the data sources in case they have been disclosed.The sixth column lists the paradigm(s) (i.e., recommendation system strategy) employed by the algorithm(s) in the paper.Column seven lists the presence/use of knowledge graphs for each paper.The eighth column shows the evaluation metrics used in the paper.The last two columns show whether the datasets are available for replication and are public.
First, we identify a broader range of research studies that provide insights into the current stateof-the-art.We also aim to discover recent experiences with RSs for biomedical items (RQ1) through a comprehensive analysis, identify which RSs can be used successfully in the domain (RQ2), how a KG-based RS is an efficient way to leverage and connect a user's and an item's knowledge (RQ3) and assess how RSs are being evaluated (RQ4).Finally, we have assembled and coded a unique dataset of 60 papers -Sur-RS4BioT, available for download at DOI:10.34740/kaggle/ds/2346894.Details are described below.

(RQ1) Real Experiences with RSs
After collecting all the research papers, we split them by fields as shown in Figure 1 based on users and items.Figure 2 shows the annual distribution of such papers between 2015 to 2021 and by each field.The data shows two peaks, the first in 2018 ( 16) and the second in 2021 ( 13) with similar curves across all areas.Health is the one that collects the most data, followed by chemistry.
Biology.Personalized therapies, or "precision medicine, " as described by Suphavilai et al. [87] provide the most appropriate regimen for each patient, as their responses may differ.For example, several authors [14,24,54,59,87,92,105] use a drug response prediction algorithm for the anti-cancer effects of different drugs based on the cell line similarity and the drug's chemical structure.Zhang et al. [105] estimate the baseline similarity score for the various cell lines and drug pairs based on the available responses.They use the known correlation coefficient to find the most similar neighbors.Suphavilai et al. [87] have proposed a matrix factorization-based RS (CaDRReS), which considers essential genes for drug-response prediction.Liu et al. [59] adopt a neighbor-based CF with global effect removal (NCFGER), removing the global effect and shrinking the similarity score for each cell line and each drug pair.They use the K-similarity score to predict the unknown ones after removing the global impact.In contrast, Wang et al. [92] introduce a new model using dual-layer strengthened collaborative topic regression (DS-CTR) that combines the knowledge of pharmacogenomics data and cell line similarity networks.Emdadi and Eslahchi [24] present a novel method for cancer drug sensitivity named drug sensitivity prediction using logistic matrix factorization (DSPLMF)-based RS.The motivation of DSPLMF is to find the features of cell lines that are sensitive to drugs since similar cell lines can also improve the prediction of drug response and gene expression profiles.Like Suphavilai et al. [87], the PCC of predicted drug responses and pathway activity scores infer drug-pathway associations.Koras et al. [54] propose a deep neural network RS-based approach (DEERS) to the problem of kinase inhibitor sensitivity.Using autoencoders and neural network-based prediction, DEERS combines dimensionality reduction and hidden representations of the cell line and drug features.An utterly alternative solution is proposed by Brandão et al. [14], in which experiments show that wavelet-transformed DNA microarray images produce better results, not only in terms of evaluation metrics but also in terms of execution time, by improving the search for cancer-cell lines with similar profiles to the new cell line.Last, to obtain the cancer cell line profiles, a drug-response matrix is acquired from the GDSC database [14,54] and using the drug structure information from PubChem database in the case of Wang et al. [92], or even the CCLE [5] datasets.
Chemistry.Scientists in the pharmaceutical industry have been focusing on developing novel drugs (or therapeutics) by utilizing expertise on existing drugs [39,107].Drug discovery begins with the identification of drug-target interactions (DTIs), e.g., genes, which can be reliably done by in vitro experiments.One of the biggest risks is the possibility of unexpected or unintended interactions between drugs and off-target proteins [25,102,106].In silico techniques are becoming more popular as a means of reducing temporal and monetary costs [74].For in silico prediction of DTIs (also called compoundprotein interactions), ML approaches are a solution.Knowledge about drugs, targets (i.e., protein), and already confirmed DTIs makes up features of ML methods (for instance, feature-based, matrix factorization, deep learning and network-based approaches) [26,55,83,106,107], which then form the basis for training a predictive model, which can determine interactions between new drugs and/or targets.Recent methods that use matrix factorization algorithms outperform other ML methods in terms of efficiency [29,39,74,80].DTI prediction is best done using a combination of chemical and genomics information using RS approaches [1].
To predict DTI, the most popular methods include drugdrug and targettarget similarity measurements through similarity or distance functions.Nearest neighbor algorithms define "nearness" in various ways based on distance functions [25,39,80].There is a wide variety of feature-based methods that perform DTI prediction.These include support vector machines (SVMs) and treebased and other kernel-based methods [83].Also, deep learning methods show good performance [102].Barros et al. [6] designed a method called LIBRETTI to create implicit feedback datasets of scientific entities such as clusters of stars and chemical compounds.Later, the authors propose a framework to recommend chemical compounds based on ontologies, a new method for calculating similarities between large numbers of entities (over 16, 000 chemicals) [8].The items/entities are from 4 distinct ontologies: chemical compounds from Chemical Entities of Biological Interest Ontology (CHEBI) [41]; functions of genes from Gene Ontology (GO) [20]; phenotype abnormalities from Human Phenotype Ontology (HPO) [53]; and, diseases from Disease Ontology (DO) [82].Many drug-related databases have been set up to support the aforementioned methods.These databases provide forms of drug-related data and are important resources for in silico DTI predictions, for example, DrugBank (6) [96], KEGG (1) [48], ChEMBL (2) [30], and PubChem (2) [52] (see Table 3).

Genetic. Protein subcellular localization (SCL)
has a role in identifying potential drug targets (i.e., protein) and genome annotating because proteins have distinct functions in individual cells.In this field, we have identified several novel methods that we will describe in detail below.Mehrabad et al. [62] use a personal RS protein multiple location prediction based on RS (PMLPR) to predict a list of locations for each protein, which successfully solves the significant location prediction issue.To overcome the cold start problem, PMLPR uses protein interaction scores.This approach creates a bipartite network of users and items; in this example, the network is built using data from SWISS-PROT and the cellular component in GO.Kim et al. [51] propose gene selection using the expression heterogeneity (GSEH) method, which combines the concept of gene expression heterogeneity with the analysis of biological processes related to diseases.Gene expression heterogeneity refers to samples from the same class that might have different amounts of gene expression.This concept could be used to identify disease-associated genes.GSEH is divided into two steps: (1) creates a new matrix with a CF pattern, then selects the target genes based on their expected scores; and (2) compares the data obtained in the first step with the original data to compute each gene's prioritization score.It then picks genes based on their scores.Zeng et al. [104] introduce a deep collaborative filtering model that combines Bayesian stacked denoising autoencoders (SDAE) and matrix completion.This technique provides a scalable platform for incorporating numerous gene and disease characteristics.The presented quantitative findings outperform existing state-of-the-art baselines by utilizing deep architectures.Weighted imputed neighborhood-regularized tri-factorization (WINTF) is a tool for predicting transcription factor (TF)-gene associations that apply one-class CF techniques [56].The tool allows users to specify different low ranks for items and users separately.With a collection of known associations, it can also be applied to more tissue-specific tasks to predict new TF-gene associations.
To study protein-domain interaction networks (PDIs), a further collaborative filtering model-based method (CFMM) has been proposed recently by Zhu et al. [109].The authors propose a calculative method for inferring potential essential proteins to achieve this goal.This method is based on an improved PageRank algorithm, which integrated the original PDI network's topological features with the proteins' biological characteristics.RNAcommender tool [21] assists researchers in identifying potential interacting candidates for most RNA-binding proteins (RBPs) with uncharacterized binding preferences.In recent follow-up work of RNAcommender, the ProbeRating method [100] is designed to predict binding profiles for unknown or poorly characterized RBPs based on the binding profiles of their homologous RBPs that are currently known.
Health.Increasingly, health information systems are playing an important role in healthcare services [4,28,44,45,69,71,79].Physical activities are frequently customized based on individual preferences [28].In addition to physical activities, Rohani et al. [79] develop a smartphone-based system for "behavioral activation" (MUBS); a personalized patient model is created by storing activity features along with the patients' ratings after an activity has been completed.In another example, Chen et al. [16] motivate users to stop smoking by providing them with tailored messages called computer-tailored health communication (CTHC), such as "In 5 to 15 years of living smoke-free, your risk of stroke goes down to a nonsmoker's risk.Congratulations on a job well done!".Additionally, other studies focus on personalized trustworthy healthcare information per se [12,71] or personalized access to general health information [18,60].Many others focused on specific health conditions [2,4,44,45,65,66,76,88,94,103].For example, Torrent-Fontbona and López [88] build a knowledge-based (KB) RS to assist diabetes patients in numerous cases, Mustaqeem et al. [66] propose an improvised algorithm for recommending medical advice to cardiac patients, Ormel et al. [71] and Iatraki et al. [44] apply a personal health record system constructed for cancer patients.Personalized cancer care involves relating genomics markers to treatment outcomes based on genomics information; Zhang et al. [105] [107], DSPLMF [24], and DEERS [54].Based on a patient's genomic data, a CB may recommend customized medicine.In contrast, a CF might recommend a therapy based on the treatment history of similar patients.A hybrid RS could use a patient's genomic data and medical history to recommend personalized treatment.In drug discovery, recommendation engines can recommend new molecules or targets for drug development.DrugBank and PubChem are two examples of databases.In particular, a CB may suggest new compounds with similar biological properties to known drugs by using compositional descriptors as a prior knowledge [83].Based on preclinical and clinical trial data, RSs can predict drug efficacy and safety.For instance, a CF could use clinical trial data to predict which patient groups may be more likely to benefit from a new drug.Clinical data can benefit RSs by providing necessary information about a patient's medical history, current health status, and previous treatments (e.g., [28,50]).As illustrated in Table 3, this is the one with the highest number of papers analyzed (11).For example, a recommendation system incorporating clinical data might use this information to identify the most effective treatments for a particular disease or condition based on a patient's medical history and current symptoms.The system could also consider a patient's age, gender, and other relevant demographic information when making treatment recommendations.However, some challenges are associated with using clinical data in RSs, which we describe later.Every available tool referenced in this section is also included in Table 4.The first column of Table 4 shows the paper's author(s) and publication year.The second column lists the field/area we defined above, and the next two list the name of the tool and URL link.Out of the 60 articles analyzed, only 14 provide direct access to the source code on GitHub.The lack of comprehensive and reliable documentation undermines the reproducibility of recommender studies and hinders validation and extension.A vital aspect of this challenge is the inadequate documentation of RS tools, which includes algorithms, frameworks, and software used in system development.Researchers struggle to understand these tools' functionality, parameters, and implementation details, making replication and comparison difficult.The insufficiency of clear documentation also makes it challenging to reproduce experiments and assess the impact of tools on system performance, for example, using them in a shared and fair evaluation using the same objective and dataset.Poor documentation has consequences beyond replication challenges, such as real-world adoption.After defining the biomedical items, another relevant question is who are the users for the domain?.In this scenario, we split into two categories: health and others.Regarding health, an RS should be designed to be used by an end-user who can be either a patient or a healthy person as shown in Figure 3. Aside from doctors, other health professionals, such as nurses and pharmacists, could benefit from the system.For others, there is a greater dissipation between drugs, genes, cell lines, or diseases.
The availability of datasets is another topic of the domain that is usually neglected.Despite the advantages of having public data, such as DrugBank or PubChem, this resource is rare for developing health recommendations in particular (more than 30%) as shown in Figure 4. Some of the issues originate from health data being inherently privacy sensitive.One challenge is the need for patient privacy and data security protection [27].Clinical data contains sensitive information about patients.It must be handled carefully to avoid data breaches or unauthorized access.Another challenge is the quality and completeness of the data.Health data can be complex and challenging to interpret as well as being stored in multiple formats and systems, making integration and analysis difficult.Additionally, this data may be incomplete or inaccurate, leading to incorrect or ineffective treatment recommendations.In the case of chemistry, the most significant data sources are DrugBank, PubChem, and SIDER, as shown in Table 3.All of them are public, but these are real recommendation datasets.In contrast, the dataset proposed by Barros et al. [7][8][9] follows the standard format < user, item, rating >, where the items are scientific entities, the users are authors from research papers in which these items are mentioned, and the ratings are the number of articles an author wrote about an entity.All datasets are available.
Recommender systems help reduce information overload by extracting user preferences or interests from relevant datasets.The most commonly used datasets2 for RS are Netflix [10], Pinterest Survey on Recommender Systems for Biomedical Items in Life and Health Sciences 149:15 Fig. 4. Availability of datasets: a global view above and a distribution by fields below.Available and public data in health are rare (less than 15%); chemistry is the field that provides more available datasets even though they do not follow the standard format.
[58], MovieLens [40], Amazon Product Data, MIND by Microsoft, Yelp Dataset, and many others [11], all available in the Kaggle platform 3 and following the standard format.Scientific databases have emerged as one of the milestones in the modern scientific enterprise.Three databases can be mentioned in the biomedical area, all of them being open source and providing bioinformatics and cheminformatics resources: (1) DrugBank online [97] database containing information on drugs and drug targets (protein); (2) GDSC database [101], with data on the sensitivity of genomically characterized cancer cell lines to selected compounds; and (3) CCLE, with about 1, 000 cancer cell lines.However, much research in the health field is based on proprietary and non-public datasets.

(RQ2) Recommender System Techniques
The recommender techniques are usually classified into three main categories, briefly described in the previous section.As shown in Figure 5, CF is the most popular approach in the studies of this survey, 41 in total, followed by CB with 10, 7 for hybrid filtering, and 2 for others.
Collaborative filtering.CF models produce recommendations through a collaborative process that utilizes multiple users' ratings.A common matrix factorization is found in the vast majority of research findings [14,24,32,51,87,103]. Han et al. [37] and Zhang et al. [105] use the weighted matrix factorization; Ha et al. [35] consider the probabilistic matrix factorization and, later, the same authors add the miRNA functional similarity scores to avoid the cold start problem from MF (IMIPMF) [36].Finally, Embadi and Eslahchi [24] apply the logistic matrix factorization.Yue et al. [103] propose a modified CF based on users and items.On the other hand, Hao et al. [39] and Liu et al. [59] design approaches derived from neighbors to infer potential drug candidates for targets of interest.Ezzat et al. [25] present an ensemble model-based approach with weighted KNN and graph regularized matrix factorization (GRMF).Galeano and Paccanaro [29] suggest that latent factor models can be useful for detecting unknown adverse drug events early and accurately.Based on structured electronic health record data from a tertiary academic hospital, Chen et al. [17] train an order recommendation system (item-based) analogous to Netflix or Amazon's.Barros et al. [9], with LIBRETTI methodology, found the relations between entities and recommended entities of interest for a particular researcher.They selected alternating least squared (ALS) as the RS method.
Deep learning is a growing field, with applications spanning several use cases.This survey identifies 6 papers.Collaborative deep learning uses a combination of CF and probabilistic matrix factorization (PMF) with denoising autoencoder (DAE) [102], stacked denoising autoencoders [104], or the deep feed-forward neural network, DeepSurv [50], and DEERS [54], to replace the dot product for modeling the user-item interactions in the latent space, and capture the complex user-item interactions in the hidden space, word embeddings of NLP.Yang et al. [100] develop a two-stage framework: the first stage involved encoding the protein and nucleic acid sequences into distributed feature vectors and the second stage involved recommending binding preferences for new proteins -a multilayer neural net [70].Oszoy et al. [72] use the Pareto dominance and CF approaches to predict future venue preferences (i.e., check-in locations) of target users.

Content-Based filtering.
CB is most commonly used when a lot of attribute information is available.Like in CF, it works with data that the user provides, either explicitly (rating) or implicitly (clicking on a link).As a result, CB is especially well adapted to making suggestions in text-heavy and unstructured domains.As expected, it is in the health field since (i) information about a patient's health is collected in the form of an electronic medical record (EMR) [60,76,88] or (ii) electronic health record (EHR) [12,28,44,71,79,99] in medical centers, hospitals, and pharmacies authorized to do so.Seko et al. [83] tailor a descriptor-based RS to estimate the relevance of chemical compositions where crystals can be formed based on an existing inorganic crystal structure database.The model-based algorithms were logistic regression, gradient boosting, and random forest.

149:17
Hybrid.Several RSs combine CF and CB methods, which helps to avoid certain limitations among these.Nouh et al. [69] propose a smart RS of hybrid learning (SRHL) for personal well-being services regarding health food service.SRHL includes: resolving the cold start problem for new users by transitioning between CB and CF; detecting user context inside dynamic filtering; and integrating profile learner approaches to reflect user input.Sosnina et al. [85] apply RS approaches in the antiviral drug discovery context with the CF algorithm implemented in the Surprise package and sparse-group inductive matrix completion (SGIMC) implementation of CB.The Surprise Python package operates only the interaction matrix elements: KNN, clustering algorithms, and matrix factorization.The RS proposed by Chen et al. [16] use a hybrid ML algorithm, which combines CF and CB ranking to select messages that are most suitable for individual smokers.Ammar et al. [4] describe a personal health library, the mHealth app, that provides hybrid RSs by incorporating KGs and linked data.For recommending chemical compounds, Barros et al. [8] developed a hybrid semantic recommendation model suitable for implicit feedback datasets and focused on retrieving ranked lists based on the relevance of the items.In this model, the authors incorporate CF for implicit feedback (ALS and Bayesian personalized ranking) and a new CB based on semantic similarities among chemical compounds in ChEBI ontology.
Others.In this survey, we found two more studies that used other RS techniques, such as the KB RS.Wang et al. [94] propose the KB RS for helping people with chronic diseases manage their health by recommending educational materials.Through an ontology, it could link patient characteristics to the content of the materials.Another one is proposed by Agapito et al. [2] for adaptive nutrition content delivery to patients with diet-related chronic diseases and healthy subjects (DIETOS).
Summary.Generally, the RS techniques choice depends on the biomedical application and the available data.Different techniques can be combined to improve the accuracy and effectiveness of recommendations.Among the papers reviewed, recommendation based on CF (more precisely, model-based) prevailed regardless of the field and the data type (health data or data from public databases, e.g., GDSC and CCLE).Because of its specificity, it was expected that the CB model would be used to recommend therapeutics in more significant numbers.For example, a CB system can recommend a treatment based on its mechanism of action or reported side effects.This approach is advantageous when patient data is limited, or there are well-defined criteria for treatment.Another application could be drug discovery to predict drug efficacy and safety based on a candidate's chemical and biological properties.In addition, KB will be increasingly used to model therapy-patient-disease relationships.KB can help identify patterns and relationships that may not be obvious from patient data alone, leading to more accurate and effective recommendations.

(RQ3) Knowledge Graph-Based Recommendation Algorithms
Knowledge-graph recommendation leverages the connections among the entities of the user, the items, and their interaction to determine the best recommendations.The algorithms use explicit or implicit connections to find items that may be interesting or valuable to the users.The relationships give extra essential information to the KG-based recommender, allowing it to use inference between nodes to uncover novel connections.Three approaches were described above: (1) embedding-based, (2) connection-based, and (3) unified methods.As presented in Figure 6, 13 out of 60 studies use KGs to improve the results in the RS.A curiosity is that the field of genetics resorts to using this methodology, a value we do not find in other areas.It is also worth noting that the hybrid technique has a higher propensity to the KG regardless of field.Embedding-based approaches can be divided in three phases: (i) representing entities and relations, (ii) constructing a scoring mechanism, and (iii) learning entity and relation representations.Wang et al. [93] categorize such embedding techniques into two groups: translational distance, and semantic matching [4,108] models.
For instance, Zheng et al. [108] first designed a pre-training method based on neural CF to get the initial embeddings for patients and drugs.The drug interaction graph will be initialized using the medical records and domain knowledge.The proposed drug package recommendation aims to build a personalized scoring function for each patient.Ammar et al. [4] created a Resource Description Framework (RDF) representation of a personal KG that maintains a digital health state of each patient from a historical perspective.
Connection-based approaches use the user-item graph to find path-level similarities between items by pre-defining meta-paths or automatically mining connective patterns.Users can also get an explanation for the outcome using the path-based method.As mentioned above, two methods are described: meta-structured based [25,36,90,104,108], and the path-embedding based [21,35,94].
For instance, Ezzat et al. [25] propose a method for tackling the drug-target interaction with the GRMF to prevent overfitting.The authors derived a p-nearest neighbor graph from each drug and target similarity matrix.Wang et al. [90] integrate heterogeneous datasets from genomics (ZINC, ChEMBL, and DrugBank) into a multi-layered network model.In this model, each node is either a chemical entity (drugs and other chemicals), a biological entity (genes or proteins that it encodes), or a phenotype entity (disease and side effects).Nodes in the same entity class are linked by similarity (e.g., chemical-chemical similarity) or interactions (e.g., protein-protein interactions).Nodes that belong to different entity classes reside in different network layers and are linked by known associations (e.g., drug-target interactions, disease-gene associations).Chemical-chemical, gene-gene, and disease-disease similarity scores are inputs of the proposed tREMAP CF algorithm.Zeng et el.[104] adopt the Katz measure, a graph-based method to measure how similar two nodes are by computing based on how many paths of different lengths exist between them.Ha et al. [36] use the miRNA network as supplementary data to improve prediction accuracy.The miRNA network can be defined as a graph in which a node represents each miRNA, and an edge means each similarity weight.Corrado et al. [21] propose a CF that can also be interpreted as a feed-forward neural network with a Kroneker layer (second-order units).Briefly, the matrix factorization would map users and items to a latent feature space where a significant correlation (dot product) between latent vectors predicts an interacting user-item pair.Wang et al. [94] use the KB RS with a combination of ontology and several NLP techniques to recommend Chinese educational materials to chronic disease patients.
Summary.The papers discuss embedding-based approaches for representing entities and relations, constructing scoring mechanisms, and learning entity/relation representations.The techniques are categorized into translational distance and semantic matching models.On the one hand, based on pre-training methods and a drug interaction graph, these techniques can be used in personalized drug package recommendations for patients.On the other hand, they use an RDF representation to create personal KGs, historically maintaining the digital health status of each patient.Embedding-based approaches have shown promise in personalized drug RSs based on patient-specific information.A connection-based approach explores the user-item graph to find similarities between items by either mining connection patterns or predefining meta-paths.Two methods are used: meta-structured based and path-embedding based.For instance, drug-target interactions are addressed using meta-structured methods based on p-nearest neighbor graphs.Multi-layered network models are used with path-embedding-based methods for integrating heterogeneous genomic data.

(RQ4) Qualitative Evaluation Methods of Recommendation Systems
A recommendation system's purpose is to predict how likely users would appreciate unknown items based on what the system already knows about them.The most common evaluation method is, as illustrated in Figure 7, the offline evaluation using existing datasets to estimate the accuracy measures of an RS.
Predictive accuracy represents the degree of similarity between the recommender's estimated and actual user ratings.This sort of measure is frequently used to evaluate non-binary ratings.For instance, Katzman et al. [50] consider the DeepSurv model with the concordance-index (C-index) and the MSE to quantify the difference between the model's predicted log-risk function and the true log-risk values.Otherwise, the experimental results of the SRHL model [69] are evaluated by using three absolute error measures: MSE, MAPE, and MAE.
Classification metrics aim to determine a recommendation algorithm's decision-making success.The performance of an RS may also be represented graphically using an ROC, and the AUC indicates how well the model can distinguish between classes.As seen in Figure 7, more than 25% use the above metrics.For instance, Sadeghi et al. [80] propose an RS-based method for drug repurposing to predict novel drug indications by integrating drug-and disease-related data sources.The AUC performance is evaluated and compared with other methods using 5-and 10-fold cross-validation.The following performances are added to the previous works such as (i) ROC curves to compare an ensemble extended neighborhood-based recommendation model [26], the CoDe-DTI method [102], the DS-CTR model [92], the IMIPMF method [35], and the efficiency of antiviral activity class prediction with hybrid techniques [85] with other advanced models; and, (ii) ROC curves and AUPRC for highly imbalanced datasets such as for predicting side effects of marketed drugs [29], the improved prediction of miRNA-disease associations (IMDN) framework [36], the CFMM method [109].Ezzat et al. [25] adopt the GRMF model using 10-fold cross-validation in simulated "new drug" and "new target" cases.Hao and Blair [38] study a userbased CF on medical data with a categorical outcome in four publicly available datasets.The same evaluation metrics, recall and specificity, are applied for the DIETOS framework [2], a food RS for healthy people and individuals affected by diet-related chronic diseases.Pustozerov et al. [76]  develop RS infrastructures that incorporate personalized blood glucose prediction algorithms for diabetes patients.The model performance is estimated using standardized metrics (RMSE, MAE, and MAPE).
Recall and precision are the traditional evaluation metrics and the most widely used recommendation quality measures [11], as shown in Figure 7, with approximately 33% for both.Precision is a measure of recommended items that are relevant.Otherwise, recall measures relevant items found in the recommendations items.Both help construct an "unbiased" test dataset and then score the resulting test dataset using a model [22].With the above metrics, Macedo et al. [60] propose a software framework in the biomedical domain and recommend related scientific information to alert health professionals to promote preventive healthcare.However, qualitative analysis is carried out in addition to quantitative indicators.Zeng et al. [104] evaluate the performance of the deep CF model with five real-world datasets (see Appendix Table 6) in biology and compared with the other algorithms, such as the graph-based method, bagging SVM classifier, and many others.The following performances are added to those previously listed, such as (i) F1-measure [55,61,62,107,108]; (ii) AUROC [17,18,21]; and (iii) the aforementioned AUPRC [72,106], used to weight the evaluation recommendation results.For instance, the performance of the RNAcommender [21] is computed in leave-one-protein-out experiments.The model-based CF technique is something all mentioned papers have in common.Regarding the DSPLMF model [24] on two datasets (GDSC and CCLE), the metrics are ACC, RE, PPV, SP, F1, MCC, and AUC.
Ranking accuracy, also known as rank correlation measurement, measures the ability of a recommender to estimate the correct order of items based on the user's preferences.The PCC is one of the most popular means to evaluate how much two users are related in a CF approach.An example 149:21 of this is the HyperRecSysPA model [28].The HCFMH and cpHCFMH models [77] are evaluated with the HR@k to compare other recommendation methods.The authors computed the Spearman correlation to assess the performance and robustness of CaDRReS [87] ranking cell lines for each drug.They reported the average correlation across drugs.Additionally, for model evaluation, the authors employed nDCG across 10 runs of 5-fold cross-validation and HR (the number of sensitive drugs identified).
Besides classification metrics, error metrics were employed to measure the error made by an RS when predicting an item rating [66,70].For instance, Mustaqeem et al. [66] present a hybrid model that gives cardiac patients illness predictions and treatment advice with a clinical dataset collected and labeled in consultation with medical experts.The prediction results are evaluated using three metrics: ACC, Kappa statistics, and RMSE.Ochoa et al. [70] developed a RS that analyzes the frequency of medical events in the EHR.The quality of the model was assessed using metrics such as PPV, RE, ACC, and RMSE.
On the other hand, both classification and ranking metrics are used to evaluate the relevance of the recommended item [7,8,12,37,39,56,74]. Regarding Barros et al. [7,8], the recommendation ranking of chemical compounds is evaluated with six metrics: PPV, RE, F1, MRR, nDCG, and lAUC.Lim and Xie [56] identify target genes of transcription factors and the performances of the two methods, WINTF and REMAP, with four different metrics, for instance, AUROC, MAP, HLU, and MPR.
The system for data-driven therapy decision support developed by Gräßer et al. [32] considers three different evaluation metrics.First, the individual RS produces a prediction of how the patient will respond to specific therapies with RMSE.Second, the top-ranked therapies based on the affinity predictions are usually presented to the user after selection from the consultation with the precision.Third, for the similarity computation, the Gower coefficient, cosine, and Pearson and Spearman correlations are applied.The Gower coefficient has the advantage of allowing for missing values and permits the introduction of a user-defined weighting scheme.To overcome the probability distribution with zero mean and constant variance assumptions, [32] applied the Spearman correlation.The online framework Yum-me [99] is evaluated both offline and online.Regarding offline evaluation, classification and error metrics are applied.
Online.RSs emerged to model user preferences in various online applications to tackle the information overload problem.In general, RSs have been developed to solve the problem of information proliferation and enhance the user experience on various online applications.Examples in health fields are PepperRec [44], PHIR [44], MUBS [79], HERS [71], CTHC [16] and personal health libraryenabled mHealth [4] with personal and clinical data.
Summary.There are several papers discussing attempts to improve the accuracy of RS results, for example, RMSE and MAE.Additionally, it is common to try to improve recommendations with PPV, RE, and AUROC, for instance.Recall and PPV (35%) and AUROC (28%) were the most commonly used offline evaluation metrics, as shown in Figure 7.Other popular offline evaluation metrics are accuracy-related measurements, such as F1 (16%), PCC (16%), RMSE (14%), MAE (12%), and SP (12%).Measurements of the other metrics are inconsistent.Table 5 shows the top@10 common metrics by fields.Classification metrics are predominant in all fields and, as we expected, online feedback is exclusive for Health.

CONCLUSIONS
In this survey paper, we examine RSs for biomedical items and summarize the previous efforts on this topic.Several papers published between 2015 and November 2021 from five scientific databases are retrieved for this purpose.After examining and selecting publications, 60 papers are We hope this survey paper can help readers better understand work in this area.Hopefully, new work can emerge in this area, minimizing the cold-start problem that is currently highly prevalent.

Future Directions and Challenges of Knowledge Graph-Based Recommendation
The future and potential of BMKGs are vast and exciting.KGs provide a powerful way to organize, integrate, and analyze biomedical data meaningfully given the rapid growth of biomedical data and the increasing need for personalized medicine and precision healthcare.They are a powerful tool for understanding complex relationships between entities in biomedicine.Some applications are identifying new drug targets, predicting drug-drug interactions, and developing personalized patient treatment plans by representing these relationships as nodes and edges in a graph.In addition to their research and clinical applications, BMKGs also have the potential to transform healthcare delivery by improving the interoperability of disparate electronic health record systems and enabling more accurate and efficient diagnosis and treatment.One of the challenges is the need for accurate and comprehensive data.While a wealth of biomedical data is available, much of it is still siloed in different databases and formats, and there are significant challenges in integrating and harmonizing this data.However, with the increasing adoption of standard data formats and the development of new data integration and analysis technologies, the potential for BMKGs is enormous.Several challenges are associated with using BMKGs.Some of the major challenges are (1) data integration, (2) KG quality, (3) scalability, (4) domain expertise, (5) interpretability, and ( 6) evaluation.In summary, BMKGs contain heterogeneous data from multiple sources.Integrating this data into a single KG can be challenging due to data formats, quality, and completeness differences.Incomplete or inaccurate information can lead to poor recommendations, and a growing KG can lead to slower recommendation times and reduced usability.Building and maintaining a KG requires significant biomedical and data science expertise, which can be a barrier for organizations with limited resources or expertise.As with traditional RSs, the interpretability of the recommendation results is critical, and healthcare professionals need to understand how the recommendations are generated.Evaluating the performance of KG-based RSs can be challenging.Developing appropriate evaluation metrics and benchmark datasets is critical to ensuring the quality and reliability of recommendations.

AUTHOR CONTRIBUTIONS STATEMENT
M.B. started by creating the dataset from a set of related papers.Then, M.P. added updated documents and categorized them by areas, metrics, and methods.M.B and M.P. designed the survey structure.M.P. wrote the manuscript.All authors participated in the design and validation of the study.

Fig. 1 .
Fig. 1.Biology, Chemistry, Genetic, and Health are the user-defined fields from the items and users found in the surveyed papers.

Fig. 3 .
Fig. 3. Pair < users, items > interactions in the recommender system techniques used in this survey.

Fig. 5 .
Fig. 5. Overview of the different recommender techniques by fields (above) and total below.

Fig. 6 .
Fig. 6.Knowledge graph in numbers: overview of papers distribution by field and recommendation system methods.

Table 1 .
Main Approaches of Recommender System Techniques

Table 2 .
Total Number of Papers Found for Each Search Term in the Five Databases: ACM Digital, IEEEXplore, PubMed, ScienceDirect, and Springer

Table 3 .
Top@10 of the Datasets by Each User-Defined Field build a clinical patient drug RS; they suggest having only one drug model for each training sample instead of having multiple models for different drugs.Personalized therapies have emerged to tailor medical treatments based on a patient's unique characteristics, including genetics, medical history, and lifestyle.Some examples, Survey on Recommender Systems for Biomedical Items in Life and Health Sciences ACM Comput.Surv., Vol.56, No. 6, Article 149.Publication date: February 2024.

Table 4 .
Overview of All Available Software

Table 3 -
Continued from previous page illustrated above, include CaDRReS

Table 5 .
Top@10 of the Evaluation Metrics by Each User-Defined Field using an RS technique.The results show that in the last decade, the digital information (e.g., laboratory results, treatment plans, and medical reports) available for patient-oriented decision-making has increased dramatically.Because this information is scattered everywhere and in text form, one solution was to centralize it into personal health record systems, which can be managed like a classical information retrieval (IR) problem.An RS provides its users with medical information intended to be highly relevant for healthcare development.Most health datasets are not publicly available ( 30%) because they are sensitive and derived from private clinical data.In contrast, 60% of the datasets are available, but most lack key characteristics to enable good reproducibility and extensibility.These values result from some studies in chemistry with open databases such as CCLE and GDSC.The most significant data source is presented, with a prevalence of the GDSC, CCLE, and DrugBank databases.In general, the RS has made remarkable progress in recent years, developing various RS tools and datasets.However, amid this progress, the poor availability and quality of documentation for RS tools and datasets pose a significant challenge to the reproducibility of RS research.Replicability is essential for corroborating and expanding the impact of research findings, but the lack of comprehensive and reliable documentation hinders this process.Poor documentation of RS tools makes it difficult for researchers to understand their functionality and replicate experiments.Similarly, poor documentation of datasets makes it difficult to assess their quality and reproduce experiments using the same data.Enforcing documentation standards, encouraging detailed information in research papers, and collaborating on best practices can improve the reproducibility and impact of RS research.As mentioned above, of the 60 articles analyzed, only 14 provide access to the source code on GitHub.An international shared evaluation would also be a boost to mitigate this problem, such as the Message Understanding Conferences (MUC) and Text REtrieval Conference (TREC) challenges were to Information Retrieval.Following good examples such as BioCreative and BioASQ for biomedical text mining.An outstanding feature in this study is that most didn't follow the standard format <user, item, rat-ing>, commonly used in RSs.Another relevant point that deserves mention is that the model-based CF is the most used regardless of the field.This fact is primarily due to applying the ML algorithms (totaling 19 papers).Regarding the performance measurement of the recommendation techniques, the metrics remain offline, and the accuracy of recommended items is based on classification metrics (precision, recall, and AUROC).The research examines different approaches to utilizing KGs as supplementary data to improve recommendation results and provide interpretable information in the recommendation process based on real-life experiences.New methods are emerging, proving that KG-based recommendation is a viable solution.Despite the numerous studies conducted in recent years, this is still an emerging field of research.More comprehensive studies are needed. categorized

Table 6 .
Overview of All Surveyed Papers that Apply Recommender Systems and Their Approaches for Biomedical Items Survey on Recommender Systems for Biomedical Items in Life and Health Sciences ACM Comput.Surv., Vol.56, No. 6, Article 149.Publication date: February 2024.