The Role of Explainable AI in the Research Field of AI Ethics

Ethics of Artificial Intelligence (AI) is a growing research field that has emerged in response to the challenges related to AI. Transparency poses a key challenge for implementing AI ethics in practice. One solution to transparency issues is AI systems that can explain their decisions. Explainable AI (XAI) refers to AI systems that are interpretable or understandable to humans. The research fields of AI ethics and XAI lack a common framework and conceptualization. There is no clarity of the field’s depth and versatility. A systematic approach to understanding the corpus is needed. A systematic review offers an opportunity to detect research gaps and focus points. This article presents the results of a systematic mapping study (SMS) of the research field of the Ethics of AI. The focus is on understanding the role of XAI and how the topic has been studied empirically. An SMS is a tool for performing a repeatable and continuable literature search. This article contributes to the research field with a Systematic Map that visualizes what, how, when, and why XAI has been studied empirically in the field of AI ethics. The mapping reveals research gaps in the area. Empirical contributions are drawn from the analysis. The contributions are reflected on in regards to theoretical and practical implications. As the scope of the SMS is a broader research area of AI ethics, the collected dataset opens possibilities to continue the mapping process in other directions.


INTRODUCTION
Artiicial Intelligence (AI) is one of the most prominent and inluential technologies of modern times. Its rapid development and increasing human dependency on it has facilitated the adoption of AI in almost all imaginable sectors of life [13]. Furthermore, AI's proliferation in critical areas, its speed of development, and the race between nations and companies to build robust AI tools has increased the need to set ethical guidelines and principles for AI development and deployment.
AI ethics is a burgeoning research ield that has emerged in response to the challenges related to the impact of AI. The challenges posed by AI include data bias, privacy, and fairness issues, in addition to the requirement for AI practitioners to gain better knowledge about the impact of the technology. As such, the subject of AI ethics itself is versatile, ranging from highly technical issues to understanding human behavior in the research, interaction, development, and usage of AI [113]. AI ethics is often broken down into principles, such as transparency, responsibility, trust, privacy, sustainability, autonomy, and dignity. Five of these principles have emerged as dominant, including transparency, justice and fairness, non-maleicence, responsibility, and privacy [89]. Transparency, which is arguably the most prevalent [89], is often viewed as a pro-ethical principle and an enabler for ethical AI [166]. Consequently, transparency plays an important role in AI ethics, where it covers a broad scope that includes XAI [101]. XAI refers to an interpretable system that provides an understandable explanation of the system output [2]. XAI draws attention to the area of AI ethics research focused on how AI systems make decisions, the explanations of the decisions, and how the decisions are communicated to relevant stakeholders [91].
XAI is a growing area of research, especially as AI systems are implemented in critical sectors that warrant transparency for AI actions. One example of this is medical AI, in which the need for an understandable system is tied to the core ethical values of medicine [11]. Here, expectations for explainability are high [83]However, due to its novelty, the ield remains riddled with unclarity and lack of structure. Despite its importance, the role of transparency is not well deined in AI ethics. Moreover, XAI currently sufers from a lack of commonly agreed deinitions of core concepts [54,89]. Most of the research and reviews of XAI in view of AI ethics are tailored toward a particular aspect of explainability, such as algorithm explanations [155,191], black box explanations [71], and methods that aim to describe explainability [179]. A recent systematic review [184] helped to explore current approaches and limitations for XAI. However, the review focuses on the area of reinforcement learning with no recourse to its role in AI ethics. Consequently, there is currently limited research that explore XAI and its speciic part in AI ethics in depth.
Given the gap in previous studies, this paper examines the research ield of XAI and its role in AI ethics scholarship. The paper's research question, łWhat is the role of XAI in the AI ethics research ield?ž requires an overview of the corpus of academic literature on AI ethics. The focus of the paper is on concrete, actionable issues rather than philosophical discussion, with the main emphasis on empirical research studies.
The paper adopts an SMS to map the research literature of AI ethics. SMS is a form of Systematic Literature Review (SLR) [94]. SLR and SMS are secondary studies where the attention is placed on analyzing the evidence of previous research. SLR aims to ind and evaluate the relevant papers, which are called primary studies, on a speciic research area. SMS aims to identify and categorise the existing literature more in general [94]. High-quality SMSs can have a signiicant beneit for the research area in establishing baselines for future research [94].
To understand the role of XAI in the research ield of AI ethics, SMS methodology represents a better approach than SLR. The infancy and lack of coherence of the AI Ethics research area support the use of SMS. The size of the research area is unknown, and the role of XAI is new. The conceptual ambiguity of the research area [89] necessitates SMS usage. Several SMSs are studied, and guidelines are utilized. However, the most inluential papers for this study are the guidelines of [130] and the SMS of [128]. This paper builds on the SMS of Vakkuri and Abrahamsson [168].
The rest of this paper proceeds as follows: Section 2 serves as a background for XAI and related AI topics, machine learning, and the principles for ethical AI. Section 3 reports the literature search process. The section starts with a theoretical framework of SMS and continues with reporting the use of SMS in this paper. The literature search process results in primary studies (n = 142) that form the scope of this study. Section 4 presents the classiication schema and the numeric results of classiication. Section 5 presents the systematic mapping, where the results are analyzed and compared, and the annual trends and the publication venues are investigated. Section 6 proposes theoretical and practical implications of primary empirical contributions. Section 7 proposes some future research topics. Finally, towards the end of the paper, Section 8 draws some inal conclusions.
issues related to AI ethics. Massachusetts Institute of Technology's "Moral Machine" -research [18] collected 40 million answers to their online experiment, which studied decisions in ethical situations related to autonomous driving. In recent years, the discussion around AI ethics has opened to incorporate a broader scope.
Governments and regulators like the European Union (EU) are increasingly becoming interested in the topic of AI ethics. European Commission's AI High-Level Expert Group [5] has identiied "Trustworthy AI" as the EU's foundational ambition for ethical AI. Companies and private organizations are also establishing ethical frameworks and principles. Large practicing organizations, such as Google, Intel, and Microsoft, have also presented their guidelines concerning ethics in AI [169]. In academia, guidelines and principles aim to structure the research ield. One notable example is the IEEE standard for Ethically Aligned Design [120].
Frameworks and guidelines may be a good starting point for the conversation, but they are not suicient to solve the challenge of AI ethics without other measures in place. The challenge of frameworks is that they tend to lack practices and modeled behavior upon which to implement them. Furthermore, they often require more work to be production-ready [118]. Often, the principles and associated frameworks presented in the literature are not actively used in practice [171]. By the end of 2020, there were over 100 sets of principles, many of which were vaguely formulated [42]. Hence, choosing the right framework from all available ones may be a challenging decision because AI ethics lacks the commonly agreed ethical framework [60]. Also, there is a lack of existing methodology in identifying the relevant frameworks for AI development in the context of implementing explainability [177]. The choice of suitable methods to create AI with the desired outcome extends beyond frameworks and must be made in each case individually, considering the needs of the relevant stakeholders and the desired explanation method properties [177].
One notable connection to AI ethics is the concept of Responsible AI, a paradigm to ensure that fairness, model explainability, and accountability are included in the practical implementation of AI methods. Besides AI principles, the Responsible AI practices include technical and non-technical training, guidance and tools to avoid and mitigate issues that may arise, and a governance model to assign responsibilities and accountabilities. Where there are many organizations that are listing their AI principles, there are viewer examples of how to implement the AI principles into practice. For practical implementation list of principles is not solely enough, but Responsible AI practices are required. [14] 2.2 XAI XAI refers to an AI system that can explain its decisions [146]. AI technologies such as machine and deep learning techniques are used for automating and optimizing predictive data patterns to achieve better or faster decision-making. However, the complexity of techniques such as deep learning, makes the resulting decisions hard to understand for humans. Thus, explanations can help communicate the justiication behind a decision or action. This can engender trust in the decision [80]. Such transparency can also ensure that the complexity of the explanation matches the complexity capacity of the consumer [80].
Understanding human decision-making and explanation deinition provides good grounds for XAI that requires multidisciplinary collaboration and the use of existing research from social sciences, such as philosophy, psychology, and cognitive science [114]. Explainability is viewed as important in assigning responsibility in cases of a system failure [141], such as a collision incident of a self-driving car. To ensure the right for explanations, legislation such as the General Data Protection Regulation (GDPR) outlines individuals' right for a meaningful explanation of decisions made by automated systems. However, while calls for XAI have increased, there have also been some arguments against it. Some AI researchers have advocated that since humans are unable to provide exact explanations for their decisions, AI systems should not be expected to do so either [56,80].
Another aspect of XAI is interpretability. AI models are expected to be interpretable, which means that they can explain the decision in understandable terms to a human [82]. Interpretability deals with understanding the algorithm output to be implemented for end users [62]. Sophisticated knowledge extraction and preference elicitation is required to extract a meaningful explanation from the raw data used in the decision-making process [146]. This often means that a trade-of must be made between accuracy, efectiveness, and interpretability [2]. Interpretability is not merely a technical problem; to gain interpretability of machine learning systems, it is necessary to focus on humans rather than technical aspects and provide personalized explanations to individuals [146].
Interpretability may not be expected from AI systems when users trust the system, even if it is known to be imperfect, or when the consequences of a wrong decision are considered insigniicant [82]. Interpretability has divergent requirements depending on the stakeholders involved [82]. Overall, interpretability requires explanations at varying degrees to help illuminate decisions made by AI [141]. Reasons behind the need for XAI vary. Based on Wachter et al. [180], the reasons may be as follows: (1) to inform the subject of the reasoning for a particular decision or explain the reasons for rejection; and (2) to understand how the decision-model needs to be changed to receive the desired decisions in the future. Overall, the application area and purpose may determine the need for interpretability.
Explainable and understandable systems are required for society to trust and accept algorithmic decisionmaking systems [180]. Better explanations can also improve existing models and open new opportunities, such as the use of machines for teaching humans [146]. XAI is also a potential tool to detect laws in the system, decrease biases in the data, and gain new insights into the problem at hand [141], this can help ensure transparency of the system.

2.2.1
Transparency. The meaning of transparency varies depending on the subject. As a result, the concept is vague, making misinterpretations likely. In the discipline of information management, transparency often refers to the form of information visibility, such as access to information [166]. In computer science and IT disciplines, transparency often refers to a condition of information visibility, such as the transparency of a computer application to its users, as well as how much and what information is made accessible to a particular user by the information provider [166]. In this paper the term transparency is used in the sense of the condition of information visibility.
Although transparency is often required, it is not easy to provide. The information provider (e.g., company or public institution) must deine who has the right to access the information and the accessibility conditions for it [166]. Legislation such as GDPR may control the access and sharing of a speciic type of information between users.
As mentioned above, transparency is listed as one of the primary principles of AI ethics [89]. At the same time, transparency can actually be seen as the pro-ethical circumstance that makes the implementation of AI ethics possible in the irst place. Without understanding how the system works, it is impossible to understand why it malfunctioned, and consequently, to establish who is accountable for the malfunction's efects. Instead of seeing transparency as an ethical principle, it would be more accurate to treat it as an ethically enabling or impairing factor, or as described above, a pro-ethical condition. Information transparency enables ethical implementation when the system provides the information necessary for the endorsement of ethical principles or when it provides details on how information is constrained. Transparency can impair ethical principles if it gives misinformation or inadequate information or exposes an excessive amount of information. The impairing of ethical principles could lead to challenges, for example, with discrimination, privacy, and security [166]. Transparency is normally associated with the black box problem in AI ethics.

2.2.2
Black box problem. The term "black box" is used when the AI model is not understandable and cannot provide a suitable explanation for its decisions [2]. A black box refers to a model that is either too complicated for any human to comprehend or proprietary to someone [139]. To understand the black box, the model needs to be built to be interpretable, or a second model must be created that explains the irst black box model [139].
Interpretability in the AI context refers to the capability to understand the overall operational logic in machine learning algorithms, not just the answer [2]. The terms interpretability and explainability are often used as synonyms [2], but this can be challenging because there is a subtle diference between them related to the level of required understandability. In public discussions, the term "Explainable" AI is more often referred to than "Interpretable" AI, whereas, in academic discourse, the situation is contrary [2]. Current AI regulation, such as GDPR, requires the right to explanation, not an interpretable model, which might cause problems as only requiring an explanation does not require the explanation to be accurate and/or complete, and therefore right for explanation is an incomplete requirement [139].
A second post hoc explainable model may provide explanations that do not make sense or that are not detailed enough to understand in terms of what the black box is doing. In order to acquire a full understanding of the model, the information provided by its transparency should also be interpretable. Secondary explanatory models are often incompatible with information outside the black box. The lack of transparency in the whole decision process may prevent interpretation by human decision-makers. Secondary models can also lead to overly complicated decision pathways when transparency is actually required from two models (i.e., the original black box and the explanatory model). [139] Neither interpretable machine learning model is challenge-free. First, this is because it is a computational challenge to build such a model. Second, the AI system's total transparency can jeopardize the system owner's business logic because the system owner must give away intellectual property [45]. In addition, constructing an interpretable model is often expensive because this requires domain-speciic knowledge, and there are no general solutions that would work in diferent use cases. In creating an interpretable model, it is a challenge to ind the balance between interpretability and accuracy because interpretable models tend to reveal hidden patterns in data that are not relevant to the subject [139,140].

Accountability and Algorithmic
Bias. In addition to interpretable machine learning and black box problems, core concepts around XAI include AI's accuracy, a performance metric to compare the number of correct predictions to all predictions, and responsible AI [2]. Accountability refers to an actor who is accountable for the decisions made by AI. To establish accountability, the system must be understandable. A lack of transparency and accountability in predictive models can cause serious problems, such as discrimination in the juridical system, endangering a person's health, or misuse of valuable resources [171]. Based on Vakkuri's [171] research, transparency is the enabler for accountability, and together, transparency and accountability motivate responsibility. Finally, responsibility produces fairness. Fairness is often linked with algorithmic biases. In other words, an AI system might repeat and magnify biases in our society, such as by segregating groups with a history of being marginalized (e.g., in preferring men over women or discriminating against people of color).
Machine learning bias is deined as "any basis for choosing one generalization over another, other than strict consistency with the instances" [117] p.1. Machine learning systems are neutral and do not have opinions, but the models are not used in voids, which makes them vulnerable to human bias. In the context of machine learning models, discrimination and unfairness in the models can be caused by unfairness in the data and the collection and processing of data or the selected machine learning system. The practical deployment of the system may reveal biases that were invisible during the development process. Ultimately, there is no easy solution to ensure fairness of algorithmic decisions [175]. But, there is an interest in inding a working solution.
Veale and Binns [175] identiied three distinctive approaches to ensure fairer machine learning. The irst is the third-party approach, where an outside organization manages data fairness for the main organization. The second is the collaborative knowledge base approach, where linked databases containing fairness issues are lagged by researchers and practitioners. Finally, the third approach is an exploratory approach, where exploratory fairness analysis of the data is performed before training or practically implementing the model. In this paper, the interest is in the exploratory approach because it is connected to the black box problem [175]. The biases are studied from the perspective of XAI, which aims to bring transparency to the AI system. Less emphasis is dedicated to research on how data can be collected or processed to avoid biases.

Summary of Emerging Issues
AI ethics research lacks harmony and standard agreement on deining the core principles of the ield [45,89]. Moreover, the research ield of XAI is complex and is in need of a common vocabulary and formalization [53]. This paper aims not to solve the issue of deinitions of fairness and transparency but rather to investigate the existing research connected to transparency as understood in this paper, as a requirement for the AI system to provide an understandable explanation if needed in the context of the application. This requirement applies to systems that are non-explainable because of the training method, or biased as a result of bias in the training data. This paper takes no stand upon ranking the principles. Instead, it aims to provide a more in-depth understanding of what has been studied and how in terms of transparent and explainable AI systems.
The research ield of XAI studied as a sub-ield of AI ethics examines the challenges and looks for potential solutions for transparent machine learning models, aiming to enable the fulillment of such ethical principles as accountability, responsibility, and fairness [157]. XAI can beneit a broad range of domains relying on AI systems. Especially in domains such as law, inance, military, and transportation, the need for XAI is emphasized [2]. In such areas, AI systems have a direct inluence on the physical conditions of people and can cause injuries [2]. In other domains, transparency may not be a critical requirement. There is no one-for-all framework or solution available for transparency issues. Hence, domain-speciic solutions and frameworks are required.
Adadi et al.'s [2] research showed that the impact of XAI is spanning a broad range of application domains. However, the lack of formalism regarding problem formulation, divergence in explanation methods and results [96], and clear unambiguous deinitions burdens the research ield. Moreover, they noted that the human's role is not suiciently studied [45,56]. A recently published paper recognized the same challenge with the lack of user-centric design in XAI [58]. For implementation, it is important to understand user requirements and needs, to ensure trust and acceptance of algorithmic decision systems [155]. In addition to understanding the user's needs, the research ield lacks knowledge on industrial practices with AI ethics [171] and knowledge on how diferent explanation methods result in varied results. Overall, there is a concern that the XAI ield sufers from the distancing of real-world problems [139].
AI ethics and XAI are broad, versatile topics with increasing importance. The present SMS is timely, as it enables an understanding of what has been studied in AI ethics. It is required to understand what is studied in AI ethics research to clarify the role of explainable AI. More systematic research is required for this purpose, and in the next sections, an SMS is used to understand the study ield of AI ethics and how XAI is manifested in the research.

LITERATURE SEARCH FOR PRIMARY STUDIES
This study employed the SMS method. The main focus of SMS is to "provide an overview of a research area, and identify the quantity and type of research and results available within it" [130] p.2. The SMS aims to identify the potential research gaps and trends, including the understudied topics and research types. The expected outcome for SMS is to identify and choose the primary studies and map the literature. [94].
The research builds on an SMS developed by Vakkuri and Abrahamsson [168], who studied the key concepts in the ield of AI ethics. For this paper, the research was updated twice, irst during mid-way through 2020 and later in the last quarter of 2021. In this paper, the goal is to analyze how XAI is researched in the study ield of AI ethics. The interest is in practical implementation and connection to real-world issues. Thus, the focus is on empirical studies, and papers without data analysis, such as literature reviews, were considered theoretical. We included papers analysing empirical data regardless of the data type, or data collection or analysis method.
The research question for an SMS can cover issues such as what topics are addressed, what empirical methods are used, and what sub-topics have been suiciently empirically studied [94]. This guideline forms the basis of the current research question, "What is the role of explainable AI in the AI ethics research ield?" and its three sub-questions: R1 What has been empirically researched in the ield of AI ethics? R2 What is the state of published research on XAI in the ield of AI ethics in the past 10 years? R3 Where are the research gaps in the ield?
To answer the main research question, it is irst important to answer the irst sub-question [R1]. In this paper, the question is studied on a supericial level to ofer enough background to understand the main research question. The major topics are noted and the research ield's size and proportion of empirical research from the existing academic literature are delineated.
To address the second question [R2] and to understand XAI's role and importance in AI ethics, research with XAI as the focus is relected against a full data-set of empirical studies. More in-depth analysis and classiication are performed on papers focusing on XAI to understand what, how, and why it has been studied in past 10 years. The analysis includes investigation of research methods, contributions, focus, and pertinence to XAI. In addition, the annual changes in the research ield are studied to reveal trends. The connection to real-world issues is also reviewed. This paper investigates the current research corpus with empirical evidence to understand the AI ethics research ield in a way that is closer to real-world issues.
The third question [R3], can be addressed based on a background literature review and a profound SMS. The background literature review revealed gaps, such as the lack of understanding of the human role in XAI [2] that were also highlighted in SMS analysis.
The processes of building an SMS is cumulative, and it includes several rounds of screening papers. The process steps and outcomes are presented in Figure 1 based on [130]. The headline of each block describes the process step, and the body relects this study. The igure guides the reader through the entire study.
Due to the fact that an SMS's goal is to understand the research area rather than give evidence, the articles do not need to feature in-depth examination. Thus, the number of articles included can be larger [130]. The total number of papers included from ive databases, after deleting duplicates, was 4,411. After applying the inclusion and exclusion criteria, the sample was narrowed to 142 papers. In the following, each step is further explained based on the theoretical framework.

Primary search
The irst step in an SMS is to identify the primary studies that contain relevant research results [37]. This paper builds on the SMS of Vakkuri and Abrahamsson [168], and the search strings and selected databases were adopted from their research. With the research question of, "What topics are covered in AI ethics research?" the search string consisted of the two following parts: (1) AI and its synonyms (robotics, artiicial, intelligence, machine, and autonomous); and (2) ethics and its synonyms (morals). The search string was as follows: (AI OR artificial* OR auto* OR intelligen* OR machine* OR robo*) AND (ethic* OR moral*) The search was narrowed to include only the headline and abstract. The search was performed in the ive following electronic databases: IEEE, ACM, Scopus, ProQuest, and Web of Science. In total, there were 221,363 results. Table 1 shows the results of primary search per database.
Because of rapid progress in the development of AI in early 2010s, previous studies, such as those carried out before 2012, are often not as relevant as the more current research. Thus, these were excluded from the results. Since the aim is to understand the state of academic research related to the topic, only peer-reviewed articles were included [20]. The search with four ilters (document type, publication year, peer-reviewed and language) performed in ive databases resulted in 49,333 papers. All the abstracts of the resulted papers were screened  manually to exclude papers that were irrelevant to the study. The primary search was done irst in 2016 and updated in 2019 and 2021. Manual screening was executed by the four irst authors. At this stage each paper was screened once. To guarantee consistency between readers, if the reader was uncertain the paper was included. The primary search resulted in 7,048 papers, which were combined into one data-set, and duplicates were deleted. The remaining papers amounted to 4,411 that were left for closer review in the inclusion and exclusion process.

Inclusion and Exclusion
The second step of SMS is to examine the selected papers and ind the primary studies [37]. This process requires deining a greater number of narrower inclusion criteria. The inclusion process is guided by the research goal and desirable contribution [128]. The inclusion and exclusion criteria are presented in Figure 2.
The study's aim is to map the relevant research area of the ethics of AI in the domain of information system science. Hence, in this step only papers focusing on the ethics of AI [I1] were included. Because many papers were . White literature refers to full papers published in venues of high control and credibility, and it excludes pre-prints, technical reports, blogs, and other types of publications that are referred to as grey and black literature [65].
In SMS studies, exclusion criteria may require excluding papers that only mention the main interest area in the abstract. General concepts are often used in abstracts, even if the paper focuses on something else [130]. The irst exclusion criterion [E1] is the exclusion of papers that do not contribute to AI ethics research and only mention the potential ethical issues related to AI in the general introduction. Moreover, in this paper, the interest is in practical AI implementation rather than a philosophical concern. Therefore, papers without empirical research, were excluded from the study [E2]. In the inal screening, papers that did not focus on XAI or related topics were excluded [E3].
The inclusion and exclusion criteria were established and deined during the screening process. The inclusion criteria provided the general boundary and quality conditions, and the exclusion criteria gave more detailed limitations to distinguish the sample relevant for this paper.
For the irst screening round, three quality inclusion rules were applied: language [I4]; access to full text [I5]; and suiciently used references as well as overall academic quality [I6]. This means that workshop, keynote, panel, and paper presentations were excluded, along with short papers, tutorials and abstracts. In addition, papers that did not focus on the ethics of AI were excluded [E1]. During the screening round, the quality of each paper was validated. Papers that did not meet the academic peer-review standards, such as short papers, tutorials, and panel/keynote/workshop presentations, were excluded from the study.
The included papers were clustered into two categories, theoretical and empirical, to separate the empirical papers that were meaningful for this paper's goal. The empirical papers were manually separated during the screening, because this was considered the most reliable way to ensure the sample would include all the relevant papers. The screening was executed by the irst four authors. Each paper was screened by one or two authors. If the irst reader was uncertain the second opinion was provided. From the total of 2,192 papers that met the inclusion criteria, 503 used empirical material. The theoretical papers consist of reports, opinions, philosophical papers, problem descriptions, proposals, and academic literature reviews. For the second screening, the papers were skimmed and scanned for keywords based on the focus area to ind the papers connected to XAI. As described in the second section, XAI is a vague concept, and there is no commonly agreed framework on what topics are considered should be included under the term. Thus, papers focusing on responsible AI, algorithmic bias, or black box models were included to ensure the inclusion of all relevant papers. The excluded papers are visualized in Table 2.
The primary studies (n = 142) included in the SMS are further classiied and analyzed in the next section. The full sample of papers with empirical evidence (n = 503) was further reviewed to understand the overall ield of AI ethics described in them. However, the analysis was done on a supericial level because a more thorough investigation was outside the scope of this study.

Short analysis of AI ethics research field with empirical evidence
Future studies are required to understand the research area of AI ethics more comprehensively. Yet, this short analysis gives suicient background to relect the role of XAI against the full sample of AI ethics research with empirical evidence (n = 503). The empirical papers represent 23% of the whole sample of manually included papers (n = 2,192). This inding forms the irst empirical contribution (EC).
• EC1: Most of the research papers in the ield of AI ethics do not use empirical evidence. Only 23% of the papers provide empirical evidence.
The two following dimensions were observed within the entire sample: emerging themes and the year of publication. The theme analysis was done during the keywording process described in the next section. A more profound analysis would require a more systematic approach.
Since the research area is in its infancy, the year of publication can provide insight into the research area's growth. The papers published per year are visualized in Figure 3. The size of the bar presents the number of papers published each year.
The visualization reveals signiicant growth starting from 2018. There is a clear correlation to public discussions, with discourse on AI ethics growing signiicantly in media in 2018 [124]. This inding forms the second empirical contribution.
• EC2: Empirical research on AI ethics grew signiicantly in 2018, corresponding with trends in public discourse.
Based on the shallow categorization of the topics during the classiication, most papers focused on general issues and challenges related to AI ethics. Some notable topics in the research ield were human-robot interaction for both physical and virtual robots (focus in 77 of 503 papers), autonomous vehicles (58 of 503 papers), health and care (54 of 503 papers), education (31 of 503 papers) and governance/regulation (28 of 503 papers). The papers • PEC1: XAI is a signiicant research focus in the study ield of AI Ethics. Of the empirical research papers published after 2012, 28% are related to XAI.
Since the inclusion of XAI did not require the paper to have full dedication and focus on XAI, the number of papers engaging with XAI is not comparable to other emerging themes. In addition, papers with partial and marginal input to XAI were included if they contributed to the topic. No further examination was performed on excluded papers.

CLASSIFICATION
Classiication uses a systematic process where the classiication schema evolves and is speciied during the process [130]. The irst step, keywording, reduces the time required for building the classiication schema and ensures that the classiication schema represents existing studies [130]. The process was initiated during the last stage of the inclusion process and continued with the inal sample, the primary studies, (n = 142) during the classiication. Next, the classiication schema, classiication results, and the overview of the primary studies are presented.

Classification schema
For the classiication schema, the papers were examined in terms of the four facets adopted from SMS of Paternoster et al [128]. These facets were research, contribution, focus, and pertinence.
(1) Research facet. The research type is used to distinguish between diferent types of studies and chosen research methodology. A research type proposal of a solution refers to papers proposing a novel solution technique and arguing for its relevance, without full justiication. At best, such papers provide a narrow proof of concept. Validation research papers investigate the properties of their or others' proposals of solutions that are not implemented in practice. The investigation is performed in a methodologically sound research setup. Philosophical papers propose new conceptual frameworks and structures. Finally, experience papers describe the implementation in practice, such as listing the lessons learned. The experience may be the author's or that of the person studied [185].
(2) Contribution facet. The aim is to identify the tangible contribution of the paper. This can be an operational procedure for development or analysis to provide a a new and more efective way to do something, such as a design framework. Alternatively, it can be a model representing the observed reality and structuring the problem area, an implemented computational tool to solve a particular problem, or a speciic solution for a speciic application problem. The contribution can also be a piece of generic advice with a less systematic approach than the model. It often focuses on one example case and is more vaguely directive than the procedure is. The contribution facet is based on Shawn's [153] research.
(3) Focus facet. Keywording that was performed during the last screening round revealed focus themes that were highlighted during the classiication process. The focus themes detected were algorithmic bias, or the challenges with fairness because of biased and discriminative training data or model; black box, or the challenges with non-transparent systems; and accountability, with papers studying when and how the accountability of a non-transparent system is divided. Some papers focused on understanding the attitudes, expectations, and trust toward non-transparent systems. These papers were categorized as attitude.
(4) Pertinence facet. The pertinence facet shows the level of relation to XAI, which is the research focus of this paper. The levels are as follows: full, where XAI or transparency issues are the main focus of the paper; partial, where the paper is partially related to XAI or transparency; and marginal, where the paper's primary research focus is out of transparency or XAI themes.
In all facets, the same paper can it into several categories. Here, in such situations, the best possible it was chosen. The process was highly opinion-based, and the evaluation of one individual could impair the study's quality and liability. Classiication was done by the irst author and the classiication schema was presented and evaluated by two reviewers to ensure the research quality.

Results of Classification
After the classiication schema was established, the actual data extraction took place, and the articles were sorted into diferent classes. A signiicant portion of papers focused on biased algorithms. These papers were classiied under the pertinence facet as łfullž if the papers focused on making the whole system more transparent. Papers that focused on cleaning and ixing biased data-sets were classiied as having a łpartialž pertinence toward XAI. They were considered to have a main focus that related more to data science. The pertinence facet helped clarify whether the paper has a strong focus on XAI and transparency issues. Papers with a marginal focus on XAI were seen to contribute to the topic even if the main focus was elsewhere, and therefore, they were kept in the sample.
After the classiication, the papers were calculated in their respective classes and visualized with the number of papers in each facet's class and the percentage of the class compared with the full sample (n = 142). This highlights what has been emphasized in past research, revealing potential research gaps and possibilities for future research [130]. The classiication results are presented in Table 3. In the research facet, the proposal class was signiicantly emphasized, with 59% of the studies proposing a technical, mathematical, or design solution. The main contribution classes were tools (computational solutions to a particular problem) and models (structuring the problem area). Many papers proposing a new computational tool suggested a new algorithm or mathematical solution.
Many papers focused on biased algorithms (46%). Papers where the main focus was to understand developers' and users' expectations, attitudes, and trust toward XAI systems represented 28% of the whole sample. From the attitudes category, only 14 papers (10% of the sample) focused on practitioners' expectations and opinions, and the remaining 26 papers focused on understanding how the general public sees the issue.
In addition to classiication, the papers were clustered based on the publication venue (journal/conference) and type of data used (real-life or synthetic). Most papers, representing 99 papers (69%), were published at conferences. Only 10 papers (7%) used synthetic data, which indicates that the research on XAI is closely connected to real-life issues.
The overview of the primary studies (n = 142) in light of the the classiication results is presented in Appendices. All the papers are found in the reference list at the end of this paper. In the next section, the classiied data are analyzed and visualized. The analysis aims to elucidate the study ield of XAI and its role in AI ethics research.

SYSTEMATIC MAP
There are several ways to visualize the results of an SMS. The two most common approaches are bar plots and bubble plots [131]. Bubble plot visualization is exceptionally well-suited to illustrating the number of studies for a combination of categorizations [131]. Because the classiication schema applied in this study includes several categories, the bubble diagrams were built to visualize the number of papers in diferent classes and investigate correlations between them. Since there were four main facets in the classiication schema, it was necessary to create several diagrams to avoid over-complicating the view. Diferent types of visualizations were constructed based on the area of inspection. In the next sections, the results of the classiication schema, pertinence, impact, annual change, and the venue of the study ield are visualized and analyzed.

Systematic Map in the Bubble Plot Visualization
A bubble plot diagram helps to give a quick overview of the research ield and support the analysis more efectively than the frequency tables [130]. Here, the bubble plot diagram was built using summary statistics presented above previous section ( Table 3). The diagram visualizes the frequencies and correlations between categories and facets. The bubble plot diagram comprises two x-y scatterplots with bubbles in category intersections. The same idea is used twice, on opposite sides of the same diagram, to show the intersection with the third facet on the x-axis [130].
In the irst bubble plot, the contribution and research facets are compared to the focus facet. The size of a bubble indicates the number of papers that are at the intersection of the coordinates. Next to a bubble, there is the percentage of the total amount (n = 142) in the represented category of the x-axis. The bubble plot is presented in Figure 4. • EC3: The most popular paper type in the research facet is a proposal for solving algorithmic bias. In addition, the proposals for black box issues are highlighted. Proposal research studies new and novel techniques to solve a particular issue. When compared to validation research, which studies a speciic solution that has already been implemented in practice, the size of the proposals bubble is much larger, which indicates the research ield's freshness. It may be that there are few proper practical solutions to ix the ethical issues related to XAI, these solutions are not yet implemented in practice, or the practical implementation has not yet been studied. The scarcity in the validation research is probably partly due to all the reasons mentioned above.
• PEC2: In the study ield of Ethical XAI, the most common type of empirical research is studying a novel technique that can solve a computational challenge. From the contribution facet, the largest bubble can be found at the intersection of bias and tools. Nearly one fourth, 23% (32 papers) of the whole sample contributes to the research ield with a computational solution to solve algorithmic biases. A computational tool to solve black box issues was proposed in 12 papers.
• EC4: Almost one-quarter of the papers in the sample contribute to the research ield with a computational solution to solve algorithmic biases. In the contribution facet, the second-largest bubble (21 papers) can be found at the intersection of the attitude facet and the model facet. The bubble visualizes how the research ield is modeled and structured by providing a better understanding of users and practitioners. Procedures, contributing by proposing a new way to solve such issues as design frameworks, are equally interesting in each focus facet when compared with the amount papers categorized per focus facet.
• EC5: Half of papers interested in users' and practitioners' attitudes and perceptions related to XAI and AI ethics are contributing by modeling and structuring the research area. There is no strong weighting on any of the contribution types in the black box's focus facet. In the bias category there is an apparent weighting in the contribution of computational tools, and in the attitudes category there is weighting placed on modeling the problem area. From 32 papers that focus on bias and contribute with a computational tool, 30 papers (20% of the whole sample) have a research facet proposal. This is the most prevalent type of paper in the present study.
• EC6: The most prevalent paper type is that of a computational tool proposing a solution to a problem with bias. Every ifth paper presents this type of research. From the bubble plot visualization, it can be concluded that the most common type of paper is a computational tool proposing to solve problems with biases, and in general, most papers look for novel techniques and solutions to computational problems. The results may indicate that the focus is slightly monotonous. Papers concerning black boxes, accountability, or attitudes are more dispersed, with the exception of the strong emphasis on proposals as a research type in the black box papers. In addition, the results indicate immaturity in the research ield.
• EC7: The research ield seems to be somewhat monotonous and immature when considering the variety of topics, research methods used, and contributions of the papers.

Pertinence Mapped in a Bubble Plot
Since the pertinence indicates the accuracy in the XAI topic, pertinence was visualized with a bubble plot corresponding to the focus and research facets. The bubble plot visualization in Figure 5 aims to understand which focus areas and types of research have full pertinence on XAI and transparency-related topics, and in which focus areas the pertinence remains elsewhere. Out of the papers focusing on algorithmic biases, 44% had full focus in XAI. Many of the papers with partial focus had the main emphasis on cleaning data and ixing the data-sets that are causing the discriminating and unfair decisions. These papers were considered to have their main pertinence in data science and fairness rather than in XAI. Not surprisingly, most of the papers (26 out of 31 papers) focusing on the black box were categorized to have full focus on XAI. The black box is one of the core concepts in XAI research [2]. In the results, 43% of papers with full pertinence were proposals of a novel solution. This again relects the freshness of the research ield, and it may indicate that the research done in the ield is solution-oriented.
• EC9: The research ield of XAI seems to be solution-oriented, and the research corpus with empirical evidence focuses more on inding solutions than exploring challenges.
Interestingly, only six papers with the main focus on attitudes and expectations of practitioners, users, and the public had full pertinence toward XAI. The results indicate a research gap in understanding people's perceptions of the topic. Similarly, papers with experience as a research focus had mainly partial or marginal focus on XAI.
• PEC3: The human perspective toward XAI is not well known. There is no in-depth understanding of the practitioners' and users' expectations and attitudes toward XAI.
It could be assumed that in research on XAI in AI ethics, there is a lack of understanding of the issues related to users' and practitioners' attitudes. Only four papers [41,170,171,182] studied the current state of industrial implementation of AI ethics in general, and none of these had full pertinence to XAI. No paper studied the managerial or business perspectives of XAI.
• PEC4: Industrial implementation of XAI is not yet profoundly studied in the research ield of AI ethics.
There is a research gap in the managerial perspective and the business implications of XAI.
In total, only six papers presented a main focus on accountability. Although accountability was mentioned in several papers, it is interesting that it has not been more profoundly studied. Only one of the papers [138] had full pertinence toward XAI, and the rest related to AI ethics more generally. There seems to be a research gap in terms of understanding who takes responsibility and how this is decided if biased or non-transparent systems are not working as expected.
• EC10: There is a research gap in understanding who is responsible for the actions of non-transparent systems and how the responsibility is decided and communicated.
In conclusion, the pertinence was strongest in black box research, and it was strongly present in the bias category. The attitudes category had a relatively weak connection to XAI. This indicates a need to understand better how people, including practitioners, businesses, and the public, perceive XAI.

Visualization of Annual Changes in the Research Field
The year range for the SMS described in this paper was 2012ś2021, but none of the papers from 2012ś2016 were included in the study after the inclusion and exclusion processes were implemented. One paper from 2017, 16 papers from 2018, 40 papers from 2019, 52 papers 2020, and 33 papers from 2021 were included. Notably, the primary search was performed during September to December 2021. Hence, the record for 2021 is incomplete.
• EC11: XAI is a young but growing empirical research area in the ield of AI ethics. The growth of the research area seems to be stabilizing. From empirical papers (visualized in section 3.3, Figure  3) published in 2019 (n = 93), 43% were connected to XAI; among those published in 2020 (n = 167), 31% (52 papers) were connected to XAI. Among empirical papers published in 2021 (n = 170), only 19% were related to XAI. This could be due to the faster growth of other research interests in the ield of AI ethics or separation related to individual research agendas that were not so tightly connected to AI ethics. However, this study is only focused on XAI papers that are related to the research interest of AI ethics.
• EC12: The research interest in XAI compared with all published empirical papers on AI ethics was highest in 2019. Since then, the interest in XAI has grown yearly but not as rapidly as the empirical research on AI ethics has in general.
To visualize the annual changes in the research ield, Figure 6 shows the annual changes and evolution in the contribution and research facets. The motivation for generating bubble plots was to detect trends in the research ield. Although, as the research ield is still emerging, the trends might only be seasonal changes. Moreover, because the year 2021 cannot be evaluated entirely, the results per year are not fully comparable.

The Role of Explainable AI in the Research Field of AI Ethics • 19
The bubble plot reveals that the proposal has been the most popular category from the research facet every year. Experience and validation papers seem to be growing in popularity as the research ield matures. Simultaneously, the number of philosophical papers is decreasing. The research trend seems to be toward more practical understandings and less philosophical framing, as well as structuring of the focus area.
• EC13: The trend is toward more practical implications and less philosophical framing of the focus area. In the contribution facet, the division between categories is more even. The strongest growth is in procedures, which are proposals for better ways of doing something. Interestingly, discussions on tools and computational solutions showed a decreasing trend in 2020 and 2021. This could indicate that the research ield is evolving to become more holistic and not as intensely focused on inding technical solutions. Moreover, the growth in speciic solutions could indicate that the computational tools are proposed to ix speciic application issues. However, more research is required to verify this conclusion.
• EC14: The research contribution and interest seems to be shifting from proposing general computational solutions to proposing more holistic design/framework level solutions and tools for speciic application issues. Another interesting observation is that advice papers seem to be decreasing in prevalence as the research ield is maturing. This might be connected to the same trend to move from general advises to more application or problem speciic solutions.

Venue and focus of the research
The research venue was studied to understand the quality and depth of the research area. All the papers were published either in conferences or journals. The papers published in journals should include the most mature research [86]. In addition, a higher degree of empirical evidence is expected from papers published in journals than from the conference of workshop proceedings [86].
As mentioned above, most papers were conference proceedings, representing 99 papers (69.7%). The most popular venue was the AAAI/ACM Conference on AI, Ethics, and Society (AIES). Thirty-nine papers (28%) of the total sample (n = 142) were published in AIES.
• EC15: The most popular publication venue is AIES, with 28% of papers published in it. The annual variation of the publication venue and focus facet is visualized in Figure 7 with a view to elucidating how the research area has been evolving.
Interestingly, in 2021 almost as many papers were published in journals as in conferences, but since the primary search was performed during late 2021, the incomplete nature of the data may have afected the result. The division between conference proceedings and journals since 2020 seems as expected, that conferences are the main publication venues in information systems. The growth in interest in publishing in journals could indicate a shift in the depth of the research.
• EC16: Nearly similar numbers of papers were published in journals and conferences during 2021. No signiicant trends can be detected from annual changes in the research focus. The research focusing on black boxes seemed to gain in popularity, whereas the research with the main focus on biases seemed to decrease in popularity. The number of papers focusing on attitudes seemed to grow relatively steadily. From the attitude papers, the annual division of papers focusing on understanding the developers and practitioners was as follows: one paper in 2018, two papers in 2019, six papers in 2020, and four papers in 2021. Understanding the expectations, needs, and opinions of practitioners seems to be a slowly growing trend. This could indicate that the research ield is increasingly interested in practical implementation.
• EC17: There is a growing interest in practical implementation and understanding of the needs and expectations of users and practitioners. Out of 43 papers published in journals, 18 focused on attitudes. This is a large proportion of attitude papers, reaching 45% (n = 40). Since the rigor in journal publications is higher than that of conference papers [86], this indicates that although the ield lacks a plurality of studies on humans' role and attitudes, the quality of this type of research is high.
• EC18: The studies on the role and expectations of users and practitioners represent high-quality research. This relection may be explained according to the type of data used in the research. User research usually requires more time-consuming research methods. Therefore, the originality and quality of the evidence are higher, which its better with the publication criteria of journals. This can be compared to the black box papers, where 26% (8 papers) were published in journals, and the bias papers, where 23% (15 papers) were published in journals.

Analysis of connection to real-world problems
To understand whether the study ield focuses on real-world problems, the papers were evaluated based on the use of real-world data versus synthetic data. As mentioned at the end of section four, only 7% of papers (10 papers) used synthetic data. In addition, most of the papers described the connected real-world challenges in the introduction and background sections. Overall, the research ield is close to real-world problems.
• PEC5: XAI researchers are interested in real-world problems and applications, not only technical aspects of the topic.
If the ield of XAI research had been studied independently without the association of AI ethics, the connection to real-world problems may have been diferent.

Summary of empirical contributions
Next, we summarize the empirical contributions and primary empirical contributions of this paper. The paper's main theoretical contribution is to map the research area, which supports future research by framing and visualizing the existing research. The secondary contribution comprises the PECs derived from the maps. The PECs are supplemented with ECs. ECs that were highlighted from the text body in previous sections are listed below.
• EC1: Most of the research papers in the ield of AI ethics do not use empirical evidence. Only 23% of the papers provide empirical evidence. • EC2: Empirical research on AI ethics grew signiicantly in 2018, corresponding with trends in public discourse. • EC3: The most popular paper type in the research facet is a proposal for solving algorithmic bias.
• EC4: Almost one-fourth of the papers in the whole sample contribute to the research ield with a computational solution to solve algorithmic biases. • EC5: Half of the papers interested in users' and practitioners' attitudes and perceptions related to XAI and AI ethics are contributing by modeling and structuring the research area. • EC6: The most prevalent paper type is a computational tool proposing a solution to a problem with bias.
Every ifth paper presents this type of research. • EC7: The research ield seems a bit monotonous and immature when considering the variety of topics, research methods used, and contributions of the papers. • EC8: Out of the papers focusing on black box (n = 31) 84% had full pertinence on XAI.
• EC9: The research ield of XAI seems to be solution-oriented, and the research corpus with empirical evidences focuses more on inding solutions than exploring challenges. • EC10: There is a research gap in understanding who is responsible for the actions of non-transparent systems and how the responsibility is decided and communicated. • EC11: XAI is a young but growing empirical research area in the ield of AI ethics.
• EC12: The research interest in XAI compared with all published empirical papers on AI ethics was highest in 2019. Since then, the interest in XAI has grown yearly but not as rapidly as the empirical research on AI ethics in general has. • EC13: The trend is toward more practical implications and less philosophical framing of the focus area.
• EC14: The research contribution and interest seems to be shifting from proposing general computational solutions to proposing more holistic design/framework level solutions and tools for speciic application issues. • EC15: The most popular publication venue is AIES, with 28% of papers published in it.
• EC16: Fairly similar numbers of papers were published in journals and conferences during 2021. • EC17: There is a growing interest in practical implementation and understanding the needs and expectations of users and practitioners. • EC18: Studies on the role and expectations of users and practitioners represent high-quality research.
The primary empirical contributions are listed below. In previous sections, the primary empirical contributions were listed from the text body to bring them to the reader's attention and ensure easy accessibility when skimming the paper. Primary empirical contributions are written in a context-enriched manner to support the understanding of readers who are not familiar with the full paper.
• PEC1: XAI is a signiicant research focus on the study ield of AI ethics. Of the empirical research papers published after 2012, 28% are related to XAI. • PEC2: In the study ield of Ethical XAI, the most common type of empirical research is studying a novel technique that can solve a computational challenge. • PEC3: The human perspective toward XAI is not well known. There is no in-depth understanding of the practitioners' and users' expectations and attitudes toward XAI. • PEC4: Industrial implementation of XAI is not yet profoundly studied in the research ield of AI ethics.
There is a research gap in the managerial perspective and the business implications of XAI. • PEC5: XAI researchers are interested in real-world problems and applications, not only technical aspects of the topic. Theoretical and practical implications of the primary empirical contributions are evaluated next.

DISCUSSION
This section lists the proposals for the theoretical and practical implications of the PECs, which were the SMS process outcomes. In theoretical implications, PECs are relected against the existing research. The practical implications are proposals and ideas for how the conclusions could be implemented in practice. The limitations of the research are discussed at the end of the section.

Theoretical Implications
The main theoretical implication of this paper is the mapping of the research area presented in section 5. The key outcomes of the analysis of the mapping process are in this section mirrored existing research. PECs are mirrored to the existing research and evaluated if they contradict or correspond to the existing research or provide a novel perspective. As the focus of this paper is to understand the research area's scope and depth, rather than the quality of the articles, the primary empirical contributions are related to those factors. The summary of the results is presented in Figure 8.

The Role of Explainable AI in the Research Field of AI Ethics • 23
A signiicant proportion of papers related to XAI in the empirical research of AI ethics (PEC1) corresponds to the research of Jobin et al. [89], who noted that the transparency is the most frequently highlighted principle in AI ethics. Besides, the result relects the overall importance and interest of XAI. At the same time, it illustrates XAI's connection to real-world problems as it is studied with empirical methods.
The interest in proposing novel computational solutions (PEC2) shows the freshness in the ield without practical results to validate. The research area of AI ethics holds interest in inding technical solutions to ethical problems [34], which correlates with a broader perspective. To our knowledge, there is no previous research that has analyzed the type of research done in the ield, so the relation to existing research may be shallow.
Previous research has shown that the human role and perspective are understudied subjects, both from the user's and practitioner's point of view [2,45,57,58]. The same inding was evident in this SMS (PEC3). Concerning the lack of research on users' and practitioners' expectations, there was a more speciic gap with the lack of research on XAI's industrial implementation (PEC4). Vakkuri et al. [171] pointed out an analogous dilemma with AI ethics. Their research is one of the few papers cited in this SMS that aims to understand the current state of the practical implementation of ethical principles.
Unlike black box problems where the research ield is distanced from real-world problems [139], XAI makes a strong contribution to addressing real-world problems (PEC5). The vast majority of the papers focusing on black boxes used real-world data in their research. In addition, in most of the papers, societal issues were highlighted in background sections or introductions. This paper has brought some novel perspectives to the research area, contributed to existing research, and contradicted some prior perspectives. It is important to remember that in SMS, the papers are not studied as profoundly as they are in SLR. To form a more in depth conclusion, the research should be continued with SLR, which could provide new insights.

Practical Implications
Some of the PECs only had a clear practical contribution. Hence, they are not analyzed by their relevance to practitioners. The research ield has a close connection to real-world problems (PEC5). The research provides knowledge and perspective to regulators and communicators by contributing to the ield and tying the research to societal issues. For practitioners looking for speciic solutions, the research area ofers open-source models tested with real-world data that practitioners can bench-mark and modify to it their needs (PEC2 and PEC5). There are many practical solutions and models built in academia; hence, the collaboration potential between academia and practitioners is signiicant (PEC2).
In contrast to the above points, since the research ield is new and emerging, a shortage of practical implementation is recognizable (PEC3 and PEC4). There is no guarantee that the research area's solution proposals have the potential to serve practitioners and users and ever be implemented into practice (PEC3). The current practical implementation level of XAI solutions is unknown, as well as the expectations or interest of business decision makers. If decision-makers do not understand the need for XAI, the practical implementation of XAI in businesses is not likely to happen on a bigger scale (PEC4). The summary of results is presented in Figure 9.
In conclusion, the analysis of practical implementation revealed the potential of even closer collaboration between practitioners and academia. At the same time, the research gap when it comes to understanding the perspectives of practitioners, users, or business decision makers can harm the practical implementation of XAI solutions. Overall, more research is required in order to advance knowledge and further develop the ield.

Limitations of the research
A common bias that systematic reviews sufer from is that positive outcomes are more likely to be published compared to negative ones [20]. Especially in the corpus of empirical research, this may lead to a lack of validation studies and leave out solutions that were not working as expected. The inclusion of conference proceedings is one solution to avoid publication bias [20]. Thus, bias should be decreased in this paper.
The framing of the research question posed limitations to this study. Since the focus of the paper was to understand the research ield of AI ethics and the role of XAI in the ield, the mapping undertaken for the present paper provided this speciic viewpoint. However, this viewpoint has its challenges because the deinition excludes all research papers with a focus on AI's interpretability without a clearly visible relation to ethical concerns.
Due to the variety of vocabulary used in these research topics, there is also uncertainty as to how accurately the used search keywords relect the underlying research area. As the keywords were limited to ethics and its synonyms, and AI, and its synonyms, there is a chance that key papers have been missed. These papers may be relevant, yet if there was no mention of AI and ethics in the abstract, the papers were not included. Thus, it is important to note that for the primary search, the keywords could have been expanded to include responsible AI related concepts and principles such as transparency and accountability.
There could have also been a larger scope of technology-centered terminology included in the search, for example, computer ethics, but while we have observed that Computer ethics and AI ethics are related ields, they are still distinct from one another. Computer ethics is a branch of applied ethics that focuses on the ethical issues related to computer technology. It encompasses a wide range of issues, including privacy, security, intellectual property, access to information, and the impact of computer technology on society [38]. AI ethics, on the other hand, is a more speciic ield that focuses on the ethical issues arising from the development and use of artiicial intelligence (AI) systems [89]. While both ields are concerned with ethical issues related to technology, AI ethics is a narrower and more specialized ield, focusing speciically on the unique ethical challenges presented by AI systems.
We chose to scope our research to a 10-year period. There are some potential limitations that may arise. By leaving out research done prior to 2012, the study may miss historical events and developments that have shaped the research ield, which may lead to an incomplete understanding of the ield's evolution. We acknowledge that our study may lack historical context but our focus was on recent trends and developments. It is possible that a 10-year period may not be enough to fully cover all the topics and issues in the ield potentially leading to a narrow or incomplete analysis. Also, the indings may not be generalizable to the entire AI ethics ield. For these reasons, we narrowed down the scope to the role of XAI in the AI ethics research ield, to gain a more detailed view on the state of this sub-ield in question. Through doing this, we found that the oldest papers added to the inal sample were from 2017. Hence, leaving out research done prior to 2012 is justiied as it is unlikely to signiicantly afect the inal sample.
During the primary search, some limitations were faced with the databases. Each database was screened, starting from the oldest papers, to track its potential changes during the screening process. However, with the larger databases of Scopus, Web of Science, and ProQuest, the number of hits varied between searches. Because of these problems, there is a chance that not all relevant papers were included. However, since all three multi-disciplinary databases were included in the study, it reduces the possibility that any relevant paper was missed.
After the primary search, the sample size (N = 4,411) was larger than expected, which limited the amount of attention dedicated to each paper during the screening and inclusion process. In other SMS studies, the initial take-in from separate databases has been signiicantly lower, for example 1,062 papers [168], 1,769 papers [128], and 2,081 papers [27]. Due to the large sample, the literature search and inclusion processes were conducted mostly by one reviewer per paper. Thus, there is a chance for human error and false classiication during the screening process. To ensure better quality, if the reviewer felt uncertain about a paper, the paper was tagged, and another reviewer provided a second opinion. The papers included after each inclusion phase were re-evaluated during the following phases. However, the papers excluded during the early inclusion were not further evaluated, increasing the possibility that a suitable paper would be missing from the inal study because of manual labeling failure.

FUTURE RESEARCH
There is potential to continue the SMS with the collected data-set to gain a more in-depth understanding of the AI ethics research ield. The literature search and inclusion process were performed with clear guidelines, disciplinary following a stringent search process, which enables future use of the research material [94]. One potential research direction is to use the collected data-set with empirical evidence (n = 503) to observe other emerging themes in the research ield of AI ethics, such as health, education or regulation. In the future, the results could be extended by expanding the data-set via adding new keywords in the primary search. Closely connected terms such as transparency and responsibility would provide deeper insight on the ethical perspective.
The SMS revealed research gaps in the existing corpus. There is a need to study how humans perceive XAI, and what they are expecting from XAI systems, or whether they even value them at all. That knowledge could guide the research area to search for solutions that are needed. Cross-disciplinary research between computer scientists and humanists could continue to provide exciting insights to the ield, as already demonstrated in research on the human perspective in AI (eg. [114]).
There is a shortage of understanding regarding users' and practitioners' expectations, needs, and attitudes toward XAI, and there was no research on the managerial perspective of XAI identiied. A more profound understanding of the current implementation level is needed to ensure that the research has value for practitioners and business decision makers. The research area would beneit from a more advanced understanding of industrial implementation of and especially the managerial perspective on transparent systems in companies using AI solutions. Top managers are the inal decision-makers, and they are accountable for their products' actions. Moreover, they are the gatekeepers of funding for development. To ensure the solutions proposed in papers are implemented in practice, it is necessary to understand what business decision makers want and where they are ready to invest.

CONCLUSION
In this paper, the SMS method was utilized to visualize how XAI is researched in the ield of AI ethics. SMS was chosen to provide a broader perspective on AI ethics, elucidate the research area in a more profound way, and clarify the role of XAI in the research area. There is potential to continue the SMS compiled in this paper to gain a more in-depth understanding of the AI ethics research ield or other emerging topics in the research area.
The expected indings included mapping of the covered topic and analysis of when, how, and why the research was done to reveal potential research gaps. The research question was, "What is the role of XAI in the research ield of AI ethics?" and the three following sub-questions were identiied: R1 What has been empirically researched in the ield of AI ethics? R2 What is the state of published research on XAI in the ield of AI ethics in the past 10 years? R3 Where are the research gaps in the ield?
The main interest behind this paper was XAI's practical implications. Hence, the research was narrowed to empirical papers.
A quick analysis of the dataset of empirical research in AI ethics (n = 503) revealed that overall, the AI ethics research is rather theoretical, as only 23% of manually included papers (n = 2,192) used empirical evidence. Empirical research grew signiicantly in 2018. Since 2018, the empirical research has kept on growing each year. Similarly, the research focus in XAI grew signiicantly in 2018 and has kept growing ever since. The interest in XAI is a signiicant area in AI ethics research with empirical evidence, as 28% of the papers (n = 503) contributed to issues related to XAI.
In terms of its current state, XAI is a growing research area that is close to real-world problems. Most of the papers were more concerned with the technical or design perspective of the problem compared to the practical challenges in implementation. This indicates that XAI is still mainly interpreted as an academic challenge. The ield would beneit from a more robust understanding of the needs, expectations and attitudes of users and practitioners. Future research is required to understand how XAI is perceived by business decision-makers. This could help to take research indings and solutions to practice.   [112] Philosophical Advice Attitudes (users) Partial Mitchell et al (2020) [115] Proposal Procedure Bias Partial Nirav et al (2020) [6] Philosophical Procedure Attitudes (users) Marginal Oppold et al (2020) [121] Proposal Procedure Bias Partial Orr et al (2020) [122] Experience Model Attitudes (practitioners) Partial Paraschakis et al (2020) [126] Proposal Speciic solution Bias Partial  [190] Proposal Procedure Attitudes (users) Full Yoshikawa et al (2021) [194] Proposal Speciic solution Bias Partial Yu et al (2021) [195] Proposal Speciic solution Bias Partial Zicari et al (2021) [200] Proposal Procedure Accountability Partial Aïvodji et al (2021) [17] Proposal Tool Bias Full Blanes-Selva et al (2021) [29] Proposal Speciic solution Black Box Marginal  [165] Proposal Procedure Bias Partial