Discovering Explainability Requirements in ML-Based Software

As the demand for Machine Learning (ML)-based software continues to grow across various industries such as healthcare, automotive, energy, and banking, there is an increasing need for explainability requirements. Domain experts such as doctors must have confidence in ML-based software to integrate them into their professional practices. This requires developers to simultaneously develop clear explanations of how these Machine Learning models work as they build the systems. While numerous philosophies and techniques for eliciting user requirements in software systems have been extensively studied within Requirements Engineering (RE), scholars argue that we need new approaches tailored to elicit explainability requirements. This PhD research aims to conduct empirical studies examining emerging methodologies and philosophies for identifying explainability requirements. The objective is to connect theoretical insights and practical approaches adopted by practitioners in this rapidly evolving field.


INTRODUCTION
As ML components become integral parts of software systems, a distinct category of requirements is gaining prominence: the need for explainability [1,7].Users now seek comprehensive explanations regarding the functioning of ML-based systems in order to instill trust in the system's outputs.This is particularly important for domain experts, such as doctors who rely on ML-based software for patient diagnoses [8], or police officers making decisions about which neighborhoods to patrol [20].When these users consider trusting ML models over their own professional judgment, it is reasonable to assume that they require explanations that build trustworthiness to the system.This exploration delves into the key attributes of effective explanations and how developers can identify and address these explanation requirements.
Developing systems based on requirements often leads to a familiar problem: the system does not meet the expectations of the users when launched.When investigating the reasons behind this, three critical problems emerge: communication flaws between developers and users, incomplete or hidden requirements, and changing requirements [4].It is reasonable to believe that the same problems will affect the elicitation of explainability requirements too.In response, Software Engineering methods such as Continuous Software Engineering (CSE) address these RE challenges by deploying code continuously, on a weekly, daily, and even hourly basis, right from the start of development [5].This approach frequently exposes users to the latest updates, allowing their reactions to be continuously monitored.It is essential to consistently evaluate how users respond to these updates throughout the software's lifetime as it can unveil requirements that have been unknown.Each deployment provides an opportunity to gain fresh insights into user preferences and requirements.
In this landscape, the development of ML-based software is drawing inspiration from CSE principles [16], particularly continuous feedback.The goal is to consistently gather user feedback to refine ML models and adapt the software to meet user needs.This iterative process may unveil new requirements as users integrate ML-based software into their work processes, daily lives, and habits.
Thus, user feedback can provide developers not only with system requirements but also with explainability requirements.The user feedback can give clues on what explanations the users need in order to trust the software.For developers, it is necessary to include qualitative feedback processes if they are to grasp how the software impacts users [17].This qualitative feedback can be in the form of i.e. interviews and user observations and play a pivotal role in gaining a deep understanding of the user domain and uncovering the consequences of introducing software [15,17].However, data scientists, who are often responsible for developing the ML component and the explanations, typically come from a background rooted in quantitative computational methods.They may tend to overlook qualitative methods, perceiving them as subjective and lacking generalizability.Despite this perception, qualitative thinking plays an implicit role in data science as it influences the formulation of problems, the definition of variables, and the interpretation of patterns [9].In a somewhat contradictory scenario, developers from a quantitative tradition find themselves needing to conduct qualitative inquiries to discover explainability requirements and to understand the impact of ML-based software on their users.

RELATED WORK
A common issue in requirements elicitation is that users often struggle to articulate their needs clearly.A recent literature review found this problem is amplified in the development of ML-based software due to users' limited literacy and understanding of ML models, coupled with unrealistic expectations of ML capabilities [11].Additionally, the literature review highlights that the crucial aspect of explainability is frequently neglected during requirements elicitation.This becomes problematic because users tend to distrust ML if they cannot comprehend its reasoning process, leading to low This work licensed under Creative Commons Attribution International 4.0 License.user satisfaction.Furthermore, a mapping study on Requirements Engineering for AI systems underscores a fundamental distinction: traditional non-AI software is deterministic, while AI systems can produce unexpected outcomes [1].This study concludes that further research is warranted in the areas of explainability requirements and data requirements.Additionally, a recent study conducted in Finland [2] sheds light on the practices of companies that utilize or develop ML-based software.It reveals that these companies consistently rank explainability as one of the most critical factors for gaining the trust of their customers and users.This underscores the significance of explainability as a crucial industry topic.
In several fields, there is an emphasis on qualitative sensemaking during the development and adoption of ML-based software, as advocated in the literature [3,[17][18][19].This body of literature highlights the importance of qualitative investigations in comprehending the complex challenges users face when adopting MLbased software and discovering explainability requirements.Thus, describing and understanding the methods developers use in this matter is important.

RESEARCH PROBLEM
Developing ML-based software significantly differs from traditional software [1], requiring new strategies for eliciting requirements, especially explainability requirements.In this PhD study, I propose conducting explorative case studies to answer the questions: RQ1: How do developers apply requirements engineering strategies to continuously elicit explainability requirements throughout the development process of ML-based software?Specifically, I aim to scrutinize the feedback loop between developers and users, exploring how it differs, if at all, from the processes in typical software engineering.This includes investigating whether there's a preference for quantitative feedback over qualitative feedback, and if so, the reasons behind it.
RQ2: Why do different domain experts demand distinct types of explanation requirements?Each domain possesses its unique set of experts and contextual factors.This inquiry delves into the reasons behind the varying explanatory needs of, for instance, doctors as opposed to police officers.Additionally, it seeks to investigate whether it is possible to categorize domain experts into distinct archetypes based on their expectations regarding explanations.

RESEARCH PLAN
To address the research questions, I propose qualitative empirical case studies within the PhD-period 2023-2027.This approach aims to investigate a strategic sample of Norwegian enterprises developing ML-based software.I adhere to Yin's guidelines for case studies and Runeson's protocols for case studies in the domain of software engineering [12,21].The study is structured into four phases.To begin, I will conduct a literature review to gather existing knowledge on requirements elicitation strategies and explainability requirements for ML-based software.Following this overview, the second phase involves an exploratory multiple-case study to generate assumptions and initial insights.This phase aims to explore various organizations to gain a comprehensive understanding of diverse requirement strategies.Subsequently, a descriptive multi-case study, encompassing two or three teams in distinct companies, will be implemented.The objective here is to illustrate the utilization of user feedback under different practical conditions and facilitate comparative analysis.Lastly, an explanatory single case study will be conducted to comprehend the underlying reasons of the observed phenomenon and move towards formulating theoretical propositions.In this manner, the study design allows for an in-depth exploration of the phenomena, followed by their description, and ultimately an attempt to discern why these phenomena manifest in the way they do.
The selection of appropriate cases will adhere to several criteria.Firstly, developers must develop ML-based software.Subsequently, the targeted enterprises must develop software for user that are accessible for research purposes.Moreover, a preferable condition is that the software development takes place in-house.This preference stems from the understanding that when a company opts to develop their own ML-based software, it underscores the strategic significance.Such a company, by choosing to directly manage the development process, provides an assurance of continuity and depth for the study.
For each case study, the research plan involves conducting 10 to 15 interviews and engaging in approximately four weeks of participatory observation.This would culminate in a total of 40 to 60 interviews and three to four months of observational study.I will primarily draw upon development teams as informants, focusing on roles close to the user insight work.Further, the inclusion of users as informants is essential, as their unique perspectives may enrich the explanatory components of this thesis, given their direct involvement with the phenomenon under investigation.Qualitative methods offer valuable insights into the intricate dynamics surrounding human behavior and enable the understanding of emerging phenomena in the context of new technologies [14].
In order to interrogate entrenched beliefs and assumptions prevalent in areas such as Requirements Engineering, the Phenomenonfocused Problematization Framework will be employed to identify fundamental assumptions in prior research to form research questions that challenge these, as opposed to spotting gaps in the literature [6,13].This approach, which facilitates a critical examination of underlying assumptions that ground existing theories, has been recently advocated in the Editor's comments of the MISQ December 2022 issue [10].My ambition is to contribute to the discussion on the future of RE methods within SE by publishing four conference papers in high-ranking conferences such as ICSE, ESEM, REFSQ, XP, and ICIS, and two high-ranking journal papers to journals like EMSE, JSS, IST, RE, ACM TOSEM, and IEEE TSE.
Should my PhD research prove successful, it will advance our understanding of the methods by which developers of ML-based software can elicit explainability requirements and offer actionable guidance on how to execute these processes in practice.