Dealing with Uncertainty: Understanding the Impact of Prognostic Versus Diagnostic Tasks on Trust and Reliance in Human-AI Decision Making

While existing literature has explored and revealed several insights pertaining to the role of human factors (e.g., prior experience, domain knowledge) and attributes of AI systems (e.g., accuracy, trustworthiness), there is a limited understanding around how the important task characteristics of complexity and uncertainty shape human decision-making and human-AI team performance. In this work, we aim to address this research and empirical gap by systematically exploring how task complexity and uncertainty influence human-AI decision-making. Task complexity refers to the load of information associated with a task, while task uncertainty refers to the level of unpredictability associated with the outcome of a task. We conducted a between-subjects user study (N = 258) in the context of a trip-planning task to investigate the impact of task complexity and uncertainty on human trust and reliance on AI systems. Our results revealed that task complexity and uncertainty have a significant impact on user reliance on AI systems. When presented with complex and uncertain tasks, users tended to rely more on AI systems while demonstrating lower levels of appropriate reliance compared to tasks that were less complex and uncertain. In contrast, we found that user trust in the AI systems was not influenced by task complexity and uncertainty. Our findings can help inform the future design of empirical studies exploring human-AI decision-making. Insights from this work can inform the design of AI systems and interventions that are better aligned with the challenges posed by complex and uncertain tasks. Finally, the lens of diagnostic versus prognostic tasks can inspire the operationalization of uncertainty in human-AI decision-making studies.


INTRODUCTION AND BACKGROUND
With the emergence of human-AI decision-making as a prominent paradigm across various domains, numerous investigations have been dedicated to understanding the factors that can impact trust and reliance on AI systems [84,138,142].Such factors can be broadly classifed into three primary categories: human-related factors [35,95,96], attributes of the AI systems [94,98], and characteristics of the decision-making tasks [16,56,126].Human factors such as prior experience [110,119], cognitive biases [85,102], and AI literacy [25], which can shape individuals' perceptions and interactions with AI systems.Attributes of the AI system include aspects such as predictions generated by AI [66,76,99], information about model predictions [11,31,93], as well as interventions that impact cognitive processes [17].Furthermore, the level of trust and reliance on AI may difer across various domains and applications due to the attributes associated with decision tasks [42,127].
The characteristics of tasks have been demonstrated to play a pivotal role in determining the level of reliance on AI systems, emphasizing the importance of methodically recognizing and comprehending these features in human-AI decision-making context.However, limited task characteristics have been systematically explored and their impact on human reliance on AI systems is not yet fully understood [68,109].Although a few studies have included multiple tasks with varying attributes [6,14,131], a systematic and empirical understanding of task features is notably absent from existing literature [68,109].Additionally, it remains unclear whether task attributes chosen in existing empirical studies have been appropriately considered, in a manner that is commensurate with the claims of the studies [42,68,75].These limitations have the potential to undermine the credibility and generalizability of research fndings, hindering our progress in developing efective strategies for human-AI decision-making [68,109].
In this work, we propose empirically examining task complexity and task uncertainty as two essential objective task characteristics that that are manipulable from the task's standpoint.Task complexity pertains to the characteristics of a task that contribute to an increased load of information [133], and it is distinct from task difculty [100], which relates to an individual's perception of the task-based on their capabilities and previous experience [133].It has been shown that task complexity is a crucial factor in determining both human performance and behaviour [3,23,83], as well as the success of human-AI teams [9].Additionally, prior work has demonstrated that individuals tend to rely more heavily on AI systems when confronted with more complex tasks [28] due to the challenges associated with analyzing large volumes of information [23].In line with work by Parkes [100], Vasconcelos et al. [126], we operationalize task complexity as an objective taskrelated characteristic that can be measured based on the number of constraints involved in the task.On the other hand, the level of task uncertainty refers to the extent of unpredictability inherent in a given task [29].We operationalize uncertainty in our study using diagnostic and prognostic tasks to capture diferent levels of uncertainty.Diagnostic tasks involve situations where participants are provided with detailed and comprehensive information about the task, (theoretically) enabling them to make accurate decisions.Prognostic tasks, on the other hand, involve situations where participants must make predictions about future events based on incomplete or limited information.By operationalizing uncertainty in this manner, we can efectively capture the diverse levels of uncertainty that arise from the inherent nature of a task and its connection to information availability.Intuitively, in prognostic tasks, users can beneft from using AI systems due to their ability to reduce uncertainties, particularly when choosing the optimal route for a future trip by considering anticipated weather and trafc conditions.Unlike planning immediate trips, this task entails a greater degree of uncertainty owing to future events' unpredictability.
Prior work has highlighted that appropriate trust and reliance play a critical role in achieving complementary human-AI team performance [58,90,139,141].Thus, it is essential to comprehend how task-related factors impact human trust and reliance on AI systems, as separate constructs [63,90,111], to foster successful collaboration between humans and AI.We thereby address the following research questions: RQ1: How does task complexity infuence user trust and reliance on an AI system?RQ2: How does task uncertainty, characterized by prognostic versus diagnostic tasks, infuence user trust and reliance on an AI system?RQ3: How does task complexity interact with task uncertainty to shape user trust and reliance on an AI system?
To address these research questions, we selected the real-world scenario of trip-planning where both task complexity and uncertainty are prominent factors.In such scenarios, individuals are confronted with circumstances that necessitate a choice between relying on an imperfect AI system or exercising their own judgment.We conducted a 3 (task complexity) × 2 (task uncertainty) between-subjects study with 258 participants recruited from the Prolifc crowdsourcing platform.
We found that users' reliance on the AI system varied depending on the level of complexity and uncertainty in the task.Individuals facing tasks characterized by medium complexity and uncertainty i.e., prognostic tended to rely excessively on the AI system.However, their ability to diferentiate accurate AI advice from misleading advice was compromised, leading to a relatively low appropriate reliance, a higher over-reliance on AI, and subsequently lower overall task performance.However, we observed a point of transition where participants started to increase their appropriate reliance on the AI system.This led to enhanced overall performance in prognostic tasks with high complexity, revealing a signifcant interaction between complexity and uncertainty.

RELATED WORK 2.1 Human-AI Collaborative Decision-Making
In recent years, the use of AI technologies has evolved to encompass more collaborative approaches that involve both humans and AI systems working together [5,21,22,73,129].While fully automated decision-making by AI systems may not always be appropriate, certain tasks still require human judgment.For example, in highstake scenarios such as in the medical [39,61,67,97], legal [6,81,86,131], and fnancial [27,36,43,45,46] domains, individuals tend to exhibit a preference for human decision-makers over AI systems.This preference could be motivated by ethical and legal concerns [68,74,104], as well as a desire for individual agency and accountability [54,70,81,117].Additionally, it may also stem from the limited trust [18,19] surrounding AI systems, coupled with concerns about potential biases or errors in algorithms [77,120], particularly when human lives or ethical considerations are at stake due to possible failures of AI systems [68,74,104].
The primary objective of integrating human and AI is to unite their respective strengths, resulting in enhanced decision outcomes through complementary capabilities [17,51].To this end, previous research has focused on identifying the factors that infuence human-AI decision-making.Recent studies have explored variables that contribute to the fairness [31,76,124,130] and trustworthiness [34,48,80,139] of AI systems, as well as the impact of assigning diferent decision-making roles to humans and AI on the reliance on such systems [52,103,122,144].Prior work has also been dedicated to developing and evaluating interfaces [15,30,87,89] and visualizations [43,49,134,137,140] aimed at improving human-AI collaboration.

Trust and Reliance on AI Systems
It is important to distinguish between trust and reliance, as they have diferent implications for the context of human-AI decisionmaking.Lee and See [71] proposed the following defnition of trust, which we adopt for the scope of our work: Trust is an attitude that an agent will achieve an individual's goal in a situation characterized by uncertainty and vulnerability.
Reliance, on the other hand, refers to the extent to which individuals rely on AI systems [71,128].When user decisions difer from AI advice, there are mainly three discernible patterns of reliance behavior [7,112,115], (i) appropriate reliance, switching to the AI advice when it is correct and overriding it when it is incorrect, (ii) over-reliance, excessively relying on AI advice even when it is incorrect, and (iii) under-reliance, not fully utilizing AI advice even when it is correct.While trust is an essential factor in determining the level of reliance on AI systems [55,63,71,111], it is not always a guarantee.Prior studies have shown that individuals may not necessarily increase their reliance on AI systems even if they trust them [62,63,90].Instead, they might rely more on their own judgments despite acknowledging the capabilities of the AI system.This highlights that the trusting behavior of users can difer from their trusting beliefs.The evaluation of the system's trustworthiness by individuals to establish perceived trustworthiness signifcantly infuences (subjective) trust and trusting behaviour (i.e., objective reliance) [113].Even if a system is trustworthy, it does not automatically ensure accurate perceived trustworthiness [8,113].To align the perceived trustworthiness of AI systems with their actual value, it is essential to consider aspects like the availability and relevance of system information and the detection and utilization of this information by human decision-makers [113].Trust in AI systems, namely perceived trustworthiness, can be evaluated through diferent methodologies, including subjective self-reported measures [26,59,63,132] and relatively objective trust-related behavioral measures [51,128,138,142], such as agreement and compliance.
Through a wide range of studies, researchers have consistently found that reliance on AI systems is infuenced by various factors including human-related aspects [37,51,81,101,120], attributes of the AI systems [43,79,106,107], and characteristics of the decision-making tasks [9,12,16,46,126].Human factors encompass a variety of individual characteristics, including previous experience [95,110], cognitive biases [85,102], and AI knowledge [25].For instance, cognitive [35,68,96] or meta-cognitive biases [51] have the potential to infuence how individuals comprehend and appraise the outcomes generated by AI systems which in turn can afect their reliance on AI.In addition, the attributes of AI systems can enhance decision-making outcomes [68], which include aspects such as predictions generated by AI [66,76,99], information about AI predictions or AI systems themselves [13,70,118,136], and interventions that impact cognitive processes [65,99,105].For instance, various explanation methods have been explored to enhance the interpretability and transparency of AI algorithms, allowing humans to better understand AI advice [2,50,66].Banovic et al. [8] discovered that reliance on AI systems is negatively affected when untrustworthy AI systems overstate their capabilities compared to trustworthy ones.This is primarily because users struggle to diferentiate between the competence of trustworthy and untrustworthy AI systems, leading to deception and excessive reliance on the untrustworthy system.Moreover, the characteristics of the decision-making tasks can also signifcantly impact human reliance on AI systems [68,109].Hence, the level of reliance may difer across various domains and applications due to the attributes associated with decision-making tasks [42,127].For instance, in high-stake felds like healthcare or fnance, individuals may exhibit distinct behaviours compared to low-stake areas such as entertainment [89,136].
Recent research has revealed several challenges in fostering appropriate reliance on AI systems.Prior work has shown that depending on diferent factors [126,143], users may blindly follow AI advice, leading to over-reliance [17], or underestimate the capabilities of AI, resulting in under-reliance [37,131].To overcome such challenges and improve performance-related outcomes, it is important to ensure that users can strike a balance between utilizing AI efectively while also considering the limitations of a given AI system.To this end, researchers and practitioners have explored the use of explanation methods [66,94,126], interventions such as tutorials [25,84] and cognitive forcing functions [17] to foster appropriate reliance on AI systems with varying degrees of success.
Building on the body of literature, our study aims to enhance the comprehension of appropriate reliance on AI systems in human-AI decision-making by investigating how task complexity and uncertainty infuence user trust and reliance.To this end, we conducted a between-subjects study in the context of trip-planning task.We measured the extent to which individuals rely on AI systems for decision-making in various conditions by leveraging a series of common metrics in the feld.

Task Characteristics in Human-AI Decision-Making
Although much attention has been given to the efect of human and AI-related factors in shaping human reliance on AI, few studies have explored the infuence of task characteristics.Lee [75] found that individuals exhibited lower trust in AI systems in tasks that involve human skills, such as work evaluation, compared to tasks that require more analytical skills.Additionally, Vasconcelos et al. [126] has also examined the concept of task difculty by considering the cognitive load required.Their fndings indicate that as tasks become more difcult, there is a tendency among users to rely excessively on AI advice, leading to over-reliance.
A few studies have also explored the efect of task features on human-AI team performance.Bansal et al. [9] conducted a study where participants had to assess whether objects passing through a pipeline were defective or not.They manipulated the complexity by changing the number of the task features, such as color, shape, and size.They found that an excessive number of task features diminished the performance of human-AI teams signifcantly.Similarly, in a study by Poursabzi-Sangdeh et al. [105], participants were presented with varying numbers of features to predict apartment selling prices.The features included variables such as the number of rooms, area size, days on the market, distance to amenities, and building maintenance fees.They also found that participants struggle to distinguish AI errors in tasks with more features, leading to decreased performance.In contrast, Tolmeijer et al. [120] showed that the complexity of tasks did not signifcantly impact human-AI performance due to a learning efect.They conducted an experiment in which participants were tasked with fnding a suitable house based on a set of constraints.The complexity of the tasks was manipulated, with some scenarios having three constraints (such as rent type, budget, and registration condition), while others had fve constraints (including rental duration and proximity to amenities).Buçinca et al. [16] conducted a study examining the infuence of proxy tasks, where participants were tasked to anticipate AI advice, compared to actual tasks where participants directly received AI advice.Their results indicate that participants' behavior in proxy tasks did not align with their behaviour in actual tasks, underscoring the importance of carefully designing experiments to draw valid conclusions.Additionally, high-stake [6,45,46,97] tasks and low-stake [44,46,66] tasks have been studied individually in literature in relation to human reliance on AI systems.
Furthermore, there is a lack of comprehensive investigations into categorizing task attributes and their specifc implications for human-AI decision-making [109].Lai et al. [68] proposed a framework that categorizes task characteristics in terms of their domain, required expertise, risk, and subjectivity.According to Lai et al. [69], tasks can also be diferentiated based on whether they are emulating human intelligence, like object recognition [20], or based on discovered patterns in data such as recidivism prediction [86].Some prior works have also provided a taxonomy of task types existing in the literature [1,92].However, these taxonomies often focus on general task types rather than specifcally addressing the impact of these characteristics on human-AI decision-making.De-Arteaga et al. [29] introduced diagnostic and prognostic tasks in which there is clear grand-truth in diagnostic tasks, while prognostic tasks involve making predictions about future outcomes.They emphasized that the level of inherent uncertainty in predicting future outcomes is a crucial factor that can impact human reliance on AI systems.Inspired by this work, we operationalize task uncertainty in our study using the distinction between diagnostic and prognostic tasks.
In this paper, we aim to fll an empirical and research gap by examining the impact of task complexity and uncertainty, as important attributes in decision-making in real-world contexts.By providing application-grounded evaluation [32] with users relying on an AI system for assistance in practical tasks, our work is the frst to explore task uncertainty and how task uncertainty interacts with task complexity in shaping human-AI decision-making.

HYPOTHESES AND TASK DESIGN 3.1 Hypotheses
The degree of task complexity is deemed one of the primary indicators for determining the success of Human-AI teams [3,9,23,83].Consequently, it can be anticipated that as tasks become more complex, their infuence on human reliance on AI systems increases [23,82,105].More complex tasks tend to require more cognitive efort [23], making individuals more likely to rely on AI systems for assistance.Moreover, as task complexity increases, the verifability [40] and plausibility [57,60] of AI advice tend to decrease.This can pose challenges for individuals in distinguishing misleading AI suggestions, leading to reduced levels of appropriate reliance on AI systems.Although there may not be a correlation between trust and reliance on AI systems [63,71,90,114], prior work suggests a higher likelihood of individuals placing greater trust in AI systems for more complex tasks [53,71].
When faced with prognostic tasks, individuals are likely to perceive them as more complex and unpredictable, thus increasing their reliance on AI systems for assistance.With the presence of uncertainty in a task, individuals may lack sufcient capability to verify the correctness of AI advice and therefore rely more heavily on the AI systems [29], leading to reduced appropriate reliance on AI systems.Previous research has also demonstrated the infuence of uncertainty on trust formation in AI systems [121].Considering highly complex and prognostic tasks, we hypothesize that individuals exhibit higher levels of trust and reliance on AI systems while showing a decrease in appropriate reliance.This could be due to the high cost of engaging cognitively in complex decisionmaking processes, leading to a greater reliance on AI systems for guidance [126].Therefore, we formulate our hypotheses as shown in Table 1.

Trip-Planning Task
We chose trip-planning to as the scenario for our study due to two primary reasons.Firstly, trip-planning is a common real-world problem that individuals frequently encounter and seek assistance from AI systems to make decisions.Secondly, this task allows us to meaningfully manipulate complexity levels (e.g., the number of constraints) and uncertainty levels in our experimental conditions, thereby enhancing the ecological validity of our fndings.In our study, participants are presented with a practical scenario where external assistance is potentially useful to successfully accomplish the task.We utilized an imperfect AI system with a 66.7% accuracy rate for trip-planning and manipulated its features accordingly (cf.section 4.1).This setup with the necessary complexity creates the desired sense of vulnerability and uncertainty, making it a suitable situation for analyzing human trust and reliance on AI systems [58,71].Note that while trip planning is a frequently encountered realworld task, the inclusion of time and budget limitations makes it unique, afecting how individuals rely on AI assistance.5) two-stage decision-making.Note that this screenshot is meant to convey a bird's-eye view of the interface.This interface is also dedicated to a highly complex scenario encompassing all constraints and the prognostic experimental condition with high uncertainty.
Planning a trip involves determining the most suitable route for travel, taking into account factors such as time limitations and budget constraints.Participants are tasked to select the trip that minimizes both travel time and expenses.Each task typically consists of multiple components that support participants in making well-informed decisions, as depicted in a bird's-eye view of the task interface in Figure 1.
Quality Control: To ensure the accuracy and reliability of the collected data in our study, we employed multiple methods.We initially ofered instructional materials on the interface and taskrelated features, followed by a training session for participants that

H1a
Users demonstrate a lower level of appropriate reliance on AI systems for complex tasks compared to relatively less complex tasks.

H1b
Users trust AI systems to a greater extent in complex tasks compared to relatively less complex tasks.

H2a
Users demonstrate a lower level of appropriate reliance on AI systems in tasks with high levels of uncertainty compared to tasks with low levels of uncertainty.

H2b
Users trust AI systems to a greater extent in tasks with a high degree of uncertainty (prognostic) compared to tasks with lower levels of uncertainty (diagnostic).

H3
Users demonstrate a relatively low level of appropriate reliance on AI systems in tasks with relatively high complexity and uncertainty.
included both theoretical instruction and hands-on practice.Secondly, we evaluated participants' comprehension by administering a quiz on task-related constraints.Individuals who scored below a certain threshold were excluded from the study to maintain the quality of data.Lastly, we incorporated four attention-check questions in the pre-questionnaire and post-questionnaire to screen out individuals who may not be fully engaged or attentive throughout the study.Detailed explanations of these methods are publicly available on our companion page. 1

Design Considerations and Setups: Task Complexity vs. Task Uncertainty
Wood's seminal work [133] proposed that task complexity consists of three constructs: component, coordinative, and dynamic complexities.Component complexity relates to the number of features in a task, while coordinative complexity pertains to executing sequences or steps within the task.Dynamic complexity arises from changing world states requiring further considerations at the point of decision-making.We utilized component complexity to defne task complexity and also adjusted the uncertainty as incomplete information in our setup.In dynamically complex tasks, decisionmaking must adapt as the situation changes, with all information accessible at each point.However, uncertain tasks involve incomplete information at the point of decision-making, setting them apart from dynamically complex tasks.Therefore, it is valid to consider these factors as separate dimensions although task uncertainty can increase task complexity.

Task Complexity:
To operationalize task complexity in our experimental conditions, we manipulated the number of constraints that are given to participants.This approach has been used in previous studies to control the level of complexity for a given task [9,105,120].We categorized the tasks into three levels of complexity: low, medium, and high.In low-complexity tasks, participants are presented with four features to consider while in medium-complexity tasks, eight features are provided.Highcomplexity tasks entail twelve diferent features that must be taken into account.This design choice is guided by prior neuroscience research by Miller [91], suggesting that human cognitive capacity for processing information is limited to around seven (± two) chunks of information at a time.Hence, we established fve to nine task features as representative of a medium level of complexity based on this fnding.Any number exceeding nine would classify 1 https://osf.io/kt8m4/?view_only=c6930ba990c8412cb3948c2cf2b0a39c as high complexity, while four or fewer would indicate low complexity [109].
3.3.2Task Uncertainty: Diagnostic tasks entail circumstances where participants are given access to well-defned and comprehensive information about the current task, allowing them to make precise judgments [29].Prognostic tasks, on the other hand, involve scenarios in which participants are presented with restricted or unclear data and need to generate predictions regarding future outcomes [29].The necessity to anticipate uncertain results gives rise to increased uncertainty throughout the process of making decisions.To operationalize uncertainty in the contrasting experimental conditions pertaining to diagnostic and prognostic tasks, we employed various strategies.For diagnostic tasks, participants are instructed to schedule a trip for the present moment within the narrative, while for prognostic tasks, participants are assigned to plan a trip that will take place two weeks later.Next, we customized the way task attributes are presented to align with the level of uncertainty.In situations involving diagnostic tasks, participants are given precise values for each constraint, eliminating any potential ambiguity.On the other hand, in prognostic tasks, a certain degree of uncertainty is introduced by ofering participants ranges or estimates instead of exact values for each attribute.We also presented the probability of diferent outcomes for certain constraints.For example, we highlighted the high likelihood of encountering trafc congestion during the rush hour or the low chance of experiencing rain during the scheduled trip.
We created one task scenario for each task.In total, we generated 24 diferent scenarios, with four scenarios in each experimental condition that difered in terms of task complexity and uncertainty.The full list of these task scenarios and all code for our implementation is publicly accessible for the beneft of the research community and in the spirit of open science. 2

Task Features:
We designed task features to impart and defne constraints in the decision-making tasks such that they do not afect each other and can be independently manipulated and measured.We communicated this independence explicitly and implicitly by ensuring that each feature is presented separately and does not rely on or interact with other features.All task features were inspired by considerations typical in real-world trip-planning contexts.In our research, we can classify task characteristics from two diferent viewpoints: each feature has the potential to infuence either the overall duration of travel, the associated expenses, or both factors.Furthermore, each feature can be categorized as being either time-dependent or time-independent.Time-dependent features, such as trafc conditions and weather patterns, are prone to temporal changes based on external factors and their presentation difers when considering diagnostic tasks versus prognostic tasks.In tasks that have low complexity, we designed an equal distribution of time-dependent and time-independent features.However, for tasks with medium or high complexity, we increase the number of time-dependent features to enhance the degree of uncertainty that need to be considered in decision-making processes.Detailed explanations of all features are publicly available on our companion page. 2

STUDY DESIGN 4.1 Experimental Conditions
Our study was approved by our institutional ethics board.We designed a between-subject study with a 3×2 factorial design.The three levels for task complexity were categorized as low, medium, and high, while the two distinct levels for uncertainty were diagnostic and prognostic tasks.We refer to these conditions as LowDiag, LowProg, MedDiag, MedProg, HighDiag, and HighProg.Participants were randomly assigned to one of the six experimental conditions while ensuring a balanced distribution of participants across the diferent task complexity and uncertainty levels.In each condition, participants were presented with three diferent task instances to complete with the assistance of an AI system.The three task instances were determined based on each condition's assigned complexity and uncertainty levels.Detailed explanations regarding the complexity and uncertainty levels are provided in section 3.3.
We fne-tuned the AI system to suggest routes that satisfy the given criteria with an accuracy of 66.7% across all experimental conditions.This level of accuracy was chosen since it is helpful if the system is relied on but still involves some risks.Hence, it calls for appropriate reliance instead of blindly following the AI system's advice.This design choice is motivated by prior work emphasizing the role of uncertainty in dictating the need to facilitate appropriate reliance [71].This implies that within each batch of three task instances that a participant completes, to control for potential ordering efects, we ensure that incorrect advice is ofered by the AI system once at random.

Measures
We leveraged a set of objective metrics to quantify participants' reliance on the AI system (cf.Table 2) [58,88,90,113,139,141].These metrics include Agreement Fraction, Switch Fraction [51,138,142], and Accuracy with Disagreement [51], Relative Positive AI Reliance, and Relative Positive Self-Reliance [112].These parameters are commonly adapted in literature to capture the level of reliance within the human-AI interaction context.In addition to these measures of reliance, we also evaluated participants' decision-making accuracy, demonstrating the human-AI team performance [11,108].By measuring trust and reliance variables alongside human-AI team performance, we can gain a deeper understanding of whether performance outcomes result from under-reliance, appropriate reliance, or over-reliance on AI systems.
The subjective trust in the AI system was assessed using the Trust in Automation questionnaire (TiA) [63], which is a commonly employed and validated tool for measuring trust [78,116,120].The questionnaire comprises multiple items that evaluate various aspects such as participants' perceptions regarding Reliability/Competence (TiA-R/c), Understanding/Predictability (TiA-U/P), Familiarity (TiA-Familiarity), Intention of Developers (TiA-IoD), the Propensity to Trust (TiA-PtT), and the overall level of trust placed in the AI system, Trust in Automation (TiA-Trust).
We collected information about participants' perceived numeracy skills as well as their afnity for technology in the pre-task questionnaire.To measure numeracy skills, we employed the Subjective Numeracy Scale [38], which is a self-report measure of perceived ability to perform various mathematical tasks and preference for the use of numerical information.Additionally, we administered the Afnity for Technology Interaction Scale (ATI) [41] to determine participants' level of comfort and familiarity with technology [120].

Participants
We frst estimated the required sample size using G*Power software, considering a medium efect size of 0.25, a power of 0.90, and a signifcance level of 0.05, leading to a recommended minimum sample size of 210 participants, i.e., 35 participants in each of our experimental condition.To obtain a sufcient sample for our study while accounting for potential exclusion, we enlisted the participation of 285 individuals using the Prolifc crowdsourcing platform.To ensure the reliability of the data gathered, we applied inclusion criteria that were designed to select native English speakers with a minimum approval rate of 95% on the platform and at least 100 completed studies.A total of 27 participants who failed any attention-check questions or the quiz were excluded from participation in the study, resulting in a fnal sample size of 258 participants.On average, participants took approximately 25 minutes to complete the entire study.All participants were compensated at the fxed rate of 8 GBP per hour regardless of their performance in the study.Additionally, participants received bonus rewards amounting to 0.2 GBP for each accurate response they provided during the study period.Overall, participants earned an average of 8.44 GBP per hour, well over the wage considered to be 'good' and recommended by the Prolifc platform.

Procedure
The entire workfow of the study is illustrated in Figure 2. When participants entered the study, they were frst provided with informed consent, a brief overview of the study's goals, and instructions on how to complete the tasks (step 1).If they consented to participate, they were directed to the pre-task questionnaire in step 2, where they were presented with a series of questions related to their numeracy skills and afnity for technology.Participants were then randomly assigned to one of the six diferent experimental conditions.According to the assigned condition, participants were presented with an interface tutorial and task tutorial that provided step-by-step instructions on how to navigate and complete the task followed by a training session on a sample task.The participants were given sufcient time to familiarize themselves with the sample task and the interface.To ensure the understanding of the task, participants were required to answer a quiz related to the task features before proceeding to the main task.If participants did not pass the quiz, they were excluded from the study.Otherwise, they received immediate feedback on their quiz performance to ensure that participants proceeded to the main task with a complete understanding of the task and devoid of familiarity or comprehension-related biases.
Participants were then asked to complete three trip-planning tasks.Each task instance consisted of a decision-making scenario, where participants had to analyze the information provided and make an AI-assisted decision.Lastly, participants were directed to fll out a post-task questionnaire to assess their perception of the task features and trust in the AI system.
Figure 2: Illustration of the procedure participants followed within our study.

Hypothesis Tests
H1a. Impact of task complexity on appropriate reliance: To explore the main efect of complexity on appropriate reliance, we conducted a Kruskal-Wallis test, Table 3. Subsequently, we conducted Dunn's post-hoc test to determine which levels of complexity resulted in signifcant diferences in appropriate reliance.We reported adjusted p-values, calculated using Bonferroni correction to account for the increased likelihood of falsely declaring statistical signifcance when conducting multiple tests.If the adjusted p-value for an individual hypothesis is less than the signifcance level (0.05), then the null hypothesis is rejected, indicating a statistically significant result [135].We frst report the infuence of complexity on reliance, followed by our examination of appropriate reliance.The observed signifcant diference in switch fraction between high and low-complexity tasks implies that task complexity does indeed exert an infuence on reliance.In tasks with higher complexity levels, individuals tend to shift from relying on their own judgment to relying on the AI system.This can be attributed to a decrease in self-confdence regarding their decision-making abilities and, as a result, seeking guidance from the AI system.
Tasks of higher complexity tend to diminish the appropriate reliance on the AI system.Participants demonstrated signifcantly lower levels of Accuracy-wid in tasks with greater complexity compared to those with lower complexity.A similar trend is observed when examining RSR, wherein participants displayed signifcantly reduced levels of confdence in themselves during tasks with higher complexity than those with lower complexity.Consistent with these fndings, participants exhibited a contrasting trend in displaying a signifcantly higher level of reliance on the AI system for tasks that were more complex compared to those of lower complexity, as indicated by higher RAIR.The rise in RAIR does not necessarily imply a higher appropriate reliance on the AI system.Rather, it suggests that individuals under-rely on the AI system in tasks with relatively lower complexity, and over-rely on the AI system in tasks with relatively higher complexity without being able to recognize when the advice may be inaccurate.This excessive reliance can ultimately have a negative impact on performance by reducing appropriate reliance levels.
Furthermore, we found that the accuracy of participants is signifcantly lower in tasks with higher levels of complexity than those H1b.Impact of task complexity on trust: We aimed to examine the main efect of task complexity on trust in the AI system.Therefore, we conducted a two-way ANCOVA to consider the potential confounding efects of the covariates, namely subjective numeracy skill, afnity for technology, TiA-Familiarity, and TiA-Propensity to Trust.We did not fnd a signifcant efect of task complexity on human trust in the AI system, leading us to reject our hypothesis H1b.However, this fnding supports that the subjective nature of trust in the AI system does not always follow the objective measure of reliance on the AI system [90,114].
H2a. Impact of task uncertainty on appropriate reliance: We investigated the main efect of task uncertainty on reliance by conducting the Kruskal-Wallis test, reported in Table 4.We found that task uncertainty signifcantly afects participants' reliance on the AI system.Participants showed signifcantly higher levels of switch fraction when faced with prognostic tasks, indicating their tendency to rely more on the AI system due to lower self-confdence.Our fndings further suggest that individuals can accurately assess the level of uncertainty in a task and adjust their reliance on the AI system accordingly.
Furthermore, our fndings revealed that the degree of uncertainty in a task signifcantly infuenced participants' appropriate reliance on the AI system.We found that participants were more likely to appropriately rely on the AI system in diagnostic tasks, leading to higher accuracy rates, as indicated by higher Accuracywid compared to prognostic tasks.In line with this fnding, we also observed that participants exhibited a slightly higher level of reliance on their own decision-making skills (RSR) when faced with diagnostic tasks.On the other hand, in prognostic tasks, participants showed signifcantly higher degree of reliance on the AI system as indicated by higher RAIR.This fnding suggests that participants tend to rely heavily on the AI system in uncertain situations.However, this does not necessarily lead to appropriate reliance.It can be challenging for them to distinguish between accurate and inaccurate AI advice in prognostic tasks, resulting in lower appropriate reliance on the AI system and decreased accuracy levels.As a result, our fndings partially support the hypothesis H2a.
H2b. Impact of task uncertainty on trust: The main efect of task uncertainty on trust in the AI system was also examined in this study through the ANCOVA test.The results indicated that there was no signifcant main efect of task uncertainty on any trust subscales.These fndings indicate that participants' trust in the AI system remains relatively stable regardless of the level of uncertainty in the task.Thus, we reject our hypothesis H2b.
H3. Interaction efect of task complexity and uncertainty: We conducted an ANOVA to investigate the interaction efect of task complexity and uncertainty on appropriate reliance and trust.We found a signifcant interaction efect between task complexity and uncertainty on Accuracy-wid as a measure of appropriate reliance.Figure 3a illustrates the interaction efect of task complexity and uncertainty on Accuracy-wid, focusing on diferent levels of complexity.We observed that the trend of Accuracy-wid is descending for tasks with low and medium complexity while increasing the level of uncertainty.However, for tasks with high complexity, the trend is the opposite, where Accuracy-wid increases with increasing uncertainty.Although we found earlier that participants have a lower Accuracy-wid for prognostic tasks, the interaction efect suggests that the impact of uncertainty on appropriate reliance depends on the level of task complexity.This fnding suggests that participants tend to engage more cognitively in tasks they perceive as less complex, believing they can make accurate judgments.This trend is also observed in diagnostic tasks with high complexity.However, when faced with highly complex and prognostic tasks, participants are more likely to relinquish some cognitive control and rely heavily on the AI system.This could be attributed to their perception of the task's complexity exceeding their own capabilities.Participants may also view the AI advice as being more reliable and trustworthy, resulting in increased agreement and appropriate reliance.This fnding is further supported by the signifcant interaction efect identifed in Accuracy, Figure 4a, demonstrating that participants' ability to make accurate predictions increases when they are faced with prognostic tasks with high complexity, compared to prognostic tasks with medium and low complexity.Consequently, their level of accuracy aligns with that of the AI system due to their increased appropriate reliance.Figures 5a and  5b illustrate the Accuracy and Accuracy-wid for diferent levels of task complexity and uncertainty.
We can observe the interaction efect of complexity and uncertainty for diagnostic and prognostic tasks in Figure 3b.For diagnostic tasks, the trend Accuracy-wid is descending as the complexity of the task increases.However, for prognostic tasks, diferent efects are observed.Participants tend to have lower Accuracy-wid as we increase the complexity from low to medium.In medium-complexity tasks, Accuracy-wid reaches its local minimum.So, as we further increase the complexity to high levels, Accuracy-wid starts to rise  again, suggesting that participants rely more appropriately on the AI system, and their accuracy improves in highly complex prognostic tasks, aligning more closely with accuracy of the AI system (cf.Figure 4b).Furthermore, we can see that the appropriate reliance is always greater for diagnostic tasks compared to prognostic tasks, except for high complexity, where the values for prognostic tasks surpass those for diagnostic tasks, further supporting our fndings.In summary, we found that the interaction efect between complexity and uncertainty in conditions with high complexity and uncertainty plays a signifcant role in human-AI decision-making.
While the appropriate reliance drops as the complexity and uncertainty of a task increase, there is a turning point where participants start to rely more appropriately on the AI system, resulting in increased accuracy in prognostic tasks with high complexity.Thus, our fndings reject hypothesis H3.

DISCUSSION 6.1 Key Findings
Our study examined the impact of task complexity and uncertainty on human-AI decision-making.The results of our study demonstrated that increasing the level of complexity and uncertainty in decision-making tasks led to signifcant diferences in users' reliance on the AI system.In more complex and uncertain tasks, we found that users were often in initial disagreement with the advice provided by the AI system.However, they demonstrated a heavy reliance on AI advice during the second stage of the decisionmaking process, leading to higher Switch Fraction.This can be attributed to the potential recognition that AI ofers valuable insights for decision-making under complexity and uncertainty, coupled with a lack of confdence in their own judgment, corroborating what has been uncovered by other work in human-AI decisionmaking [23,105].Furthermore, the greater cognitive efort linked to complex tasks may also be a contributing factor.The cost of relying on the AI system would prove to be less compared to evaluating the reliability of the AI advice, thereby prompting individuals to lean towards following AI advice [126].Additionally, users showed higher engagement and information-gathering behavior in prognostic scenarios, demonstrated by signifcantly more clicks on route control buttons, indicating greater inclination to explore diferent route options.We also found that the appropriate reliance on the AI system varied signifcantly depending on task complexity and uncertainty.Users exhibited lower appropriate reliance on the AI system ( lower Accuracy-wid), leading to lower accuracy in tasks with medium complexity or uncertainty compared to those with low.However, users demonstrated higher appropriate reliance on the AI system, resulting in improved accuracy in the experimental conditions with tasks with high complexity or uncertainty compared to those with medium complexity or uncertainty.Users perceived that tasks with higher complexity and uncertainty required greater efort and information processing, making them more willing to rely on the AI system.In such scenarios, their performance approaches AI accuracy, indicating the efectiveness of integrating AI in decision-making.
Our fndings showed that individuals generally place signifcantly more reliance on the AI system when faced with tasks characterized by high uncertainty.However, in such prognostic tasks, their ability to appropriately rely on AI advice is lower compared to diagnostic tasks, subsequently afecting their overall performance.Tasks that involve inherent uncertainty are often those where humans tend to rely on AI systems for advice, such as loan approval [27,34,124], recidivism prediction [31,47,86], house price estimation [2,13,25], and student admission [13,24].Individuals may be more inclined to adhere to AI advice in these types of tasks.This could stem from the belief that AI systems possess advanced analytical abilities and have access to a greater amount of data [75].On the other hand, when individuals are faced with tasks that have lower uncertainty, such as annotation and classifcation task [4,77,117], they tend to rely less on the AI advice and rely more on their expertise and judgment.Since the heavy reliance on AI systems in uncertain situations does not always lead to improved decision-making accuracy, several mechanisms have been proposed to optimize the combination of human and AI decisions to achieve the best outcomes and facilitate appropriate reliance on the AI system.These mechanisms include providing interpretable explanations for AI advice [21,72,123], using cognitive forcing functions [17,47,99], and incorporating feedback loops to enhance the interaction between humans and AI systems [10,11,139].Despite implementing a two-stage decision-making process to encourage individuals to be cognitively involved in the procedure, as well as incorporating visual and textual explanations for increased transparency, our research emphasizes the necessity for additional exploration into strategies that can facilitate appropriate reliance on AI systems in contexts characterized by high levels of uncertainty.
The complexity of tasks plays a signifcant role in determining the degree of reliance on AI advice, consistent with the fndings of [9,105].The more complex a task is, the more individuals may be inclined to rely on the AI system.We use the number of features or constraints as the measure of task complexity similar to previous studies [9,105,120].Tasks with a larger number of constraints that need to be accounted for in decision-making are often more challenging for individuals to process, making them more likely to seek guidance from AI [63,90,111].Our fndings, which were based on objective measures, align with [126] study and suggest that users tend to rely more heavily on AI systems when faced with complex tasks that demand higher cognitive efort.This is further backed by [100] indicating that the complexity of a task can elevate its perceived difculty, potentially resulting in greater reliance on AI systems.As shown by Salimzadeh et al. [109], the majority of tasks that have been studied in the context of decisionmaking are characterized by low and medium complexity.Prior studies that investigated tasks exceeding individual information processing capabilities (i.e., 9 constraints [91]) suggested employing visualization techniques to assist individuals in understanding the AI advice and the underlying decision-making process [43,137,140].We used visual and textual techniques to support individuals in understanding the factors playing a role in shaping the given AI advice.However, in higher complexity scenarios, an individual still lacks cognitive engagement with the AI system and may be more likely to rely heavily on its advice.This is supported by the tendency of individuals to rapidly make their decision within approximately twenty seconds after receiving advice from AI, without carefully reassessing the provided information or exploring alternative route options.Although these visual and textual strategies have shown promise in improving decision-making outcomes in literature, they were not sufcient to mitigate over-reliance on AI advice in high complexity tasks.
According to the Trustworthiness Assessment Model (TrAM) [111], accurate perceived trustworthiness of AI systems is essential for establishing meaningful trust and reliance on AI systems.Factors such as relevance and availability of system information, as well as the ability of individuals to detect and utilize this information, play a crucial role in determining accurate perceived trustworthiness.In our study, we only presented relevant task features using visual and textual formats to participants.We utilized user behavior metrics and validation of participant perceptions through training and quizzes to ensure the detection of these features.However, we expected the complexity and uncertainty of tasks to impact the availability and utilization of system information, thus afecting perceived trustworthiness [98].However, participant trust remained consistent regardless of task complexity or uncertainty, which was in contrast to what is suggested by the TrAM framework.

Implications of Our Work
6.2.1 Implications for Methodology and the HCI Community.The implications of methodology in HCI research pertain to the design and analysis of studies [125].These implications specifcally address data collection methods and the construction of new knowledge.Our work has important implications for the methods used to study human-AI decision-making, for increasing the external validity of empirical work and strengthening the understanding of the transferability of fndings across diferent studies.It has been observed that task characteristics, such as complexity and uncertainty, are seldom examined or analyzed systematically in human-AI decisionmaking studies.While it may not be experimentally feasible to account for every facet of a task, our research emphasizes the signifcance of considering these factors when assessing human-AI collaboration.Future research should consider the incorporation of methodologies that take into account task-related features when evaluating human-AI decision-making.Our fndings also contribute to the interpretation of human behaviour and reliance on AI systems through the lens of task complexity and uncertainty.Current studies often focus on generic decision-making scenarios or tasks with low to medium complexity, which may not fully refect or represent the challenges and dynamics of the full range of realworld scenarios.This is particularly important in highly complex tasks coupled with high uncertainty, where humans tend to require, appreciate, and rely on advice from an AI system.Future research should consider the systematic identifcation and inclusion of task-specifc characteristics in the design of studies in the realm of human-AI decision-making.
To initiate a systematic evaluation of task characteristics, we propose the lens of diagnostic and prognostic tasks as a framework for modeling uncertainty in decision-making, which can be used as a basis for designing experiments and gathering data on human-AI interactions.This approach acknowledges the inherent uncertainty in determining or estimating diferent constraints that infuence decision outcomes.Additionally, it ofers a relatively more precise representation of decision-makers' challenges.Incorporating this lens into research methodology would involve designing studies that specifcally control the uncertainty inherent in diagnostic and prognostic tasks and exploring their impact on human-AI decisionmaking processes and outcomes.We also encourage researchers to consider highly complex tasks in their experiments to capture the challenges and nuances of decision-making in real-world scenarios.This can be achieved by developing scenarios or simulations that closely resemble complex decision-making situations in diferent domains.Our task details and all code for the interface are made publicly available to support future research in the community. 2ur study also highlights the need for further examination and development of techniques tailored specifcally to support highcomplexity and prognostic tasks in human-AI decision-making.Although many interventions have been developed for decisionmaking in various domains, there is still a need to focus on the unique challenges posed by high complexity and prognostic tasks.Such interventions could be targeted to ofer users indicators that can help them accurately assess the reliability, plausibility, and verifability of the AI advice.Consequently, these methods will promote appropriate reliance on the AI system in complex and uncertain decision-making scenarios.There is a heightened urgency in developing and creating these mechanisms to prevent potential deception arising from the complexity and uncertainty of tasks, which can make it challenging to detect untrustworthy AI systems [8].By reducing the cost of verifability and plausibility of such XAI techniques, decision-makers can gain a better understanding of the basis for AI advice based on their own expertise and judgment, potentially leading to improved performance and appropriate utilization of AI systems.
The decline in performance of human-AI teams when tackling tasks of medium complexity suggests that users may have faced challenges in accurately assessing their own abilities and the capabilities of AI systems, primarily by overestimating their own abilities [64].This aligns with previous research fndings, highlighting the need for interventions to assist users in evaluating their skills and appropriately adjusting their reliance on AI systems [25,51,69].This may be particularly important in tasks with relatively moderate complexity which may lead to illusory selfassessments among some users, compared to tasks with evidently low or discernibly high complexity.
6.2.2 Implications for Theory.Theoretical implications focus on the understanding of task characteristics and their impact on human-AI decision-making.Based on our fndings, it is evident that the complexity and uncertainty of tasks signifcantly infuence how humans rely on AI systems.This study serves as the applicationgrounded evaluation [32] in the context of trip-planning, centering on the individuals the system intends to support in actual tasks.It empirically validates the commonly held belief that task complexity and uncertainty play a crucial role in determining human reliance on AI systems.While the primary objective of combining humans and AI is to achieve enhanced performance through collaboration, an over-reliance on AI can potentially impede the advantages offered by human judgment and decision-making abilities.Therefore, it is crucial for researchers to develop theoretical frameworks that can help identify and motivate the optimal balance between human and AI involvement in decision-making, taking into consideration task complexity and uncertainty.
Contrary to previous research suggesting that trust in AI systems increases with the complexity and uncertainty of tasks, our fndings indicate that trust is orthogonal to these factors.These results suggest that trust is not the sole determinant of reliance on AI advice, and other factors such as task characteristics play a signifcant role.This also indicates the diference between human trustworthy beliefs and behavior toward AI systems, where trust may not always translate into increased reliance, highlighting the need to measure, calibrate, and understand factors beyond trust that infuence human-AI decision-making.

Caveats and Limitations
According to the checklist of cognitive biases provided by Draws et al. [33], it is important to acknowledge that humans are prone to cognitive biases.In our task, we identify the familiarity bias and availability heuristic, which can cause individuals to exhibit an inclination towards decisions that align with their pre-existing beliefs or past experiences.Although we created artifcial routes, individuals may still tend to prefer familiar or known options or prefer specifc transport modes due to personal biases.Confrmation bias and overconfdence bias are other potential limitations, as individuals may be more likely to seek out and give more weight to information that confrms their preconceived notions or beliefs regarding AI capabilities and their decision-making abilities.We should also consider the self-interest bias, where individuals may prioritize their own monetary reward over objective decision-making criteria.
The fndings discussed in this paper are not universally applicable to all decision-making tasks.Diferent tasks may have varying characteristics and contexts that can infuence human-AI decisionmaking.Although this is a valid approach to operationalize uncertainty, it is important to acknowledge that there could be other approaches to capturing task uncertainty that were not explored in this study (e.g., missing data or conficting information).Future research should consider exploring diferent operationalizations of task complexity and uncertainty to further understand their impact on human reliance on AI systems.It is worth noting that we asked participants in our study to consider that the trafc features were unrelated to each other and carried equal weights in determining the best route.This may not always be the case in real-world contexts.We also considered trafc conditions in both diagnostic and prognostic scenarios, although, in the real world, trafc conditions can change over time and at the time of decision-making, making them predominantly prognostic.

CONCLUSION AND FUTURE WORK
In this study, we explored how task complexity (RQ1) and task uncertainty (RQ2) and their interaction (RQ3) inform user trust and appropriate reliance on AI systems.To this end, we conducted a user study with 258 participants across six experimental conditions varying in three levels of task complexity (low, medium, and high) and two levels of task uncertainty (diagnostic and prognostic).We selected trip-planning as the decision-making task and evaluated participants' trust, reliance, and decision-making behaviors when interacting with an AI system.The study showed that task complexity and uncertainty signifcantly impact human reliance on AI systems.Participants tended to rely more on AI in tasks with higher complexity and uncertainty, with no signifcant diferences in human trust across diferent levels of complexity and uncertainty.
Future studies should further explore the relationship between task complexity and uncertainty to better understand their interconnections in human-AI decision-making.Further research is needed across a range of domains and task types to fully understand the impact of task complexity and uncertainty.We encourage researchers to investigate the impact of other task characteristics, such as time pressure and information overload, on human-AI decision-making.Future work should also focus on understanding how to efectively present AI-generated predictions and explanations to enhance human understanding and decision-making, particularly in complex and uncertain situations.Given the increasing complexity and uncertainty of tasks, it becomes crucial to develop strategies that can help users evaluate the reliability and verifability of AI advice in these scenarios.

Figure 1 :
Figure1: An overview of the trip-planning task interface that participants used including fve components: (1) task scenario and description, (2) map, (3) route information, (4) general information, and (5) two-stage decision-making.Note that this screenshot is meant to convey a bird's-eye view of the interface.This interface is also dedicated to a highly complex scenario encompassing all constraints and the prognostic experimental condition with high uncertainty.

Figure 3 :Figure 4 :
Figure 3: Interaction efects between task complexity and uncertainty on the Accuracy-wid metric refecting appropriate reliance.

Figure 5 :
Figure 5: Mean of Accuracy-wid and Accuracy across diferent levels of task complexity and uncertainty.

Table 1 :
Summary of Our Hypotheses.

Table 2 :
An overview of the diferent metrics that we considered in our user study.

Table 3 :
Kruskall-Wallis test for the main efect of task complexity on reliance.†indicates that the efect of the variable is signifcant in the comparisons shown in the 'Post-hoc Results' column.Dependent Variable adjusted-p ± (Low) ± (Medium) ± (High) Post-hoc Results

Table 4 :
Kruskall-Wallis test for the main efect of task uncertainty on reliance.†indicates the efect of the variable is signifcant in the comparisons shown in the 'Post-hoc Results' column.Dependent Variable adjusted-p ± (Diagnostic) ± (Prognostic) Post-hoc Results