Modelling Experts' Sampling Strategy to Balance Multiple Objectives During Scientific Explorations

Our analysis of human sampling decision data reveals that scientists adapt their sampling strategies to balance multiple objectives based on two key factors: the current level of information about the environment, and the availability of sampling location options with large potential rewards. While this work is only a beginning step towards the development of cognitive-compatible robotic decision algorithms, our findings show by better understanding human decision processes, robots can use extremely simple algorithms to connect experts' high-level objectives to desired sampling locations while balancing multiple objectives. Going forward, exploring how humans coordinate and prioritize multiple objectives under more sophisticated scientific exploration scenarios, such as with multiple competing hypotheses, with hypotheses regarding multiple variables, or with additional sampling objectives, would be helpful to explore. These understandings could help our robots produce explainable sampling strategies that are well-aligned with humans' high level goals, and improve humans' trust and confidence during teaming. These cognitive understandings could also allow robots to identify potential vulnerabilities in human decisions, such as biases and fatigue, and provide targeted support to enhance scientific outcomes. In addition, we expect that these cognitive insights could complement existing robotic decision methods by informing which algorithms to use, and eventually empower robots to become intelligent teammates that can truly participate in the decision-making process.


INTRODUCTION
Robots are increasingly being used for planetary science and terrestrial explorations [30,41].The ability for mobile robots to traverse complex natural environment [34,38,41] and deliver high spatial-temporal resolution and multi-stream data with high accuracy [32,34,35] makes them perfectly suited to these tasks.However, even the state-of-the-art robots still lack a good understanding of how human scientists connect high-level science objectives to low-level sampling plans [26].Scientists in the feld were found to update their hypotheses and priorities in real-time to facilitate scientifc discoveries [47], which could be challenging to model.Such complexity, combined with the limited understanding of human decision processes, lead to most robots being used as simple mobile sensor platforms [50] that simply executes human's low-level instructions, rather than intelligent teammates that can participate in the decision making process and fexibly support human's high-level scientifc goals.
While there is a large body of information theory based methods that allow robots to autonomously select a sampling plan based on scientifc goals [9,17], the alignment between robot algorithms and human decision processes remains unclear.Without a better understanding of human decision making in response to incoming observations, robot's suggestions are likely to be rejected or ignored by human scientists, and human-robot teams will continue struggling to work together in highly-coupled tasks without sacrifcing robot autonomy.The sacrifce of robot autonomy can place a high cognitive burden on scientists, which has been demonstrated to be detrimental to scientists' data interpretation and sampling plan adaptation [11].Increasing cognitively-compatible robot autonomy could remove the burden of low-level repetitive tasks and allow scientists to focus on the higher-level tasks that require their full expertise.
In this paper we take a key step towards the development of cognitively-compatible robot autonomy for scientifc explorations, by understanding how human scientists make their sampling decisions to balance multiple, competing scientifc objectives.We collect and analyze sampling decision data from 33 geoscience experts, and we use simple robotic algorithms as repeatable models to directly test human decision-making hypotheses.We fnd that experts exhibit distinct multi-objective strategies, and that their strategy choices are sensitive to both what stage of data sampling they are in and the availability of outstanding information rewards.Based on our fndings, we implement a multi-objective balancing algorithm which predicts the adaptation of experts' multi-objective coordination strategy and allows the robot to suggest sampling locations aligned with human abstract decision making.

RELATED WORKS 2.1 Robots for Earth and Planetary Sciences
Historically, most robots for Earth and Planetary science have been designed for regions inhospitable or inaccessible to humans [13,41,42,49].Rovers for planetary exploration and teleoperated deep-sea robots both fall into this category, and both require high cognitive efort on the parts of the scientists who use them [7,33].With new opportunities on missions for robots to accompany human explorers, new work has focused on improving teleoperation [3,21,39] and mobile sensor platform approaches [50] for robotic exploration.While these approaches allow for increased scientifc exploration and experimentation, as well as a reduction in physical burdens and dangers to human scientists, there is a notable gap in work related to improving the autonomous science capabilities of robotic assistants on missions.

Human-In-The-Loop
Human-in-the-loop systems have become increasingly relevant as robotic systems have improved [28], with recent advances in coactive learning [42], demonstration learning [4,6,15,18,20,22,29,45], and shared autonomy [8] allowing for new levels of collaboration between robots and humans.However, most of these areas run into issues with adaptability to new situations and individuals [18] and are reliant on methods that don't align with human thought processes [15].While some recent work has started analyzing more human-compatible decision modeling methods [24], signifcantly more work needs to be done before robots can integrate seamlessly with human teammates.

Objective Balancing in Decision Making
A rich set of multi-objective coordination and balancing methods have been developed for resolving strategies under diferent conditions.These including population-based heuristics such as evolutionary algorithms (MOEAs) [53], multiple-criteria decisionmaking methods such as TOPSIS [16] and PROMETHEE [37], scalarization based methods [1,10], goal programming [43], and the Reference Point Method [48].However, each of these methods struggles with one or more aspect of real-world situations.While many are capable of approximating the Pareto fronts 1 [25], they either fail to capture the entire front [19,52] or have difcult to adapt requirements, such as known objective weights [10,16,37] or predefned user goals [43].Even when a method has solved the majority of these issues [48], it is unclear how well they match with actual human reasoning processes or if they are robust to changes in an individual's reasoning process.On the other hand, cognition and decision science researches have found that humans [2,14,36,44,46,51] and animals [31,40] can often use simple 1 The Pareto front represents the set of solutions in the objective space that are not dominated by any other solution in terms of all the objectives.heuristics, or rules of thumb, in place of complex decision-making models, yet still achieve similar outcomes.However, these heuristics can introduce a level of unconscious bias that must be accounted for.Our work aims to bridge the gap between the two approaches, by using simple robotic algorithms as repeatable models to directly test human decision making hypotheses.

A SIMULATED SCENARIO TO COLLECT EXPERTS' SAMPLING DECISION DATA
In order to investigate the underlying cognitive dynamics of experts when they hold multiple objectives, we have developed a humanrobot co-exploration simulated scenario on the basis of informationbased objectives and discrepancy-based objectives.The simulated scenario is based of a real-world feld campaign at White Sands National Park (Fig. 1A), a dune feld in New Mexico [34,35], where a feld-deployable robot, RHex [38] (Fig. 1B), helped scientists discover that the actual dependence of soil strength, , on the soil moisture, , was diferent from what was hypothesized in the literature [27].In our simulated scenario, we represent this hypothesized dependence as 1 (Fig. 1C green curve), and the observed actual dependence as 2 (Fig. 1C red curve).Participants would randomly receive one of the two measurements, one supporting the given hypothesis, 1 , and the other supporting an alternative hypothesis, 2 .Circular markers labeled on the transect represents the 12 initial measurements given to the participants at the beginning of the exploration.
The hypothesized dependence, 1 , was provided to all participants, but each participant was randomly assigned to receive one of the two data sets2 supporting 1 or 2 .See [47] for a more complete description of how each data set was computationally generated.Experts were asked to take measurements of and along the dune transect, to evaluate whether the hypothesized dependence, 1 , was supported by their measurements.They were allowed to continue taking measurements until they were confdent enough to draw a conclusion.The detailed data collection procedure is described in Sec.3.1.2.This scenario was interesting from a decision-making perspective because scientists' confdence towards the hypothesis could vary dynamically in response to each measurement, which requires them to adjust their sampling strategies based on the updated confdence.This provided an opportunity to study how changes in hypothesis beliefs (measured by subjective confdence) alter the priority of sampling objectives and corresponding sampling decisions.Furthermore, since the optimal sampling location to satisfy each objective was often not the same, this scenario allows us to better understand how scientists choose sampling strategies to balance multiple conficting objectives.

Data Collection
3.1.1Participants.33 expert geoscientists were recruited to participate in the simulated task, hosted online 3 , including 14 female participants, 16 male participants, and 3 participants who selected not to report their gender identity or reported as "other".Experts had to have obtained a bachelor's degree in a geoscience-related feld to qualify for the study.Participants ranged in age from 24 to 68 years old.
3.1.2Procedure.Before experts were shown any data, they were asked to report their initial confdence towards the given hypothesis, choosing from seven levels ranging from low confdence to high confdence.Expert-provided initial confdence would not change the data collection procedure, but would provide us information regarding how potential initial biases may infuence the sampling behavior 4 .
At the beginning of the sampling process, experts were presented with 12 initial measurements of soil strength, , and soil moisture, , collected by the robot at locations 2, 11, 15, 20 (Fig. 1C, yellow to blue markers), with 3 samples at each location.To understand how experts' sampling location choices were afected by holding multiple objectives, we asked the experts to report their ranked sampling objectives, prior to each sampling step.Four sampling objectives (Table 1) identifed from our previous study [24] were provided for experts to choose from.Among the four objectives, and focused on increasing the information coverage in spatial and parameter coordinates.We refer to these two objectives as "information-based objectives".and focused on testing observed diferences between measurements and hypothesis.We refer to these two objectives as "discrepancy-based objectives".Experts were asked to rank their selected objectives, from the most important to the least important.These self-reported ranked objectives allow us to understand how experts' sampling priority changes during the data collection, and how these changes afect experts' sampling behaviors.
Once the expert provided the ranked objectives, the robot would suggest 3 sampling locations by selecting peaks in the reward function corresponding to experts' most important objective, following 3 The simulated scenario is available online here: https://www.dataforaging.com/#/. 4The bias efect analysis is not discussed in this paper.

Table 1: Objective Representation,
Description I s There are areas along the dune transect (between crest and interdune) where data is needed I m There are portions of the dynamic range of the moisture variable (x axis of the data plot) where data is needed D v There is a discrepancy between the data and the hypothesis that needs additional evaluation D i The data seems to be supporting the hypothesis so far but additional evaluation is needed the method reported in [24].Experts were given the option to accept one of the locations suggested by the robot, or to reject the suggestion and select a sampling location themselves.If the expert chose to accept a suggested location, they were asked to rate how well the robot-suggested location addressed each of their reported objectives, on a scale of 1-5 (1 being the least satisfed and 5 being the most satisfed).Note that these robot suggestions were not designed to accurately predict experts' choices, but instead were used as a tool to probe how experts select among the diferent suggested options to simultaneously balance multiple objectives.After the expert selected the location, the measurements (i.e., , at the selected sampling location) were shown to the expert, and they were asked to update their confdence towards the hypothesis.This process (i.e., ranking sampling objectives, selecting a sampling location, updating confdence) would repeat until the expert chose to stop the data collection and make a conclusion towards the hypothesis.For the analysis in this study (Sec.4 and Sec.5), we were not particularly concerned about whether the experts made a correct conclusion.Instead, we focused on understanding the underlying factors governing their sampling strategies when they hold multiple objectives.
Once a conclusion was made, the experts were asked to fll out a follow-up survey about their preferred strategies to balance multiple objectives, their overall impression of the suggested locations, and their demographic information (e.g., age, gender, years of experiences).Experts were also asked to report how frequently (Always, Most of the time, Sometimes, Rarely, Never) they would hold multiple objectives during their sampling, and their preferred strategy to handle multiple objectives.

SAMPLING BEHAVIORS OBSERVED FROM HUMAN DECISION DATA 4.1 Experts often hold more than one objective during scientifc explorations
Through analysis of the frequency with which experts select multiple objectives, we found that experts hold more than one sampling objectives during a substantial fraction (32%) of the 141 total sampling steps.Among the 31 experts whose data were reported here 5 , 18 experts (58%) reported more than one sampling objectives at least once during the course of the exploration.Among the 18 experts, 10 of them (56%) consistently pursued multiple objectives at every step.Even for the experts who did not report multiple objectives simultaneously, 46% of them switched between the informationbased and discrepancy-based objectives from one sampling step to another at least once during their exploration.In addition to their sampling behaviors, it's worth noting that almost half (48%) of experts reported "always" or "most of the time" holding multiple sampling objectives during their explorations in the post-survey.Interestingly, we also found that experts' satisfaction with robotsuggested locations was lower when they held multiple sampling objectives.We analyzed the diference in the satisfaction rating between sampling steps where only one objective was reported, and sampling steps where multiple objectives were reported, for all experts who reported both through their exploration.Results shows that 75% of those participants reported lower satisfaction towards robot-suggested sampling location when multiple objectives were reported.The observation that experts often hold multiple objectives, yet were less satisfed with the robot's suggestions when they do, underscores the signifcance of understanding multi-objective decision-making for human-robot teaming.

Three types of sampling behaviors that experts use to balance multiple objectives
Here we analyze each decision-making step when participants hold multiple objectives, to ofer insights into the underlying cognitive model.Data suggested that there existed three distinct types of strategies (Fig. 2) that experts exhibit when managing multiple objectives: • Focus type.Expert who uses the focus type strategy prioritizes selecting a location that could maximize the reward value of their highest-ranked objective.Fig. 2A shows an example of the focus type, where among the three robotsuggested locations (Fig. 2A-i, yellow stars) the expert accepted location 6 (Fig. 2A-i, red line).Notice that location 6 has the largest information-based reward (Fig. 2A-ii, green curve), which is the highest-ranked objective reported by the expert at this step.By selecting location 6, the expert prioritized their highest-ranked objective, even with the cost that the selected location has relatively small reward value towards their secondary objective (Fig. 2A-ii, purple curve).• Hierarchy type.Expert who uses the hierarchy type strategy selects sampling location by iteratively optimizing each objective based on the level of importance, i.e., frst select candidate locations that satisfy the most important objective, then select from those a location that satisfes other lessimportant objectives.Fig. 2B shows such an example.Here the highest-ranked objective was also to increase information coverage, and the robot suggested 3 candidate locations based on the corresponding reward (Fig. 2B-ii, green curve).Unlike the focus type, the reward value at the selected location (location 13) was not the largest for the highest-ranked objective (Fig. 2B-ii, green curve).However, the reward for the secondary objective (Fig. 2B ii, purple) was the largest among the three locations, suggesting a strategy to frst satisfy the foremost important objective, then hierarchically satisfy the secondary objective.• Trade-of type.Expert who uses the trade-of type strategy selects the locations to satisfy multiple objectives simultaneously, without placing a priority on either.As shown in Fig. 2C, among the three suggested locations, location 4 had the largest reward for the primary objective (Fig. 2C, green curve), but very small reward for the secondary objective (Fig. 2C, purple curve); on the other hand, location 10 had the largest reward for the secondary objective (Fig. 2C, purple curve) among the 3 locations, yet relatively small reward for the primary objective (Fig. 2C, green curve).The expert ended up choosing location 13, which was not the best for either objectives, but allowed the expert to strike a balance between the two.Based on the three observed strategies, in the follow-up survey we asked experts to select their preferred strategy to handle multiple objectives from Table 2.We use this information in our later analysis to determine the key factors governing which type of multi-objective balancing strategy experts use under diferent sampling scenarios.

Strategy Description
Focus I focused on one-at-a-time.I selected a sampling location that addressed my most important objective.Hierarchy I weighted my objectives.I prioritized sampling locations that addressed my most important objective frst, and then selected from these a location that also addressed my second most important objective.Trade-of I did not weight my objectives.I selected a sampling location that addressed my multiple objectives, even if it was not the ideal location to address my most important objective.Other Other

Quantitatively characterize the multi-objective balancing types in the objective function space
To quantitatively characterize the three observed types from experts' sampling behaviors, we projected sampling locations into the objective function space (Fig. 2 iii).With two objectives, the objective function space can be simply spanned by two axes, each representing reward values corresponding to one objective.Here we use the x axis to represent the normalized reward corresponding to the primary objective, 1 ∈ [0, 1], and the y axis to represent the normalized reward corresponding to the secondary objective, 2 ∈ [0, 1].We then plot all sampling locations in the objective function space based on their reward values.The objective function space ofers a way to efectively visualize and compare the reward values for multiple objectives.In addition, there exists a rich set In (i), red circular markers represent the existing measurements along the transect; the black dashed curve and the blue solid curve represent the given hypothesis and the estimated mean of the measurements using Gaussian processes, respectively.The diferences between the two curves were used to compute the discrepancy-based reward.Yellow stars represent robot-suggested sampling locations, and the red vertical line represents expert-accepted sampling location.In (ii), the green (left y-axis) and purple (right y-axis) curves represent the reward values corresponding to the primary and secondary objectives, respectively.In all three examples, the primary objective was information and the secondary objective was discrepancy.In (iii), black circles represent available sampling locations.The yellow and red stars corresponding to the robot-suggested locations and expert-accepted location, respectively.
of multi-objective optimization (MOO) methods [25] that can be leveraged to guide reward selections in the objective function space, once we understand how experts select diferent strategies during their exploration.Here, we use this space to characterize the three observed multi-objective balancing types: • Focus type.The focus type is defned as a location excelling in the primary objective reward value.For a location accepted by the expert to be characterized as the focused type, it must be either the location with the largest 1 among the suggested locations, or a location with 1 ≤ 0.9.Fig. 2 A-iii illustrates the focused type example corresponding to Fig. 2A-i,ii).• Hierarchy type.The hierarchy type corresponds to a location with a prominently large 1 , paired with the large value in 2 .A location was characterized as the hierarchy type if it had the largest 2 among the 3 suggested locations (based on large 1 values); or if it was a location with the top 2 largest 2 values within the range 0.7 ≤ 1 < 0.9.Fig. 2 B-iii illustrates the hierarchy example corresponding to Fig. 2B-i,ii.• Trade-of type.A location was characterized as the tradeof type if it did not satisfy either focused nor hierarchy criteria, but with both R1 and R2 values above 0.2.The tradeof type allows experts to select a sampling location with reasonable reward for all objectives, even if it was not the ideal location for either.Fig. 2 C-iii illustrates the trade-of type example corresponding to Fig. 2C-i,ii.
The representation in the objective function space allowed us to quantitatively characterize the types of multi-objective balancing strategies from experts' sampling behaviors.In Sec. 5, we compare experts' self-reported preferences with the strategies identifed from their sampling behaviors, to understand how human choose diferent strategies to coordinate and prioritize multiple, conficting objectives.

KEY FACTORS GOVERNING EXPERTS' ADAPTATION OF MULTI-OBJECTIVE BALANCING STRATEGIES
To understand whether experts always use a fxed type of strategy to coordinate multiple objectives, or if they switch strategies under diferent scenarios, here we compare the strategy types characterized from their sampling behavior with their self-reported preferences in the follow-up survey.Interestingly, we found that experts only used their preferred strategy to balance multiple objectives for approximately one-third of the time.During other sampling steps, experts' actual multi-objective balancing strategy exhibited a signifcant deviation from their reported preferences.This interesting observation prompted us to investigate what factors may have caused experts to switch strategy, and what benefts may be obtained through such adaptation.In the following subsections, we discuss our fndings on the two key factors that primarily infuence experts' multi-objective balancing strategies: (i) the sampling stage (Sec.5.1), and (ii) the existence of outstanding rewards (Sec.5.2).

Experts adapted towards diferent multi-objective balancing strategies based on the sampling stage
We noticed that when the reported primary objective was informationbased, experts' strategies to balance multiple objectives shifted toward two opposite directions during early and late sampling stages (Fig. 3).
To characterize the sampling stage, we used an "information level" measure [24] which was found to refect humans' mental representation of accumulated information coverage.The information level, ¯, generally increases with the number of samples that have been collected.Here we compare the expert-reported multiobjective balancing strategy preferences with the actual strategy characterized from their sampling behavior, for three diferent sampling phases: a low-info level ( ¯ < 0.51) qualitatively corresponding to the beginning few steps of the sampling phase; a high-info level ( ¯ > 0.74) qualitatively corresponding to the last few steps of the sampling stage prior to conclusion; and a mid-info level in between.Fig. 3 shows the comparison of experts' actual sampling behavior (plotted on the y axis) with their self-reported preferences (plotted on the x axis), when the primary objective was to gather information.The numbers in the cells along the diagonal dashed line represents the percentage of sampling steps where the type of strategy that experts used to balance multiple objectives was consistent with their reported preferences.Cells below the diagonal line indicate that experts shifted their priority towards the primary objective; whereas cells above the diagonal line indicate a higher priority towards the secondary objective.
We found that at low information level, experts' actual sampling behavior exhibited a strong shift downward towards the primary objective, information gathering (Fig. 3A); whereas at high information level, experts' actual sampling behavior exhibited a strong shift upward, towards the secondary objective, investigating discrepancies (Fig. 3B).This suggested that experts adapted their multi-objective balancing strategies to place more priority on information gathering (i.e., exploration) during the early sampling stage, and adapted the strategies to place more priority on testing specifc hypotheses (i.e., exploitation) at the later sampling stage.This is consistent with previous cognitive science and human-robot teaming studies, which suggested that humans' sampling behavior tended to shift from exploration-oriented objectives towards exploitationoriented objectives through the sampling process [5,23,24].
The exploration-to-exploitation priority shift explained the observed deviation of experts' multi-objective balancing strategies at low and high information levels.This fnding also allowed robots to use a simple model to predict experts' sampling action under multiple objectives.Fig. 4 shows that by simply applying the focus strategy for the information-based reward at the early sampling stage, and the focus strategy for the discrepancy-based reward at the later sampling stage, more than 70% of human sampling behaviors could be successfully captured., where and represents robot-predicted location and expert-selected location, respectively, and represents the total number of available locations along the transect.

Experts adapted their multi-objective balancing strategies in response to outstanding reward values
While experts' multi-objective balancing strategy at low and high information levels were successfully explained by the infuence of sampling stage, at the mid info level, experts' strategy was observed to shift towards both information and discrepancy, which could not be simply explained by either exploration nor exploitation behaviors.However, we noticed that experts' sampling behavior at the mid information level tended to exhibit a large shift when there existed large reward values in one of their objectives.We speculated that the existence of outstanding rewards may have drawn experts' attention towards the specifc objective, resulting in the observed adaptation in their sampling behavior.
We then characterize the prioritization shift, , as the diference between experts' actual sampling strategy tier, , and their selfreported strategy tier, .If an expert's actual strategy was shifted towards information-focused, was denoted as → , where denotes the number of tiers shifted.Vice versa, we denote → if the strategy was shifted towards discrepancy focused.= 0 would suggest that the experts sampling strategy, , was an exact match with their reported preference, .Fig. 5 shows that the prioritization shift exhibited a strong correlation with the maximum reward values available.Experts were signifcantly more likely to shift towards the exploitation-focused prioritization (Fig. 5A, 2 → , 1 → ) when a large maximum discrepancy reward was available; whereas a signifcant shift towards the exploration-focused prioritization (Fig. 5B, 2 → ) was observed when a large maximum information reward was present.
This observed multi-objective strategy adaptation in response to outstanding reward could be explained by the evidence reported in a recent human behavior research [12], which discovered that humans could achieve more accurate and efcient decision making by only updating their beliefs when the reliability of incoming sensory information is sufcient strong.We speculated that, by keeping their default multi-objective balancing strategy when neither reward was sufciently strong, experts could potentially avoid prematurely locking to a specifc objective or assumption and making biased sampling decisions.In addition, by applying strategy adaptation only towards strong rewards, they could reduce cognitive burden caused by frequently adapting sampling priorities.

COGNITIVELY-COMPATIBLE ALGORITHM FOR HUMAN-ROBOT TEAMING
Understanding when and how experts adapt sampling strategy can allow robots to better anticipate humans' desired sampling action in dynamic environments, and thus better support scientists' high level goals during collaborative missions.Here we take the frst step towards this goal by developing a simple, cognitive-compatible algorithm for robots to assist human scientists in adapting sampling plan under multiple objectives.
The proposed idea is that, based on humans' scientifc objectives and incoming measurements, the robot considers the infuences from the sampling stage and availability of outstanding rewards, and selects sampling location to satisfy scientists' high-level objectives.In Sec.6.1 we implement the observed key factors to predict human's multi-objective balancing strategy adaptation.In Sec.6.2 we test the proposed algorithm in two human-robot teaming tasks during a planetary-analogue feld exploration, to obtain scientists' feedback on the algorithm.

Encoding the efect of information level and outstanding reward to predict desired strategy adaptation
The algorithm uses the discoveries from Sec. 5 to predict when and how experts would adapt their default strategies in handling multiple objectives, and produces suggested sampling locations.Expert's preferred strategy to balance multiple objectives will be taken as an input into the algorithm.However, instead of using the default strategy to generate sampling locations, the algorithm would apply an adaptation to the expert's default strategy based on the current information level and maximum rewards.
Based on the fnding that experts' sampling strategy was primarily governed by the information levels (Sec.5.1), if the information level was low, the algorithm applies a focus type on the exploration objective regardless of the reported sampling objectives and the expert-reported preferred strategy.Similarly, if the information level was high, the algorithm applies a focus type on the exploitation objective, regardless of the reported objective or the preferred balancing type.According to results from Fig. 4, this simple model was able to capture the majority of human sampling behaviors.
Based on the fnding that experts were likely to place a higher priority on a specifc objective if there existed a sampling location option with outstanding reward value corresponding to that objective (Sec.5.2), at mid-info level, if a notable (i.e., exceeding a predetermined threshold) reward value was detected for either objective, an adaptation towards that specifc objective would be applied.Otherwise, the algorithm uses the preferred balancing type reported by the expert.
Once the adapted strategy is determined, the algorithm proceeds to select the sampling location based on the adapted strategy type.Specifcally, the algorithm applies the Pareto front method [25] to identify a set of candidate locations that allows optimally balancing the multiple objectives, as compared to other alternative options.Once the candidate locations were identifed, the algorithm uses the adapted type of multi-objective balancing to determine which sampling location to choose from the Pareto sets [19,52].

Testing the proposed multi-objective balancing algorithm in human-robot collaborative feld sampling mission
In this section, we seek to evaluate whether sampling plans produced by the extremely simple algorithm would align well with scientists' abstract reasoning in an actual feld sampling mission.To do so, we deployed the algorithm during a human-robot collaborative feld exploration mission at a planetary-analogue, Mt.Hood.The overarching goal of the mission was to understand how mechanical properties of ice-regolith mixture were governed by ice content.Two sampling tasks were performed where a human-robot team collaboratively adapted sampling strategies during exploration (Fig. 6).Two planetary scientists with previous rover mission experiences participated in the human-robot teaming scenario.
For the frst sampling task, the goal of the human-robot team was to test a hypothesis regarding the stifness of icy regolith along a ∼10m transect (Fig. 6A-C).For the second sampling task, the goal for the human-robot team was to verify a hypothesized relationship between ice content and temperature (Fig. 6D-F) using thermal and microscopic imaging.In both sampling tasks, the scientists were asked to provide their preferred strategy to balance multiple objectives, and their initial confdence towards the hypothesis, before the sampling process began.Given the scientifc hypothesis provided by the scientists (Fig. 6C, F), the robot (Fig. 6B) used the proposed algorithm to predict the adaptation of multi-objective strategy type, and the resulting sampling location.The robot-suggested location was presented to the scientists one step at a time.The scientists were given the option to accept or reject the suggestion, and provide feedback on the robot suggestion.An explanation of how the robot suggestion was produced was also provided to the scientist.Once a sampling location was accepted or selected by the scientist, the robot proceeded to collect measurements, and present the data to the scientist along with its suggestion of the next sampling location.The process repeated until the scientists decided to stop the data collection and make a fnal conclusion about their hypothesis.
The reported objectives from both sampling tasks exhibit a similar transition of priority as observed from our simulated data set.In both tasks, the scientists started with an information-based primary objective, and switched to discrepancy-based objectives towards the fnal step.More than one objective was reported in 67% of the sampling steps.For all sampling steps where multiple objectives were reported, the scientist accepted the robot's suggestions.
The success of the highly simplifed algorithm to generate sampling plans that are well-received by scientists demonstrates the potential for better understandings of human reasoning to improve human-robot teaming.In addition to the ratings, descriptive feedback and suggestions were also gathered from the scientists for future improvement of the algorithm.There were two sampling steps where the scientist reported that the robot's explanation or the measurement capabilities could be improved.In the frst case, the explanation was perceived to be too complicated, and the scientist suggested more succinct or visual explanations.In the second case, the expert reported that the robot-suggested location was scientifcally valuable, yet challenging to access.Based on these suggestions, future work should explore alternative forms of robotic explanations to aid with efective intention communication.In addition, it would be important to explore how humans form unexpected objectives (such as accessibility) during feld operations, and develop robotic teammates with environment awareness to better support science explorations.

CONCLUSION AND FUTURE DIRECTIONS
Our analysis of human sampling decision data reveals that scientists adapt their sampling strategies to balance multiple objectives based on two key factors: the current level of information about the environment, and the availability of sampling location options with large potential rewards.While this work is only a beginning step towards the development of cognitive-compatible robotic decision algorithms, our fndings show by better understanding human decision processes, robots can use extremely simple algorithms to connect experts' high-level objectives to desired sampling locations while balancing multiple objectives.Going forward, exploring how humans coordinate and prioritize multiple objectives under more sophisticated scientifc exploration scenarios, such as with multiple competing hypotheses, with hypotheses regarding multiple variables, or with additional sampling objectives, would be helpful to explore.These understandings could help our robots produce explainable sampling strategies that are well-aligned with humans' high level goals, and improve humans' trust and confdence during teaming.These cognitive understandings could also allow robots to identify potential vulnerabilities in human decisions, such as biases and fatigue, and provide targeted support to enhance scientifc outcomes.In addition, we expect that these cognitive insights could complement existing robotic decision methods by informing which algorithms to use, and eventually empower robots to become intelligent teammates that can truly participate in the decision-making process.

Figure 1 :
Figure 1: The data collection scenario used in this study.(A) Field site at White Sands, NM, a dune feld in the southwest of the United States, where the RHex [38] robot (B) assisted human scientists by collecting soil property measurements along a sand dune.The sampling decision scenario and data set used in this study was based of this actual mission.(C) The simulated feld sampling scenario, where participants were asked to evaluate whether the given hypothesis ( 1 ) was supported by the robot measurements.Participants would randomly receive one of the two measurements, one supporting the given hypothesis, 1 , and the other supporting an alternative hypothesis, 2 .Circular markers labeled on the transect represents the 12 initial measurements given to the participants at the beginning of the exploration.

Figure 2 :
Figure2: Examples of three multiple objective balancing modes: (A) Focus mode, (B) Hierarchy mode, and (C) Trade-of mode, illustrated in the (i) measurement, (ii) reward, and (iii) objective function spaces.In (i), red circular markers represent the existing measurements along the transect; the black dashed curve and the blue solid curve represent the given hypothesis and the estimated mean of the measurements using Gaussian processes, respectively.The diferences between the two curves were used to compute the discrepancy-based reward.Yellow stars represent robot-suggested sampling locations, and the red vertical line represents expert-accepted sampling location.In (ii), the green (left y-axis) and purple (right y-axis) curves represent the reward values corresponding to the primary and secondary objectives, respectively.In all three examples, the primary objective was information and the secondary objective was discrepancy.In (iii), black circles represent available sampling locations.The yellow and red stars corresponding to the robot-suggested locations and expert-accepted location, respectively.

Figure 3 :
Figure 3: Comparison between expert-reported multiobjective balancing strategy preference (x axis) and the strategy characterized from their sampling behaviors (y axis), for (A) sampling steps at low information level (corresponding to early sampling stage), and (B) sampling steps at high information level (corresponding to late sampling stage).

Figure 4 :
Figure 4: Accuracy of predicted expert sampling location at diferent information levels, using (A) focused type with information-based objectives, and (B) focused type with discrepancy-based objectives.Accuracy was computed as − = 1 −, where and represents robot-predicted

Figure 5 :
Figure 5: The relationship between maximum reward value and the observed prioritization shift in experts' sampling behaviors, measured from the mid information level range.(A) shows the observed prioritization shift when diferent values of maximal discrepancy-based reward were available.(B) shows the observed prioritization shift when diferent values of maximal information-based reward were available.

Figure 6 :
Figure 6: Deployment of the proposed algorithm for humanrobot teaming in a planetary-analogue mission at Mt.Hood, OR. (A-C) Sampling scenario task 1, where the robot (B) assisted human scientists in adapting the sampling plan across a transect (A, red line) to test the hypothesis regarding the trend of icy regolith stifness along the transect (C).(D-F) Sampling scenario task 2, where the human-robot team tested a hypothesized relationship (F) between ice content (D) and thermal readings (E) for ice-regolith mixtures.