Variable Autonomy through Responsible Robotics: Design Guidelines and Research Agenda

Physically embodied artificial agents, or robots, are being incorporated into various practical and social contexts, from self-driving cars for personal transportation to assistive robotics in social care. To enable these systems to better perform under changing conditions, designers have proposed to endow robots with varying degrees of autonomous capabilities and the capacity to move between them—an approach known as variable autonomy. Researchers are beginning to understand how robots with fixed autonomous capabilities influence a person’s sense of autonomy, social relations, and, as a result, notions of responsibility; however, addressing these topics in scenarios where robot autonomy dynamically changes is underexplored. To establish a research agenda for variable autonomy that emphasises the responsible design and use of robotics, we conduct a developmental review. Based on a sample of 42 papers, we provide a synthesised definition of variable autonomy to connect currently disjointed research efforts, detail research approaches in variable autonomy to strengthen the empirical basis for subsequent work, characterise the dimensions of variable autonomy, and present design guidelines for variable autonomy research based on responsible robotics.


INTRODUCTION
Robots are being incorporated into various practical and social contexts, from self-driving cars for personal transportation to assistive robotics in social care.There is an emerging understanding of how robots with ixed autonomy inluence a person's sense of autonomy, social relations, and, as a result, notions of responsibility [48,72,120,123].For example, some scholars have suggested that social robots in care homes can increase residents' feelings of autonomy by decreasing their dependence on staf [99] or helping them stay connected with friends and family through video [112]; meanwhile, others ofer opposing critiques, claiming that social robots provide illusory and inauthentic social relations that may emotionally manipulate care home residents [44,129].
But, what happens when these robots are imbued with the potential to operate along a continuum of autonomous capabilities?We refer to this approach to robotics as variable autonomy [25].Past work has shown that dynamically changing between levels of autonomy in complex settings can improve a robot's performance [87].For example, a robot for disaster response may need to operate in environments with limited network conditions [e.g., 79].When communication channels are operating properly, a remote human operator can directly control the robot; in this situation, the robot is in a teleoperated mode and consequently has lower levels of autonomous capabilities.Yet, when there are instances of low connectivity between the teleoperator and robot, the robot may have to transition to a state of greater autonomous capabilities to perform its rescue task without direct control from its human operator.Another example when variable autonomy may be required comes from the domain of assistive robotics.Consider a care robot that supports medication management for older adults.For some adults, the robot may only need to provide auditory reminders at set times throughout the day.But, for those who sufer from dementia, the robot may have to perform a wider range of tasks at higher levels of autonomy, such as physically moving throughout a house.Apart from having to change autonomous capabilities in accordance with individual diferences across a group of users, we can also imagine how the care robot's autonomy may have to adapt to the same individual user's condition if it were to deteriorate over time.These examples show how variable autonomy implementations lead to situations in which control authority over a robot shifts between a human and artiicial agent in response to some set of conditions.
Recently, others have proposed variable autonomy as a means to operationalise responsibility in the design of autonomous systems [see 82].Instead, we take the opposite direction: how can we ensure that robots with variable autonomy are designed and developed in a responsible manner?The preceding scenario of the care robot for medication management highlights the potential risks of introducing variable autonomy into sensitive environments: too great or too little autonomy under certain conditions may result in various harms, such as missed medication or a person losing their sense of independence.The capability to alter a robot's autonomous capabilities during interaction accentuates questions pertinent to responsible robotics, such as: under what environmental and social circumstances is variable autonomy appropriate?Who may be harmed and how?Who should be held accountable if control over a robot's capabilities may alter unexpectedly?To date, few, if any, studies have addressed the connection between responsibility and variable autonomy (see [116] for one such study), and none, as far as we are aware, have approached variable autonomy through the lens of responsible robotics.
Therefore, our objective in this paper is to construct a research agenda for variable autonomy based on responsible robotics.To do so, we must irst establish a coherent representation of variable autonomy research.In its present state, this ield lacks cohesive terminology, leading to disjoint research eforts; a detailed description of the ield's research approaches, making it diicult for scholars to adopt similar designs, employ consistent and validated measures, and identify empirical gaps; and a clear discussion of variable autonomy design guidelines that can serve as a heuristic for engineers and researchers.From these gaps and in pursuit of our objective, we address the following research questions: RQ1: How is variable autonomy deined in the literature?RQ2: How is research into variable autonomy conducted?RQ3: How is variable autonomy implemented?
In answering these questions, we develop a novel model to the study and design of variable autonomy robotics that builds on prior empirical and conceptual research.This research model will be articulated through clear, consistent terminology, and guided by an in-depth understanding of past empirical approaches.Given these aims, we follow the łdevelopmental reviewž method as described by Templier and Paré [126].A developmental review is a structured literature review method from the ield of information systems that is useful for developing novel conceptualisations, frameworks, and approaches from previous bodies of research.As our review, we survey 42 recent contributions to variable autonomy in robotics published in high quality and high impact venues; we expand upon our method in Section 3.
Based on our review, we make four contributions: capacity to both deliberate and act upon the world.A framework of autonomy in human-robot interaction (HRI) by Beer et al. [9,77] provides the following deinition: łThe extent to which a robot can sense its environment, plan based on that environment, and act upon that environment with the intent of reaching some task-speciic goal (either given to or created by the robot) without external controlž [emphasis in original].
As per Beer and colleagues, any task is composed of three łprimitivesž: sense, plan, act; a robot's ability to perform each of these facets independently determines how autonomous it is said to be.Since the degree to which a robot executes each task primitive autonomously can vary, researchers have conceptualised robot autonomy in a hierarchical structure of potential control modes.
Taxonomies for levels of autonomy (LoA) have a long history within the automation and HRI literature.We do not attempt to delineate them all here, but focus on a few key contributions that help explicate the concept of LoAs.Those interested in greater detail can refer to reviews by Vagia et al. [130] and Beer et al. [9].
One of the earliest comes from research on automation by Sheridan and Verplank [114].Published in 1978, the authors survey the potential of teleoperated and supervisory control systems: teleoperation means, intuitively, that a vehicle is controlled remotely by a human operator, while supervisory control includes vehicles that can operate automatically for periods of time with intermittent intervention by a remote operator.These control modes represent two of ten potential levels; as one moves up the hierarchy, the extent to which human intervention is necessary decreases.
Building on this work over two decades later, Parasuraman et al. [98] expanded the framework to include both types and levels of automation.As before, automation varies across a continuum from manual performance to full automation.But, in this framework, the authors speciied the classes of functions to which automation can be applied: information acquisition, information analysis, decision and action selection, and action implementation [98].Automation is not all-or-nothing, and can be applied to varying degrees to certain types of functions.
Alongside the proliferation of such taxonomies in the automation literature, researchers in HRI have articulated their vision of robot autonomy, taking into consideration the idiosyncrasies of robotics technology such as physical embodiment and social situatedness [9].From the perspective of military applications, Huang et al. [54] created a framework to describe the levels of autonomy along three dimensions: the complexity of the mission, the diiculty of the environment, and the degree to which humans interface with the robot.Each axis contains a series of metrics which are used to calculate the robot's level of autonomy.In situations characterised by low mission complexity, simple environments, and a high degree of human interaction the robot's autonomy is considered to be low; and the more independently the robot can sense, plan, and act during complex missions in diicult environments, higher levels of autonomy are needed [54].
Beer et al.'s most recent work [9] sets out a framework that speciies in detail ten diferent levels of robot autonomy.Across each level, the framework states the roles performed by both the human and robot, as they relate to the primitives of sense, plan, and act.For example, in a level titled batch processing, ł[b]oth the human and robot monitor and sense the environment.The human, however, determines the goals and plans of the task.The robot then implements the taskž [9,87].As one moves along the continuum from manual to full autonomy, the number of functions allocated to the robot increases.
Apart from its adoption in other academic disciplines, the level of autonomy concept has been profoundly inluential in shaping international standards.For example, the SAE J3016 standard for łLevels of Driving Automationž depicts degrees of automation for vehicles [56], ranging from Level 0, in which the human manually operates all driver support features, to Level 5, where the automation drives the vehicle under any condition.
While inluential, the LoA concept has been criticised by numerous authors.These critiques commonly take issue with the implied trade-of between human and autonomous control, albeit through slightly diferent formulations.For example, Bradshaw et al. [16] implore that increases in a system's autonomy do not necessarily entail a concomitant decrease in the need for human control.Ironically [5], the introduction of an autonomous system tends to create new kinds of cognitively demanding work for human operators to perform [16].Relatedly, Endsley [41,8] points to the automation conundrum: łThe more automation is added to a system, and the more reliable and robust that automation is, the less likely that human operators overseeing the automation will be aware of critical information and able to take over manual control when needed.žBuilding the line of critique levied against the LoA taxonomy, Shneiderman [115] proposes a two-dimensional framework in which high levels of human control and autonomous capabilities are simultaneously achievable.
In summary, frameworks for levels of autonomy originate in the ield of automation research, and have been inluential in numerous areas.Those involved in HRI have adapted these taxonomies to it the nuances of robotics technology.The continuum of autonomy supposes that as the degree to which a robot can sense, plan, and act in its environment increases, the level of human involvement subsides.Despite its adoption in technical standards and much academic writing, the uni-dimensional LoA concept is heavily criticised.

Variable Autonomy
A central assumption of these frameworks is that levels of autonomy are ixed at the design stage ś what Parasuraman et al. [97] termed łstatic automation.žThe resultant rigidity of these robots comes with various challenges, such as ensuring operators can intervene during automation failures [36,37,97] and enabling humanrobot teams to adapt to changing and complex environments [104].To accommodate the challenges presented by ixed LoAs, substantial research has been directed towards approaches that aim to dynamically shift between modes of autonomous control [37,90] ś which we call variable autonomy.As early as the 1970s, variable autonomy has appealed to roboticists; it promised lexibility amid dynamic and hostile environments, reduced workload for human operators, and the ability to exploit the complementary skill sets of humans and robots [47,49,66,109].The past four decades have seen a number of research groups investigate variable autonomy under many diferent labels, such as traded control [65], adaptive autonomy [35], adjustable autonomy [18,37], sliding autonomy [20,36], and dynamic autonomy [21].The diferent uses of these terms are discussed in further detail in Section 5.1.
Despite their shared concern for the limitations posed by ixed LoAs in robots, these similar concepts are loosely deined and inconsistently compared and contrasted: some authors provide similar deinitions for diferent terms, some create subtle distinctions between them, while others ofer no deinition at all.This semantic ambiguity complicates attempts to formally characterise variable autonomy and unnecessarily separates related research eforts.In this section, we provide an historical background on the concept of variable autonomy in robotics, point to seminal work in the ield and its motivating problems, and outline limitations in current taxonomies of variable autonomy to emphasise the need for a robust deinition and characterisation.
One of the earliest formulations of the notion that a robot can possess multiple LoAs comes from the previously discussed report by Sheridan and Verplank.Sheridan and Verplank distinguished between two types of control, which they term as shared and traded.As the authors wrote: łHere, to share control means that both human and computer are active at the same time.To trade control means that at one time the computer is active, at another the human isž [114, 6.1].Shared control, as deined in a recent review, is a control mode in which łhuman(s) and robot(s) are interacting congruently in a perception-action cycle to perform a dynamic task that either the human or the robot could execute individually under ideal circumstancesž [1,511].As such, a robot with shared control is not necessarily one with variable autonomy; it is a form of collaboration, typically described as a speciic LoA [9], that aims to achieve a given task through complementary human-robot capabilities.Meanwhile, the distinction by Sheridan and Verplank [114] implies that traded control is a type of variable autonomy in which control of a robot is at any time in one of two discrete states: fully autonomous or remotely controlled [65].
Beginning in the late 1990s, the concept of variable autonomy and its variants took hold in robotics research.A 1999 symposium titled Agents with Adjustable Autonomy hosted by the AAAI brought together early contributors and ofered an initial deinition.According to the symposium co-chairs, ładjustable autonomy means dynamically adjusting the level of autonomy of an agent depending on the situationž [90]; the authors go further and state that adjustments in autonomy can be initiated by either human or autonomous agents.Some of the earliest studies on variable autonomy addressed its applications in diverse contexts such as space missions [17,37] and urban search and rescue [21]; investigated the problem of coordinating control in human-robot teams [109,138]; evaluated how changes in LoA afect task performance, situation awareness, workload, and acceptance [49,77]; and designed user interfaces for controlling the autonomy levels of multiple robots [47], moving across a continuum of LoAs [36], and delegating planning tasks to autonomous agents [84].As this research progressed, it began to revolve around several central problems: who initiates changes in autonomy, for what reason, and when [82,87,104].

Responsible Robotics
To achieve our objective of constructing a research agenda for variable autonomy based on responsible robotics, we must irst deine what responsible robotics is.In the past few years, numerous authors have attempted to provide a description that captures the dynamic and diverse landscape of research on the social and ethical issues associated with robotics.In a special issue of Frontiers in Robotics and AI, Brandão, Mansouri, and Magnusson [19] outline the aims of responsible robotics; as per these authors, the ield łshould focus both on identifying social and ethical issues, and on designing methods to account for (and alleviate) such issuesž [emphasis in original].Meanwhile, another special issue edited by van Wynsberghe and Sharkey [136] deines responsible robotics as łthe responsible research and innovation of robot development processes as well as the resulting products of such processes.žAlong similar lines, Winield et al. [141] provide the following deinition for responsible robotics: łResponsible robotics is the application of Responsible Innovation in the design, manufacture, operation, repair, and end-of-life recycling of robotics, that seeks the most beneit to society and the least harm to the environment.žFrom these three articulations, we see that responsible robotics is an instantiation of responsible (research and) innovation (RI) within the domain of robotics.RI, then, is described as an approach that aims to align the products and processes of research and innovation with societal values and expectations [see 106 , 122].Numerous authors have contributed to the conceptual foundations of RI over the past decade; therefore, we draw on this extensive corpus to sharpen the concept of responsible robotics.In doing so, we clarify terms in the preceding deinitions that have multiple, and oftentimes opaque, meanings in the literature: responsibility, innovation, approach, and societal values.
In their synthesis of moral responsibility and responsible innovation, van de Poel and Sand [133] distinguish between two interpretations of responsibility.The irst, backward-looking responsibility focuses on assessing a past sequence of events to attribute blame or praise for some outcome.It requires łthe ability and willingness to account for one's actions and to justify them to othersž [133].And the second, forward-looking responsibility entails an obligation to ensure that some future state comes about.This interpretation of responsibility implies anticipation of innovation outcomes on the part of those involved in the innovation process.Given the inherently uncertain nature of innovation and the unpredictability of its outcomes, attributing forward-looking responsibility for the breadth of an innovation's social, environmental, and ethical efects is challenging to adopt in practice [12].
The term innovation itself likewise has many faces in the RI literature.van den Hoven [134,80] ofers one such deinition: łInnovation is an activity or process which may lead to previously unknown designs pertaining either to the physical world (e.g., designs of buildings and infrastructure), the conceptual world (e.g., conceptual frameworks, mathematics, logic, theory, software), the institutional world (social and legal institutions, procedures and organization) or combinations of these, which ś when implemented ś expand the set of relevant feasible options for action, either physical or cognitive.žFrom this articulation, at least two interpretations of innovation are apparent: innovation as both a product and a process.The latter represents the act of innovating, while the former is the result.Other scholars have extended that deinition to include both the purpose ś the reasons motivating innovators [122] ś and people ś those involved in innovation activities [58].
Within the last decade, several academic and policy organisations have formulated multiple RI approaches.Two of the most prominent are those presented by von Schomberg [106] and Stilgoe et al. [122].From the world of policy, the EPSRC, the UK's main funding body for engineering and physical sciences research, has assimilated the work of Stilgoe and colleagues into its łAREAž framework for RI [93], constituted by four dimensions: anticipate, relect, engage, act [42].For clarity in writing, we present the dimensions here as though they are discrete; in practice, they overlap and build on one another.
First, anticipation refers to structured processes to identify and evaluate potential future scenarios and their associated impacts: both intended and unintended, positive and negative [122].As previously mentioned, innovation is rife with uncertainty; therefore, the goal is not accurate prediction, but anticipation of plausible and desirable futures towards which we guide innovation [74].Second, relection involves questioning underlying motivations, purposes, and assumptions, and understanding the boundaries of knowledge [122].Third, engagement is the inclusion of diverse stakeholder groups throughout the innovation process, enabling deliberation and debate during anticipation and relection.Despite the consensus in the literature that stakeholder engagement is essential for responsible innovation [74,132], questions remain on how to engage stakeholders with vastly distinct, and potentially incompatible, worldviews [12] and enable meaningful engagement [108].Finally, acting is about using the insights gained from the three prior dimensions to guide innovation along desired trajectories.
Innovators are then tasked with shepherding innovations according to the values of various societal actors.But, what exactly are values, and how are innovators meant to identify them?Value sensitive design, an approach that seeks to engage with human values during design processes, ofers some help; as per Friedman and Hendry [46,24], values are łwhat is important to people in their lives, with a focus on ethics and morality.žYet, as Boenink and Kudina [13,452] argue in their critique of values in RI, values are not łpre-given stable entities, ready made for relection.žThe meaning of a given value varies: from person-to-person, place-to-place, and time-to-time.The dynamism of values has implications for innovators' strategies to identify them.One method is to appeal to a priori deined lists of ethical principles.Such lists ofer a helpful starting point and heuristic for dealing with values in design [46]; however, a strict reliance on so-called łuniversalž values neglects those that are culturally contingent [14].Therefore, other authors advocate for an empirically-led approach to the identiication of values, engaging with people in their place and practice to understand what it is they ind important [13].A common critique against this line of thinking is that it falls victim to the naturalistic fallacy; that is, it assumes that the things people value are those they should value [14].Our own perspective sees merits in both strategies.As mentioned, pre-deined ethical guidelines provide a helpful basis for agreed-upon values.Yet, we also acknowledge that they should not be used too rigidly; it is crucial to consider the actual experiences of those impacted by a technology.Therefore, we draw from both strategies, noting how ethical guidelines can inform our understanding of values, but they must be complemented with an empirical investigation of those involved.
Responsible robotics applies elements of RI to the robotics innovation lifecycle to reach societal and environmental objectives.Responsibility for events that have yet to occur and those that have already come about is essential; the former depends on anticipatory practices, and the latter on transparency into past events and a causal understanding that links actions and outcomes.Innovation in robotics refers to its dimensions of process, product, purpose, and people: the how, what, why, and who of innovation.And following an RI approach emphasises anticipation of potential pathways, relection on motivations and assumptions, inclusive deliberation with impacted stakeholders, and responsiveness to the insights brought up through this process.We ground our approach to interpreting societal values in ethical guidelines for robotics, most of which agree that these systems should not harm individuals or the environment, promote human rights and well-being, maintain transparency, and ensure that human designers and operators remain responsible and accountable [140].International standards such as BS 8611:2016 Guide to the ethical design and application of robots and robotic systems [55], IEEE 7000-2021 Standard Model Process for Addressing Ethical Concerns During System Design [118], and IEEE 7001-2021 Standard for Transparency of Autonomous Systems [117] have been built on top of these shared principles.But, we equally emphasise that any study must include opportunities to relect on stakeholder values as they exist in their time and place.

Past Reviews
Researchers have conducted reviews that address similar topics to those covered in this study, as shown in Table 2.In an early paper, Bradshaw et al. [15] conducted a narrative review to distinguish the dimensions along which autonomy can be adjusted.Per Bradshaw et al. [15], autonomy includes both actions that one is capable of performing and those that one is allowed to perform; as such, a robot's autonomy can be adjusted according to what it is allowed to do, what it is required to do, what others think it could plausibly do, and what it is able to do.This initial taxonomy provided a helpful conceptualisation of the elements of autonomy that can be altered, but it did not ofer any insight into other dimensions of variable autonomy, such as who adjusts and why.More recently, Mostafa et al. [87] performed a systematic literature review to map the extent of research on variable autonomy for multi-agent systems.Their review speciies six design requirements: how autonomy is deined, measures to evaluate autonomy, available autonomy modes, which agent controls changes in autonomy states, patterns of human-agent interaction, and techniques to evaluate autonomy adjustments.Selvaggio et al. [110] provided a brief narrative review on shared control and shared autonomy in robotics.In this review, the authors' deinitions of shared control and shared autonomy resemble the distinction between adjustable and adaptable autonomy, respectively, as detailed in Section 5.1.Finally, O'Neill et al. [94] conducted a critical review on teamwork in human-autonomy teams.Importantly, their work excluded research on robotics because of the idiosyncrasies that arise from physical embodiment.
Table 2. Summary of related work.❍ indicates that a review does not focus on a given aspect, ◗ indicates that a review partially focuses on a given aspect, and • indicates that a review directly focuses on a given aspect.

Reference
Period Aspect

Robotics Responsible Robotics Methodology
Bradshaw et al. [15] 1996ś2004 This review difers from existing work across four aspects: (1) Period This review focuses on recent developments in variable autonomy for robotics, extending 6 years beyond the review by Mostafa et al. [87].While the review by Selvaggio et al. [110] aims to cover recent research, the authors did not intend to conduct a comprehensive survey and therefore did not include details on the time frame of papers included in their review.(2) Robotics While others have included both embodied and non-embodied artiicial agents in their reviews [15,87], we focus speciically on robotics.Robot's physically embodied nature allows them to move throughout and act upon an environment, as well as engage with people, in ways that traditional automation cannot [9].Therefore, focusing on robotics speciically enables us to engage with the technology's idiosyncracies.
(3) Responsible robotics The objective of this review is to establish a research agenda for variable autonomy that is based on responsible robotics.In contrast, the objectives of related work have been to construct general frameworks [15,87] or synthesise existing research [94,110].As far as we are aware, this is the irst study to focus on how variable autonomy can be approached through a responsible robotics lens.( 4) Methodology This study reviews the research designs, empirical sites, and evaluation measures employed in variable autonomy robotics research.In this sense, this review is similar to O'Neill et al. [94]; yet, as mentioned, their review explicitly excluded research on robotics.Meanwhile, Mostafa et al. [87,181] only briely touched on the methodology of variable autonomy for robotics, stating that ł[m]ost of the adjustable autonomy research results are obtained based on simulation programs... [and] hence, the results might lack valid testing.žWe strengthen their claim by providing evidence that the results of variable autonomy research may lack ecological validity given that most studies have been conducted in artiicial settings, such as simulations or contrived physical environments.

METHOD
Because of the unresolved conceptual and operational ambiguities surrounding variable autonomy, and our objective of specifying an approach to variable autonomy that is based on responsible robotics, we employ a łdevelopmental reviewž as proposed by Templier and Paré [126].

Search Strategy
To account for the diverse terminology in variable autonomy, we employ three data collection strategies: database, backward, and forward searches.First, we query four databases: ACM Digital Library Full-Text Collection, IEEE Xplore, Elsevier Scopus (Scopus), and Clarivate Web of Science (WoS).The irst two databases provide comprehensive coverage of papers published in ACM and IEEE conferences, prominent associations for computing and technology research.The latter, Scopus and WoS, likewise are known to have extensive and high quality coverage of journals and conferences [86].We construct keyword searches for each database based on terms identiied in previous reviews [87], consultations with researchers in variable autonomy, and informal database searches.The resultant keyword queries are shown in Appendix A. Searches were performed in January 2022.To focus on recent developments in the ield we restrict our search from 2010 ś 2021.Additionally, we only include results published in journal articles or conference proceedings, and written in English.This strategy yields a total of 294 papers.Additionally, we conduct a backward search by reviewing the reference lists of previous reviews and papers recommended by colleagues to identify further references.In parallel, we record seminal early works in variable autonomy based on recurring citations in papers identiied through the database search; these include the following works [3,6,17,18,21,37,47,59,60,76,77,84,90,103,104,109,138].Next, we conduct forward searching in the Scopus database ś retrieving papers that cite the previously stated seminal works or the review by Mostafa et al. [87].Together, backward and forward sampling result in an additional 438 papers.

Data Selection
Overall, our three search strategies lead to 732 papers.We then employ a multi-stage selection approach to identify relevant and representative papers.First, we remove any duplicate entries.Then, we review the titles and abstracts according to the following inclusion criteria: (1) Primary research: conceptual or empirical.
(2) Full text is available.
After this initial inclusion review, we are left with 154 papers.Given that we do not intend to provide an exhaustive review of the literature, we prioritise studies based on their publication venue and citation counts, two fairly reliable indicators of inluence [7].Top priority papers include those published in irst and second quartile journals for their respective discipline, as per the Scimago Journal Rank scheme, along with those published in conferences sponsored by the ACM and IEEE, given that these are the venues in which leading contributions are likely to be found [7,126].We make adjustments based on citation counts ś as reported in the paper's respective database ś to identify central contributions that were published in lesser-known venues.This approach strategically delimits the number of papers included in the review, while mitigating the bias towards highly-cited publications or those published in prominent venues.At this stage, a total of 67 papers are chosen.
Finally, we perform full-text reviews of each of the 67 papers, excluding those that are irrelevant according to the initial inclusion criteria, extended abstracts, shorter than 4 pages, or elaborated further by the same authors in a subsequent study.Ultimately, a sample of 42 papers are included for analysis.Figure 1 presents these reasons for exclusion in a PRISMA diagram [95].

Analysis
Our data analysis employs both deductive and inductive elements.The deductive elements are the categories delineated in Table 3; these were deined prior to data extraction.Meanwhile, the inductive elements were deined during analysis according to the data; these are represented throughout Sections 5.1ś5.3.We also extract bibliometric information, such as title, author(s), publication venue, abstract, and year.This scheme is coded into NVivo 12 to facilitate structured data extraction and analysis.
Throughout our analysis, we continuously review extracted segments: conceptually relevant extracts are grouped together and assigned an inductive code; these codes are added, combined, separated, or removed as further studies are analysed; and patterns among inductive codes are identiied to determine higher-level relationships to inform the development of our conceptual framework presented in Section 6 [7,83].
It is worth ofering further clariication on the Architecture category.Initially, we gather sub-codes from previous reviews [15,82,87].We follow a lexible approach where new dimensions are added, while some dimensions found in previous reviews are excluded.As an example, past reviews do not diferentiate between changes in autonomy determined before operation or at run-time; our distinction between goal-oriented and stimulus-driven approaches captures this nuance.We expand on the similarities and diferences between the dimensions of VA proposed in this paper with past work in Section 7.
In summary, we aim to reconcile the conceptual and operational ambiguity around implementations of variable autonomy to devise an approach relevant for responsible robotics.With this aim in mind, we employ a developmental review of recent work in the variable autonomy literature.We leverage three search strategies to ensure a breadth of coverage, combined with a prioritisation strategy that delimited the corpus to a manageable number of prominent and representative publications.Finally, we employ an analysis approach that draws on deductive and inductive elements; the results of this analysis are presented in the following sections.

Limitations
We now deal with four limitations of our study.First, search queries are an inherently restrictive sampling strategy: only papers which use equivalent language will be returned as a result.Therefore, those which employ dissimilar language yet are still relevant will be excluded.We attempt to mitigate this risk by developing an extensive search query as shown in Appendix A. The terms in the query are gathered inductively by the irst author from early papers and past reviews on variable autonomy; the search query was then reviewed by the second and third authors and revised accordingly.Additionally, we use multiple sampling strategies, such as forward-and backward searches, to further ofset this limitation.
Second, the process of data selection and analysis includes numerous decisions that may impact the internal validity of results.Therefore, we iteratively develop a data selection and extraction protocol.The data selection protocol is encoded in Microsoft Excel and the data extraction protocol is encoded in Nvivo 12 to support consistency.
Third, our search strategy draws from four sources of data: Scopus, WoS, IEEE Xplore, and ACM Digital Library.While each of these databases index high impact conferences and journals, some relevant papers may be omitted.Nonetheless, the number of data sources in our review exceeds the minimum of 2 suggested by Shea et al. [113].
Finally, we build our research agenda from what is currently possible from the perspective of technical research on variable autonomy.As such, research that does not focus on the design and implementation of variable autonomy is excluded from our search strategy.An implication of this choice is that studies which adopt a qualitative orientation to human-robot interaction and social robotics may not be included.While there is a productive community of scholarship that takes a qualitative approach to the study of human interactions with robots [e.g., 81,142], as far as we are aware, such studies have not yet been extended to variable autonomy implementations.

DATA DESCRIPTION
Our review includes 42 papers published in journals and conferences spanning from 2010 to 2021 and a diversity of application domains and robot technologies.The list of publication venues covered in this review is included in Appendix B. In this section, we present a brief description of our dataset.The intention of these statistics is not to infer properties of variable autonomy research in general, but depict the breadth of publications included within our review.
As shown in Figure 2, the number of publications is fairly constant across the 12-year period between 2010 and 2021.Our dataset is evenly distributed, with half of the papers (21 of 42) published between 2010 ś 2015, and the remaining half published between 2016 ś 2021.
Figure 3 shows the application domains addressed in the reviewed papers.The most common are search and rescue (13 of 42) [2, 22, 25ś27, 38, 39, 45, 73, 79, 102, 124, 131] and military (9 of 42) [30,34,39,62,88,105,119,143,144] contexts: the former refers to the use of robotics to identify and rescue missing persons in, for example, disaster scenarios; the latter includes the use of robotics for military operations such as surveillance, reconnaissance, and defence.8 of 42 papers do not state a speciic domain and are categorised as generic.

RESULTS
In this section, we present our results framed as responses to each of our three research questions.First, we review common deinitions of variable autonomy in the literature and distill their central features.We contextualise these deinitions with the motivations for conducting variable autonomy research, and present a comprehensive deinition.Second, we describe the process of variable autonomy research, focusing on research designs, research sites, and evaluative measures.Third, we present a taxonomic representation of variable autonomy implementations across four dimensions, stated informally as questions: who initiates changes in autonomy, what aspects of autonomy are adjusted, when are changes determined, and why do changes occur?We provide formal characterisations of each dimension in turn.
5.1 RQ1: How is Variable Autonomy Defined in the Literature?
As alluded to in Section 2.2, the literature on variable autonomy lacks consistent terminology.Diferent terms are given equivalent deinitions; similar terms are alternatively deined; and some terms are given no deinition at all.Further, there is no central deinition to which authors commonly refer.Therefore, we propose a comprehensive deinition that, when combined with the dimensions of variable autonomy discussed in Section 5.3, ofers precision when describing robots with variable autonomy.
Of the 42 papers, the authors of 30 explicitly deine their conceptualisation of variable autonomy; and across these 30 papers, 6 diferent terms appear.These terms, listed from highest to lowest number of appearances, include adjustable autonomy, adaptive autonomy, variable autonomy, sliding autonomy, adaptable autonomy, and dynamic autonomy.For some authors, the choice between these terms signals diferent approaches to variable autonomy.Adjustable and adaptable autonomy, on the one hand, may represent systems in which changes in a robot's autonomy are initiated by a human operator, whereas adaptive autonomy describes systems in which changes are triggered by the robot agent [51,62,131].Valero-Gomez et al. [131,703] ofer a representative distinction: ładjustable autonomy, in which the operator has initiative over the autonomy level; adaptive autonomy, in which the autonomy level is adjusted depending on the task and contextž [emphasis in original].From this deinition, we see that adjustments in autonomy are associated with particular conditions of the context of use and can be initiated by either a human or artiicial agent.
Most of the papers which identify their approach as adaptive autonomy align with this distinction [2,30,45]; meanwhile, those that employ adjustable autonomy use the term much more loosely.Speciically, these authors refer to adaptive and adjustable autonomy, along with other terms such as sliding autonomy, inconsistently or interchangeably [8,22,70,73,101].For example, Basich et al. [8,124] deine adjustable autonomy as łthe ability of an autonomous system to alter its level of autonomy during plan execution, often by dynamically imposing or relaxing constraints on the extent of actions it can perform autonomously in a human-agent team.žSimilarly, Lewis et al. [70,1657] refer to adjustable autonomy as łhaving the robots alter their level of autonomy in a situationally-dependent manner.žNext, an example of interchangeable use of terms, Roehr and Shi [101,508] state that łsliding autonomy also known as adaptive/adjustable autonomy and mixed initiative control is one area...[motivated by] increasing the eiciency of mixed [human-robot] teams by adjusting the autonomy level of individual robots.žHere, sliding, adaptive, and adjustable are treated as equivalent terms, and the focus of the deinition shifts to human-robot collaboration.These deinitions complicate the adaptable/adjustable and adaptive autonomy distinction, and point out the dynamic nature of autonomy in variable autonomy systems.
The third most common term is variable autonomy, favoured by Chiou and colleagues [25ś27, 100].Chiou et al. [25,2] indicate that a łvariable autonomy system is one in which control can be traded between a human operator and a robot by switching between diferent Levels of Autonomy.žIn comparison, this deinition makes no claim as to who efects change; the emphasis is instead on what is changed.
Despite the inconsistent terminology, researchers' motivations for pursuing variable autonomy are fairly similar.Researchers position variable autonomy as a strategy for groups comprised of both humans and robots to interact with one another, thereby balancing the strengths and limitations of autonomy with those of human operators.In particular, autonomous robot behaviour is seen to reduce operator workload, stress, and fatigue, and compensate for losses in an operator's situation awareness: the ability to sense and perceive the robot's operating environment [26,30,38,100].Human operators, on the other hand, are valued for their ability to respond to and navigate complex and uncertain environments [27,70,73,87,91,116,131].Researchers, implicitly or explicitly, view this capability balancing as a means to improve the efectiveness, eiciency, and safety of the joint human-robot team [25,31,85,88,101,107,116]. Two papers ofer an alternative framing, instead stating that the motivation for variable autonomy is to enable automation to adapt to the needs of human operators [51,85].
From the preceding discussion and the results presented in Section 5.3, ive fundamental concepts related to variable autonomy arise.The irst two, levels of autonomy and dynamism, are closely linked.That is, the robot must possess multiple LoAs and the capacity to move between them during operation.Importantly, these changes can be initiated by either the human, robot, or both.Next, variable autonomy is an interaction strategy for groups comprised of both human and robot agents, each of whom possess distinct capabilities.As such, human-robot interaction considerations are central to the operationalisation of variable autonomy.Finally, changes in autonomy are deliberate: contextual cues trigger an adjustment from one LoA to another.Drawing together these concepts, we propose the following deinition for variable autonomy in robotics.
An interaction strategy between human and robot agents in which the robot's level of autonomy varies during operation in response to changes in context.
This deinition makes explicit the ive fundamental concepts of variable autonomy, whereas many of these are omitted from the reviewed deinitions.Additionally, it includes both systems in which changes in autonomy are initiated by either the human, robot, or a combination of both; the intention is that this merging will remove unnecessary separation between related research eforts.

RQ2: How is Research into Variable Autonomy Conducted?
In this section, we discuss three features of variable autonomy research: the research design employed, the research site, and measures used for evaluation.Reporting on the research design and site provides insight into the state of variable autonomy research, and relatedly, the robustness of results.Depending on how results are generated and in what context they arise inferences can be made on their validity.And the measures researchers choose for evaluation and comparison reveal the qualities valued in variable autonomy implementations.

Research Design.
Ordered from most to least common, variable autonomy researchers report on a range of research designs, as shown in Table 4: experimental, simulation, ield tests, conceptual, and surveys.All research designs besides those categorised as conceptual or survey were task-oriented: a human-robot team, whether real or simulated, had to complete some predeined task.
Experimental designs refer to studies in which human participants act as a robot operator and perform a series of tasks under varying experimental conditions.Many experimental studies involve participants operating a robot across multiple LoAs while performing a secondary task, such as responding to questions [73] or mentally rotating 3D objects [25].Secondary tasks enable researchers to test operators' situation awareness [26] and induce cognitive load [25ś27, 34,85].The participants in these studies constitute a relatively homogeneous population: 11 of 28 experiments rely on undergraduate and graduate students from the authors' respective universities [24,25,30,34,52,73,92,131,143,144] and 5 of 27 employ members of the research team [23,61,88,102,128].For 11 of 27 experimental papers, the participant sampling strategy is unclear [26,27,31,38,62,70,91,111,116,124,137], and 1 paper recruits participants from the lead author's research institution [105].
There are three variations of experimental design: within-subjects, between-subjects, and single-subject; it is unclear which approach is followed for 6 papers.The diferences between these three refers to how many experimental conditions, or independent variables, each participant experiences.A within-subjects design has each participant experience each condition, whereas between-subjects exposes each participant to only one condition; for both within-and between-subjects, either one or multiple conditions can be tested.Most withinsubjects experiments are single factor, meaning they only test one independent variable across each participant; these studies compare diferent implementations, such as teleoperation and variable autonomy [24,38,70], variable autonomy with other static levels of autonomy [27,92,116,124,144], or systems in which changes in autonomy are triggered by the system or the human operator [25,62,143].Meanwhile, the remaining withinand between-subjects studies test multiple independent variables, such as implementation (e.g., static vs. variable autonomy), operator and robot workload, and task diiculty [26,30,34,73].Three studies test unique conditions, such as diferences in interfaces [105], alerts for changing autonomy [52], and number of robots [131].Lastly, the single-subject designs imply that the study includes only one participant, a design used in preliminary work [23] or as a supplement to ield tests [102].Simulations, on the other hand, rely on numerical experiments within a virtual environment.For 7 of 9 papers that employ a simulation design, it serves as preliminary validation for a proposed variable autonomy architecture [8,22,28,45,101,119,135].In contrast, Miller et al. [85] compare the predictive performance of diferent information streams for triggering shifts in levels of autonomy, including signals from human control, autonomy, and the environment.
For the remaining papers, 4 report on ield tests, such as in robotics competitions [79,91,102] or navigation through diicult terrain [107]; 3 papers introduce conceptual frameworks for performance measures to trigger adjustments in autonomy [2,100,145]; 1 paper presents the results of a survey that explores how older adults would respond to changes in a social robot's autonomy if triggered automatically or by the user; and for 1 paper it is unclear whether the results are from a simulation or experiment [39].
In summary, the majority of studies in this review employ an experimental design.Across these studies, the participants come from a limited subset of possible populations.Additionally, the experimental design varies from study to study, making it diicult to compare results.
These results show that most variable autonomy studies take place in artiicial settings, whether in contrived physical environments or simulations.Variable autonomy implementations, therefore, are not evaluated in

Evaluation Measures.
Researchers who conduct experimental studies and ield tests employ an array of constructs and associated measures to evaluate variable autonomy implementations.Within the reviewed studies, constructs fall into two categories: capability constructs, which focus on the performance of either the operator or the robot in completing a predeined task, and collaboration constructs which characterise the quality of collaboration between the human and robot.Tables 5 and 6 detail the capability and collaboration constructs, respectively.Across each construct, measures are either objective or subjective, a common distinction in the HRI literature: the latter refers to measures that draw from the experiences and perceptions of the participant, commonly recorded through Likert-style surveys provided after the experiment; the former refers to data that is łindependentž of the participant, recorded manually by the researcher or through devices such as sensors and timers.
Capability constructs include efectiveness, eiciency, safety, situation awareness, adaptability, border-line functioning, and workload.Objective measures of efectiveness and eiciency such as whether the primary task of operating the robot was successfully completed, the number of errors, and task completion time are ubiquitous.Many of these are idiosyncratic to each study, such as the number of targets accurately identiied in a surveillance mission [34] or total area explored in a search and rescue simulation [131].For studies in which errors are associated with vehicle collisions, researchers interpret primary task success rate as a measure of safety [25ś27, 38,52].Relatedly, Zieba et al. [144,381] employ two unique constructs of adaptability and border-line functioning, which refer to the ability of the system to manage issues and łborder-line use conditions in a given operational mode,ž respectively.Whereas measures of efectiveness and eiciency are exclusively objective, workload and situation awareness include measures both drawn from the operator's experience and behavioural data.A common instrument for measuring subjective mental workload of task execution is the NASA Task Load Index (NASA TLX) method [25ś27, 30,34,73,85,116,124].The NASA TLX is a well-established survey, composed of six dimensions: mental demand, physical demand, temporal demand (i.e., how rushed a participant felt), efort, performance, and frustration level.After completing a trial, participants rate their response to each question on a scale from low (1) to high (20).While not all papers directly use the NASA TLX survey, some include closely related questions covering task diiculty [62] and perceived stress [38].These are combined with objective measures of workload, including operator energy expenditure, calculated in terms of mechanical work [92], amount of information exchanged between operator and robot [39,79], and time spent in each LoA [107].Similarly, a combination of objective and subjective measures represent situation awareness.For example, Kidwell et al. [62] interpret participant performance on secondary tasks as an indication of situation awareness [62], while Côté et al. [31] infer situation awareness from the amount of environmental information displayed on a GUI throughout the duration of the experiment.
Constructs that refer to human-robot collaboration include interaction efectiveness, interaction eiciency, automation reliance, trust, conidence, and acceptance.The number of LoA switches [25,62,101,105] and time spent in each LoA [25,52,62,91,105,131], along with the number of human-robot interactions [31] and operator reaction time [143,144], are collectively interpreted as relecting the efectiveness and eiciency of interactions, how reliant participants are on automation, and the trust participants have in the robot.Owan et al. [92] evaluate participants' level of comfort engaging with the robot as a subjective measure of collaboration efectiveness.Similarly, three studies include questions to gauge participants' trust in automation [34,62,92].Finally, measures of acceptance were mainly informal survey questions, asking participants to state their preferences between control modes [62,73,92,116,143], intention to use, and perceived usefulness [51].
The use and interpretation of measures varies signiicantly across the studies.For example, the number of LoA switches and time spent in each LoA is interpreted as an indicator of operator reliance on autonomy [105], trust [52], interaction eiciency [91,101], and interaction efectiveness [131].Even more, many studies do not explicitly state which constructs their measures are associated with.Additionally, the use of established subjective measures beyond the NASA TLX survey is limited.For other constructs such as situation awareness, trust, and acceptance, researchers rely on informal measures developed for the study at hand.Two exceptions are the experiments by de Visser and Parasuraman [34] and Owan et al. [92]: the former draws from the Situation Awareness Rating Technique by Taylor [125] and trust and self-conidence measure of Lee and Moray [69], and the latter adapts a questionnaire for human-robot collaboration luency from Hofman [53].Finally, there are instances of joint use of objective and subjective measures to converge on a given construct.For example, Schaefer et al. [105] infer trust in automation through both the number of LoA switches and responses to trust questionnaires.

RQ3: How is Variable Autonomy Implemented?
Implementations of variable autonomy difer across four dimensions, stated informally as questions: who initiates changes in autonomy, what aspects of autonomy are adjusted, when are changes determined, and why do changes occur?Each dimension includes several attributes.In this section, we detail the four dimensions in turn and describe the variety of considerations designers manage when constructing variable autonomy systems.An overview of the four dimensions and associated attributes is provided in Figure 5.

Initiative.
A longstanding concern in variable autonomy is who initiates changes in autonomy.Whether it is the human, robot, or a combination of both represents our irst dimension.We distinguish between these three types ś human initiative, system initiative, and mixed initiative ś and relect on the implications of each.
Human initiative (HI) refers to implementations in which the human operator has sole capacity to change the robot's autonomy.In one study, Lin and Goodrich [73] design an interface that enables an operator to manage the behaviour of a simulated UAV by setting the amount of time allocated to autonomy.In this instance, the human operator interprets information provided by the GUI to make a judgment on the appropriate level of autonomy during the task.While the information provided by Lin and Goodrich's interface is continuous, Bush et al. [22] present an architecture in which the robot issues a request for an autonomy switch based on the predicted likelihood of goal completion.Importantly, the robot could not initiate the change in autonomy itself and the operator could reject requests for assistance.Therefore, humans retain full control of changes in autonomy for HI variable autonomy systems, and may receive information that guides their decision on when to initiate a change either through continuously available information on an interface or discrete alerts sent by the robot.Besides serving as a medium for information on when to intervene, interface design also inluences a human operator's propensity to initiate autonomy changes.Schaefer et al. [105] ind that operators are more likely to adjust a robot's autonomy when the interface is familiar; drawing on past work in automation reliance, the authors suggest that the familiarity of interfaces mediates human trust in and reliance on robots.
Rather than relying on a human operator to adjust a robot's autonomy, system initiative (SI) implementations enable the robot's autonomy to change automatically.Speciically, a łcontrol switcherž [25] ś an artiicial agent, such as a learning algorithm [38,61,70,92,119], fuzzy controller [111], Markov Decision Process [135,137], or inite state machine [79] ś adjusts the robot's autonomy.For example, Doroodgar et al. [38] develop a hierarchical reinforcement learning algorithm that allocates control for performing a task to either a human operator or robot according to whichever agent is predicted to do so more eiciently.These systems obviate the need for human intervention in autonomy switches, yet still require human involvement.A transition from autonomous behaviour to teleoperation demands the availability and awareness of a human operator who is willing and able to assume a greater degree of control following a period of passivity.Research on self-driving cars discusses the risk of łvigilance decrementž on behalf of operators when they remain in a passive state for an extended period of time [64].
Finally, mixed initiative (MI) implementations integrate the previous two types: both human operator and control switcher are able to initiate autonomy changes [25,88,100,144].The operator and robot must collaborate to determine the appropriate level of robot autonomy, with the most capable either seizing or being granted control [25].As characterised by Chiou et al. [26], this implies that both the robot and human must have an understanding of the other's state, knowledge, and capabilities.Recent experimental work by Chiou et al. [25] inds that MI systems improve performance and operator workload during navigation tasks as compared to HI systems, at least in a simulated environment.

Specificity.
When developing a variable autonomy system, designers must specify what aspects of autonomy are subject to variation.Approaches found in the literature adjust autonomy between two or more discrete operation modes, or at a granular level of control for autonomous behaviour.
Traded control approaches shift between two extremes: manual and autonomous control [24,26,119].A concern for this approach is that operators lose situation awareness during periods of inattention, and struggle to regain control after the robot's autonomous behaviour decreases [52].Cosenzo et al. [30] attempt to mitigate this risk by continuously reengaging the operator.Similarly, discretised control implementations include predeined LoAs with intermediate degrees of autonomy [28,31,85,143].As no studies in the review compare traded and discretised control implementations, there lacks evidence on the trade-ofs associated with employing either approach.
Granular control implementations do not conceptualise operation modes in terms of discrete LoAs.Instead, they adjust autonomy by constraining or expanding the functions a robot and human are allowed to do, required to do, and able to do [73,128,144,145].The continuous scale approach requires designers to exercise greater speciicity in deining what autonomous behaviours will be adjusted; for example, Lin and Goodrich [73] set constraints on where a UAV could operate under autonomous behaviour and for how long.

Flexibility.
A variable autonomy system is one in which the robot's autonomy changes during operation.Some variable autonomy implementations provide greater lexibility in the number and timing of these adjustments than others.Our third dimension diferentiates between systems in which changes in autonomy are deined a priori or occur dynamically.
In goal-oriented variable autonomy systems, when and what autonomy changes occur are deined before operation.In a study by Small et al. [116], the authors introduce a goal-oriented variable autonomy system, termed łAssigned Responsibility,ž in which various segments of a task are assigned an LoA before operation, and the robot monitors the progress of task completion to automatically change LoAs as it moves from one segment to the next.This approach imposes rigidity on the system, but, as Small et al. [116] suggest, reduces the operator's cognitive load and enables designers to explicitly state when automation will be used to align with legal and ethical considerations.
Stimulus-driven autonomy adjustments imply that all decisions related to changes in autonomy take place at runtime [2,22,24,28,31].The human operator or control switcher dynamically adjusts autonomy during task execution, without following a prescribed set of changes.These approaches enable greater lexibility and the ability to respond to unpredictable circumstances, but introduce a degree of uncertainty in robot behaviour.
Of course, the choice is not binary.Some implementations, such as that proposed by Romay et al. [102] and Mostafa et al. [88], adopt a hybrid approach, in which designers deine a relative LoA for various task segments during the design stage while the operator retains the ability to make adjustments on-the-ly.

5.3.4
Trigger.According to our deinition for variable autonomy, autonomy adjustments occur because of some change in context.Inluenced by previous taxonomies for triggers in adaptive systems [43,97], we organise triggers for variable autonomy systems into four categories: task, operator, system, and environment.
Task triggers address aspects of the task which the human-robot team performs, relying either on a measurement of the task's state or the properties ascribed to individual tasks by designers.Variable autonomy systems calculate task state indicators such as completion status [116,124] and predicted likelihood of failure [22,101].In the goal-oriented approach by Small et al. [116], the system monitors task progress to automatically change LoA as the robot moves from one task to the next.Task completion is represented as an observable state of the world to which a current state is continuously compared against.These triggers require the system to sense its surrounding environment and relate environmental conditions to the ongoing task.Another grouping of task triggers address properties of the task itself: some studies distinguish between types of tasks, labelling some as sensitive and therefore requiring human, rather than autonomous, control [102,137]; others switch between control modes as the relative diiculty of a task changes [30,34,39,88,143].For example, de Visser and Parasuraman [34] develop a system initiative architecture that moves from manual to autonomous control as task load, deined in terms of the number of vehicles under an operator's supervision, increases.Similarly, Mostafa et al. [88] develop a system that varies its autonomy according to a task's complexity, calculated by the number of individual actions required to complete it.
Operator triggers relect the states and decisions of the human operator.Several studies attempt to infer internal properties such as operator workload through physiological sensors [143] and competence level through the amount and quality of human input [70,85,111].Zhao et al. [143] employ eye trackers and sensor-enabled wristbands to measure cognitive processing and stress levels, while Lewis et al. [70] develop a model of expertnovice diferences to increase the degree of autonomy when lower-skilled operators engage with the system.Whereas such systems require the ability to sense aspects of the operator, others defer to an operator's own judgment.Some studies indicate that an operator's judgment on when to adjust a robot's autonomy may be inluenced by individual characteristics such as personality, preferences, trust, and experience with robots [25ś27, 62,101].
System triggers refer to events and states internal to the robot.There are two varieties of system triggers: monitoring and error detection.The diference between the two is one of severity: monitoring approaches measure gradual changes in system performance; error detection focuses on discrete failures in autonomy.By comparing current to expected performance, monitoring techniques initiate changes in autonomy whenever system performance falls below a given threshold [24,25,61,119].A recent paper by Ramesh et al. [100,303] proposes the concept of łrobot vitals,ž a composite measure of performance in multi-robot systems.The vitals include łrate of change of signal strength, sliding window average of diference between expected robot velocity and actual velocity, robot acceleration, rate of increase in area coverage, and localisation error.žThe authors argue that the relative simplicity of their measure supports explainability in a robot's decisions.Meanwhile, error detection triggers changes whenever the autonomy fails [52,91,107,124,144].
Finally, environment triggers capture the circumstances of the robot's external environment.For example, a robot may enter a manual control mode when entering a novel environment or encountering unforeseen events [79].Likewise, changing environmental conditions such as weather and obstacles may require an operator to take or relinquish control from the autonomy [92,107,135].Robots must be able to sense their surrounding environment for these triggers to function.

DISCUSSION: DESIGN GUIDELINES FOR VARIABLE AUTONOMY THROUGH RESPONSIBLE ROBOTICS
We reviewed 42 recent papers on variable autonomy to investigate how variable autonomy is deined in the literature (RQ1), how research into variable autonomy is conducted (RQ2), and how variable autonomy is implemented (RQ3).Overall, our review makes four contributions.First, we provide a deinition of variable autonomy synthesised from past deinitions in the literature.As shown by our results in Section 5.1, the variable autonomy literature employs diverse and inconsistent terminology and deinitions.We attempt to clarify the ield's language by ofering a synthesised deinition that builds on past articulations and incorporates the four dimensions of variable autonomy.Second, we detail the research designs, sites, and measures employed in the literature to support rigorous empirical research.We provide evidence for the concern that the results of variable autonomy research may lack ecological validity given that most studies have been conducted in artiicial settings, such as simulations or contrived physical environments [87].As such, these studies have not been evaluated in contexts that relect the dynamism and complexities of the real-world.Additionally, we highlight how variable autonomy research follows a restrictive deinition of relevant stakeholders, focusing only on the role of the operator rather than any other implicated group such as bystanders or passengers.Further, we point to the ield's limited modes of evaluation; most empirical studies rely on homegrown measures, rather than utilising validated instruments, and do not include qualitative evidence surrounding people's experiences with variable autonomy robots.These challenges are not restricted to variable autonomy, but have been noted in the ield of human-robot interaction more broadly [33,63].
Third, we distil previous characterisations of variable autonomy to provide a heuristic for designers when deining requirements for variable autonomy robotics.In particular, we deepen the description of the triggers that initiate changes in autonomy and introduce the dimension of łlexibilityž to distinguish between implementations that allow for changes in autonomy to be determined before operation or at run-time.Further, previous reviews include several dimensions which we argue are not speciic to variable autonomy, but are relevant to autonomy more broadly; these include human-agent interaction, autonomy representation, and autonomy measurement [87].Therefore, our taxonomy ofers a concise formulation of aspects that distinguish variable autonomy from other human-robot interaction strategies.
Finally, in this section, we draw inspiration from Jirotka et al. [58] and Amershi et al. [4] to present 11 design guidelines (DG1 ś DG11) that will help researchers approach variable autonomy through a lens of responsible robotics.These guidelines, depicted in Table 7, touch upon the product and process of innovation, as introduced in Section 2.3, and build upon the results from our review in Section 5.

DG1. Select ethical robotics principles.
There are several resources that outline ethical principles for robotics [see 140].Select one as a basis for ethical relection throughout the duration of the research and innovation process, while remaining lexible so the principles can be adapted to it the circumstances of project stakeholders.
DG2. Determine the objectives of the robotic system.As shown in the discussion of researcher motivations and evaluation measures, the values underpinning variable autonomy research are predominantly performance-based.The concern is how to enable a human and robot to interact with one another to achieve some objective.Yet, a responsible robotics approach to variable autonomy entails a wider range of goals, such as supporting stakeholder physical and psychological well-being and minimising environmental harm.DG3.Identify relevant stakeholders beyond users.Stakeholders are łthose who are or will be signiicantly implicated by the technologyž [46,35].A stakeholder can be one who directly interacts with a technology or one who does not interact with it but is still impacted by its use ś a distinction between direct and indirect stakeholders, respectively.Within variable autonomy research, most participants assume the role of operators.This presents an abstraction of how robots would be used in practical contexts; for example, there are networks of diferent humans who exist in the robot's operating environment.The IEEE 7001-2021 Standard for Transparency of Autonomous Systems [117] includes several categories of direct and indirect stakeholders to consider, such as non-expert users, domain expert users, superusers, the general public, and bystanders.Take the scenario of an assistive robot within a care home: the intended user may be an older adult with support needs, but she does not live in isolation.She is likely supported by a network of family members, friends, care workers, and physicians.Each of these groups may have separate experiences of and responses to the use robots with variable autonomy.DG4.Conduct ethical risk assessment.The British Standard 8611 (BS8611) outlines a systematic approach to identify, analyse, and mitigate ethical hazards associated with the design and application of robots [55].It includes a taxonomy of twenty ethical hazards that designers can draw from to reduce the efect of ethical harms; that is, harms that compromise psychological, societal, or environmental well-being.
DG5. Sample representative participants from stakeholder populations.As shown in this review, research on variable autonomy ś and human-robot interaction more generally [63] ś relies on non-representative groups ś namely, university students and members of the research team ś to act as prospective robot operators.These groups may not actually display the same characteristics as future relevant stakeholders given diferences in age and professional history.Therefore, the preferences and attitudes towards robots expressed by these study populations may not represent those of other populations.
DG6. Create research design with stakeholder input.Collaborate with stakeholders to determine where the study will be conducted, the tasks to be performed, how diferent types of stakeholders will be included, and whether the approach is acceptable.From this process, researchers should clearly specify the research design employed.For example, when following an experimental set-up, researchers should articulate whether it follows a between-, within-, or single-subject(s) design, the independent variable(s), the evaluative measures (along with what construct each is meant to operationalise), and the research site.Additionally, researchers should use this as an opportunity to extend beyond the traditional experimental paradigm, towards studies that focus on łhow real people, in real-world environments, would interact face to face with a real robotž [33].That is, research should evaluate variable autonomy implementations in contexts that relect the dynamism and complexities of the real-world. .There are numerous reviews that outline common measures used in human-robot interaction research [e.g., 29,32,78,89,121].While quantitative evaluation allows for comparison across individuals and the potential for generalisable knowledge, it misses out on the meanings and values people ascribe to phenomena within speciic contexts.Qualitative methods such as interviews [10,11] and ethnography [57] enable researchers to engage with such concepts.DG8.Match initiative to context.Deciding who has the authority to initiate changes in autonomy has implications for the performance of the human-robot team, the experience of the human operator, as well as the experiences of other people who either directly or indirectly interact with the robot.The choice between human, system, and mixed initiative implementations entails trade-ofs between factors such as human control, eiciency, and consistency and therefore should be made in relation to the context in which the robot will be used.DG9.Support speciic control modes.Levels of autonomy are a useful construct to help us understand variation in autonomous capabilities.But, in actual implementations, diferent autonomous capabilities are allocated to diferent functions and may change depending on the activity [16].Therefore, greater speciicity of choice in autonomous capabilities, such as in discretised and granular control, enable the robot and human to ine tune autonomous capabilities to the current situation.DG10.Enable lexible autonomy changes.The lexibility of the variable autonomy implementation concerns the designer's ability to specify a priori the types of behaviour which will be performed under certain control modes.Goal-oriented approaches enable designers to pre-deine the allocation of autonomous capabilities.Deining exactly when the robot will operate with certain autonomous capabilities is useful in regulated or safety critical contexts where the use of autonomy to perform certain tasks may be restricted.Yet, this regulation of autonomous behaviour increases its rigidity, and may preclude the operator or the robot from adapting dynamically in uncertain and unforeseen situations.Dynamic adjustments in autonomy, as stated before, imply greater variability and unpredictability in behaviour: an operator may be unprepared to regain control when it is handed back to her, or she may retake it when she is not suited to perform the task at hand.Hybrid approaches, therefore, provide a middle-ground route where certain behaviours can be assigned to autonomous capabilities beforehand, while retaining the system and/or operator's ability to make adjustments as changes in context arise.
DG11.Select appropriate triggers.Responding to changes in context requires the use of sensing capabilities.These triggers, such as those that infer an operator's state or environmental conditions, may introduce privacy and security concerns depending on the type of data collected.Audio and video data of the operating environment may capture personal information if used in a sensitive context such as a person's home.Data collected within search and rescue and military applications may depict traumatising experiences or conidential national security details.Decisions on the types of triggers should be made on a case-by-case basis, as the operating environment determines the data that is likely to be collected.Diferent jurisdictions face diferent regulatory requirements for data collection and processing, and these should serve as a foundation for these decisions.
An important property of these design guidelines is that they are not speculative; several of these recommendations have already been successfully applied on numerous robotics projects.First, we begin with the process-oriented design guidelines (DG1 ś DG7).In a project on accidents involving autonomous vehicles, Ten Holter et al. [127] describe how they based their approach on the AREA framework (DG1) and drew on the expertise of stakeholders such as insurers, scholars, engineers, pedestrians, and cycling groups to inform their research plan (DG3, DG6).Meanwhile, McGinn et al. [80] build on BS8611 to conduct an ethical assessment of a real-world distinfectant robot used in a hospital in Ireland (DG4).Moving towards the design guidelines focused on product (DG8 ś DG11), our recommendations have been drawn from the technical literature: Small et al. [116] point to the utility of system initiative architectures in predictable environments (DG8); the work of Lin and Goodrich [73] introduces an innovative strategy to enable speciic modes of autonomy adjustment (DG9); Romay et al. [102] and Mostafa et al. [88] enable lexible autonomy changes (DG10); and Ramesh et al. [100] propose a unique set of performance monitoring measures that support explainability in a robot's autonomy adjustment decisions (DG11).

FUTURE WORK
This paper's objective is to establish a research agenda for variable autonomy based on responsible robotics.The relationship between these two areas is in its early stages and has yet to be investigated through primary research.Therefore, we propose the following research agenda.
Responsibility.As discussed, there are two notions of responsibility: forward-and backward-looking [133].Forward-looking responsibility depends on the anticipation of consequences.Therefore, we ask what concerns and challenges do stakeholders anticipate regarding the use of variable autonomy robotics, particularly across diferent design conigurations.This inquiry will seek to provide an empirical basis for our initial explorations of impacts discussed in this section and to conceptualise how variable autonomy design features can mitigate the adverse consequences of robotics in varied contexts.Next, backward-looking responsibility requires the ability to assess a past sequence of events.We are exploring the concept of an Ethical Black Box (EBB), a device similar to a Flight Data Recorder that continuously records sensor inputs, actuator outputs, and relevant internal status data to facilitate accident investigations involving robots (see [141] and [139] for further discussion on the concept of an EBB).As such, we are interested in how variable autonomy can be incorporated into EBB recordings and how relevant information can be interpreted during accident investigations.

Fig. 1 .
Fig. 1.PRISMA-style flowchart depicting the sampling strategy for the developmental literature review.Figure is adapted from Page et al. [95].

Fig. 3 .Fig. 4 .
Fig. 3. Application domains addressed in the reviewed papers.The sum exceeds the number of papers reviewed due to papers discussing multiple domains.

Table 4 .
Research designs employed by authors in the reviewed studies.The sum exceeds the number of papers reviewed due to papers reporting on results from multiple research designs.

Table 5 .
Capability constructs and associated objective and subjective measures focus on the performance of either the operator or the robot in completing a predefined task.

Table 6 .
Collaboration constructs and associated objective and subjective measures characterise the quality of collaboration between the human and robot.

Table 7 .
11 design guidelines for variable autonomy research based on responsible robotics.