From Inanimate Object to Agent: Impact of Pre-beginnings on the Emergence of Greetings with a Robot

The very first moments of co-presence, during which a robot appears to a participant for the first time, are often “off-the-record” in the data collected from human-robot experiments (video recordings, motion tracking, methodology sections, etc.). Yet, this “pre-beginning” phase, well documented in the case of human-human interactions, is not an interactional vacuum: It is where interactional work from participants can take place so the production of a first speaking turn (like greeting the robot) becomes relevant and expected. We base our analysis on an experiment that replicated the interaction opening delays sometimes observed in laboratory or “in-the-wild” human-robot interaction studies—where robots can require time before springing to life after they are in co-presence with a human. Using an ethnomethodological and multimodal conversation analytic methodology (EMCA), we identify which properties of the robot's behavior were oriented to by participants as creating the adequate conditions to produce a first greeting. Our findings highlight the importance of the state in which the robot originally appears to participants: as an immobile object or, instead, as an entity already involved in preexisting activity. Participants’ orientations to the very first behaviors manifested by the robot during this “pre-beginning” phase produced a priori unpredictable sequential trajectories, which configured the timing and the manner in which the robot emerged as a social agent. We suggest that these first instants of co-presence are not peripheral issues with respect to human-robot experiments but should be thought about and designed as an integral part of those.


INTRODUCTION
A growing body of studies has recently approached the social status of a robot not as a categorical property of the robot's inside [1], or as a specifiable and implementable set of features [51] but, instead, as an emergent phenomenon [66]. These works follow an ethnomethodological perspective for which, like many seemingly objective and transsituational properties of social life, the social status of a social robot is not given but locally produced, incrementally developed, and, by extension, [...] transformable at any moment [32]. By closely studying the moment-to-moment interactions of humans with a robot, researchers using these approaches attempt to grasp through which local processes a robot can, momentarily, emerge as a social partner [91], a social actor [19,62], a subject or social being [92], a social agent [1], an artificial agent [51] or as a new (and evolving over time) ontological category [66]. Without necessarily sharing the same theoretical background, they rely on a common assumption that, in the same way that a situation becomes observable and is treated as the meeting of a jury when participants produce practices that others orient Trans. Hum. -Robot Interact. and respond to as practices of a jury [22,48], an entity becomes a social robot when it produces practices that humans orient and respond to as practices of a social robot.
Yet, this definition of sociality as an emergent feature raises a question about the minimal conditions for a social interaction [39]. Indeed, once a robot is demonstrably shown to be oriented to as a social agent , there remains the issue of what shifted this robot s interactional status; especially when it was approached as a non-social object at first. What properties of the local situation were observably oriented to as relevant for participants before a social interaction could emerge suddenly or incrementally, momentarily or durably? Our study extends on this ethnomethodological line of research by focusing on the initial emergence of a first, conditionally relevant, greeting. Based on an experiment from which were collected 80 recordings of dyadic interactions between a human and a robot, we try to identify 1) if and when a shift occurs from an inanimate artifact to an agent and 2) how this shift progressively emerges, when it does.
Of all the properties which could be constituted and oriented to as relevant features in the continuous flow of a local human-robot interaction, we wonder which of them were treated by participants as creating the adequate conditions to produce a first greeting move. That is, we attempt to describe the interactional work required before behaviors from the robot could be treated as actions which either established the adequate framework for the participant to initiate a first greeting sequence or, alternatively, produced a response slot that the participant was normatively pressured to complete with a return greeting. Overall, four typical paths to the emergence of a first greeting were identified, and one where the production of a greeting never became relevant. This typology is exemplified through the analysis of five fragments representative of our corpus.

The significance of greetings for human-robot interactions
Human greetings, as a canonical part of the opening sequence in many interactional settings, are among the most documented practices in conversation analysis: from the first analysis of telephone calls [78,79], to video calls [50,55], to co-present encounters [12,27,42,59,64,74]. Greeting sequences have been of special interest in the study of human-robot interactions, not only because of the amount of data available, but also because of what they accomplish in human-human interactions. Indeed, they simultaneously reflect and construct the mutual status of the co-interactants, by being tailored to (display) their own understanding/appraisal of who we are to one another right now. [64], manifesting, in particular, that another is recognized and categorized as a possible partner for future interaction [58]. They also tend to be connected with observable changes in the structure of talk and in the physical configuration of participants, as they often form both the end of a phase of incipient interaction and the first exchange of a conversation [79]. In summary, greetings are critical for organizational reasons (coordinated, well-tuned, reciprocal engagement), social reasons (recognition, display of the type of relationship the participants entertain), and normative reasons (mutual trust) [58].
The previous approaches focus on what greetings do in an interaction. In parallel, several quantitatively oriented human-robot interaction (HRI) studies have highlighted the significance of greetings as an indicator. Humans who greeted a robot were commonly found to display specific behaviors during the rest of their interaction and/or to hold specific representations or perceptions of the robot. Notably, producing a greeting was observed to be a predictor of a more social script [46], of patterns of discourse [53] or of specific linguistic behaviors [18]. Greetings were also shown to correlate with the attribution of higher linguistic, perceptual, and cognitive competence to the robot [17] and, in particular, reacting to a robot s greeting wave was observed by Baddoura and Gentiane [5] to be significantly correlated with evaluating this robot as sociable suggesting that responding to a greeting documents more than a simple mimetic reflex action. However, Holthaus [34] demonstrated that the multimodal behavior of a humanoid robot can heavily impact the number of participants who greet it, and the timing of their greetings, highlighting that the first moments of the interaction can play a heavy role in constituting a framework in which a greeting sequence becomes relevant.

Conditional relevance as a breaking point in the robot's status as an interactant
As highly ritualized actions [90], greetings are also typical cases involving conditional relevance. This concept refers to a property which binds together two turns at talk from different speakers in an interaction [44,84]: sequences of question-answer, invitation-acceptance (or refusal), etc. In each of these sequences, a first pair-part (e.g. a question) makes relevant for the recipient the production of "a second pair-part of the same sequence type" [44] (e.g. an answer). For example, a greeting creates expectations of a reciprocal greeting, whose absence can be accountably oriented to by participants as a meaningful departure from the norm [44]. Two turns united by conditional relevance (i.e. a first pair part and a second pair part ) form an adjacency pair . Significantly, conditional relevance is not a mere statistical observation (that a first pair part tends to be followed by a second one of a certain type) but corresponds to the achievement of a normative organization [44] where a first action from a participant imposes constraints on the type and form of action with which the recipient should respond [44]. In this sense, [37] argue that when people respond to a social robot s greetings, they do not merely respond to the robot, but orient to the moral obligation involved in the normative practice of greetings .
Because of this documented property of greetings, the treatment of an action of a robot as initiating a first greeting pair part has been connected to the status of this robot as an interactant [1,62]. A response to a robot s greeting suggests that, at this specific moment, humans orient towards the robot as an entity which can impose normative constraints. Locally and momentarily, the robot stops being treated as an object performing an autonomous script which is not inserted in an orderly sequence of conversational turns [78] but is, rather, oriented to as an agent or partner [51,91] whose actions can have (normative) consequences for the recipient [15,37]. The emergence of a sequence of (mutual) greetings may therefore constitute an observable breaking point in the interaction dynamic. When a participant initiates a first conditionally relevant greeting pair or responds to a robot s greeting action, this enactment of the greeting ritual models the appropriate and expected way of acting and interacting that constitutes the addressee as a particular kind of entity" [1]. Relying on the distinction from [37], a mutual greeting sequence with a robot can be said to display participants who are talking rather than using speech : at this instant, the robot is treated as producing actions discoverable within a normative order [37] and, conversely, as having the capabilities to interpret the normativity of other participants actions [37].
Crucially, mutual greetings emerge from a preexisting situation; they do not appear out of an interactional vacuum. Precisely because they enact the existence of a normative order over both the greeter and the greeted [37], the appearance of greetings supposes a framework in which a greeting sequence is relevant and expectable by the participants [59], or a proper interaction frame [50], usually established as part of the pre-beginning [79] or pre-opening [55]. Yet, coparticipants accomplish various degrees of interactional work to establish such a framework. In particular, for humanoid robots, emerging as agents is not systematically granted by the sharing of a mutual space with other participants. Robots may require, even more than humans, to achieve the type of self-affirming done through language [which] is of a different nature from mere physical presence [15].

Pre-beginning designs in HRI.
Focusing exclusively on the moment at which the robot appears 1 to participants for the first time, HRI studies and datasets collected in controlled or natural settings can, at first glance, be sorted into two general categories.
1. Studies where the robot stands motionless when participants encounter it, without displaying any preexisting idle behavior nor adjustments to the participants' approach or presence: the Wizard of Oz has to seat participants in front of the robot before going behind a divider to send commands to the robot (e.g. [8]) or has to deal with a significant response time (e.g. [77,95]), the script is not launched yet (e.g. [62,93] -this study), the autonomous robot's reactions are delayed, etc. In these situations, participants find themselves in physical copresence with the robot for a long period, before reciprocal exchanges and mutual identification become possible: there is a non-accounted for delay between entry into physical copresence and moves to enter into social copresence [65]. 2. Studies where the robot, or virtual agent, is already observably involved in a preexisting activity when it appears to participants (including idling behaviors like simulated breathing, random head movements, etc.e.g. [4,52,73,98]) and/or observably adjusts to the human's approach or physical co-presence (tracking their gaze, waving, producing a non-delayed greeting, approaching them, etc.e.g. [7,30,34,40,46,67]). This includes any form of activity from the robot which may be witnessed by participants prior to their own interaction with it, similarly to human service-encounters where salespersons, doctors, help desk staff, sushi chefs, etc. are often already immersed in a (potentially competing) activity when they are sighted by the customer/patient/student [27,59,74,97].

Coming into sight and coming into existence .
The two types of HRI pre-beginnings described above make relevant an earlier distinction made by J.J. Gibson in his ecological psychology, regarding the way humans may appear on the social scene, and gradually achieve participant status in the pre-beginnings of encounters. In co-present encounters in relatively uncluttered spaces, co-participants usually get into a greeting position progressively, relying on the way they move, their gaze and gestures to continuously coordinate their getting-together, and make relevant interactional moves such as distant greetings [42]. Gibson calls this type of appearance a coming into sight [24]. This is the most common configuration in co-present encounters. He opposes to this another type of appearance, in which the other person seems to materialize or come to life suddenly in the situation, as when someone hidden by features of the local environment suddenly becomes visible, which Gibson calls coming into existence [24] to allow for the pop-up , quasi-instantaneous character. Other examples of coming into existence would be the initial connection in a video call [50], or in a co-present encounter, someone sleeping who suddenly wakes up after being approached. Because of its suddenness, the exact moment of comings into existence can be difficult to anticipate, the causes and underlying processes for such a coming into existence are not apparent, and finally, the interactional status and competence of the potential co-participant at the moment of its coming into existence can be uncertain.

Off-the-record pre-beginnings in HRI.
Gibson s distinction may be highly applicable to HRI, for it now appears that in the pre-beginnings of encounters of the first type of studies mentioned above, more or less prepared subjects have to deal with a robot that comes into existence , while in studies of the second type, the robot may seem to come into sight and allow for some form of embodied mutual co-ordination in the pre-beginning phase. However, in most cases, studies neither analyze nor clarify the state of the robot when participants see it [4] or enter in physical co-presence with it. Indeed, as most experimental studies only start when the human is already placed in the appropriate starting position in front of the robot [33], methodology sections rarely cover the observable behavior of the robot when participants encounter it. HRI experiments display an orientation to the opening phase of the interaction as the first relevant moment and tend to set aside the pre-beginning phase, although, depending on the scenario, it may be crucial to the way subjects and robots achieve some form of co-participation status. These very first seconds, during which the robot appears to participants, are often, so to speak, off-the-record in the data which ends up being collected.
Here, we will analyze a relatively common HRI experimental setup in which subjects are brought in the presence of a robot which suddenly animates and comes into existence . Thereby, it imitates the interaction opening delays regularly observed in laboratory or "in-the-wild" human-robot interaction studies, where robots can require time before springing to life after they stand in co-presence with a human. We will show how this is consequential with respect to the way in which openings unfold, in which some moves such as greeting and waving may become interactionally relevant, and, ultimately, in which the robot emerges as a social agent. We conclude that pre-beginnings are not anecdotal or peripheral issues with respect to HRI experiments but should be thought about and designed as an integral part of those.

Ethnomethodology and Conversation Analysis.
Ethnomethodological Conversation Analysis [22,58] (EMCA) is a micro-sociological approach which studies the temporal unfolding of events in an interaction in order to understand, on a moment-to-moment basis, how participants actions turns-at-talk, embodied actions or complex multimodal moves are produced and recognised as performing meaningful social actions [6,96]. EMCA offers methodological tools for understanding what is treated as publicly relevant by participants in a given situation among the potentially inexhaustible number of features of this situation describable from an external perspective without imposing the researcher s perspective on the data: it adopts an emic point of view rather than an etic point of view [96]. In sum, EMCA will consider as phenomena only those practices of members which are used by them to produce, accomplish, sustain, reproduce, recognize, and give account of, to and for themselves, social order [70].
This angle of analysis treats social order as a product of the local organization of participants: that is, as continuously maintained (or modified) through their local accomplishments [22]. EMCA multimodal approaches rely on the minute analysis of video recordings [29] and of their detailed transcriptions, in order to identify the fine-tuned temporal unfolding [57] of participants actions with a much higher degree of precision than mere observation would allow [57]. The in-depth analysis of large collections of data can subsequently reveal recurring patterns in the procedures and methods through which social order is accomplished [28,66].

EMCA for HRI.
When applied to HRI more specifically, EMCA is suited to identify what, in a robot s multimodal behavior (its talk, gestures, flashing LEDs, sounds signals, motor noise, etc.) [61] is treated by involved participants as social actions: that is, as actions which make relevant a set of potential next actions [96]. This becomes apparent when these behaviors from the robot are responded to (and in a certain way) by co-present participants, in the following turns-at-talk. In other terms, EMCA s micro-analytic level of description can be leveraged to explore what emerges as pragmatically consequential when humans and robots actions intertwine and respond to each other (often in ways unforeseen by designers [60]), i.e. to retrospectively unpack how humans and robots co-constructed the interactions [68] which ended up being captured on camera. Therefore, researchers drawing on EMCA will focus on behaviors produced by a robot that are made publicly recognizable and accountable (as an action of a certain type, e.g. an offer, a question, etc.) by co-present humans, as these humans are immersed in a situated activity, where they face specific practical problems [31] e.g. whether to respond to a robot producing a waving gesture at the start of an interaction and, if so, how to respond to it. In this sense, the methodological tools of EMCA can be mobilized by researchers faced with the issue of analyzing robots and humans micro-adjustments over time as something else than an inscrutable black box [31].

Studying the sociality of robots besides mental representations.
A result of this approach is to study the sociality of robots independently from mental representations. Our analytical focus will be exclusively limited to what s oriented to by participants as an action from the robot, i.e., the way in which some of its behaviors are responded to or, alternatively, what actions from the robot are accountably displayed as absent by humans. In particular, an ethnomethodological and conversation analytic methodology does not aim at establishing how the robot is otherwise mentally represented [26] by the participants (as a social agent, as an object, as a human, as an animal, as a new ontological category…) or if they engage in pretense towards it [87]. Unless it is made observably accountable, it makes no difference for the status occupied by the robot in the interaction that participants are behaving as or behaving as if this robot is a subject with internal states and perceptual experiences [87]. Similarly, this analytical perspective is independent from the set of questions related to whether participants' actions involve a mental representation of the robot as they interact with it, or if these behaviors are produced as part of a non-representational mindless coping with the situation [14] 2 . In other words, EMCA takes an agnostic stance regarding cognition [37,54] as it is interested in participants (humans or robots) "observable and hearable conduct [...] at the interactional surface [66].

Participants
We base our analysis on 80 video recordings of dyadic interactions with an autonomous robot, which took place at the INSEAD-Sorbonne Université Behavioural Lab. Participants were all native French speakers aged between 18 and 30 years old. All participants were recruited by the INSEAD-Sorbonne University Behavioural Lab under ethics approval by the INSEAD Institutional Review Board. Separate consent was obtained for the use of video data. The experiment took on average 20 minutes to complete, each participant received a compensation of 6 €. 2 In particular, we do not intend to suggest that participants discretize and categorize the stream of conduct of the robot into a preexisting list of action types [16]: the robot can be positioned interactionally as an agent without any of its behaviors being mentally constituted as actions by participants as they react to them.

Experimental setup
A humanoid robot Pepper , produced by Softbank Robotics, was positioned in the middle of a room, standing at a threequarter angle from participants when they entered by the door (see Figure 2). The interaction was filmed with two cameras: one behind the robot, one on the left of the robot. An additional webcam was placed in a corner of the room. For a detailed description of our experimental setup and of the design of the autonomous robot, see [93].

Instructions
Before entering the room, all participants were given the same verbal instructions:

1.
You are going to have an interaction with a social robot. This robot will try to help you plan your holidays, for this summer.
Please answer as if you were really planning these holidays.

2.
Speak loudly. If the robot does not respond, it is possible that it didn't hear you. If you see a question mark displayed on the robot's tablet, it means your utterance wasn't understood: you can repeat or rephrase it.

3.
The experiment should take 5 minutes to complete, then you will have to fill in a questionnaire in the next room. Then, as they entered to room, participants were informed that: 4.
The robot should start speaking to you in a few moments 5.
You can stand anywhere in the room This characterization of the robot and of the task partially pre-configured the interaction. They created the expectation for an incoming, but delayed, first turn uttered by the robot. Doing so, they portrayed the robot as an entity which may not immediately be available as a co-interactant and potentially come into existence at some point. They also stated the robot may not hear the participant, depicting its perceptual abilities as imperfect. For these reasons, they should be treated as constitutive of pre-beginnings. In sum, the distribution of greetings (Fig 1.) on which we will focus is not to be understood as a direct reflection of the strength of a transsituational (greeting) norm but, rather, as connected to a specific experimental configuration.

Scenario
The robot was designed as a travel agent . Once participants had entered the room, the experiment followed a holiday planning scenario : the Pepper robot woke up by going through several activation steps , introduced itself, produced a how are you question, offered to take water, and, then, asked participants several questions aimed at understanding their preferred destinations. When the scenario reached its end, participants moved to a different room and completed a questionnaire composed of several psychometric scales.
All participants studied in this article faced the same initial behavior from the Pepper robot. Because our focus is on the earliest moments of the interaction, the different conditions in which these participants were placed didn t impact the multimodal behavior of the robot yet. However, as part of a larger study [93], participants were distributed in 5 experimental conditions, each one featuring a different multimodal behavior from the robot later in the interaction (no social gaze, no approach, etc.). 101 valid participants took part in this experiment. In one of our experimental conditions, the robot didn t wave during the opening of the interaction: these 21 participants were removed from our analysis (since they couldn t possibly react to the robot s wave), leaving 80 remaining participants who all witnessed the same activation steps from the robot.

Activation steps achieved by the robot during the first seconds of the interaction
Immediately after each participant entered the room, the robot went through the same 5 activation steps (cf. Figure 1 for a detailed timeline): 1. Physical co-presence: When participants entered the room, the robot was motionless 2. Gaze tracking: The robot started to track their gaze These steps exacerbated two features of the coming into existence often observed in natural or controlled human-robot interactions openings: the robot stood in physical co-presence with the participant for several seconds before a reciprocal interaction could start and it displayed no preexisting activity when first appearing to this participant.

DISTRIBUTION OF FIRST GREETING OCCURRENCES DURING THE EXPERIMENT
Out of a total of 80 participants, 62 (78%) produced a greeting utterance (e.g. Hi, Hello, Hey, Good morning [65]) or gesture (e.g. hand wave, palm display, head toss/bow, eyebrow flash [65]). A micro-analysis reveals that most of these utterances and gestures corresponded to greeting actions, based on the definition from [65]: discrete audible and visible (vocal, verbal/lexical, and embodied) actions that participants deploy to publicly mark the moment when they ratify another s social copresence . As the following fragments will illustrate, this does not imply the actions participants achieved through these utterances and gestures were limited to greeting the robot .
Among participants who produced a greeting utterance or gesture, the events or robot s behaviors ( activation steps ) which immediately preceded the production of their first greeting were distributed as displayed in Table 1. The average delay 4 between these activation steps (cf. Figure 1) was long enough to prevent misattributing which activation step preceded participants greeting utterances or gestures.  Fig. 1: Distribution of all greetings produced by participants between activation steps , including multiple greetings by the same participants. Total first greetings = 62, total greetings overall = 85. 4 The presence of a measurable standard deviation with this delay between activation steps is due to variations in the robot s CPU load between participants.
Most initial greeting utterances or gestures therefore occurred after the robot s own verbal greeting. However, overall, most greeting behaviors were produced after the robot s wave, as many participants among those who had previously greeted the robot produced a new greeting after the robot achieved this gesture (cf. Figure 1).

FIVE PATHS TO THE PRODUCTION OF A FIRST GREETING
The five fragments below, ordered chronologically relative to the robot s activation steps , are representative of this corpus 5 . They are analyzed using an ethnomethodological and conversation analytic methodology (EMCA) in order to reveal, on a moment-to-moment basis, what interactional processes are aggregated in the statistical distribution presented above. That is, which events or actions were typically oriented to by the human participant as constructing an appropriate framework for the production of greetings. Our transcription conventions are detailed in the Appendix.   5 Each fragment displays the most common way in which a greeting emerged for each activation step and a typical case in which greetings didn t emerge for fragment 5. However, very few greeting occurred after the first activation step (Motionless robot) and the second activation step (Mutual gaze) cf. figure 1. In this sense, fragment 1 and 2 display rare occurrences in comparison with all 80 participants analyzed. Note that, even though these fragments were chosen as the most representative, some specificities of these participants behavior (i.e., the distance at which they were standing from the robot) are still idiosyncratic and different from the average.

Description and analysis.
After laying her coat on the chair, the participant gazes at the robot and positions herself in front of it, facing it (L.1). Standing at 0.8 meter from the robot, she is comparatively close to it in regard to other participants (average initial distance = 1.3 meter; SD = 0.26). She immediately initiates a first greeting pair and an address term ( Pepper , L.2) and maintains her gaze, body orientation and posture slightly leaning towards the robot during the next 3.6 seconds. After this standstill, she produces a lateral head tilt for 3.8 seconds (L.3), while maintaining her gaze towards the robot s eyes. A few seconds later, the robot directs its face towards the human s eyes, establishing mutual gaze. This quick adjustment of the robot s head is associated with motor noise and plastic sounds (L.4). These so-called consequential sounds'' are known to be regularly oriented to by participants in human-robot interactions [20,85,94]. The robots arms also start shaking lightly (L.4), which will persist until the end of the experiment. After 3.5 seconds of this mutual gaze, the participant raises one eyebrow (L.5) and, after a second of silence, produces a new greeting ( hello Pepper , L.6). This action appears to account for the robot s continued silence and, in particular, to orient to the robot s alignment with the human s gaze as creating expectations for a next action on its part. However, the participant s greeting is immediately followed by a greeting from the robot (L.7). This implies that, in this fragment 6 , the robot s greeting turn is sequentially positioned as a second pair part responding to the participant greeting who, indeed, does not achieve a new greeting in return (L.8), suggesting that she orients to the robot s bonjour as a reaction to her own. After a short pause, the robot then starts to produce a greeting wave (L.8), with which the participant aligns by producing a similar wave. Once the robot starts to retract this waving gesture, the participant immediately begins to lower her own arm and finishes her retraction simultaneously with the robot. The previous fragment constitutes one of only three openings in our corpus where a greeting is initiated by the participant before the robot was activated (from an etic perspective). All along the interaction, the robot is constructed as a co-present interlocutor, even when it is not moving yet. Besides the bonjour that the robot produces after her greeting, the participant orients to every activation step displayed by the robot (physical co-presence, mutual gaze and wave) as opportunities to produce a first greeting. Thus, as soon as she is positioned in front of the robot, she treats the situation as a framework in which a greeting sequence is relevant [59] and the robot as able to react to the production of a greeting: by tilting her head 7 (L.3) and by raising one eyebrow (L.5), she makes accountable the non-answer of the robot to the first pair parts she produced and she displays expectations for a reaction.
Remarkably even though the participant s last verbal greeting (which took place immediately before the robot s vocal greeting) ultimately positioned her as having initiated the greeting sequence (L.6 & 7), she instantly re-positions the robot as the anchor [83] or, more generally, as the speaker initiating new sequences. Indeed, after the robot greets her back, the participant stays silent (L.8) and does not use her position as a first speaker to self-select [76] and to initiate a new sequence. This silence by the participant leads the robot to produce an interlocked turn [82] which combines both its response to her hello and its initiation of a greeting wave (L.8). By staying silent, the participant thus provides the robot with the adequate position to initiate subsequent sequences [82] and, later, to initiate the topic of the interaction (not in transcript). 5

Description and analysis.
After positioning himself in front of the robot, the participant does some self-grooming [42] as he readjusts his clothes (L.1). Standing at 0.8 meter from the robot, he stands closer to it than most participants. After the robot orients its face towards his eyes (L.2), with the matching motor noise and squeaking plastic sounds, the participant gazes back at it and produces a tongue smack ( .tsk , L.4) followed by a first greeting ( hellow , L.6), interrupting his self-grooming (L.5). He then takes a step forward while staring at the robot and, after 2.1 seconds of silence, takes a step back to his original position (L.7). After 2.8 additional seconds of silent mutual gaze, the robot utters a greeting term ( bonjour , L.8) as the participant just finished taking his arm outside of his pocket. The participant briefly extends his hand toward the robot (L.9), then retracts it and produces a return greeting (L.10). The robot starts to raise its arm to prepare a waving gesture (L.10, image 2.6). When this gesture has not reached its apex yet, the participant starts to extend his own hand towards the robot (L.11, images 2.7 & 2.8); however, once the robot s arm becomes fully extended and starts the waving motion, the participant redirects his arm to produce a waving gesture (L.11, image 2.9). He simultaneously achieves a new verbal greeting and smiles (L.12).
In this fragment, physical co-presence is not treated as sufficient to initiate the interaction unlike the previous example. The robot is not oriented to as a conversational partner from the very start. This is especially visible through the production of self-grooming by the participant, usually displayed during the approach between two interactants [30]. However, the status of the robot in the interaction shifts after the establishment of mutual gaze. The participant s interruption of his self-grooming (L.5), his greeting (L.6), and the step forwards he takes (L.6) accentuate a shared inner space [42] and display the expectation of an imminent action from the robot. This reconfiguration results from the crucial analytic distinction [59] made by the participant about what the gaze from the robot is projecting: it is not oriented to as a merely automatic gaze tracking , nor as a mere look [45,59] but as a look projecting the initiation of an upcoming action. The participant s expectation is not met, however, as he goes back to his original spot. Mutual gaze therefore constitutes the first breaking point after which the robot becomes (momentarily) present as a potential interlocutor. As [66] notes in the case of young children interacting with a toy robot, even little sequential phenomena of the robot s timely conduct in relation to participants actions can have a critical impact on the categorization and re-interpretation [66] of this robot. In our fragment, the redirection of the robot s gaze towards the participant after a silence is treated as a meaningful social action.
Incidentally, two reconfigurations could be observed in this participant s gestures. First, L.9, he extends his hand towards the robot immediately after its first verbal greeting, before retracting this hand and producing a return bonjour .
The cancellation of his tentative gesture and, instead, his production of a verbal greeting, appear to constitute alignments with the robot s (then verbal) mode of greeting. Later, as the robot starts to visibly raise its arm as part of its waving gesture, the participant s response gesture shifts from an apparent handshake gesture to a clearly observable wave (L.11 to L.12; images 2.4 to 2.7). These two episodes display quickly evolving interpretations of what action the robot is projecting during its first greeting and, later, during the preparation of its waving gesture. It highlights an online monitoring of the robot by the participant [68], which allows him to reconfigure his embodied course of action to align with the robots co-occurring action 8 .

Description and analysis.
After entering the room, the participant positions herself in front of the robot, at a higher distance than most participants. She starts to swing from one leg to the other while looking around the room (L.1). After a few seconds, the robot gazes at the human s face and, doing so, produces motor noise and squeaking plastic sounds (L.2). The participant instantly stops her swinging movement and initiates mutual gaze with the robot, while raising an eyebrow (L.3). She maintains this posture for the next 4 seconds of mutual silence, and even after the robot utters a first greeting ( bonjour , L.4). Because of a momentary processor overload, the robot s waving gesture takes 3.7 seconds to be triggered after its bonjour , unlike the rest of the corpus where it took on average 3 seconds. After 3.5 seconds of silence, possibly orienting to the silence and the stillness of the robot as offering a response slot, the participant softly whispers a first greeting (L.6) with a rising vocal pitch right before the robot starts its wave. Once the robot initiates its waving gesture (L.6), the participant glances towards its waving arm (L.7) then utters a new greeting ( bonjour , L.8). This new greeting is uttered out loud and with a final continuing intonation, while the participant widens her smile (L.8, image 3.4). In this fragment, the participant orients to the first greeting of the robot as sequentially equivocal [35]. Her first greeting (whispered, delayed, with a rising pitch) displays uncertainty regarding what the robot s greeting is projecting: the robot s bonjour is not oriented to by the participant as clearly constituting the first part of an adjacency pair which should be completed by a return greeting. The design of her first greeting thus appears to question the status of the interaction and even the existence of a stepwise process of mutual adjustments [67]. Conversely, the participant s second greeting turn appears to confirm what is going on [6] as an exchange of mutual greetings : uttered out loud and simultaneous with a widened smile, it orients positively 9 to the robot s gesture as initiating a second greeting sequence.
This supports an interpretation where each greeting produced by the participant accomplishes a different task [55]. The first greeting, uttered 3.5 seconds after the robot s own greeting, mainly checks the availability of the robot and its ability to perceive and react to the human s relevant actions 10 a form of device testing [71] whereas the second greeting is a clear ratification of the start of the co-present interaction: it is a sociability practice [55]. Consequently, we observe a form of inertia in this fragment: the inanimate object that the robot is first oriented to requires interactional work (lasting over several seconds) to be replaced by a conversational agent. The first greeting term produced by the robot does not immediately nor automatically institute it as a conversational partner which can be greeted back.

ROB $bonjour
hello rob $opens its arms-> 9 Smiling, which is a principal way parties do displaying a positive stance toward encountering recipients [65] suggests the situation is now being treated as the beginning of a socially co-present encounter. 10 This participant can be described as verifying whether the entity in front of her is an animate object that is able to engage with her in re-occurring interactional patterns [66].
Trans. Hum.-Robot Interact. stays synchronized with the robot s wave, then stops immediately after the robot starts retracting it (L.7): 200 milliseconds after the robot s arm starts to lower, the participant also starts to retract her arm as in fragment 1. Like the overwhelming majority of our corpus, this participant s gaze focuses on the robot as soon as it moves its head to track her gaze but she does not immediately produce a speaking turn. The lasting silence, mutual gaze and consequential sounds'' are not oriented to as initiating a slot where to self-select. Even after the robot utters a bonjour , she returns no greeting and maintains her previous pose and gaze for the next few seconds.
However, once the robot starts a waving gesture, the participant silently observes its arm rise during the action s preparation. She then abruptly produces her own wave which catches up with the robot s gesture and simultaneously produces a smile and a verbal greeting (L.6). The speed of this return wave seems to indicate that the participant orients to the robot s gesture as, either, producing a normative obligation to achieve a return greeting, or, alternatively, as upgrading a normative obligation to respond that she would have previously failed to observe. In particular, based on the numerous occurrences of this situation in our corpus, we suggest that, in this fragment, the participant s hasty first greeting displays her alignment as normatively expected at an earlier point in the interaction. That is, she orients to the robot s wave as a second greeting sequence which reinforces the conditional relevance attached to its first vocal first greeting ( bonjour ), to which she didn t answer. The robot s wave is treated as making accountable the participant s non-answer to this first greeting sequence 11 .

Description and analysis.
After entering the room and dropping her bag on the ground, the participant gazes at the robot (L.1). As she starts approaching it, she produces a speaking turn involving a deictic reference to the robot as this thing ( ce truc , L.2) referring to it in the third person and qualifies it as weird . This comment displays most of the typical properties of self-talk [41]: it is achieved while the participant is leaning forward, her body not oriented towards the robot, and part of it is uttered while looking at the ground, in a low voice. As a consequence, the participant does not manifest any expectation for an answer; her comment does not open a conversational sequence [41]. Once her approach is complete, the participant stands at 1 m from the robot, closer than the average initial distance of 1.3 meter for all participants. After a silence of 3.6 seconds, the robot shifts its gaze towards her eyes doing so, it produces soft motor noises and squeaking plastic sounds (L.4). The participant reacts with a short laugh (L.6). After a new silence of 4 seconds, the robot initiates a first greeting (L.8) and opens its arms palms towards the ceiling. The participant produces another laugh, more audible and longer than her previous one (L.10). This laugh is continued during most of the hand wave of the robot and is followed by an in-breath when the robot s arm starts to retract (L.10).
The actions of this participant highlight a double orientation to the robot, as both normatively neutral and unable to react to human actions. First of all, she treats the robot as an autonomous script whose verbal utterances imply no normative obligation to be responded to, even after it produces a greeting term: no conditional relevance emerges from the behaviors produced by the robot in the course of the interaction. Yet, simultaneously, the participant orients to the robot as unable to respond to (or to perceive) her own actions. The self-talk (L.2) or laughs (L. 6 & 10) she produces in front of the robot do not manifest any expectation for an answer, and the absence of reaction from the robot is not visibly made accountable (unlike fragment 1). Her turns are not recipient designed to be registered as inputs'' in response to which the robot would produce, reconfigure [25] or interrupt speaking turns. In other words, the robot is never characterized as being able to perform reciprocal interactions [91], which is a prerequisite for the existence of a social encounter [91].
Neither the robot nor the human imposes a normative order on the other: there is no observable sequence organization [82] which exerts a constraint on their actions. Using the previously mentioned terminology from [37], this participant is merely using speech but not talking with the robot, in the sense where talking would imply to produce actions that are discoverable within a normative order [37] and to assume other participants to be able to perceive them as actions within that order [37].
Additionally, no facework is achieved by the participant as, in particular, her comments are not fully whispered (L.2) and her laughs are achieved audibly and visibly while standing right in front of the robot. Her first utterance ( this thing is really weird ) also explicitly characterizes the robot as an object and refers to it in the third person. As a whole, semantic content and sequential organization mutually reinforce to characterize and position the robot as a non-agent: the participant establishes herself as the spectator of a pre-recorded monologue, whose performer is not socially present with her in the room.
Last, we note that even though this participant approached the robot more than average (i.e. stood at less than 1.3 meters) this unusual proximity was part of a sequence where she scrutinized and commented on the robot, treating it as an inanimate object instead of an interactant. This is unlike participants in fragment 1 and 2 even though they also stood unusually close to the robot for whom this proximity displayed a treatment of the robot as an ongoing (fragment 1) or imminent (fragment 2) interactant. This connects with the general observation that, on an individual level, the distance at which a participant stood from the robot was meaningful only in connection with a sequential context.

Sequential ambiguity
The previous fragments reveal the varied interactional work required before behaviors from the robot could be treated as actions which either 1) established the adequate framework for the participant to initiate a first greeting sequence or 2) produced a response slot that the participant was normatively pressured to complete with a return greeting. Even though, after they entered the room, all participants positioned themselves in front of the robot to form a vis-a-vis arrangement [36,42], we see that the mere utterance of a greeting term from the robot ( hello ) didn t automatically and immediately establish a reciprocal interaction. There was a regular delay in the shift from the robot as a normatively neutral artifact (which was discovered completely motionless at the very beginning of the interaction) to a conversational partner producing sequentially implicative turns. For many participants (exemplified in fragments 3, 4 and 5), the interactional status of the robot persisted after it produced a first greeting.
An explanation for this delayed emergence of the robot as an agent is that participants found themselves confronted (and prepared to be confronted by the instructions described in section 3.4) to an inert robot that suddenly animates and comes into existence and were therefore engaged in sequentially ambiguous situations [35] as to what actions the robot was projecting (or if it was projecting anything): they had to entertain the full range of possibilities momentarily, using the immediately following talk to find out what sort of sequence is in progress [80]. During fragments 1, 2 and 3, participants initial turns can be considered as practical attempts to probe the current status of the interaction: by trying to elicit a response from the robot, these actions clarified whether a phase of mutual adjustments, or any form of turnbased coordinated activity, was either ongoing or technically feasible.
We argue that these face-to-face encounters with a humanoid robot coming into existence disrupted background expectancies and methods at play in the accomplishment of commonplace activities, such as having a conversation [88].
Participants had to achieve greetings in a situation marked by otherness, which, like it was observed in different contexts, "throw[s] the greeters and their practices of greeting into crisis" [58]. This experiment thus made especially apparent the constant experiments in miniature [23,48] achieved by humans when interacting with robots (and, of course, with other humans), where each action tests the hypothesis a participant has about a co-participant s response to her/his action . In particular, in these human-robot interactions, participants faced the challenge of 1) identifying if intersubjectivity [48] was even possible (i.e. if the entity in front of them possessed the required properties for mutually achieving a reciprocity of perspectives [48]) and, then, of 2) establishing this intersubjectivity for example, by producing actions which displayed, and therefore tested, an orientation to the previous robot s turns as opening a greeting sequence. We suggest that this double challenge is a common trait of first encounters with humanoid robots, which may require the use of different resources to be overcome depending on the way in which the robot is first encountered by the human.

The waving gesture as a threshold
The robot s wave was often critical in clearing up this sequential ambiguity [35]. In several cases (although, not systematically) this gesture offered a practical answer to the practical issue participants were encountering, namely to document "what is going on" within a given spate of talk [6]. For example, in the specific sequential contexts presented in fragments 3 and 4 (i.e., not a wave as such , discretized and disconnected from local situations 12 ), the wave functioned both as a clarification of the situation as an ongoing greeting sequence and, simultaneously, as a soft upgrade of the conditional relevance of the previous greeting turn produced by the robot ( hello ). In other terms, it manifested the normative obligation to produce a return greeting and retrospectively oriented to the participant s non-response (or nonproper response) to the robot s first vocal greeting. Therefore, in a similar way to the responses [60] observed after a twopart animation 13 of the robot Cozmo, the etically designed two part greeting achieved by our robot (vocal greeting, 12 Actions are intrinsically meaningful because they unavoidably participate in an organization of activity, not because there is an abstract, decontexted meaning which they have independent of their occurrence. Action is intrinsically meaningful, not because it is meaningful outside of any concrete situation, but because it is always embedded in a concrete situation. [63] 13 Focusing on the Cozmo robot s sad animation, designed to unfold in two parts, [60] observed that the second part of this animation could be treated by participants as the upgrade of an action projected by the first part, leading them to reconsider their prior actions. pause, wave) often led participants to reconsider their past actions. To paraphrase [16], there was an observable evolution in the multiple drafts these participants produced of the situation, as the robot achieved a waving gesture. In sum, more than any other of its behaviors, the robot s wave was frequently responded to as a conditionally relevant first greeting pair. The instants following this wave constituted a frequent (and momentary) threshold between the robot oriented to as a raw physical artifact [10] (of plastic, sensors, etc.) and the robot treated as a socially co-present entity whose actions could establish a set of normative constraints on the type and form of action with which the recipient should respond [44]. The wave regularly interrupted the persistence of the status of the robot as a non-agent: in these cases, the self-affirming done through language [15] emerged as a consequence of this gesture 14 .

Should a robot be designed to harness conditional relevance?
Antithetical design opportunities stem from the observation that behaviors from a robot can establish a normative pressure to produce an adequate social response (here, a return greeting action). Roboticists may use such behaviors documented to produce alignment from humans to enforce a robot as a social agent at the beginning of an interaction, or, conversely, design the robot to align with the way in which participants treat it from the very start. For example, one could imagine designing a robot which only produces a reinforcement wave when its verbal greeting is not answered with a greeting action after several seconds, to purposely pressure a human interlocutor into greeting it like a legitimate social agent. This raises the question of whether designers should leverage conditional relevance as a tool i.e. harness the tacit normative order [49] of human-human sequence organization [82] to induce social treatments of the robot by participants; no matter how the robot is perceived by these participants. And, if so, to which extent?
Indeed, existing ethical and usability debates [21] can be connected to the legitimacy and to the capability of a robot to impact the degree to which it is being responded to as a social agent. As [19] notes, once a robot can identify that some of its actions (even a limited set of ritualized actions [90] like greetings or goodbyes) are not being treated as those of a social agent, this opens up new possibilities for the field of personalization in robotics [11]. Information about the ongoing interactional status of a robot enables to trigger different behaviors from the robot when humans appear to be construing it exclusively as a raw physical artifact [10]. In particular, information about the current status of a robot in an interaction offers the possibility for a robot to adapt to the user s observable initial definition of the situation (for example by aligning with a treatment of itself as a simple device by an utilitarian [47] or non-player [10] interactant who only uses keywords and does not greet the robot), or, on the contrary, to rely on different strategies to change the (e.g. nonsocial) way in which it is being treated.
The phenomenon of conditional relevance therefore constitutes another factor in the question of the degree of agency a robot should display of which the interpretation of our robots wave as a reinforcement of the normative pressure of the robot s first greeting is a striking example and highlights how even very mundane and minute choices from designers have ethical ramifications. Human-human sequence organization in conversations has been argued to be the place for a form of proto-morality [9,75]: the treatment of a robot as involved (or not) in a sequence organization is directly connected with the (non-)attribution of specific rights and responsibilities to this robot in a micro-level moral order [89]. Consequently, an attempt to enforce this treatment has normative implications.

CONCLUSION: ENTRY INTO PHYSICAL CO-PRESENCE AS A BLIND SPOT IN HRI
Based on the video data from 80 interactions, we observed that the sudden activation of our robot (its coming into existence ) was pragmatically consequential for participants. The intertwining between participants actions and the activation steps displayed by the robot (including its original motionlessness, which was sometimes oriented to as meaningful by participants see also [85]) led to the emergence of various sequential trajectories: some participants ended up orienting to the robot s gaze shift, to its wave or to its greeting as a response to a greeting they just produced, others ended up orienting to these behaviors as initiating a greeting sequence, as reinforcing a previous greeting, and more generally as a slot for a next action. Nevertheless, coming back to our original question of how changes first emerge in the status of a robot during an interaction, it is not possible for us to suggest to which degree some patterns we identified in section 6 (e.g., participants treatment of the robot s delayed waving gesture) might be generalizable or, instead, remain specific to the local configuration of our experiment. This point could be clarified by a systematic comparison of interactions with a humanoid robot which comes into sight versus a robot which comes into existence , whether in a natural context or in a controlled experimental setting.
Crucially, when our robot started to move or to greet the human, it didn t do so in the middle of an interactional vacuum: participants were already building courses of action with it. A sole focus on the opening phase starting when the robot is alive and starts to greet the human would abstract these greetings from the preexisting sequential trajectories from which they emerged and in relation to which they can be understood: just because two participants greeted a robot at the same step in this robot s script, they didn t necessarily do the same thing. An EMCA approach allows us to analyze greetings as something else than synchronic snapshots [3] but, on the contrary, to get a diachronic understanding of how they emerged, as well as to clarify interactional phenomena (initiation, response, reinforcement, etc.) which were simultaneously taking place as these greetings unfolded. Even seemingly straightforward behaviors from the robot, like this robot looking at the human, saying hello or waving at a certain point in the interaction, could mould in different sequential trajectories [72], in which what was projected by these (etically similar) behaviors was oriented to in a radically different manner by participants 15 .
The previous observations put into question the moment at which data collection should start (video recording, movement tracking, etc.) in human-robot experiments, especially those which deal with the topic of robots as agents or partners. These studies should pay close attention to the way in which their participants enter into physical co-presence with the robot and, in particular, whether the robot comes into sight or comes into existence . Participants orientations to the very first behaviors displayed by the robot can produce a priori unpredictable sequential trajectories which are, in turn, susceptible to configuring the timing and the manner in which the robot emerges as a social agent, and possibly participants behavior during the rest of the scenario. A robot oriented to by participants as already activated is not the same kind of entity as a robot which first appears to these participants as an immobile object and, then, wakes up .
Therefore, we suggest that, each time it is relevant, researchers should take into account and describe the conditions in which robot and human were put into physical co-presence and regard pre-beginnings as an integral part of the experiment. Depending on the studies methodology and hypothesis, the state of the robot when it appears to participants could impact the comparability, replicability and explainability of the findings.