Learning Action Conditions for Automatic Behavior Tree Generation from Human Demonstrations

The multitude of possible tasks and user preferences in real-world human-robot interaction scenarios renders pure pre-programming of robotic tasks inadequate. Recently, Behavior Trees (BTs) gained more focus as a modular internal task representation and particularly learning BTs directly from human video demonstrations offers also non-programming experts an opportunity to conveniently teach robots. However, automatically building BTs from human task demonstrations requires task constraints in form of action conditions. While existing work on automated BT generation often relies on pre-defined relevant features and heuristic condition computation, here we propose and evaluate different methods to automatically extract action pre- and post-conditions for BT generation from videos of human demonstrations. In particular, we first reduce the feature space using a correlation-based feature pre-selection, as well as a rule-based pre-selection based on a Decision Tree. Then, we select features that are relevant pre- and post-conditions for particular actions based on three different variance-based methods. We compare the different methods for feature selection and condition extraction on two pick-and-place tasks and discuss advantages and shortcomings of all methods in the context of learning BTs from human demonstrations.


INTRODUCTION AND RELATED WORK
In real-world human-robot interaction scenarios, Learning from Demonstration becomes an essential capability for future robots to expand their skill set and enable also non-expert users to teach robots new tasks [12].Starting as an alternative to Finite State Machines in computer game programming [13], Behavior Trees (BTs) provide a modular, reactive, and interpretable task representation that has recently been more widely used for robotics [2,8].A Behavior Tree (BT) is a directed rooted tree that manages the hierarchical execution of behaviors (Figure 1).It serves as a control structure for transitions between distinct tasks within an autonomous agent.Due to its tree structure, a BT allows a modular design of complex decision-making systems composed of several simple, reusable behaviors with built-in reactivity [2].
While BTs can be hand-coded or constructed manually using supporting interfaces [9], there are only a few approaches to learn BTs directly from demonstrations enabling non-expert programming [3,5,[12][13][14].These demonstrations can be provided in form of kinesthetic teaching [5], step-by-step action advice via a user interface [3,13], or human task demonstrations [4,14].There are diferent approaches to automatically build a BT based on these demonstrations.One such approach is to frst learn a Decision Tree (DT) from demonstrated state-action pairs and then convert this DT into an equivalent BT [3,4,13].Gustavsson et al. [5] introduce a method that merges task constraints and context inference to optimize the BT creation through backchaining.Starting from the goal condition, actions are successively added based on their preand post-conditions.However, identifying relevant features and extracting action pre-and post-conditions is challenging, and action conditions are therefore often specifed manually [5,7].To make the procedure usable for non-experts, it is desirable to determine the conditions automatically.Scherf et al. [14] automatically extract continuous action pre-and post-conditions from human task demonstrations to build a BT using backchaining.Nevertheless, they rely on pre-defning the set of task-relevant features instead of automatically selecting them based on the data.In addition, they use all features as pre-conditions, resulting in a high number of Behavior Tree nodes for more complex tasks.
In this paper, we propose two approaches to reduce a larger feature space to a task-relevant set of features and three variance-based approaches to identify relevant features as pre-and post-conditions for demonstrated actions for Behavior Tree building.The variancebased approaches are inspired by Abdo et al. [1], who uses variations between demonstration as an indication for a feature's relevance and Păiş et al. [11], who propose additionally computing the variance of a feature within a demonstration assuming importance for the task of variables with substantial within-demonstration variance but little across-demonstration variance.We compare the diferent methods for feature selection and condition extraction on a stacking and a trash disposal task.From the experimental results we discuss advantages and shortcomings of the diferent methods for learning Behavior Trees from human demonstrations.

LEARNING BT ACTION CONDITIONS
Here, we propose diferent methods for automatic action condition extraction from human demonstrations that can be used to automatically build a Behavior Tree as described in [14].Formally, let D be a set of human task demonstrations.For each demonstration ∈ D with 1 ≤ ≤ a set of features F, e.g.hand poses and distances between objects is recorded.We defne pre-and post-conditions as value ranges between minimum values − (, ) and (, ), − maximum values + (, ) for a feature and action (, ), + where F () and F () are feature subsets of pre-and post-conditions for an action .A condition is true if all features lie within the condition ranges.A pre-condition has to be true before action execution and is checked during action execution to allow reactivity of the Behavior Tree.Post-conditions specify which and to what range an action changes a particular feature.The goal of this paper is to identify task-relevant features and defne F () and F () for each demonstrated action .In particular, we propose two approaches to pre-select potentially task-relevant features F ˆ ⊆ F, as well as three variance-based approaches to select relevant pre-and post-conditions for each action.Here, the feature pre-selection is an optional intermediate step before the actual condition computation can help to reduce the feature space.

Feature Pre-Selection
In order to pre-select features from human demonstrations and reduce the size of a larger feature set, we discuss two methods.
Rule Extraction from Decisions Trees.Inspired by [3,4,13], we propose using Decision Trees (DTs) to pre-select features as potential candidates for action conditions.In particular, a DT is used to learn a mapping (Ψ) from the features Ψ of all human demonstrations to actions (Ψ) : R ↦ → A, where denotes the model parameters.
During training, the most informative features in distinguishing between diferent actions are considered to build a hierarchical tree.This tree contains successive decision rules that lead to one of the actions within the leaf nodes.The nodes higher up in the tree represent decisions, i.e. feature thresholds, with the highest discriminatory power.This DT can then be traversed to extract all features that are used to distinguish between all actions as F ˆ .
Correlation-based Feature Pre-Selection.The idea in correlationbased feature pre-selection is to group similar features and select representatives from each group to reduce the feature set while ensuring that no vital information is lost.For this approach and all following condition selection approaches, the feature time series of all demonstrations are frst transformed to the same length using uniform scaling to stretch all feature sequences to the longest one.Missing values are linearly interpolated.The correlation between all transformed features is calculated, resulting in a correlation matrix with dimensions × .Here, we use Kendall's Tau as a similarity measure between the feature time series.The correlation matrix is then used to perform hierarchical clustering with varying numbers of clusters.The fnal number of clusters is selected by the highest achieved silhouette score.Representatives are selected from each cluster by selecting the feature that has the lowest average intra-cluster distance, i.e. the lowest average distance to all other features within this cluster.All selected representatives form F ˆ .

Condition Selection
To decide whether a feature is a relevant pre-or post-condition for an action, metrics for the feature values at the beginning and end of an action are computed and compared to thresholds.Here, we defne () = [ , , , + ] and () = [ , − , , ] as the pre-and post-condition phases of an action where , and , marks the start and the end of the corresponding action and is a threshold for the number of values considered.
Variations across Demonstrations.The variations a feature across diferent demonstrations can provide insights into its importance w.r.t. the demonstrated task [1,11].We compute the variance across demonstrations VAD () for a feature at timestep as where corresponds to the mean value of the feature across all , demonstrations and represents the value of feature in the -th demonstration, both respective to a specifc time step .
A feature is a pre-condition of action , i.e. ∈ F () if where M is a threshold.Likewise, is a post-condition of action i.e. ∈ F () if Intra-Cluster Variations.To cover situations where an action is shown in diferent ways, leading to a high variance in a feature between those demonstrations, it has been proposed to frst cluster data points from demonstrations [1] and compute the average squared intra-cluster distance =1 =1 with the number of data points , the number of clusters , and the number of data points assigned to a cluster .Furthermore, corresponds to the mean and to the value of the -th data point of cluster .We propose to use the K-Means clustering approach with √ an increasing number of clusters and a maximum of of /2, which corresponds to a heuristic proposed by Mardia et al. [10].Formally, feature is a pre-condition of action , i.e. ∈ F () √ if it holds that ∃ ∈ [1, /2] : ( ) < M for all datapoints in () and a threshold M .The same criterion is applied to all datapoints in () to determine if a feature is a post-condition, i.e. ∈ F ().
Variance Component Analysis.While the previous approaches only consider variations across demonstrations, the variance within a demonstration can also provide insights into the relevance of a feature [11,15].Relevant features might show a high variance within a demonstration while at the same time showing a low variance across demonstrations.The variance within a demonstration for a feature depending on the respective time step can be calculated using a rolling window approach according to [15] 2 where is the window size.Feature is a pre-condition of action , i.e. ∈ F () if it holds that where M is a threshold.Similarly, is a post-condition of action , i.e. ∈ F () if The calculation of the minimum and maximum values for the fnal action conditions, as well as the BT building based on these pre-and post-conditions and a Backchaining approach are adopted from [14] for the experimental evaluation.The thresholds , , and are manually tuned using a heuristic grid search.

EXPERIMENTAL EVALUATION
We evaluate the diferent approaches for feature pre-selection and condition selection on human demonstrations of two tasks, namely a trash disposal task and a stacking task.For the trash disposal data set from [14], 19 participants demonstrated how to pick up trash and dump it in a trashcan.In addition, we recorded videos of 10 participants performing a stacking task where they had to stack three cubes on top of each other.Both data sets include object-object and hand-object distances as features and action labels such as Moveto-Trash, Grasp-Trash.Here, we use AR markers to track the position of relevant objects.In a real-world scenario, instead of markers a proper object tracking algorithm can be used.The feature sets include 67 features in total for the trash disposal task and 97 for the stacking task.For the trash disposal task, only three demonstrations were available per participant.Here, we used jittering [6] as a form of data augmentation to overcome the problem of low variation.For the stacking task, we recorded six demonstrations per participant and used these six demonstrations to evaluate all proposed methods.In Tables 1 and 2, the results of all three condition selection approaches (VAD, SSD, VCA) without feature pre-selection on the trash disposal and stacking task features are shown.We report as the average percentage of conditions that match manually defned baseline action conditions necessary to defne the task, i.e. the hand-cube distance is a relevant pre-condition for the Grasp-Cube action.Here, we identifed 13 necessary action conditions for the trash disposal task and 44 conditions for the stacking task.
describes the percentage of successful BT buildings over all participants based on the extracted action conditions and corresponds to the number of nodes in the resulting BT.For all approaches, we compare two exemplary thresholds, one restrictive threshold leading to a lower number of conditions and a more permissive threshold.The results show that in general, there is always a trade-of between a high and a low .For the  trash disposal task, almost all methods are able to extract conditions leading to a successful BT building for all 19 participants.In comparison, the baseline condition computation from the ILBERT approach [14] leads to 18/19 successfully built BTs, of 0.69 and a very low = 37 using their pre-defned feature set of in total only seven features.For the more complex stacking task, only SSD with a threshold of 15 is able to extract conditions leading to a successful BT building for all 10 participants.However, in this case, is very high, with 843 nodes.Even for the more restrictive thresholds, is high for all approaches, resulting in large BTs.
Tables 3 and 4 show the results for the combination of the feature pre-selection based on Decision Trees (DT ) or feature correlations (C) with the condition selection approaches.Both feature pre-selection methods previous to the condition selection were able to drastically reduce the number of nodes in the resulting BT while at the same time still being able to successfully build a BT for most participants.While the lowest achieved without feature preselection was 120 and 456 for the two tasks (Table 1 and 2), the node number could be reduced to 53 and 223 BT nodes, respectively.
Overall, a lower through the use of feature pre-selection methods can prevent BT failures due to irrelevant action conditions.However, there is a trade-of between a low and a high , which can also result in failures during BT building or robot execution.While all condition selection approaches achieved good results in combination with a suitable feature pre-selection, VCA without pre-selection often lead to problems during BT building, resulting in a lower .While some features might not change during an action resulting in a low with-in demonstration variance, they can still be relevant for this action and successful task execution.

CONCLUSION AND FUTURE WORK
We proposed and evaluated diferent methods for automatic preand post-condition extraction in order to learn Behavior Trees from human demonstrations.In particular, the combination of feature pre-selection and condition selection demonstrated a signifcant reduction in tree size through the pre-selection methods.Our experimental evaluation on two tasks confrmed that the proposed methods are able to automatically select relevant features as action pre-and post-conditions, which are suitable to subsequently build a Behavior Tree as a robotic task representation.
While variance-based methods for condition computation showed promising frst results, not all important task-relevant features might necessarily be discoverable through those.Therefore, in the future, we want to to explore additional metrics.Moreover, additional experiments with more participants are necessary to evaluate the generalizability of the generated BTs and infuence of a larger number of demonstrations.Also transfer of discovered relevant features across similar tasks, automated tuning of the thresholds in our condition-selection methods and incorporation of interactive user input in the feature discovery process and subsequent BT building are, in our opinion, interesting future research directions.

Figure 1 :
Figure 1: Overview of the proposed approaches including two feature pre-selection methods to reduce the feature space of human task demonstrations and three variance-based methods to select action conditions for Behavior Tree building.

Figure 2 :
Figure 2: The approaches are evaluated on demonstrations of a trash disposal task (A) and a stacking task (B).

Table 1 :
Results Trash Disposal Task Condition Selection.

Table 2 :
Results Stacking Task Condition Selection.

Table 3 :
Trash Disposal Task, Combinations of Feature Pre-Selection and Condition Selection Approaches

Table 4 :
Stacking Task, Combinations of Feature Pre-Selection and Condition Selection Approaches