Collective Anomaly Perception During Multi-Robot Patrol: Constrained Interactions Can Promote Accurate Consensus

An important real-world application of multi-robot systems is multi-robot patrolling (MRP), where robots must carry out the activity of going through an area at regular intervals. Motivations for MRP include the detection of anomalies that may represent security threats. While MRP algorithms show some maturity in development, a key potential advantage has been unexamined: the ability to exploit collective perception of detected anomalies to prioritize the location ordering of security checks. This is because noisy individual-level detection of an anomaly may be compensated for by group-level consensus formation regarding whether an anomaly is likely to be truly present. Here, we examine the performance of unmodified idleness-based patrolling algorithms when given the additional objective of reaching an environmental perception consensus via local pairwise communication and a quorum threshold. We find that generally, MRP algorithms that promote physical mixing of robots, as measured by a higher connectivity of their emergent communication network, reach consensus more quickly. However, when there is noise present in anomaly detection, a more moderate (constrained) level of connectivity is preferable because it reduces the spread of false positive detections, as measured by a group-level F-score. These findings can inform user choice of MRP algorithm and future algorithm development.


INTRODUCTION
An important application for multi-robot systems (MRS) is multirobot patrol (MRP), where multiple robots are required to somehow coordinate their behaviour such that all points in the environment requiring surveillance are visited regularly.Several multi-robot patrol algorithms have been developed over the past 20 years or so, which commonly aim to minimize the maximum and average 'idleness' [20] (return times) on each vertex in a patrol graph, using a variety of centralized and distributed robot coordination methods [13,24].Depending on the real-world deployment context, one aim of such patrols might be to detect subtle anomalies representing a potential security threat, for example the electromagnetic activity of an eavesdropping device planted by an adversary.Such anomalies, when detected, are quite likely to be false positives, i.e. to be false alarms caused by noise in sensors or in the environment.An important opportunity for MRS, then, is to help prioritize limited security resources, by advising a user on system-level consensus on an anomaly's presence.In the course of MRP, one or more robots can make more than one inspection of an anomaly to obtain more information, and/or the detection system can benefit from the mechanisms of consensus formation to slow the spread of incorrect opinions ('misinformation').Additional anomaly inspections may be deliberate (i.e.planned by the robots following an initial detection) or occur spontaneously in the course of ongoing patrol.In an adversarial context, it would be important not to unduly compromise idleness minimization to make additional inspections, as this could be a source of vulnerability in a multistage, multi-location attack [30].In the first instance, then, one may wish to retain the primary objective of idleness minimization, while allowing a consensus formation process regarding anomaly location to take place during the course of normal, longer-term operations.Maintaining typical patrolling behaviours may also be less likely to alert an adversary to their discovery.
In this work, we use a realistic ROS-based MRP simulator to examine the performance of unmodified patrolling algorithms when given the secondary objective of reaching a perceptual consensus on anomaly presence, via local pairwise communication and a quorum decision threshold.We consider the accuracy of the system-level perception in relation to the level of noise present in the anomaly detection process.Overall, MRP algorithms that promote more physical mixing of robots have faster consensus, given the assumption of local communications.However, when anomaly detection noise increases, a more moderate communication connectivity is preferable to suppress false positives.These findings could help to inform user choice of patrolling algorithm, given expectations around detection noise and attack frequency.We provide more background on relevant research in Section 2, and detail our methodology in terms of simulator, collective decision-making mechanisms, and graph metrics in Section 3. In Section 4 we present simulation results and in Section 5 we provide some conclusions and plans for future work.

BACKGROUND
We briefly review MRP algorithms and collective decision-making and perception.We also consider a graph-based metric to describe the emergent communication network between patrolling robots with local belief exchange.

Multi-Robot Patrolling Algorithms
MRP generally focuses on a static environment to be patrolled.This is defined as an area of interest that has nodes that are to be visited, in order to minimize idleness of each node.The term instantaneous node idleness was proposed by Machado et al. [20] and is used extensively in the MRP field.The application of this metric is to define the number of cycles or time since a node was last visited.In addition to this term there is also 'instantaneous graph idleness' which is the average idleness of each node in the entire region or graph.Given a map or graph to patrol by an MRS (e.g. Figure 1), one fundamental problem to solve is the apportionment of locations (nodes) to visit by different robots, where efficient visitation will physically spread the agents between nodes.One approach is an auction-based system where each robot submits a bid either to a central auctioneer [16] or between robots in a local proximity [5].Much work has been done to take developments from Game Theory and apply it to the robot patrol problem.Owing to the malicious nature of an attacker in a system, different patrolling strategy approaches are required [32].With predictability comes weakness to attackers or intruders who may observe the pattern of patrol and gain access during robot downtime [3].A more realistic algorithmic performance benchmark, beyond idleness minimization, needs to take such predictability into account [34].Other recent approaches have taken advantage of machine learning capabilities in order to derive a 'learned' patrol strategy to offer solutions to adversarial patrolling as traditional methods are subject to exploitation by attackers [4,40].These reinforcement methods offer benefits when it comes to managing potential attackers, but often do not generalize well to different environments [18].An in-depth study of the latest approaches to the problem of multi-robot patrolling and their respective benefits and shortcomings can be found in [2].
A broad selection of ten existing multi-robot patrolling algorithms from the literature, designed to minimize idleness, are used here without modification.These algorithms are structured in different ways and as such, patrolling robots exhibit a variety of behavior (Table 1).More detail on the structure and operation of the algorithms is presented in [20,24,25].As we go on to discuss, the algorithms vary considerably in the extent to which robots become spatially mixed, which is significant when they have local communication.

Collective Decision-Making and Perception
One approach to managing the large quantity of information that is gathered in a multi-robot system is for each robot to contribute its 'opinions' or 'beliefs' about the state of the world, and according to some global or local mechanism, allow the group to determine which to collectively adopt, referred to as reaching a consensus.Often owing to communication constraints, and also to prevent rapid noise propagation, consensus formation can be performed at a local level, between individual robots.How the consensus between all robots is formed depends on the belief exchange operator that is used to respond to information.One approach is to use voting model methods, where input from each robot is submitted into a voting system in order to determine the opinion of the group [12,29], but this approach can often suffer from noisy measurements from individual robots [8].An approach that seeks to solve the issue caused by noisy measurements was initially proposed by Perron et al. [21], whereby a third truth state of 'uncertain' is employed as a compromise between certain of false or certain of negative.This was applied directly to multi-agent systems in [6], which showed that the application of this Kleene uncertain ternary logic can result in better performance than a standard Boolean truth pair in noisy environments.
In the collective learning literature, a common task is for agents to perceive a feature of the environment and come to a consensus on the state of that feature.This has been abstracted as a perception of white to black floor tiles (e.g.[29]), with the ratio being the element being voted on.This scenario shares similarities with the patrolling problem in regard to communication and data exchange [33], but the information is of a single dimension or value -the ratio of black to white.In the patrolling problem there is a single value for each area of interest, with all nodes having the same importance as another.
With the collective learning and communication of information, comes the ability for erroneous information to propagate throughout the system and cause a false consensus to be reached.Often in the literature an assumption is made that the world is static and fixed -that there is no change in the real values of the measured world.In a dynamic environment -such as a realistic patrolling problem -there are changes in the world that need to be recorded, such as the activity of an attacker, or variability in the physical environment affecting sensor readings ('noise').Some recent work has shown that robots adapt better in environments that are subject to change when they have a constrained communication, preventing preemptive consensus formation [31].

Communication Graph Connectivity
Robots in a multi-robot system can be dispersed over a wide area, and the ability for robots to maintain communication with one another is dependent staying within local communication range.If a robot is isolated or weakly connected to the communication graph, it will be unable to exchange information that it gathers or receive new information from other robots.It is therefore useful to use a measure of connectivity that adequately describes the interconnections between the robots and the strength of said connections.One of the most extensively used metrics to measure this in the literature is the algebraic connectivity or 'Fiedler value' [10].Mathematically, it is the second smallest eigenvalue of the Laplacian matrix for the graph, where the Laplacian matrix is the degree matrix subtracted from the adjacency matrix.Conceptually, it is a measure of connections between nodes as well as the strength of

Short name
Full name Patrol Strategy Decentralized?
CBLS [27] Concurrent Bayesian Learning Strategy Bayesian Learning Yes CGG [23] Cyclic Algorithm for Generic Graphs Hamiltonian Path No CR [19] Conscientious Reactive Reactive Yes DTAG [9] Dynamic Task Assignment Greedy Utility Function Yes DTAP [9] Dynamic Task Assignment Auction Utility Auctioneer Yes GBS [26] Greedy Bayesian Strategy Bayesian No HCR [1] Heuristic Conscientious Reactive Heuristic Reactive Yes HPCC [1] Heuristic Pathfinder Conscientious Cognitive Heuristic Pathfinder Yes RAND [20] Random Random Selection Yes SEBS [26] State Exchange Bayesian Strategy State & Bayes Exchange Yes Table 1: MRP algorithms examined in this study the edges between nodes within a graph.The algebraic connectivity has a wide range of applications in determining the strength of connection of nodes within a system, most notably in wireless communications where adequate coverage and interconnections are highly desirable properties [17,35,38].The measure of algebraic connectivity can be highly correlated with performance under certain circumstances such that it can be used as an input when designing the control algorithms for mobile robots.Zavlanos and Pappas used the algebraic connectivity of a team of mobile robots' communications as a way of controlling their spatial distribution [37].Other work has been done to use algebraic connectivity as part of the feedback input to a system within applications such as formation control [22].Here, we use it to assess the emergent communication networks resultant from robots moving according to various MRP algorithms.

EXPERIMENTAL SET UP
We examine the performance of a multi-robot system (MRS) undertaking multi-robot patrol (MRP) when there is one, and only one, true anomaly present.The anomaly is always located in the same place on the patrol graph ('node 30').When there is detection noise, false positive anomaly detections occur, and we examine the ability of the collective opinion dynamics to suppress these.

Simulator
The simulation environment used in this study is ROS Patrolling Sim1 , which is described in [28].The simulator is built using the Stage2 simulation package [11] and integrated into the ROS (Robot Operating System) framework, which enables easy transfer to real robotic systems.Each robot in the simulation is modeled as a differential drive robot, with a laser rangefinder and odometry from wheel encoders with a drift error model.As the modality of the actual sensor is not the focus of this paper, anomalies to be detected are modeled as measurements that are made with a generic sensor.It is assumed that each observation of an anomaly is instantaneous upon reaching a patrol graph node, and subject to some noise probability value according to the experiment.Upon arrival to a node on the graph, the robot searches for an anomaly and records a true or false depending on world state and noise model for that node.Given a probability value of 5%, there is a 5% chance that the sensor records the opposite value that is present in the world.Upon revisiting that same node, there is a new chance of making an incorrect measurement.Robots are modeled as having an omni-directional antenna that is capable of communicating in a defined radius with other robots in the vicinity, regardless of obstacles or walls that may be between each robot.Communication is modeled as a one-to-one (pairwise) exchange, with a minimum timeout/delay period before repeated communications between the same robots.Upon initializing communication with another robot, a pairwise comparison of belief states occurs resulting in both robots having identical beliefs.

Opinion Dynamics
In a world with  nodes on the patrol graph, robots have a set of belief values that can be either 1, 0 or 1  2 , meaning true, false and uncertain respectively.Each robot maintains a set of measured belief states of the world according to the number of nodes that exist in the map  = { 0 , • • • ,   }, where   is the robot's value for node .When two robots compare beliefs, they update their belief values according the fusion truth table shown in Table 2. Using this belief fusion provides benefits to the system as a whole, as each robot can gain information on regions of the world that they have not explored.As the simulation world is subject to anomaly detection noise at individual nodes, the ability to reject noise at the system level is highly desirable.When two robots that begin the communication process have conflicting belief states on the same node, their resultant value is that they are both uncertain about the value of that node.This ability to reconcile conflicting information also extends to measuring a node directly.When an agent revisits a node and takes a measurement which does not match its prior belief, the belief fusion in Table 2 is performed between the prior and subsequent beliefs.By employing the use of pairwise communication, the system as a whole is able to collectively learn the state of the world faster than a single robot.The overall benefit of limiting the information breadth by using this ternary belief state model over, say, a voting model is that the three-value system is better suited to reject noisy measurements or information from faulty agents [8].

Error and F-Score
In order to judge the performance of the system at arriving at an accurate consensus of the world state, a metric which will be referred to as system error is used, where the error is defined as: Where m is the number of nodes on the graph, n is the number of robots in the experiment, B is the belief for a node, S * is the true world state and p is the agent's belief at a given node [7].
Another system-level performance metric used is an F-score, which is introduced here in Equation 2. This represents the harmonic mean of the precision and recall of the system, which gives a way to compare the recorded results with the real world in order to determine the effect of noise on the performance of each algorithm.Terms in Equation 2 are   , the count of true positives,   , true negatives,   , false positives,   , false negatives and , the number of unmeasured nodes.These are given a half weighting as per the uncertain belief in Table 2, to penalize the performance metric if nodes remain unknown.The use of this system-level Fscore provides a useful insight into the performance of the system with regard to precision of the data that is collected.This scoring will be given for each algorithm at the end of the experiment when an expected consensus is to be reached.

Experiment Parameters
For each experiment run, each robot is initialized at the same position regardless of patrol algorithm to ensure comparability.Each set of experiments are run on the same default map ('Cumberland') that has 40 patrol graph nodes (Figure 1).This map is representative of a realistic building environment.The simulations have a system of eight robots, a group size that is both realistic and reduces the number of interference or collision events that can occur at higher robot densities [25].There were 20 simulation runs per algorithm.Table 3 shows some further simulation parameters.

Robot communication range 5 m
Communication timeout period 30 s

Table 3: Simulated experiment parameter values
In order to compute the simulation runs, the software packages were 'containerized' and executed in parallel on a cloud-based compute service.The cloud service utilizes a 22-core Intel Xeon Gold 6238 with 16GB of RAM.

SIMULATION RESULTS AND DISCUSSION
A boxplot of the algebraic connectivity for each algorithm is presented in Figure 3.This shows a wide variety in communication intensity between robots, with the HPCC, HCR, CR and CGG algorithms leading to noticeably higher algebraic connectivity than the other six algorithms tested.At the other end of the scale, DTAP and DTAG have very low connectivity, while GBS, RAND, SEBS and CBLS have intermediate connectivity.Figure 3 seems to suggest,  then, three approximate groupings of algorithms of high, intermediate and low connectivity.
There was a negative correlation between a system's algebraic connectivity and the convergence time to 85% consensus on an accurate world view, i.e. at least 7 of 8 robots completely matching the true state of the world in their beliefs (Figure 4).There was a Pearson correlation coefficient of −0.377 ( ≪ 0.001) for 0% noise, −0.313 ( ≪ 0.001) for 5% noise, and −0.299 ( ≪ 0.001) for 10% noise.On this figure 20% noise level is omitted as a majority of the simulations did not reach accurate consensus within the experiment run time.While faster convergence time is welcome if it is to a true positive, it may also have its disadvantages if it is at the expense of larger numbers of false positives.
Figure 5 shows that there are several algorithms that reliably give a consensus for a true positive (i.e. a correctly detected anomaly) on the correct node for different noise levels, producing a consensus 100% of the time, or at least on over 90% of occasions.This is valuable information that could help to prioritize a security professional's inspections after time away from the building.However, in some cases there were also large numbers of false positives produced elsewhere on the patrol graph (Figure 5 bottom), averaging around 6 per run in the worst case (CGG) down much less than 1 per run with DTAG, with 5% noise.As detection noise increases, as anticipated, the number of false positives also increases, and thus it becomes apparent that it might be preferable to employ an algorithm that has good, but not the best, true positive detection performance, in return for cutting down false positives.SEBS and CBLS are notable examples of this, with much lower false positives than some alternatives that perform well in true positives, such as CGG.This trade-off is captured by the F-score.
Table 4 shows both the average idleness and F-scores for different noise levels, across the different algorithms tested.Standard deviations  for the results are also shown.The best two algorithms are shown for each metric: very good performance of idleness minimization is found for CBLS and SEBS.For 0% noise, where false

CONCLUSION AND FUTURE WORK
Multi-robot patrolling (MRP) is a well-studied problem with several effective control algorithms for a user to choose from.However, when there is anomaly detection noise, there is considerable untapped potential for a multi-robot system to help a user to prioritize their security inspections.This is because robots can share their detection information and form a collective consensus about which locations are most likely to be worth checking -and such locations could be the first stops on a regular human security patrol, for example.Here, we found that MRP algorithms controlling robots with local, pairwise belief exchange, resulted in systems that tended to converge to consensus more quickly when there was higher connectivity in their emergent communications network -that is, when they made robots physically pass by each other more often.However, for some algorithms, easy consensus formation also led to high numbers of false positive results.We developed a multirobot F-score to obtain an overall view of system accuracy for each algorithm and noise level.We found that some of the leading MRP algorithms with respect to idleness minimization [27], which also had intermediate connectivity, also had good F-scores, with SEBS (State Exchange Bayesian Strategy, [26]) and CBLS (Concurrent Bayesian Learning Strategy, [27]) having good all-round performance.Their coordination of robots to enhance efficient patrolling also has the advantage of reducing the mixing of robots, and hence constrains somewhat their local communication.
The algorithms and their emergent consensus formation behaviors have been studied here on a moderately-sized map (40 nodes, average degree 2.2).This investigation will be extended further for different maps.Given a patrol graph (map) with a different level of connectivity or larger number of nodes for the same number of robots to patrol, different interaction levels may result.This would require further investigation into the relationship between the number of robots to the number of nodes that each robot is expected to visit.Given a larger map for the same number of robots it could be anticipated that there will be fewer interactions due to the larger spatial separation between the robots (i.e.lower robot density).This may result in a slower consensus, or no consensus at all.Patrol graph connectivity (e.g.average degree) is also likely to be a relevant factor in emergent communication network connectivity.
We have examined the performance of unmodified MRP algorithms, where robot behavior does not change upon initial anomaly detection.A natural next step would be to add in a behavioral response such as a re-inspection or communication to a nearest neighbor to invite them to also visit that location.With such interaction, the spatial distribution of robots could itself indicate anomaly distribution [14,15].Given the good performance of algorithms resulting in intermediate network connectivity [31], this could also be used as a control input [22,37], with global measurement or local connectivity estimation in a decentralized implementation [ 36,38,39].We will also explore further scenarios in terms of anomaly regularity, given that real-world adversary activity is likely to be transient or intermittent.In such a case, it may well be preferable to err on the side of false positives, and prefer higher communication connectivity.Our results should help inform understanding of the trade-offs involved in using multi-robot systems to aid the perception of difficult-to-detect environmental features.

Figure 1 :
Figure 1: Trajectory plot of a system of eight robots executing the SEBS patrol algorithm for over an hour on the 'Cumberland' map , with each unique color representing a single robot and its patrol route.

Figure 2 :
Figure 2: Example social network of eight robots executing SEBS algorithm on 'Cumberland' map for 1 hour.Nodes are individual robots, with edges representing the number of communications between respective robots.Darker shade indicates more frequent communication between robots.

Figure 3 :
Figure 3: Algebraic Connectivity results for each algorithm tested

Figure 4 :
Figure 4: Time to reach 85% consensus on accurate view of world state against algebraic connectivity for each algorithm and noise level.

Figure 5 :
Figure 5: For 20 simulation runs, Top: Proportion of runs that reached consensus (> 85% or at least 7 of 8 robots) on particular node containing true positive (real security threat); Bottom: Average of count false positive consensus on other nodes.

Table 2 :
Belief fusion truth table for beliefs  and  ′

Table 4 :
Performance (average node idleness, F-score) for each algorithm tested at 0, 5, 10 and 20% noise in measurement, averaged across 20 simulation runs.Best two performing algorithms in each category are highlighted in bold text.