Testing Cluster Properties of Signed Graphs

This work initiates the study of property testing in signed graphs, where every edge has either a positive or a negative sign. We show that there exist sublinear query and time algorithms for testing three key properties of signed graphs: balance (or 2-clusterability), clusterability and signed triangle freeness. We consider both the dense graph model, where one queries the adjacency matrix entries of a signed graph, and the bounded-degree model, where one queries for the neighbors of a node and the sign of the connecting edge. Our algorithms use a variety of tools from unsigned graph property testing, as well as reductions from one setting to the other. Our main technical contribution is a sublinear algorithm for testing clusterability in the bounded-degree model. This contrasts with the property of k-clusterability in unsigned graphs, which is not testable with a sublinear number of queries in the bounded-degree model. We experimentally evaluate the complexity and usefulness of several of our testers on real-life and synthetic datasets.


INTRODUCTION
A signed graph is a graph where every edge either has a positive or a negative label.Formally it is denoted as = ( , , ) with node set = [ ], edge set ⊆ × and edge labelling : → {+, −}.Such graphs model a variety of diferent scientifc phenomena.The widely studied correlation clustering problem [7,14] was orginally motivated by a document classifcation problem, where one has knowledge of pairwise similarities between documents, and the goal is to cluster the documents into (an unspecifed number of) groups such that within a group the documents are similar to each other, while across groups they are less similar.Several authors [9,37] have focused on the related problem of fnding large polarized communities in signed graphs.The survey of [36] lists several other important mining tasks in signed social media platforms.
A second example are web applications such as online social networks.Interactions between individuals on these platforms can often be categorized into binary categories: trust versus distrust, friendly or antagonistic, etc.An important aspect in the edge formation of social networks is the sign of triangles.According to structural balance theory from social psychology [10], triangles with either one or three positive edges are more plausible, and this prevalence has been observed in real-life social networks [30,36].Many methods and algorithms for link and sign prediction try to capitalize on this [11,30,32].For a given sign confguration of a triangle (e.g, a {+, +, −} triangle) we say that a signed graph is signed triangle free if such a triangle is not present in the graph.The presence of signed triangles is relevant in structural balance theory [10], and {+, +, −} triangle packings have been used as lower bounds in designing approximation algorithms for correlation clustering problems [1,7,38].A frst natural question we ask is if we can quickly decide whether a signed graph is signed triangle free or "far" from it.This turns out to be very similar to the unsigned case, so we shift our focus to cluster properties inherently specifc to signed graphs.
Since signed graphs generalize unsigned graphs, they can have diferent properties.One important example is the property of clusterability or weak balance, this was frst introduced in [13] and it is the subject of the correlation clustering problem [7].A signed graph is clusterable if there exists a partitioning of the nodes into an a priori unknown number of components such that (i) every positive edge connects two nodes in the same component, and (ii) every negative edge connects two nodes in diferent components.An equivalent characterization, in terms of forbidden subgraphs, is that the signed graph contains no cycles with exactly one negative edge [13,14].In Section 3.2 we utilize this forbidden subgraph characterization to design a clusterability tester in the bounded-degree √ model with query and time complexity O ( ˜ /poly()).Clusterability does not appear to have a meaningful interpretation in the case of unsigned graphs.For example, if one views an unsigned graph as a signed graph with the restriction that all the edges have the same sign, whether positive or negative, then clearly any such graph is clusterable since it contains no cycles with exactly one negative edge.
Other signed graph properties are closer related to unsigned graph properties.For example, the property of balance or 2-clusterability) in signed graphs [23,25] in fact generalizes that of bipartiteness in unsigned graphs.A signed graph is balanced if it is clusterable into exactly two components, with only positive edges inside the components and negative edges between the components.It follows that a signed graph with only negative edges is balanced if and only if the underlying unsigned graph is bipartite.There is also a distance-preserving reduction in the opposite direction, transforming a signed graph to an unsigned graph by replacing each positive edge by a path of two negative edges, and afterwards omitting all the signs of the edges [39].We use this reduction to show that in the bounded-degree model we can reduce the problem of testing balance to that of testing bipartiteness. 1.1 Graph Property Testing: What and Why?
Graph property testing was formally introduced in the seminal work of Goldreich, Goldwasser and Ron [20].As input we are given query access to an (unsigned) graph = ( , ) with node set = [ ] and edge set ⊆ × .We would like to decide whether the graph obeys a certain property P, or whether it is "far" from any graph having that property.This is a relaxed setting as compared to that of deciding P and it often allows for algorithms that have sublinear query and/or time complexity.Such sublinear algorithms have been proposed for a wide range of graph properties such as bipartiteness [20,21], -colorability [3,20], cycle-freeness [12,22] and more generally monotone graph properties [4] and minorclosed properties [26,27].The precise defnition of "far" however depends on the type of query access that we have to the graph: (1) In the dense graph model [19] we are able to query the adjacency matrix entries.A query takes the form (, ) ∈ [ ] × [ ] and the reply is 1 if there is an edge between and ′ , otherwise it is 0. Two graphs = ( , ) and = ( , ′ ) are said to be -far from each other if they difer in at least an -fraction of the adjacency matrix entries.Equivalently, at least2 edges have to be added or removed to turn ′ into .(2) In the bounded-degree graph model we are given an upper bound on the degrees of the graph, and we are given access to the adjacency list of .A query takes the form (, ), with ∈ [ ] and ∈ [].If the degree of is at least , then the query is answered with the th neighbor of node (in arbitrary order).If has degree smaller than , an error ′ symbol is returned.Two graphs = ( , ) and = ( , ′ ) are -far from each other if at least edges have to be modifed (added or removed) to turn into ′ .Using these defnitions, a graph is -far from having property P if it is -far from any graph having property P. A property testing algorithm for P is a randomized algorithm 2 that, given query access to and an error parameter , must behave as follows [20]: (1) if has property P then the algorithm should accept with probability at least 2/3.(2) if is -far from having property P, then the algorithm should reject with probability at least 2/3.If satisfes neither condition than the algorithm can behave arbitrarily.This is the main reason why testing algorithms are often far more efcient than algorithms for efectively deciding whether has property P or not.If a property tester always accepts graphs having property P (i.e., it never falsely rejects), then it is called a one-sided tester for P. Otherwise it is called two-sided.If P can be tested by a tester with query complexity independent of (only dependent on ), then P is said to be easily testable.
There are several potential benefts of using a property testing algorithm (see also [20]).First, they can be used before running an algorithm that decides P exactly.If a graph is far from satisfying P, we obtain a proof of this with good probability, and save the time of running the exact algorithm.If the tester accepts, we may still run the exact algorithm without much extra time.Second, in case one is promised that input graphs are either good or very bad, then there is no need for the exact algorithm at all.Third, one might not have the time to run the exact algorithm before making design decisions.This is helpful in scenarios where a good design consists of an input satisfying P. A tester will accept good designs and reject designs that need a lot of cost (edge modifcations) to become a good design.The tester may still accept designs that are not good, but then at least we know there is a little cost needed to make it into a good design.Lastly, sometimes the ideas in the design and analysis of testing algorithms can be used to design distance approximation algorithms.We refer the interested reader to the survey by Goldreich [18] for more connections to sublinear time approximation, streaming algorithms and computational learning theory.We also mention the important result of [16] that in dense graphs if a property is easily testable, then it is possible to estimate how far (how many edge modifcations are required) to an input graph is to satisfying the property, within additive error 2 , in time that depends only on .Signed Graph Testing Framework.The defnitions of distance and query access for unsigned graphs are easily extended to signed graphs: (1) In the dense signed graph model adjacency matrix queries are now answered by an element from {0, −, +}.A signed graph is -far from property P if at least 2 edge modifcations (addition, removal or sign switch) have to be made to obtain a graph that satisfes P. (2) In the bounded-degree signed graph model a query (, ) ∈ [ ] × [] is now answered either by the -th neighbor of and the sign (, ) of the corresponding edge, or by an error symbol if has less than neighbors.A signed graph is -far from P if at least edge modifcations (addition, removal or sign switch) have to be made to obtain a graph that satisfes P. In both models the edge modifcations consist of edge additions, removals, as well as sign switches.Since the properties discussed in this paper (signed triangle freeness, balance and clusterability) are all monotone, we restrict our attention to edge removals.time complexity is at most exponential in the query complexity.In the rest of the section we give a sketch of our techniques.Dense signed graph model.The property of signed triangle freeness can be efciently tested by interpreting the signed graph as an edge-colored graph.We can then use Fox's edge-colored triangle removal lemma [17], similar to the case of triangle freeness in unsigned graphs.For the property of balance or 2-clusterability we can use a reduction to a Constraint Satisfaction Problem (CSP): every node corresponds to a Boolean variable (which indicates its cluster), a positive edge imposes an equality constraint between its endpoints and a negative edge imposes an inequality constraint.We can then test balance by using a property testing algorithm for CSPs [2,6,35].The property of clusterability can also be cast as a CSP in which the node variables now take arbitrary integer values in [ ] (indicating their cluster).However, the aforementioned CSP testers [2,6,35] are not efcient in such a regime.We circumvent this problem by proving that a signed graph that is clusterable is necessarily /10-close to being clusterable into O (1/) clusters.Using this we reduce the problem of testing clusterability to that of distinguishing graphs that are /10-close to being O (1/)-clusterable from those that are -far from being O (1/)-clusterable.This problem corresponds to tolerantly testing a CSP where the variables now take values in [O (1/)], and this can be done efciently using an algorithm by Andersson and Engebretsen [6].Bounded-degree signed graph model.Testing signed triangle freeness is trivial in this model, similar to the unsigned case [18].Testing balance requires more care.While we can again cast the problem as a CSP, we are not aware of any appropriate property testing algorithms for CSPs in the bounded-degree model.Rather we reduce the problem of testing balance for signed graphs to that of testing bipartiteness for unsigned graphs, for which we can use the algorithm of Goldreich and Ron [21].The reduction is based on a transformation described by Zaslavsky [39], which maps balanced (resp.unbalanced) signed graphs to bipartite (resp.nonbipartite) unsigned graphs.The resulting algorithm's query complexity has √ √ an optimal -dependence, which follows from the Ω( ) lower bound for testing bipartiteness.

Results and Techniques
Finally, and this is our main technical contribution, we describe a property testing algorithm for clusterability.While we can again reduce the problem to testing O (1/)-clusterability, similar to the dense case, the problem is that -colorability (which is a special case of -clusterability) is not testable in the bounded-degree model [8].Rather, we base our algorithm on the forbidden subgraph characterization by Davis [13], which states that a signed graph is clusterable if and only if it has no cycles with exactly one negative edge.We then use random walks to fnd such a cycle: frst we pick a random initial node and perform a large number of random walks on the positive edges of , then we check for the existence of a negative edge between any pair of nodes that were visited by a random walk.Such a negative edge necessarily yields a bad cycle.The correctness of this algorithm is straightforward to prove when the positive edges in induce an expander.For the general case we build on the (unsigned) graph decomposition results of [21].

DENSE SIGNED GRAPH MODEL
Testing signed triangle freeness in the dense model can be done by using Fox's edge-colored triangle removal lemma [17], similarly like testing triangle freeness in unsigned graphs is a direct application of the triangle removal lemma [17,34].More details and the proof of Theorem 1 can be found in Appendix A.1.Theorem 1.There exists a one-sided tester for signed triangle freeness in the dense signed graph model with query complexity Õ (tower(log(1/))).

Balance
We cast balance or 2-clusterability of a signed graph = ( , , ) as a satisfability problem.Associate with each node a variable ∈ {0, 1}.With every edge (, ) ∈ we associate a constraint on and : if () = + (positive edge) the constraint is satisfed if = ; if () = − (negative edge) then the constraint is satisfed if ≠ .The graph will be balanced if there exists an assignment of 's such that all constraints are satisfed.Even more, if is -far from being balanced then we have to remove 2 constraints from the satisfability problem for it to be satisfable.As a consequence, the problem reduces to testing whether the satisfability problem is in fact satisfable.For this we can use the work by Sohler [35] which describes a one-sided tester with query complexity O ( ˜1/ 2 ).

Clusterability
A relaxation of -clusterability for signed graphs is the notion of weak balance or clusterability [7,13].A signed graph is clusterable if it is -clusterable for some (a priori unknown) ∈ [ ]. Since there can be at most clusters, we could test clusterability by testingclusterability.The satisfability reduction from last section however fails in such case, because typical satisfability testers have a bad dependence on the domain size of the variables.Rather, we argue that testing clusterability can be reduced to tolerantly testing -clusterability for ∈ (1/).A tolerant tester [33] is required to accept inputs that are 1 -close to some property P, while rejecting inputs that are 2 -far from P, for some parameters 1 < 2 .Tolerant testing is closely related to approximating the distance from an object to a property (see [33]).We use the following lemma.and 2 (which is always possible).This yields a new partition with at most 1/ components.Between the components there are only negative edges, and there are at most (2 ) 2 / = 4 2 edges within the components.Hence if we remove all the edges within the new components, then we obtain a new graph for which the new partition describes a clustering with at most 1/ clusters, and which is (4)-far from the original graph.□ Now if a signed graph is -far from being clusterable, then clearly it is also -far from being say (8/)-clusterable.On the other hand, by this lemma, a graph that is /4-close to being clusterable will be (/4 + /2) = 3/4-close to a graph that is (8/)-clusterable.Hence we can use a tolerant tester for O (1/)-clusterability to tolerantly test clusterability.Equivalently, we can use an additive estimate (with error ± 2 ) on the number of edges that need to be removed in order to make a signed graph -clusterable, for ∈ O (1/).Now we are in better shape to cast the problem as a satisfability problem, similar to last section.In Appendix B.1 we detail how to use the algorithm of Andersson and Engebretsen [6] to tolerantly test for O (1/)-clusterability using O ( ˜1/ 7 ) queries.This yields the following theorem.
Theorem 4.There exists a two-sided tolerant tester for clusterability in the dense signed graph model with query complexity O ( ˜1/ 7 ).
Moreover, since the tester is tolerant it also estimates (within additive error 2 ) the minimum number of of edge deletions such that the remaining graph is clusterable by using O ( ˜1/ 7 ) queries.

TESTING IN THE BOUNDED-DEGREE SIGNED GRAPH MODEL
Testing signed triangle freeness is similar to the unsigned case [18].The algorithm picks O ( ˜1/) nodes and rejects the input graph if any of them is part of a signed triangle.The proof of Theorem 5 can be found in Appendix A.2.
Theorem 5.There exists a one-sided tester for signed triangle freeness in the bounded-degree model with query complexity O ( ˜1/).

Balance
Our algorithm for testing balance of bounded-degree signed graphs reduces the problem to testing bipartiteness in a related unsigned graph 3 .Consider the following mapping (Figure 1) from a signed graph to an unsigned graph ′ : (i) for every positive edge (, ) create a new node (,) and replace the edge (, ) by two unsigned edges (, (, ) ) and ( (,) , ), and (ii) replace each of the remaining negative edges by an unsigned edge.The unsigned graph ′ has an odd cycle if and only if has a cycle with an odd number of negative edges.As a consequence, ′ will be bipartite if and only if was balanced.The signed frustration index of a signed graph is the minimum number of required edge deletions to make the resulting graph balanced.The unsigned frustration index of an unsigned graph is the minimum number of required edge deletions to make it bipartite.In [39, Proposition 2.2] a stronger result was proven, namely that the (signed) frustration index of is equal to the (unsigned) frustration index of ′ .This implies the following lemma.Lemma 6.If is -far from balanced then ′ is /( + 1)-far from bipartite.
Proof.For the second fact, let be -far from being balanced, so that it has signed frustration index ≥ .The unsigned graph ′ then has unsigned frustration index ≥ .Now if has + ′ positive edges, then has exactly + + ≤ ( + 1) vertices while keeping the same degree bound .As a consequence, we can bound its frustration index ≥ +1 ( + 1) ≥ +1 ( + + ), so that ′ is indeed +1 -far from being bipartite.□ Lemma 6 states that we can test the balancedness of with ′ parameter by testing the bipartiteness of with parameter /( + 1).Testing bipartiteness can be achieved by Algorithm 1, which uses random walks to fnd odd cycles.It was proven to be a one-sided bipartiteness tester in the bounded-degree model [21].

4:
Output reject if a vertex is reached by both an even-length and an odd-length path ⇝ .
It remains to prove that we can efciently implement this tester on ′ .Lemma 7. It is possible to implement Algorithm 1 on ′ using √ poly(log( )/) adjacency list queries to .
Proof.First note that the degree () of any node can be determined using O (log( ())) queries using binary search.We need to be able to select a uniformly random node from ′ , and ′ implement a random walk on .The latter is easy: ′ • If we are on an original node in then pick a random neighbor of in .If (, ) is negative, go to , otherwise go to the new node indexed (,) .• If we are on a new node (, ) , go to either or with probability 1/2.To select a uniformly random node from ′ , do the following: (1) Pick (, ) ∈ [ ] × [] uniformly at random and query for the -th neighbor of in .If has less than neighbors we reject.( 2) If (, ) = −, with probability 1/(4 ()) output a random endpoint of (, ) and terminate.Otherwise, go to next step.(3) If (, ) = +, with probability 1/4 output (,) and terminate.Otherwise, with probability 1/(3 ()) output a random endpoint of (, ).With probability ≥ 1/(4), this scheme returns a uniformly random ′ node from (and otherwise it rejects).To see this, frst consider ′ any original node in .Any of its () incident edges is picked with an equal probability 1/( ).If a negative incident edge is then is returned in step 2. with probability 1/(4 ()); if it is a positive incident edge then is returned in step 3. with probability (1 − 1/4)/(3 ()) = 1/(4 ()).Hence the total probability that is returned is ′ Now consider a new node (,) in .In step 1. the edge (, ) is picked with probability 1/( ), after which (,) is returned with probability 1/4 in step 3., yielding a total probability 1/(4 ).Since there are + − ≥ nodes in ′ , the total probability of returning a node is ≥ /(4 ) = 1/(4).The sampling scheme only requires a single query, and so we can sample a uniformly ′ random node from using 4 ∈ O (1) queries in expectation.By Chebyshev's inequality the total number of queries will be close to its expection with high probability.□ This proves the following theorem.
Theorem 8.There exists a one-sided tester for balance in the √ bounded-degree model with query complexity O ( ˜ /poly()).
Since balancedness of signed graphs generalizes bipartiteness of √ unsigned graphs, the Ω e ( ) lower bound for testing bipartiteness in the (unsigned) bounded-degree model [22,Theorem 7.1] also applies to testing balancedness in the (signed) bounded-degree model.√ As a consequence, the -dependency of our tester is optimal.

Clusterability
In this section we prove the existence of a one-sided property tester for clusterability in the bounded-degree signed graph model.We frst note that similar to the dense case we can reduce the problem to testing (1/)-clusterability.However, -clusterability is a special case of -colorability for unsigned graphs, and this is known not to be testable in the bounded-degree model [8], requiring Ω( ) queries.Instead, we use the forbidden subgraph characterization of clusterability by Davis [13]: Theorem 9 ([13, Theorem 1]).A signed graph is clusterable if and only if contains no cycle with exactly one negative edge.
We call such a cycle a bad cycle.Algorithm 2 will try to fnd bad cycles by performing many random walks on the subgraph + = ( , + ) induced by the positive edges + .Starting from a random initial node, we perform many such walks and we check for the existence of a negative edge between distinct walks.Such a negative edge will necessarily yield a bad cycle, in which case we reject the graph.The claim about the query complexity is easy to check.The √ total number of random walk steps is O ( ˜ /poly()), and a single random walk step can be implemented with O ( ˜1) queries.To check whether there exists a negative edge between any pair of nodes in , it sufces to query the full (bounded) neighborhood of every node √ ĩn .This takes | | ∈ O ( /poly()) queries.The remainder of this section is used to prove correctness of the tester, which ultimately follows from Theorem 14. Intuition for expanders.We frst describe the intuition behind the tester.To this end, assume that there is a decomposition = 1 ∪ • • • ∪ as in Figure 2 such that for each the following holds: (1) has few positive outgoing edges: (2) A random walk of length O ( ˜1/poly()) on + , and starting from any ∈ , ends uniformly at random inside .While such a decomposition does not generally exist, the existence of a closely related decomposition was proven in [21].Now assume that is -far from being clusterable.Then we claim that there must be at least 2 negative edges inside the partitions 1 , . . ., .Indeed, if this were not the case, then we could fnd a valid clustering by removing these ≤ 2 negative edges together with the ≤ 2 positive edges between the partitions.This contradicts the fact that is -far from being clusterable.Now make the additional assumption that the number of negative edges | − ( )| inside each partition is Ω( | |), and consider an arbitrary node ∈ .The probability that a pair random walks on the positive edges, starting from , results in a bad cycle can be lower bounded by the probability that the random walk endpoints and form a negative edge (, ) ∈ − .Since and are distributed uniformly, this probability is at least √ Taking / independent random walks (and ignoring correlations), the total probability of fnding a bad cycle then becomes  General graphs.While the former section correctly captures the intuition behind the tester, the full proof of correctness is significantly more involved.We utilize various ideas from [21] regarding the graph decomposition, but also regarding bounds on the correlation between distinct random walks.The main idea of the decomposition is that for most of the vertices in + , we can fnd a subset so that has few outgoing edges and a short random walk from mixes approximately uniformly over .We can hence set the frst partition 1 = .Now we would like to repeat the argument for the remaining graph + = + [ − 1 ], induced on the node subset − 1 .The problem is that the next subset ′ ⊆ − 1 will be "good" for random walks in + , but these can behave very diferently from the original random walks in + .This problem is dealt with by defning a Markov chain ( + ) on the unpartitioned subgraph + such that (i) we can use ( + ) to cut of a new partition ′ from + , but also (ii) that the behavior of walks according to ( + ) is related to the behavior of the original random walks in + .Details of the Markov chain ( + ) are given in Appendix B.1.Lemma 11 is the key lemma in the graph decomposition of [21].Here , () denotes the probability that after steps of the Markov chain ( + ) starting from we end in .Lemma 11 ([21,Corollary 3 and Lemma 4.3]).Let + be a subgraph of + with at least /4 vertices.Then for at least half of the vertices in + there exists a subset of vertices in + , a value ∈ Ω e ( 2 ) and an integer ∈ O (1/ 3 ) such that (1) The number of edges between and the rest of Under these conditions, we can prove that if the original signed graph has many negative edges inside the subset then the modifed Markov chain ( + ) will fnd a bad cycle.Lemma 12 is new, and we prove it in Appendix C.2. Lemma 12. Let + be a subgraph of + , a vertex in + and a subset of vertices in + .Assume that there exist > 0, ≥ 1, such that ≤ , () ≤ for every ∈ .( + ) over steps and starting from will return a bad cycle with probability at least 0.99.
Ultimately we are of course interested in the behavior of random walks in + , rather than that of ( + ).The following statement (Appendix C.1) shows that both are closely related.Claim 13.Assume that there exists and such that walks of ( + ) of length O ( ˜1/ 3 ) and starting from result in a bad cycle w.p. ≥ 0.99.Then random walks in + of length O ( ˜1/ 8 ) and starting from will also result in a bad cycle w.p. ≥ 0.99.
Algorithm correctness.Instead of proving that if is -far from clusterable we reject, we prove the contrapositive: if is accepted with large probability, then must be close to being clusterable.Theorem 14.If Algorithm 2 accepts a graph with probability greater than 1/3, then must be 2-close to being clusterable.
Proof.To prove this, let be a graph that is accepted with probability greater than 1/3.We say that a vertex is good if bad-cycle() in Algorithm 2 returns a bad cycle with probability at most 0.1.Otherwise it is bad.Since we reject with probability less than 2/3, and we consider Ω(1/) starting vertices, there can be only /16 bad vertices (for the appropriate constant in the Ω(•) notation).We show that under these circumstances we can fnd a valid clustering by removing less than 2 edges.To this end, we iteratively separate a subset that has at most | |/2 positive outgoing edges and at most | |/2 negative internal edges.We call such a subset an -good cluster.At a given step, let + denote the unpartitioned graph.We wish to invoke Lemma 11.Call a vertex for which the lemma holds a "useful" vertex with respect to + .While | + | ≥ /4, the lemma ensures that there are ≥ /8 useful vertices.Since there are at most /16 bad vertices, this implies that there exists a vertex that is both good and useful.By Lemma 11 there exists a subset in + that has at most 2 | | (positive) edges to the rest of + .Moreover, by Lemma 12 and Claim 13, the set is such that if the original signed graph has at least | | negative edges inside then bad-cycle() will return a bad cycle with probability at least 0.99.However, we assumed that is a good vertex and so the latter probability can be at most 0.1.This implies that must have less than | |/2 negative edges inside , and hence is an -good cluster.
We repeat this process until | + | < /4.If 1 , . . ., denote the -good clusters that we have cut of, then we have a partition = 1 ∪ • • • ∪ ∪ such that the number of positive edges between the partitions is at most and the number of negative edges inside the partitions is at most Removing these less than 2 edges yields a valid clustering, so that must be 2-close to clusterable.This proves Theorem 14. □

EXPERIMENTS
All experiments are performed on an Intel Core i5 machine at 1.8 GHz with 16 GB RAM and the methods are publicly available. 5

Query complexity
We experimentally verify the query complexity of three of our testers in a more practical scenario.For example, the theoretical query complexity of testing signed triangle freeness in the dense graph model is a tower function of height log(1/).We experimentally verify if such a query complexity is actually necessary in realistic networks, or if a smaller number of queries is sufcient.Signed triangle freeness.We start by testing the practical dependency on the parameter of our signed triangle freeness tester in the dense model.The triangle sign confguration of interest will be {+, +, −}.First, we aim to generate synthetic complete graphs that are within a controllable distance to a graph that does not have any such triangles.In order to achieve this, we start from an initial complete graph that has no such triangles.Then we randomly switch the sign of an -fraction of the edges, in the hope that the distance of the resulting graph to being signed triangle free is roughly equal to the number of signs that we have switched.
Of course, this will not always be true, since the efect of multiple sign switches can nullify each other or if the switch induces no new triangles.However, for small it is a reasonably good approximation for certain types of graphs.This is more likely to be true if the graph has a good amount of positive edges, which has been observed in many real-life networks, where often more than 80% of the interactions are positive [15,30].In such a case, a random fip will most likely turn a positive edge into a negative one, and induce several {+, +, −} triangles.After switching, we empirically verify how many vertex triples we need to sample until we detect a triangle.The smaller will be, the less amount of such triangles that will be present in the graph, and the detection should be harder.
Estimating the expected number of required samples until triangle detection is done by noticing that this is a geometric distribution with expectation 1/ , where is the probability that a randomly chosen triplet is indeed a {+, +, −} triangle.The quantity is estimated by sampling a pool of 25 random triplets, and returning the relative fraction in the pool that induce a triangle.We consider two types of initial graphs that have no {+, +, −} triangles; (a) a complete graph with only positive edges, and (b) a complete signed graph that is clusterable into a constant number of clusters, all with equal cluster size /10.The latter models networks with cluster structure.For both these types of graphs, randomly switching an -fraction of the signs should induce a graph that is O ()-far from being signed triangle free, provided that is not too large.We set = 10 and ∈ {0.01, 0.02, . . ., 1}. Figure 3 shows the results.In Figure 3a the initial graph is a complete signed graph with all positive edges.The increase of the required number of samples to detect a triangle for large is an artefact of our experimental construct.Indeed, as discussed before, sign switches might nullify each other.E.g., for = 1 the resulting graph is switched to the all-negative graph and thus contains no {+, +, −} triangles.This blow-up is observed in the tail of the plot.Figure 3b shows the result for an initial graph with cluster structure and a fxed number of clusters.In this case switching the sign of all the edges ( = 1) does not give a graph without triangles.For approaching one the number of required samples does increase again slightly, but does not blow up.For small we observe that we need to sample more triplets to encounter a triangle as compared to Figure 3a.In both cases, we also remark that the number of vertices has  accepts if and only if the induced subgraph is balanced.We generate graphs similar as in testing signed triangle freeness, but now the second type of graphs is clustered into exactly two clusters.Edges inside clusters are positive and across they are negative.Set = 10 and we let ∈ {0.01, 0.02, . . ., 1}.We randomly switch an -fraction of the edges.For each value of , we then incrementally try to fnd the size of the smallest induced subgraph that correctly rejects in 2/3 of the cases over a sample size of 5k subgraphs.Figure 4a shows the results when the initial graph is a complete graph with all positive edges.Figure 4b shows the results when the initial graph is a balanced graph with two clusters of equal size.Interestingly, the results are nearly identical.
Clusterability.Finally, we test Algorithm 2 for testing clusterability in the bounded-degree model.We check the query complexity's dependency on , for fxed = 0.15.We generate asymptotically uniformly random sparse -regular graphs of varying sizes ∈ {10, 20, . . ., 100 }, with fxed = 4, using the method from [24].The initial graphs all have positive edges.For each randomly generated graph, we switch the signs of randomly chosen edges from positive to negative.Again the intuition is that for this relatively small amount of switches, the resulting graph should be -far from being clusterable.The question is if the detection of a bad cycle gets more difcult as N increases.We implement the bad-cycle routine from Algorithm 2, and we check how many fxed length random walks (we set the length to 10) we need to start from a randomly chosen node such that there is a negative edge amongst the encountered nodes in the walks.We take the average over 10 repeats.Figure 5 shows that the results are in accordance √ with the theoretical -dependency.

Insights on real datasets
In a fnal experiment, we run the balance tester in the dense model on fve real-life networks in Table 2.For each dataset, we list the
What do we learn from this? Recall the defnition of a property tester in Section 1.1.The contrapositive of a one-sided tester implies two things; () if we reject, then we learn that the graph does not have the property and () if we accept with probability greater than 1/3, then we learn that the graph is -close to having the property.So the question is whether we learned false conclusions when running this experiment (e.g., due to a bad choice of constants in the sampling, or due to the density of the graphs).Outputting reject is never a wrong answer, since all networks are not balanced.More importantly, the algorithm correctly rejected in most trials whenever = /10, which is clearly the desired outcome.Moreover, the algorithm accepted most graphs whenever = 10, which does not contradict the interpretation of .Naturally, the denser a graph the more accurately the notion of distance in the dense model corresponds to the actual spectral signed frustration index .

ACKNOWLEDGMENTS
This work has benefted from discussions with Jop Briët, Aristides Gionis, Oded Goldreich and Christian Sohler.Florian Adriaens is currently supported by Helsinki Institute for Information Technology HIIT, and while at KTH supported by the ERC Advanced Grant REBOUND (834862), the EC H2020 RIA project SoBigData (871042), and the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.: ( )= Ŵe now defne a Markov ( + ) on + such that lengthℓ 1 walks of ( + ) have the same distribution Pr ( ).In the defnition of ( + ) we use the quantity , () for , ∈ + , which denotes the probability that a random walk from will take − 1 steps outside of + and end in at the -th step.The Markov chain ( + ) is defned as follows: Í ℓ 2 −1 • For every , ∈ + : , , (). = =1 • For every ∈ ( + ): chain, then we will also fnd a bad cycle using the original random walk.We say that a set of walks results in a bad cycle if the original graph has a negative edge between two distinct vertices of the walks.

C.2 Sufcient condition for bad cycle
Proof of Lemma 12.For 1 ≤ , ≤ , let , be the random variable so that , = 1 if the -th and -th walk form a bad cycle and otherwise , = 0. We now bound the probability that we do Í not fnd a bad cycle, which is Pr( ) < , = 0 .To this end we use the bound which can be derived from Chebyshev's inequality [5,  .Now we bound the second term, where we let denote the endpoint of the -th walk: E ¯1,2 ¯2,3 ≤ E

Lemma 3 .
If a signed graph is clusterable then it is 4-close to being clusterable into at most 1/ clusters.Proof.Let the partition = 1 ∪ 2 ∪ • • • ∪ denote a valid clustering of the graph.We defne a new partition by merging diferent components: keep all components of size | | ≥ , and merge the remaining components into components of size between

3 :Theorem 10 .
Output reject if bad-cycle() = True bad-cycle(): √ 1: Perform O ( /poly()) random walks of length Õ (1/poly()) on + , starting from .Let denote the set of all the nodes that are visited.2: If there is a negative edge between any pair of nodes in , then bad-cycle() ← True.Algorithm 2 is a one-sided tester for clusterability √ with query complexity O ( ˜ /poly()).

Figure 2 :
Figure 2: Signed graph decomposition.Positive edges + depicted as solid lines, negative edges − depicted as dashed lines.is -close to clusterable if there are ≤ 2 positive edges between the partitions and ≤ 2 negative edges inside the partitions.
If the original graph has at least | |/2 negative edges inside then ∈ Ω ( √ ) runs of | |

Figure 3 :
Figure 3: Empirical query complexity for testing signed triangle freeness in the dense model on two types of synthetically constructed graphs.

Figure 4 :
Figure 4: Empirical sampling complexity for testing balance in the dense model on two types of synthetically constructed graphs.
All positive initial -regular graph, switching random edges to negative with = 0.15 fxed.

Figure 5 :
Figure 5: Empirical query complexity for testing clusterability in the bounded degree model.

2 = 1 /
− denote the set of negative edges with both endpoints inside then we can lower bound ∑ | |), which is ≤ 0.01 for the right choice of constants.□

Table 1 :
Query complexity of the three testers in the two models.All testers are one-sided except for the clusterability tester in the dense model.The function tower(log(1/))) denotes a power tower of 2's of height O (log(1/)).

Table 2 :
Overview of the real-life datasets.The statistic denotes the spectral signed frustration index.

Table 3 :
Output of the balance testing algorithm.A checkmark (✓) denotes an accept, a crossmark (×) denotes a reject.