Understanding the Cluster Linear Program for Correlation Clustering

In the classic Correlation Clustering problem introduced by Bansal, Blum, and Chawla (FOCS 2002), the input is a complete graph where edges are labeled either + or −, and the goal is to find a partition of the vertices that minimizes the sum of the +edges across parts plus the sum of the -edges within parts. In recent years, Chawla, Makarychev, Schramm and Yaroslavtsev (STOC 2015) gave a 2.06-approximation by providing a near-optimal rounding of the standard LP, and Cohen-Addad, Lee, Li, and Newman (FOCS 2022, 2023) finally bypassed the integrality gap of 2 for this LP giving a 1.73-approximation for the problem. While introducing new ideas for Correlation Clustering, their algorithm is more complicated than typical approximation algorithms in the following two aspects: (1) It is based on two different relaxations with separate rounding algorithms connected by the round-or-cut procedure. (2) Each of the rounding algorithms has to separately handle seemingly inevitable correlated rounding errors, coming from correlated rounding of Sherali-Adams and other strong LP relaxations. In order to create a simple and unified framework for Correlation Clustering similar to those for typical approximate optimization tasks, we propose the cluster LP as a strong linear program that might tightly capture the approximability of Correlation Clustering. It unifies all the previous relaxations for the problem. It is exponential-sized, but we show that it can be (1+є)-approximately solved in polynomial time for any є > 0, providing the framework for designing rounding algorithms without worrying about correlated rounding errors; these errors are handled uniformly in solving the relaxation. We demonstrate the power of the cluster LP by presenting a simple rounding algorithm, and providing two analyses, one analytically proving a 1.49-approximation and the other solving a factor-revealing SDP to show a 1.437-approximation. Both proofs introduce principled methods by which to analyze the performance of the algorithm, resulting in a significantly improved approximation guarantee. Finally, we prove an integrality gap of 4/3 for the cluster LP, showing our 1.437-upper bound cannot be drastically improved. Our gap instance directly inspires an improved NP-hardness of approximation with a ratio 24/23 ≈ 1.042; no explicit hardness ratio was known before.


INTRODUCTION
Clustering is a classic problem in unsupervised machine learning and data mining.Given a set of data elements and pairwise similarity information between the elements, the task is to find a partition of the data elements into clusters to achieve (often contradictory) goals of placing similar elements in the same cluster and separating different elements in different clusters.Introduced by Bansal, Blum, and Chawla [7], Correlation Clustering elegantly models such tension and has become one of the most widely studied formulations for graph clustering.The input of the problem consists of a complete graph ( ,  + ⊎  − ), where  + ⊎  − =  2 ,  + representing the so-called positive edges and  − the so-called negative edges.
The goal is to find a clustering (partition) of  , namely ( 1 , . . .,   ), that minimizes the number of unsatisfied edges, namely the +edges between different clusters and the −edges within the same cluster.Thanks to the simplicity and modularity of the formulation, Correlation Clustering has found a number of applications, e.g., finding clustering ensembles [12], duplicate detection [5], community mining [22], disambiguation tasks [36], automated labelling [1,16] and many more.
This problem is APX-Hard [18], and various  (1)-approximation algorithms [7,18] have been proposed in the literature.Ailon, Charikar and Newman introduced an influential pivot-based algorithm, which leads to a combinatorial 3-approximation and a 2.5-approximation with respect to the standard LP relaxation [4].The LP-based rounding was improved by Chawla, Makarychev, Schramm and Yaroslavtsev to a 2.06-approximation [21], nearly matching the LP integrality gap of 2 presented in [18].
It turns out that (a high enough level of) the Sherali-Adams hierarchy can be used to design a strictly better than 2-approximation.Cohen-Addad, Lee, and Newman [28] showed that  (1/ 2 ) rounds of the Sherali-Adams hierarchy have an integrality gap of at most (1.994+).This approximation ratio was improved by Cohen-Addad, Lee, Li, and Newman [27] to (1.73 + ) in  poly(1/ ) -time, which combines pivot-based rounding and set-based rounding.
One undesirable feature of [27] is the lack of a single convex relaxation with respect to which the approximation ratio is analyzed.For technical reasons, it combines the two rounding algorithms via a generic round-or-cut framework.Given  ∈ [0, 1]  , each of the two rounding algorithms outputs either an integral solution with some guarantee or a hyperplane separating  from the convex hull of integral solutions; if both algorithms output integral solutions, one of them is guaranteed to achieve the desired approximation factor.Though each of the rounding procedures is based on some LP relaxations, they are different, so there is no single relaxation that can be compared to the value of the final solution.
In this work, we propose the cluster LP as a single relaxation that captures all of the existing algorithmic results.Based on this new unified framework, we design a new rounding algorithm as well as principled tools for the analysis that significantly extend the previous ones, ultimately yielding a new approximation ratio of 1.437 + .The study of the cluster LP sheds light on the hardness side as well, as we prove a 4/3 ≈ 1.33 gap for the cluster LP and a 24/23 ≈ 1.042 NP-hardness of approximation.

Our Results
We first state the cluster LP here.It is similar to configuration LPs used for scheduling and assignment problems [8,31].In the cluster LP, we have a variable   for every  ⊆  ,  ≠ ∅, that indicates if  is a cluster in the output clustering or not.As usual,   for every  ∈ (2) The objective of the LP is to minimize obj(), which is a linear function.(1) requires that every vertex  appears in exactly one cluster, (2) gives the definition of   using  variables.The idea behind this LP was used in [27] to design their set-based rounding algorithm, though the LP was not formulated explicitly in that paper.Moreover, the paper did not provide an efficient algorithm to solve it approximately.Our first result shows that we can approximately solve the cluster LP in polynomial time, despite it having an exponential number of variables.We remark that unlike the configuration LPs for many problems, we do not know how to solve the cluster LP simply by considering its dual.Theorem 1.Let  > 0 be a small enough constant and opt be the cost of the optimum solution to the given Correlation Clustering instance.In time  poly(1/ ) , we can output a feasible cluster LP solution (  )  ⊆ , (  )  ∈ (  2 ) with obj() ≤ (1 + )opt, described using a list of non-zero coordinates. 1   The cluster LP is the most powerful LP that has been considered for the problem.Indeed, previous algorithms in [28] and [27] can be significantly simplified if one is given a (1 + )-approximate solution to the LP.A large portion of the algorithms and analysis in [28] and [27] is devoted to handle the additive errors incurred by the correlated rounding procedure, which is inherited from the Raghavendra-Tan rounding technique [41].Instead, we move the complication of handling rounding errors into the procedure of solving the cluster LP relaxation.
With this single powerful relaxation, we believe that Theorem 1 provides a useful framework for future work that may use more ingenious rounding of the exponential-sized cluster LP without worrying about errors.Indeed, the constraints in the cluster LP imply that the matrix (1 −   ) , ∈ is PSD, 2 and thus the LP is at least as strong as the natural SDP for the problem.For the complementary version of maximizing the number of correct edges, the standard SDP is known to give a better approximation guarantee of 0.766 [18,42].For the minimization version, the standard SDP has integrality gap at least 1.5 (see full paper), but it is still open whether this program has an integrality gap strictly below 2 or not.
We demonstrate the power of the cluster LP by presenting and analyzing the following algorithm, significantly improving the previous best 1.73-approximation.Theorem 2. There exists a (1.49+ )-approximation algorithm for Correlation Clustering that runs in time  ( poly(1/ ) ). 1 We remark that obj( ) given by the theorem is at most 1 +  times opt, instead of the value of the cluster LP.This is sufficient for our purpose.One should also be able to achieve the stronger guarantee of (1 +  )-approximation to the optimum fractional solution.Instead of dealing with the optimum clustering C * in the analysis, we deal with the optimum fractional clustering to the LP.For simplicity, we choose to prove the theorem with the weaker guarantee. 2Consider the matrix  ∈ [0, 1]  × where   = 1 −   for every ,  ∈  (  = 1, ∀ ∈  ).For every  ∈ R  , we have This is achieved by a key modification of the pivot-based rounding algorithm that is used in conjunction with the set-based algorithm as in [27].In combination with more careful analysis, which involves principled methods to obtain the best budget function, we obtain a significantly improved approximation ratio.
In order to obtain an even tighter analysis of the same algorithm, we introduce the new factor revealing SDP that searches over possible global distributions of triangles in valid Correlation Clustering instances.By numerically solving such an SDP, we can further improve the approximation ratio of the same algorithm.Theorem 3.There exists a (1.437 + )-approximation algorithm for Correlation Clustering that runs in time  ( poly(1/ ) ).
While the proof includes a feasible solution to a large SDP and is not human-readable, we prove that our SDP gives an upper bound on the approximation ratio, so it is a complete proof modulo the SDP feasibility of the solution.Our program and solution can be found at https://github.com/correlationClusteringSDP/SDP1437code/.
We also study lower bounds and prove the following lower bound on the integrality gap of the cluster LP.Theorem 4. For any  > 0, the integrality gap of the cluster LP is at least 4/3 − .
This integrality gap for the cluster LP, after some (well-known) loss, directly translates to NP-hardness.It is the first hardness with an explicit hardness ratio apart from the APX-hardness [18].Theorem 5. Unless P = BPP, for any  > 0, there is no (24/23−)approximation algorithm for Correlation Clustering.

Further Related Work
The weighted version of Correlation Clustering, where each pair of vertices has an associated weight and unsatisfied edges contribute a cost proportional to their weight to the objective, is shown to be equivalent to the Multicut problem [30], implying that there is an  (log )-approximation but no constant factor approximation is possible under the Unique Games Conjecture [20].
In the unweighted case, a PTAS exists when the number of clusters is a fixed constant [32,37].Much study has been devoted to the minimization version of Correlation Clustering in various computational models, for example in the online setting [24,38,39], as well as in other practical settings such as distributed, parallel or streaming [3, 6, 10, 11, 13-15, 17, 23, 25, 40, 43, 44].Other recent work involves settings with fair or local guarantees [2,29,35].

ALGORITHMIC FRAMEWORK AND SETUP FOR ANALYSIS
In this section, we describe our algorithm for obtaining the improved approximation ratio for Correlation Clustering.We solve the cluster LP using Theorem 1 to get a fractional solution  = (  )  ⊆ , We have obj() ≤ (1 + )opt.The theorem will be proved in Section 4. With , we then run two procedures: the cluster-based rounding and the pivot-based rounding with threshold 1/3.We select the better result as the final clustering.The two procedures are defined in Algorithms 1 and 2 respectively.We use  + () and  − () to denote the sets of + and −neighbors of a vertex  ∈  respectively.
for every  ∈  ′ ∩  − () do independently add  to  with probability 1 − Analysis of Cluster-Based Rounding Procedure.The cluster-based rounding procedure is easy to analyze.The following lemma suffices.Lemma 6.For every  ∈  2 , the probability that  and  are separated in the clustering C output by the cluster-based rounding procedure is 2  1+  .So the probability they are in the same cluster is Proof.We consider the first set  chosen in the cluster-based rounding algorithm such that {,  } ∩  ≠ ∅.  and  will be separated iff | ∩ {,  }| = 1.The probability that this happens is Therefore, a +edge  will incur a cost of 2  1+  in expectation in the cluster-based rounding procedure, and a −edge will incur a cost of 1−  1+  .The approximation ratios for a +edge  and a −edge  are respectively 2 1+  and 1 1+  .Notice that the latter quantity is at most 1.
Notations and Analysis for Pivot-Based Rounding Procedure.We now proceed to the pivot-based rounding procedure in Algorithm 2. We remark that to recover the correlated rounding algorithm in [28] and [27], we can use  ← ∅ in Step 4. Then we can obtain their approximation ratios without the complication of handling rounding errors.The errors are handled in [28] by distinguishing between the short, median and long +edges.In our algorithm, we also distinguish between short +edges (those with   ≤ 1 3 ) and long +edges (those with   > 1 3 ); however, the purpose of this distinction is to get an improved approximation ratio, instead of to bound the rounding errors.
Our high-level setup of the analysis follows from [27,28], which in turn is based on [4] and [21].We consider a general budget for every edge.We shall define two budget functions: They determine the budget   for the edge : if  ∈  + , then   :=  + (  ), and if  ∈  − , then   :=  − (  ).
We now focus on one iteration of the while loop in Algorithm 2. Suppose , ,  ∈  ′ at the beginning of the iteration, and let  be the cluster constructed at the end.We use  to denote the event that  is chosen as the pivot.We say  incurs a cost in the iteration, if  ∈  + and | ∩ {,  }| = 1, or  ∈  − and {,  } ⊆ .We show the following lemma in our full paper.
Lemma 7. Suppose that for every  ′ ⊆  , we have ∑︁ Then, the expected cost of the clustering output by Algorithm 2 is at most  ∈ (  2 )   .To obtain an approximation ratio of  ∈ [1, 2), we consider a variant of our algorithm, in which we run the cluster-based rounding procedure (Algorithm 1) with probability  2 , and the pivot-based rounding procedure with threshold 1/3 (Algorithm 2) with the remaining probability 1 −  2 .Clearly, the actual algorithm that picks the better of the two clusterings generated can only be better.We set up the budget functions  + and  − such that every edge pays a cost of at most  times its LP cost in expectation.That is, the following properties are satisfied for every  ∈ [0, 1]:  2 This gives us the following definitions: Lemma 8.If the budget functions  +  and  −  satisfy (4) for some  ∈ [1, 2), then our algorithm has an approximation ratio of .
Proof.Consider the variant of the algorithm where we run the cluster-based rounding procedure with probability  2 , and the pivotbased procedure with threshold 1/3 with the remaining probability of 1 −  2 .By Lemma 7, the expected cost of the clustering given by the variant is at most The actual algorithm we run can only be better than this variant.□ As a baseline, we provide a per-triangle analysis leading to an approximation ratio of 1.5 in the full paper: Lemma 9.For budget functions  + ≡  + 1.5 and  − ≡  − 1.5 , we have cost( ) ≤ Δ( ) for every triangle  .
Clearly, the lemma implies that (4) holds for  + ≡  + 1.5 and  − ≡  − 1.5 .By Lemma 8, our algorithm gives an approximation ratio of 1.5.We remark that 1.5 is the best possible ratio we can achieve using the per-triangle analysis.For a ++− triangle with length 1   2   for +edges and length 1 for the −edge, we need to pay a factor of 2 for each of the 1  2 -length +edge.Then the cluster-based rounding algorithm gives factors of 2 and 4  3 for +edges of lengths 0 and 1 2 respectively.For the pivot-based rounding algorithm, the factors are at least 0 and 2. A combination of the two algorithms can only lead to a factor of 1.5.
To get a better approximation ratio, we provide two analyses that use global distributions of triangles.The former is purely analytic and the latter relies on solving a factor-revealing SDP.The following two lemmas are proved in the full paper.Combined with Lemma 8, the two lemmas imply Theorems 2 and 3 respectively.

OVERVIEW OF TECHNIQUES
In this section, we provide overviews of the techniques used in our results.
Simpler and Better Preclustering Procedure.The concept of preclustering was introduced in [27].In a preclustered instance, we predetermine the fate of some edges: for some edges ,  and  must be in the same cluster; for some other edges ,  and  must be separated.Since the relation of being in the same cluster is transitive, we define a preclustered instance using a pair (K,  adm ), where K is a partition of  into so called atoms and  adm ⊆  2 is a set of admissible edges.An atom can not be broken.If  and  are not in the same atom and  ∉  adm , then  and  must be separated.[27] showed how to construct a preclustered instance (K,  adm ), losing only a (1 + ) factor in the optimum cost, while at the same time guaranteeing that | adm | ≤  (opt/ 12 ).This is crucial for their correlated rounding algorithm, as it loses an additive error depending on | adm |.In this work, we still need the preclustering procedure to bound the rounding error, but now it is inside the procedure of solving the cluster LP.
We greatly simplify the preclustering procedure from [27], and as a result, we achieve a much better bound of  (opt/ 2 ) on | adm |.
[27] used the agreement graph to construct the atoms; roughly speaking, two vertices are in agreement if their neighborhood sets are similar to each other.The analysis uses many technical structural lemmas from [25], which solves Correlation Clustering in the online setting.In contrast, our construction of atoms is simple: we construct an  (1)-approximate clustering C, mark vertices whose costs are large, and then K is obtained from C by removing marked vertices and creating singletons for them.The set of admissible edges is roughly defined as follows: we construct a graph ( ,  1 ) where two vertices are neighbors if their +degrees are similar.Then an edge  is admissible if  and  have many common neighbors in  + ∩  1 .
Solving Cluster LP by Preclustering.As we mentioned, we move the complication of handling rounding errors to the step of solving the cluster LP.As in [27], we construct a preclustered instance (K,  adm ), and formulate an LP relaxation aimed at finding the (1 + )-approximate good clustering for (K,  adm ), that we call the bounded sub-cluster LP.In contrast to [27], which solves many instances of this LP embedded in their round-or-cut framework, we only solve the LP once, therefore avoiding this heavy framework.With a solution (, ) to the LP, we run a procedure that constructs a single cluster  randomly.The probability that any vertex is in  is precisely 1/ ∅ , where  ∅ is the fractional number of clusters in .The probabilities that exactly one of  and  is in , and both of them are in , are respectively    ∅ and 1−   ∅ up to some error terms arising from the Raghavendra-Tan rounding procedure.As usual,   is the extent in which  and  are separated.
To construct the solution  = (  )  ⊆ for the cluster LP, we generate  ∅ Δ many clusters  independently, for a large enough polynomial Δ. Roughly speaking, the solution  is 1  Δ times the multi-set of clusters  we generated.The error incurred by the Raghavendra-Tan rounding procedure can be bounded in terms of | adm |, and the error from sampling can be bounded using concentration bounds.
1.49-approximation.We start with the algorithm of [27], but make several key modifications both in the design and in the analysis.This allows us to significantly improve the approximation ratio, first to 1.5 and, eventually, to 1.49, which shows that, perhaps surprisingly, even the rather low approximation factor of 1.5 is not tight for Correlation Clustering.The first key ingredient is to use a principled budget function for the pivot-based rounding procedure, defined earlier in (5), which is designed to optimally balance the approximation factor of edges between the two rounding procedures.This new budget function is better than the one used in [27], but does not allow us to reach 1.5 without changing the algorithm.Indeed, the budget for the short +edges in + + + triangles is still too low to reach the approximation ratio 1.5.Thus, the second key ingredient is to add the threshold step to the pivot-based rounding procedure for the short +edges (i.e., +edges  with   ≤ 1/3).By adding this threshold step, the cost of the triangles containing such edges decreases; for example, a +++ triangle with all short edges now has cost zero.This allows us to use the new budget function and still reach 1.5.Notice that making the threshold too large would result in too much cost for ++− triangles.
Finally, we observe that, analogous to the correlated rounding approach of [28], only the bad triangles are tight, meaning their cost equals their budget.Roughly speaking, a bad triangle is a ++− triangle whose two +edges have value very close to half and whose −edge has value close to one.This allows us to apply a charging argument, in which tight triangles have part of their cost paid for by triangles that are not tight (i.e., that have extra budget).Now there are no tight triangles (i.e., all triangles have some unused budget), and we can decrease the  in the budget function from 1.5 to 70/47.As previously [4,21,27,28], the analysis necessary to reach 1.5 and go below requires a case-by-case analysis of triangle types to ensure that the budget allocated to each triangle covers its cost.Both the new threshold step and the new budget functions result in an analysis that is more involved than what was required in [27], but is still feasible.
1.437-approximation.The above charging argument between different types of triangles can be more systematically expressed by a factor-revealing SDP.Given a cluster LP solution   and vertices , , , we define   :=  ⊇ {, }   (resp.  :=  ⊇ {,, }   ) be the probability that ,  (resp., , ) are in the same cluster.Given any quadruple  = (, , , ) ∈ [0, 1] 4 and a cluster LP solution   , let   represent the number of triangles (, , ) such that of   = ,   = ,   = ,   = .The above 1.49approximation analysis can be regarded as putting one constraint on the distribution of   .To enhance the approximation ratio and reduce the budget function, we opt for a more detailed categorization of triangles, imposing stronger constraints on   .
Consider an imaginary rounding procedure, where given a pivot , the cluster  that contains  is simply chosen with probability   (note that  ∋   = 1).Let   denote the event that node  is included in the cluster of node  in this rounding.We can show must be positive semidefinite (PSD).This PSD constraint on the covariance matrix enforces a stronger constraint on   .For instance, if all non-degenerate triangles centered at  are ++− triangles with  value (  = 0.5,   = 0.5,   = 0,   = 0), then the covariance matrix of   cannot be PSD because   (, ) =   −     = −0.25 for almost all non-diagonal entries.
Despite there being infinitely many types of triangles in each range   ,   ,   , our key observation is that   −     is multilinear.Therefore, we only need a few triangles in each range to represent all possible triangles.We want to mention the triangles we need are fixed so can be precomputed and the only unsure variable is   .To compute a lower bound   (Δ( ) − cost( )), we set up a semi-definite program (SDP) under the constraint that  is PSD.This SDP is independent of cluster LP and relies on the chosen interval and budget function.By employing a practical SDP solver, we demonstrate that   (Δ( ) − cost( )) ≥ 0.

Gaps and Hardness.
A high-level intuition for the cluster LP is the following: (any) LPs cannot distinguish between a random graph and a nearly bipartite graph.For the cluster LP, given a complete graph  = (  ,   ) with  = |  |, our Correlation Clustering instance is  = (  ,   ) where   =   and ,  ∈   have a plus edge in  if they share a vertex in  .Consider vertices of  as ideal clusters in  containing their incident edges.The LP fractionally will think that it is nearly bipartite, implying that the entire   can be partitioned into /2 ideal clusters of the same size.Of course, integrally, such a partition is not possible in complete graphs.
For the cluster LP, it suffices to consider a complete graph instead of a random graph.We believe (but do not prove) that such a gap instance can be extended to stronger LPs (e.g., Sherali-Adams strengthening of the cluster LP), because it is known that Sherali-Adams cannot distinguish a random graph and a nearly bipartite graph [19].
The idea for the NP-hardness of approximation is the same.The main difference, which results in a worse factor here, is that other polynomial-time algorithms (e.g., SDPs) can distinguish between random and nearly bipartite graphs!So, we are forced to work with slightly more involved structures.
Still, we use a similar construction for 3-uniform hypergraphs; let  = (  ,   ) be the underlying 3-uniform hypergraph and  = (  ,   ) be the plus graph of the final Correlation Clustering instance where   =   and ,  ∈   has an edge in  if they share a vertex in  .We use the hardness result of Cohen-Addad, Karthik, and Lee [26] that shows that it is hard to distinguish whether  is nearly bipartite, which implies that half of the vertices intersect every hyperedge, or close to a random hypergraph.
Organization.We show how to solve the cluster LP in Section 4, proving Theorem 1.We give the ( 4 3 −)-integrality gap of the cluster LP (Theorem 4) in Section 5, and the improved hardness of 24/23− (Theorem 5) in Section 6.
Global Notations.For two sets  and , we use △ = ( \ ) ∪ ( \ ) to denote the symmetric difference between  and .We used  +  and  −  to denote the sets of + and −neighbors of a vertex  respectively in the Correlation Clustering instance.For a clustering C of  , we define obj(C) to be the objective value of C. For any  ∈ [0, 1] (  2 ) , we already defined obj() =  ∈ +   +  ∈ − (1 −   ).Recall that we defined cost  (, ), Δ  (, ), cost( ) and Δ( ) for a triangle  = (, , ) or a degenerate triangle  = (, ) in Section 2; they depend on the budget functions  + and  − .

SOLVING CLUSTER LP RELAXATION APPROXIMATELY
In this section, we show how to solve the cluster LP in polynomial time, by proving Theorem 1, which is repeated below.
Theorem 1.Let  > 0 be a small enough constant and opt be the cost of the optimum solution to the given Correlation Clustering instance.In time  poly(1/ ) , we can output a feasible cluster LP solution with obj() ≤ (1 + )opt, described using a list of non-zero coordinates.

Preclustering
We use the definition of a preclustered instance from [27], with some minor modifications.
Definition 12.Given a Correlation Clustering instance ( ,  + ⊎ − ), a preclustered instance is defined by a pair (K,  adm ), where K is a partition of  (which can also be viewed as a clustering), and 2 is a set of pairs such that for every  ∈  adm ,  and  are not in a same set in K.
Each set  ∈ K is called an atom.An (unordered) pair  between two vertices  and  in a same  ∈ K is called an atomic edge; in particular, a self-loop  is an atomic edge.A pair that is neither an atomic nor an admissible edge is called a non-admissible edge.
There are two minor differences between our definition and the one in [27].First, we require that K forms a partition; this can be guaranteed by adding singletons.Second, we do not require an edge between two different non-singleton atoms to be non-admissible.Our construction can guarantee this condition, but it is not essential.Definition 13.Given a preclustered instance (K,  adm ) for some Correlation Clustering instance ( ,  + ⊎  − ), a clustering C of  is called good with respect to (K,  adm ) if •  and  are in the same cluster in C for an atomic edge , and •  and  are not in the same cluster in C for a non-admissible edge .
The following theorem with a worse bound on | adm | was proved in [27].We give a cleaner proof of the theorem in the full paper; as a byproduct, it achieves a better bound on | adm |.Theorem 14.For any sufficiently small  > 0, there exists a poly(, 1  )-time algorithm that, given a Correlation Clustering instance ( ,  + ⊎  − ) with optimal value opt (which is not given to us), produces a preclustered instance (K,  adm ) such that • there exists a good clustering w.r.t (K,  adm ), whose cost is at most (1 + )opt, and 2 • opt.We can assume in the preclustered instance (K,  adm ), the edges between two different atoms  and  ′ are all admissible, or all non-admissible.If one edge between them is non-admissible, we can change all other edges to non-admissible edges.This will not change the set of good clusterings, and it will decrease | adm |.
We apply Theorem 14 to obtain a preclustered instance (K,  adm ), with the unknown good clustering C * 1 .We define   to be the atom that contains , and   = |  |.We shall use  adm () to be the set of vertices  such that  ∈  adm ; so  adm () =  adm () if  ∈   .We further process the good clustering C * 1 using the following procedure in [27].This procedure is not a part of our algorithm; it is only for analysis purpose.
1: while there exists some   in a cluster  ∈ C * 1 with Proof.Whenever we break  into   and  \   in the procedure, the cost increase is at most We separate each atom   at most once.Therefore, the total cost increase is at most So, the cost of C * 1 after the procedure will be at most (1 + )opt +  ( 1 )| adm |.Crucially, the following property is satisfied: (A1) For every  ∈  ,   is either a cluster in C * 1 , or in a cluster of size more than   +  1 • | adm ()|.

Bounded Sub-Cluster LP Relaxation for Preclustered Instances
Following [27], we form an LP relaxation aiming at finding the good clustering C * 1 .In the LP, we have a variable    , for every  ∈ [], and  ⊆  of size at most  (recall that  = Θ(1/ ≥ 0 ∀,  (10) = 1 ∀non-admissible edge  (12) ∑︁ (6) gives the definition of   , ( 7) requires  to be contained in some cluster, and (8) gives the definition of   .( 9) says if    = 1, then there are exactly  elements  ∈  with    = 1.(An exception is when  = ∅; but the equality also holds.)(10) is the non-negativity constraint.( 11) and ( 12) follows from that C * 1 is a good clustering, and (13) follows from (A1).The left side of ( 14) is the number of clusters of size  containing  but does not contain any vertex in  .So the inequality holds.This corresponds to a Sherali-Adams relaxation needed for the correlated rounding [41], see Lemma 16.The running time for solving the LP is   ( ) =   (1/ 12 ) .

Sampling One Cluster Using LP Solution to the Bounded Sub-Cluster LP
We solve the bounded sub-cluster LP to obtain the  and  vectors.
Given , we can use the procedure construct-cluster described in Algorithm 3, which is from [27], to produce a random cluster .
As in [27], we define err   | to be the error generated by the procedure when we choose  as the cardinality and  as the pivot: Notice that all these quantities are expectations of random variables, and thus deterministic.
Lemma 18 ([27]).Focus on an edge  ∈  2 .( 1) A similar lemma to the following is proved in [27].The parameters we use here are slightly different and we provide a proof for completeness.

Lemma 19.
∑︁ Proof.Throughout the proof, we assume , ,  are all in  ,  and  are in  2 .Fix some  ∈ [],  ∈  with    > 0, and we now bound  err   | .If  =   , then  =   ; no errors will be created and the quantity is 0. Assume  >   .By (13) The first equality is by ( 9) and    =    for every  ∈   .(To see this, notice that    ≤    is implied by (14).We have   =     ,   =     , and   =   = 1 if  ∈   .)Considering the inequalities over all  ∈  , we have To see the last inequality, notice that The same inequality holds for . Finally, we take all  into consideration: err  .
To see the last inequality, we notice that  ∩ {,  } ≠ ∅ is the union of the 3 disjoint events:  ∈  and  ∉ ,  ∉  and  ∈ , and {,  } ∉ .By Lemma 18, we have Pr This proves the lemma.□

Construction of Solution to the Cluster LP Using Independently Sampled Clusters
With all the ingredients, we can now describe our algorithm for solving the cluster LP approximately, finishing the proof of Theo- 1 | adm | with a large enough hidden constant, and Δ ∅ being an integer.(We assume | adm | ≥ 1 since otherwise the preclustered instance is trivial.)We run Algorithm 3 Δ ∅ times independently to obtain clusters We use the following variant of Chernoff bound.
and  ′ ≥  be a real.Then for any  ∈ (0, 1), we have , with a large enough hidden constant.Using Chernoff bound and union bound, we can prove that with probability at least 1 − 1/, the following conditions hold.
• For every  ∈  , we have • For every  ∈  − , we have .
From now on we assume the conditions hold.For every  ∈  , we let  ′  be the set of the ⌈(1 − )Δ⌉ smallest indices in   .Clearly, Proof.For convenience, we use  to denote the upper bound . We think of  ′  ( ′  resp.) as obtained from the set   (  resp.) by removing the largest indices one by one.Wlog we assume |  | ≥ |  |; and thus initially We remove the elements from   and   in two stages.
In the first stage we do the following.While |  | > |  |, we remove the largest index from   .This can not increase |  \   |.After the first stage, we have In the second stage we do the following.While For a  ∈  + , we have For a  ∈  − , we have The second inequality is due to Lemma 19, and the third one used that | adm | ≤  1  2 • opt.By scaling , the upper bound can be made to (1 + )opt.This finishes the proof of Theorem 1.

1.33-GAP FOR CLUSTER LP
In this section, we show that the cluster LP has a gap of 4/3, proving Theorem 4 restated below.Theorem 4. For any  > 0, the integrality gap of the cluster LP is at least 4/3 − .
The graph of the plus edges of our gap instance is based on the line graph of a base graph; given a based graph  = (  ,   ), our correlation clustering instance is  = (  ,   ) where   =   and ,  ∈   have a plus edge in  if they share a vertex in   .
A high-level intuition is the following: LPs cannot distinguish between a random graph and a nearly bipartite graph.Consider vertices of  as ideal clusters in  containing their incident edges.Given a random graph  , the LP fractionally will think that it is nearly bipartite, implying that the almost entire   can be partitioned into /2 ideal clusters.Of course, integrally, such a partition is not possible in random graphs.For the cluster LP, it suffices to consider a complete graph instead of a random graph.We believe (but do not prove) that such a gap instance can be extended to stronger LPs (e.g., Sherali-Adams strengthening of the cluster LP), because it is known that Sherali-Adams cannot distinguish a random graph and a nearly bipartite graph [19].
Proof of Theorem 4. Let  = (  ,   ) be a complete graph on  vertices.Let  =  − 1 be the degree of  .Our correlation clustering instance  = (  ,   ) is the line graph of  ;   =   and ,  ∈   has + edge in  if and only if they share a vertex in  .The + degree of each  ∈   in  is 2 − 2.
Consider the following solution for the cluster LP: for every  ∈   , let   ⊆   be the  edges containing .The cluster LP has    = 1/2 for every  ∈   .Each  ∈   belongs to two fractional clusters, each of which has its  − 1 plus neighbors, so fractionally  − 1 plus edges incident on it are violated.Since each violated edge is counted twice, the LP value is  2 ( − 1)/2.Let us consider the integral optimal correlation clustering of .Consider a cluster  in the clustering.Note that every vertex in  has at least | |/2 plus neighbors in , which implies | | ≤ 4.We apply the following procedure to  to partition it further.Claim 22.There is a partition of  into  1 , . . .,   such that (1) each   is a subset of   for some  ∈   , and (2) replacing  by  1 , . . .,   in the correlation clustering solution increases the objective function by at most 35| |.
, so it should not exist in .So, every edge is incident on   for some  ≤ 8.
Let us make at most 8 2 = 28 edges in  between  1 , . . .,  8 as singleton clusters; the objective function increases by at most 28| |.Then partition the remaining  into  1 , . . .,  8 where   :=  ∩    .Each  ∈   has at most seven plus neighbors in ∪ ≠   , so the objective function increases by at most 7| |.So, we partitioned  into  1 , . . .,   where all the edges in   share a common endpoint.We increased the objective function by at most 35| |. □ After we apply the above procedure to every cluster , we increased the cost by at most 35|  | ≤ 35 2 and all the edges in a cluster  share a common endpoint.For  ∈   , let   be the cluster in the solution whose common endpoint is .(If there are many of them, merging them will strictly improve the objective function value.)Without loss of generality, there are  such clusters   1 , . . .,    and let Proof.The LHS is monotone in ( 1 , . . .,   ), and if there is an edge (  ,   ) ∈   with  >  (which implies   ≥   ), the LHS strictly improves by moving (  ,   ) to   .Therefore, the configuration that maximizes the LHS is when  =  and    contains all the edges of  not incident on  1 , . . .,   −1 .In that case, the LHS is as desired.□ Using this, we can prove a lower bound on the cost of our nearoptimal clustering.Note that every cluster is a clique of +edges.Thus, the only edges violated are +edges.Moreover, there are at most  ∈ [ ]  2  /2 ≤  3 /6 correctly clustered +edges.The cost of our near-optimal clustering is the total number of +edges of  minus the number of correctly clustered +edges, namely at most  2 ( − 1) −  3 /6 =  3 /3 −  ( 3 ).Since the cost of the optimal clustering is at most 35 2 lower than ours, it is still  3 /3 −  ( 3 ).The fractional solution has the value at most  3 /4, so the gap is at least 4/3 −  (1).□

1.04-NP HARDNESS
In this section, we show that it is NP-hard (under randomized reductions) to obtain an algorithm with an approximation ratio of 24/23 ≥ 1.043, proving Theorem 5 restated below.The idea is similar to the gap for the cluster LP in Section 5, which is based on the fact that the LPs generally cannot distinguish nearly bipartite graphs and random graphs.The main difference, which results in a worse factor here, is that other polynomial-time algorithms (e.g., SDPs) can distinguish between them!So, we are forced to work with slightly more involved structures.
Still, we use a similar construction for 3-uniform hypergraphs; let  = (  ,   ) be the underlying 3-uniform hypergraph and  = (  ,   ) be the plus graph of the final Correlation Clustering instance where   =   and ,  ∈   has an edge in  if they share a vertex in  .We use the following hardness result of Cohen-Addad, Karthik, and Lee [26] that shows that it is hard to distinguish whether  is nearly bipartite or close to a random hypergraph.Theorem 24.For any  > 0, there exists a randomized polynomialtime algorithm that receives a 3-CNF formula  as input and outputs a simple 3-uniform hypergraph  = (  ,   ) where the degree of each vertex is (1 ±  (1)) for some  =  (|  |) such that the following properties are satisfied with high probability.Proof.The same reduction in Theorem 4.1 of (the arXiv version of) [26] yields the desired hardness.In the following, we highlight the difference between the statement of Theorem 4.1 of [26] and our Theorem 24 and briefly explain how our additional properties are satisfied by their reduction.We analyze the expected cost of this clustering.For each  ∈   , let  () be (the number of plus neighbors in the same cluster) minus (the number of minus neighbors in the same cluster).Intuitively, it is the amount of saved cost between  and its neighbors, compared to the situation where  is a singleton cluster.Then, the cost of our clustering is the total number of plus edges of , namely Therefore, the total saving is at least  2 (7/30 −  ()) and the final cost is at most  2 (1/2 − 7/60 +  ()) =  2 (23/60 +  ()).
NO case.Our analysis will be similar to that of the gap instance, slightly more complicated by the fact that we are working with a non-complete hypergraph.Consider the optimal correlation clustering and consider one cluster .For  ∈ , it has at most (3 ±  (1)) plus edges in , so | | ≤ (6 +  (1)); otherwise, it is better to make  a singleton cluster.We prove that if  is large, then we can partition  into smaller clusters where each cluster consists of hyperedges sharing the same vertex in  .For  ∈   , let   ⊆   be the set of hyperedges containing .
Then one see that the optimal  satisfies either  () = 1 or ∫  =0  () =  − 2 + 3 + /3 for every  ∈ [0, 1).If it is not satisfied at some , we can increase  () while decreasing  () for some  > , which will still satisfy the constraints and increase Using this, we can prove a lower bound on the cost of our nearoptimal clustering.Note that every cluster is a clique of +edges.Thus, the only edges violated are +edges.Moreover, there are at most  ∈ [ ]  2  /2 ≤  2 (0.1+( √ ) correctly clustered +edges.The cost of our near-optimal clustering is the total number of +edges of  minus the number of correctly clustered +edges, namely at least  2 (1/2 − 0.1 −  ( √ )) =  2 (0.4 −  ( √ )).Since the cost of the optimal clustering is at most  ( 2 ) lower than ours, it is still  2 (0.4 −  ( √ )) using  =  ().Since the value in the YES case is at most (23/60 +  ()) 2 , so the gap is almost 24  23 ≥ 1.043.

( 1 )
Regularity of  : Section 4.5 of[26], based on an earlier weighted hard instance, constructs the final hard instance  = (  ,   ) as a certain random hypergraph where the degree of each vertex  is the sum of independent {0, 1} variables with the same expected value.This expected value is Θ(|  |1.5 ), so the standard Chernoff and union bound argument will show that the degree of each vertex is almost the same with high probability.(2)In the (YES) case, for every  ∈  , |{ ∈   :  ∩ = {}}| ≥ (1/2 − ): It follows from their construction in Section 4.1.The construction is analogous to Håstad's celebrated result on Max-3SAT[34] where in the (YES) case, almost three quarters of the clauses have one true literal and almost one quarter have three true literals, so that for each true literal ℓ, roughly half of the clauses containing ℓ has it as the only true literal.(3) In the (NO) case: the guarantee holds for any value of  ∈ [0, 1] instead of just 0.5: One can simply change 1/2 to 1 −  in the proof of Lemma 4.4 in Section 4.3.It is analogous to the fact that all nontrivial Fourier coefficients vanish in Håstad's result on Max-3SAT and Max-3LIN[34].□Givensuch  = (  ,   ), let  := |  |.Our correlation clustering instance  = (  ,   ) is the line graph of  ;   =   and ,  ∈   have a plus edge in  if they share a vertex in  .This means that every  ∈   has (3 ±  (1)) plus edges incident on it; we used the fact that  =  () and  has at most  () other hyperedges that intersect with  with at least two points (which causes double counting).YES case.Consider  ⊆   guaranteed in Theorem 24.Our (randomized) clustering is the following: randomly permute vertices to obtain  = { 1 , . . .,  /2 }, and let   := { ∈   :   ∈  and  ∩ { 1 , . . .,   −1 } = ∅}.Since  intersects every  ∈   , ( 1 , . . .,  /2 ) forms a partition of   .
12)), that denotes the number of clusters in C * 1 of size  containing  as a subset.When  ≠ ∅, there is at most one such cluster and thus   ∈ {0, 1} indicates if is a subset of a cluster of size  in C * 1 .For every  ⊆  of size at most  , let   :=     denote the number of clusters (of any size) in C * 1 containing  as a subset.Again, if  ≠ ∅, then   ∈ {0, 1} indicates if  is a subset of a cluster in C * 1 .For every  ∈  2 , we have a variable   indicating if  and  are separated or not in C * 1 .We call the LP the bounded sub-cluster LP relaxation, as we have variables indicating if a small set  is a subset of a cluster or not. the form   or    has | | ≤  ; if not, we do not have the variable and the constraint involving it.
We use the following type of shorthand:    for   { } ,    for   {, } , and    for   ∪{ } .The bounded sub-cluster LP is defined as follows.In the description, we always have  ∈ [],  ∈  and  ∈  2 .For convenience, we omit the restrictions.By default, any variable of Lemma 16 ([41]).In Step 4 of Algorithm 3, one can sample a set  ⊆  that does not break atoms in time   ( ) such that 1: randomly choose a cardinality , so that  is chosen with probability 2: randomly choose a vertex  ∈  , so that  is chosen with probability , we have that  >   +  1 • | adm ()|, since otherwise we shall    = 0.By the second property of Lemma 16, we have  err   | ≤  rt 2 | adm ()| 2 .(Notice that if one of  and  is not in  adm (), then err   | = 0.) Recall that  rt =  2 1 .Therefore, ∑︁ (NO) If  is unsatisfiable, any set of  |  | vertices ( ∈ [0, 1]) do not intersect at least a (1 − ) 3 −  fraction of hyperedges in   .