Cluster Editing Parameterized above Modification-disjoint P3-packings

Given a graph G =(V,E) and an integer k, the Cluster Editing problem asks whether we can transform G into a union of vertex-disjoint cliques by at most k modifications (edge deletions or insertions). In this paper, we study the following variant of Cluster Editing. We are given a graph G = (V,E), a packing ℋ of modification-disjoint induced P3s (no pair of P3s in ℋ share an edge or non-edge) and an integer ℓ. The task is to decide whether G can be transformed into a union of vertex-disjoint cliques by at most ℓ +|ℋ| modifications (edge deletions or insertions). We show that this problem is NP-hard even when ℓ = 0 (in which case the problem asks to turn G into a disjoint union of cliques by performing exactly one edge deletion or insertion per element of ℋ) and when each vertex is in at most 23 P3s of the packing. This answers negatively a question of van Bevern, Froese, and Komusiewicz (CSR 2016, ToCS 2018), repeated by C. Komusiewicz at Shonan meeting no. 144 in March 2019. We then initiate the study to find the largest integer c such that the problem remains tractable when restricting to packings such that each vertex is in at most c packed P3s. Here packed P3s are those belonging to the packing ℋ. Van Bevern et al. showed that the case c = 1 is fixed-parameter tractable with respect to ℓ and we show that the case c = 2 is solvable in |V|2ℓ + O(1) time.


Introduction
Correlation Clustering is a well-known problem motivated by research in computational biology [9] and machine learning [8].In this problem we aim to partition data points into groups or clusters according to their pairwise similarity and this has been intensively studied in the literature, see [2,4,5,8,9,18], for example.
In this paper, we study Correlation Clustering from a graph-based point of view, resulting in the following problem formulation.A graph H is called a cluster graph if H is a union of vertex-disjoint cliques; we also call these cliques clusters.Given a graph G = (V, E), in the optimization version of Cluster Editing we ask for a minimum-size cluster-editing set S, that is, a set S ⊆ V 2 of vertex pairs such that G△S := (V, E△S) is a cluster graph.Here E△S is the symmetric difference of E and S, that is, E△S = (E \S)∪(S \E).We also sometimes refer to vertex pairs as edits.Cluster Editing is NP-hard [48].Constant-ratio approximation algorithms have been found for the optimization variant [2,8,18] but it is also APX-hard [18].We focus here on exact algorithms and the decision version of Cluster Editing.
Given a natural number k and a graph G = (V, E), the decision version of Cluster Editing asks whether there exists a cluster-editing set S such that |S| ≤ k.Exact parameterized algorithms for Cluster Editing and some of its variants have been extensively studied [32,10,47,22,15,35,13,14,24,34,11,39,28,45,16,1,7,6,26].Cluster Editing is but one of a large group of edge modification problems that In other words, given a graph G and a packing H of modification-disjoint P 3 s in G, it is NP-hard to decide if one can delete or insert exactly one edge per element of H to obtain a cluster graph.Proving Theorem 1 was surprisingly nontrivial.A straightforward approach would be to amend the known reductions [39,27] that show NP-hardness for constant maximum vertex degree by specifying a suitable packing of P 3 s.However, an argument based on the linear-programming relaxation of packing modification-disjoint P 3 s shows that the graphs produced by these reductions do not admit tight P 3 packing bounds.We did not find a way around this issue and thus developed a novel reduction based on new gadgets.
The verdict spelt by Theorem 1 is unfortunately quite damning.It indicates that even just reaching the lower bound given by a modification-disjoint P 3 packing already captures the algorithmic hardness of the problem.However, there may be a way out of this conundrum: Call a modification-disjoint P 3 packing 1/c-integral if each vertex is in at most c packed P 3 s (and say integral in place of 1-integral and half-integral in place of 1/2-integral ).As the case c = 1 is just the case of vertex-disjoint packings, van Bevern et al. [49] showed that Cluster Editing parameterized by the excess over integral P 3 packings is fixed-parameter tractable.Thus it becomes an intriguing question to find the largest c < 23 such that CEaMP remains tractable with respect to the excess over 1/c-integral packings.We provide progress towards answering this question here.The problem Cluster Editing above Half-Integral Modification-Disjoint P 3 Packing (CEaHMP) is defined in the same way as CEaMP except that the input packing H is half-integral.It turns out that the complexity of the problem indeed drops when making the packing half-integral: Theorem 2. Cluster Editing above Half-Integral Modification-Disjoint P 3 Packing parameterized by the number ℓ of excess edits is in XP.It can be solved in n 2ℓ+O (1) time, where n is the number of vertices in the input graph.
A straightforward idea to prove Theorem 2 would be to adapt the fixed-parameter algorithm for vertexdisjoint packings given by van Bevern et al. [49].Their main idea is to show that if a packed P 3 P of the input graph G admits a solution that is optimal for P and that respects certain conditions on the neighborhood of V (P ) in G then this solution can be used in an optimal cluster-editing set for G. Afterwards, each packed P 3 P either needs an excess edit in V (P ) or an edit incident with V (P ) in G. Since the P 3 s in the packing are vertex-disjoint, an edit incident with V (P ) will be in excess over the packing lower bound as well.It then follows that the overall number of edits is bounded by a function of the excess edits.
Unfortunately, the above idea fails for modification-disjoint packings for two reasons.First, the property that packed P 3 s have an edit incident with them is not helpful anymore, because these edits may be part of other packed P 3 s and hence not be in excess.Second, if we would like to preserve that these edits are excess, we need to check the special neighborhood properties of van Bevern et al. [49] for arbitrarily large connected components of packed P 3 s efficiently.We did not see a way around these issues and instead designed an algorithm from scratch: A straightforward guessing of the excess edits reduces the problem to the case where we need to check for zero excess edits.This case is then solved by an extensive set of reduction rules that exploit the structure given by the half-integral packing.Essentially, we successively reduce the maximum size of clusters in the final cluster graph.This then allows us to reduce the problem to Cluster Deletion.Together with the properties of the packing, this problem allows a formulation as a 2-SAT formula which we then solve in polynomial time.
Organization.After brief preliminaries in Section 2, we give some intuition about CEaMP in Section 3. Then we proceed to the reduction used to show Theorem 1 in Section 4.1 (containing the construction) and Section 4.2 (containing the correctness proof).Section 5 then contains the proof of Theorem 2.

Preliminaries
In this paper, we denote an undirected graph by G = (V, E), where V = V (G) is the set of vertices, E = E(G) is the set of edges, and V  2 \ E is the set of non-edges.An undirected edge between two vertices u and v will be denoted by uv where we put uv = vu.An undirected non-edge between two vertices x and y will be denoted by xy, where we put xy = yx, and we will explicitly mention that xy is a non-edge in case of confusion with the notation of an edge.If uv is an edge in the graph, we say u and v are adjacent.We denote a bipartite graph by B = (U, W, E), where U, W are the two parts of the vertex set of B and E is the set of edges of B. We say that a bipartite graph is complete if for every pair of vertices u ∈ U and w ∈ W , uw ∈ E. For a non-empty subset of vertices X ⊆ V , we denote the subgraph induced by X by G[X].A clique Q in a graph G is a subgraph of G in which any two distinct vertices are adjacent.A cluster graph is a graph in which every connected component is a clique.A connected component in a cluster graph is called a cluster.
Let G ′ be a cluster graph and let S be a cluster editing set S such that G△S = G ′ .We say that two cliques Q 1 and Q 2 of G are merged (in G ′ ) if they belong to the same cluster in G ′ .We say that Q 1 and Q 2 are separated (in G ′ ) if they belong to two different clusters in G ′ .When mentioning the edges or non-edges between the vertices of the clique Q 1 and the vertices of the clique Q 2 , we refer to the edges or non-edges between the clique Q 1 and the clique Q 2 for short.Let ℓ, r ∈ N. We denote a path with ℓ vertices by P ℓ and a cycle with r vertices by C r .
Let x, y, z be vertices in a graph G.We say that xyz is an induced P 3 of G if xy, yz ∈ E(G) and xz / ∈ E(G).Vertex y is called the center of xyz.We say that vertices x, y, z belong to xyz or x, y, z are incident with xyz.We also say that xyz is incident with the vertices x, y and z.In this paper, all P 3 s we mention are induced P 3 s; we sometimes skip the qualifier "induced" for convenience.
Given an instance (G, H, ℓ) of CEaMP, if xyz is a P 3 in G and xyz ∈ H, we say that xyz is packed, and we say that the edges xy, yz are covered by xyz and the non-edge xz is covered by xyz.If an edge xy is covered by some P 3 of H, we say that xy is a packed edge.Otherwise we say that xy is a non-packed edge.If a non-edge uv is covered by some P 3 of H, we say that uv is a packed non-edge.Otherwise we say that uv is a non-packed non-edge.If none of the edges of a path P is packed, we say that the path P is non-packed.
If xyz is a P 3 in G and Q 1 , Q 2 , and Q 3 are pair-wise non-intersecting vertex sets of G, we say that xyz connects Q 1 and Q 3 via Q 2 if the center y of xyz belongs to Q 2 and x, z belong to Q 1 and Q 3 , respectively.
We sometimes need finite fields of prime order.Let p be some prime.By F p we denote the finite field with the p elements 0, . . ., p − 1 with addition and multiplication modulo p.Let x ∈ F p .Where it is not ambiguous, −x and x −1 will denote the additive and multiplicative inverse, respectively, of x in F p .
When we say that we relabel the vertices of a graph, we use v ← u to denote that we relabel the vertex v by the new label u.

Intuition
Before giving the hardness proof, it is instructive to determine some easy and difficult cases when solving CEaMP with ℓ = 0.This will give us an intuition about the underlying combinatorial problem that we need to solve.
Let (G, H, 0) be an instance of CEaMP.It is helpful to consider the subgraph G fix of G that contains only those edges of G that are not contained in any P 3 in H, that is, the non-packed edges.Suppose that (G, H, 0) has a solution S and let G sol be the associated cluster graph.Observe that each connected component of G fix is part of a single cluster in G sol .Let us hence call the connected components of G fix proto-clusters.Our task in finding G sol is thus indeed to find a vertex partition P that is coarser than the vertex partition given by the proto-clusters and that satisfies certain further conditions.The additional conditions herein are given by the P 3 s in G and also by the non-edges of G which are not contained in any P 3 in H, that is, by the non-packed non-edges.A non-packed non-edge between two proto-clusters implies that these proto-clusters cannot be together in a cluster in G sol .Hence, we are searching for a vertex partition P as above subject to the constraints that certain proto-cluster pairs end up in different parts.
The constraints on P given by P 3 s in G can be distinguished based on the intersection of the P 3 s with the proto-clusters.We only want to highlight two situations that are most relevant for the hardness construction.The first situation is when a P 3 , name it P , intersects with three proto-clusters D 1 , D 2 , and D 3 , each in exactly one vertex and with center vertex in D 2 .The corresponding constraint on P is that either D 1 and D 2 are merged or D 2 and D 3 are merged into one cluster.We can satisfy such constraints easily, in the absence of further constraints, by merging all proto-clusters into one large cluster.However, together with The second case is when there is a P 3 in G and also in the packing H that has an edge contained in one proto-cluster A and the remaining vertex in a different proto-cluster B. Call this P 3 P .Intuitively, regardless of whether A and B are merged into one cluster in G sol , P can be edited without excess cost over H to accommodate this choice.In our hardness reduction, a main difficulty will be to pad subconstructions with P 3 s in the packing H, so that we are able to find a solution with zero excess edits.For this we will heavily use P 3 s of the form that we just described.

NP-hardness for tight modification-disjoint packings
In this section, we prove Theorem 1 by showing a reduction from the NP-hard problem of deciding satisfiability of 3-CNF formulas.Given a 3-CNF formula Φ, we construct a graph G = (V, E) with a modificationdisjoint packing H of induced P 3 s such that Φ has a satisfying assignment if and only if G has a cluster editing set S which consists of exactly one vertex pair of each P 3 in H.In other words, the CEaMP instance (G, H, 0) is a YES-instance.We assume that every clause of Φ has exactly 3 literals of pair-wise different variables as we can preprocess the formula to achieve this in polynomial time otherwise.Similarly, we can assume that every variable of Φ appears at least twice.In the following, we let m denote the number of clauses in Φ, denote the clauses of Φ by Γ 0 , . . ., Γ m−1 , let n be the number of variables, and denote the variables of Φ by x 0 , . . ., x n−1 .Furthermore, we let m i denote the number of clauses that contain the variable x i , i = 0, . . ., n − 1.

Construction
The outline of our construction is as follows.In Sections 4.1.1 and 4.1.2we explain the basic construction of the variable and clause gadgets.In these two sections we first show how to construct a subgraph of the final construction that enables us to show the soundness, that is, if the CEaMP instance is a yes-instance, then Φ is satisfiable.The main difficulty is then to extend this construction so that the completeness also holds.This we do in Sections 4.1.3and 4.1.4.Sections 4.2.1 and 4.2.2 then contain the correctness proof.
Both the variable gadget and the clause gadget rely on some ideas outlined in Section 3. Our basic building blocks will be proto-clusters.A proto-cluster is a subgraph that is connected through edges that are not contained in any P 3 in the constructed packing H.The proto-clusters then have to be joined into larger clusters in a way that represents a satisfying assignment to Φ.The variable gadget basically consists of an even-length cycle of proto-clusters, connected by P 3 s so that either odd or even pairs of proto-clusters on the cycle have to be merged.These two options represent a truth assignment.The construction of the variable gadget is more involved than a simple cycle of proto-clusters, however, because of the connection to the clause gadgets: We need to ensure that all vertex pairs between certain proto-clusters of a variable and clause gadget are covered by P 3 s in H, so to be able to merge these clusters in the completeness proof.The way in which we cover these vertex pairs imposes some constraints on the construction of the variable gadgets, making the gadgets more complicated.

Variable gadget
As mentioned, a variable will be represented by a cycle of proto-clusters such that any solution needs to merge either each odd or each even pair of consecutive proto-clusters.These two options represent the truth value assigned to the variable.In order to enable both associated solutions with zero edits above the packing lower bound, we build an associated packing of P 3 s such that all vertex pairs between consecutive proto-clusters are covered by a P 3 in the packing.It would be tempting to make each proto-cluster a single vertex.However, due to the connections to the clause gadget later on, we need proto-clusters containing five vertices each.
Throughout the construction, the cliques we have just introduced will remain proto-clusters, that is, they contain a spanning tree of edges that are not covered by P 3 s in the packing H.We now add pairwise modification-disjoint P 3 s so as to cover all edges between the cliques K i j we have just introduced.Recall that F 5 is the finite field of the integers modulo 5. We take three consecutive cliques and add P 3 s with one vertex in each of the three cliques.To do this without overlapping two P 3 s, we think about the cliques' vertices as elements of F 5 and add a P 3 for each possible arithmetic progression.That is, in each added P 3 the difference of the first two elements of the P 3 is equal to the difference of the second two elements.In this way, each vertex pair is contained in a single P 3 since the third element is uniquely defined by the arithmetic progression.
Formally, for each j = 0, 2, . . ., 4m i − 2 and every triple of elements p, q, r ∈ F 5 satisfying the equality q − p = r − q over F 5 , we add to the graph the edges v i j,p v i j+1,q and v i j+1,q v i j+2,r and we add to the packing H the P 3 given by v i j,p v i j+1,q v i j+2,r .Note that in this manner the clique K i j+1 becomes fully adjacent to K i j and to K i j+2 while K i j+1 stays anti-adjacent to all other cliques K i j ′ .Observe that the P 3 s given by v i j,p v i j+1,q v i j+2,r for j = 0, 2, . . ., 4m i − 2 such that q − p = r − q are pairwise modification-disjoint: For each j = 0, 2, . . ., 4m i − 2, an arbitrary edge just introduced between K i j and K i j+1 has the form {v i j,p , v i j+1,q } for some p, q ∈ F 5 .It belongs to the unique P 3 given by v i j,p v i j+1,q v i j+2,r , where r = 2q − p.Similarly, an arbitrary edge {v i j+1,q , v i j+2,r } for q, r ∈ F 5 belongs to the unique P 3 given by v i j,2q−r v i j+1,q v i j+2,r and an arbitrary non-edge {v i j,p , v i j+2,r } for p, r ∈ F 5 belongs to the unique P 3 given by v i j,p v i j+1,(p+r)•2 −1 v i j+2,r , where 2 −1 is the multiplicative inverse of 2 over F 5 , that is, 2 −1 = 3.After this construction, we set the modification-disjoint packing of the variable gadget to be H var = {P 3 given by v i j,p v i j+1,q v i j+2,r | i = 0, . . ., n − 1; j = 0, 2, . . ., 4m i − 2; p, q, r ∈ F 5 ; and q − p = r − q}.
This finishes the first stage of the construction.Notice that the cliques K i j form a cyclic structure.Intuitively, every second pair of cliques needs to be merged into one cluster by any solution due to the P 3 s we have introduced, and we will see that the two resulting solutions are in fact the only ones.The truth values of the variable are then represented as follows.For every variable x i , i = 0, . . ., n − 1, if K i j and K i j+1 are merged for j = 0, 2, . . ., 4m i − 2, then this represents the situation that we assign false to the variable x i .If K i j+1 and K i j+2 are merged for j = 0, 2, . . ., 4m i − 2, then this represents variable x i being true.We will make , and Q 4 d are in one connected component.A pair of incident brown thick lines indicates a set of four transferring P 3 s used to connect a clause gadget to a variable gadget.The cycles made from cliques and gray thick lines represent variable gadgets, where a dashed gray line indicates an omitted part of the cycle.The cycle for variable x a is shown completely, where we assume that m a = 3, that is, variable x a is in three clauses.Labels T and F on thick gray edges indicate the pairs of cliques that shall be merged into one cluster if the variable is to be set to true or false, respectively.minor modifications to the variable gadgets and H var in the following section, so as to transmit the choice of truth value to the clause gadgets.

Skeleton of the clause gadget
In order to introduce the construction of the clause gadget, we first give a description of the skeleton of the clause gadget.The skeleton is a subgraph of the final construction that allows us to prove the soundness.The final construction is given in the succeeding sections.We give a picture of the skeleton in Fig. 2. The basic idea is a generalization of the idea explained in Section 3: A clause Γ d is represented by four proto-clusters (cliques), Q i d , i = 1, . . ., 4, as in Fig. 2. The proto-clusters are connected by a path P of length 5 containing vertices of Main gadget.Formally, for each variable x i , i = 0, 1, . . ., n − 1, we fix an arbitrary ordering of the clauses that contain x i .If a clause Γ j contains a variable x i , let π(i, j) ∈ {0, . . ., m i − 1} denote the position of the clause Γ j in this ordering.Let initially H tra = ∅.For each clause Γ d (d = 0, . . ., m − 1) proceed as follows.We first introduce four cliques T a d Figure 3: Connection of a clause gadget with a variable gadget for a variable x a which appears positively in the clause.White ellipses represent cliques.The vertices in the cliques in the variable gadget are ordered from top to bottom according to the elements of F 5 which they represent.For example, the topmost vertex in K a 4π(a,d) is v a 4π(a,d),0 (corresponding to 0 ∈ F 5 ) and the bottom-most is v a 4π(a,d),4 (corresponding to 4 ∈ F 5 ).The gray lines adjacent to cliques in the variable gadget represent some of the P 3 s that were introduced into the variable gadgets in the beginning.(Some gray lines are super-seeded by edges of other colors.)The P 3 s represented by the gray lines have the associated arithmetic progression "+0", that is, q − p = r − q = 0 in the definition of the P 3 s.The P 3 s for the remaining arithmetic progressions are omitted for clarity.In colors red, black, green, and blue we show the P 3 s that connect the transferring clique T a d with the variable gadget of variable x a .Herein, dotted lines are non-edges and solid lines are edges.Note that these connecting P 3 s supplant some of the edges of previously present P 3 s in the variable gadget-the previously present P 3 s are then removed from both G and H.For example the green P 3 replaces the edge v 2 v 3 of the P 3 given by v 6 v 2 v 3 that was previously present.To maintain that each vertex pair between consecutive cliques in the variable gadget is covered by some P 3 in the packing, we add the two brown P 3 s.Connection to the variable gadgets.Next we connect the transferring cliques T a d , T b d , and T c d to the variable gadgets of x a , x b , and x c , respectively.To avoid additional notation, we only explain the procedure for T a d and x a , the other pairs are connected analogously.We connect T a d to the variable gadget of x a by a set of four modification-disjoint P 3 s as shown in Fig. 3 and explained formally below.The centers of these P 3 s are in K a 4π(a,d)+1 .For each of these four P 3 s, exactly one endpoint is an arbitrary distinct vertex in T a d which is different from the endpoints of the P 3 s connecting T a d to Q 1 d ; we denote these endpoints as w 1 , w 2 , w 3 , w 4 .The other endpoint is in K a 4π(a,d)+2 if x a appears positively in Γ d and the other endpoint is in K a 4π(a,d) otherwise.The precise centers and endpoints in the cliques K a 4π(a,d)+2 or K a 4π(a,d) are specified below.Since these newly introduced P 3 s use edges that belong to some P 3 s in H var that were introduced while constructing the variable gadgets, we will remove such P 3 s in the variable gadget from H var , remove their corresponding edges from the graph, and add some new P 3 s to H var as described below.As a result, the clique K a 4π(a,d)+1 may no longer be fully adjacent to K a 4π(a,d) or K a 4π(a,d)+2 .We will however maintain the invariant that each vertex pair between K a 4π(a,d)+1 and K a 4π(a,d) or K a 4π(a,d)+2 is covered by a P 3 in the packing and that all the P 3 s of H var are pairwise modification-disjoint.
Formally, if x a appears positively in Γ d , we denote: If x a appears negatively in Γ d , we swap the roles of K a 4π(a,d) and K a 4π(a,d)+2 , that is: As shown in Fig. 3, we remove P 3 s given by from H var and we remove their corresponding edges from the graph.Then we add the P 3 s given by v 5 v 6 v 2 and v 1 v 7 v 8 to the graph and to H var .Finally, we connect T a d via K a 4π(a,d)+1 by adding the P 3 s given by w 1 v 1 v 3 , w 2 v 2 v 4 , w 3 v 2 v 3 , and w 4 v 1 v 4 to the graph and to H tra .Note that, indeed, each vertex pair between K a 4π(a,d)+1 and K a 4π(a,d) and between K a 4π(a,d)+1 and K a 4π(a,d)+2 remains covered by a P 3 in the packing after replacing all P 3 s.This finishes the construction of the skeleton of the clause gadgets.
The intuitive idea behind the connection to the variable gadget and how it is used in the soundness proof is as follows.Recall from above that we need to delete at least one of three sets of edges in the solution, namely the edges between Q The P 3 s added so far are indeed sufficient to conduct a soundness proof of the above reduction: They ensure that there exists a satisfying assignment to the input formula provided that there exists an appropriate cluster editing set.However, the completeness is much more difficult: We need to add some more "padding" P 3 s to the packing (and edges to the graph between the cliques that can be potentially merged) to ensure that a satisfying assignment can always be translated into a cluster-editing set.The goal of the next two sections is to develop a methodology of padding such cliques with P 3 s in the packing.The padding will rely on the special structure of P 3 s that we have established above in the clause gadget and connection between clause and variable gadget.

Merging model of the clause gadget
In the sections above, we have defined all proto-clusters of the final constructed graph: As we will see in the correctness proof, each clique will be a proto-cluster in the end.Thus, all solutions will construct a cluster graph whose clusters represent a coarser partition than the partition given by the proto-clusters, or cliques.).The number i ∈ {0, 1, 2, 3, 4} beside a vertex v denotes that v ∈ L i .The placement of vertices corresponds to the placement of the cliques in Fig. 2. For example, the two vertices of level 1 on the top correspond to Q 1 d and Q 4 d .We assume that m a = 3.
What remains is to ensure that the proto-clusters indeed can be merged as required to construct a solution from a satisfying assignment to Φ in the completeness proof.To do this, we pad the proto-clusters with P 3 s (in the graph and packing H).To simplify this task we now divide the set of proto-clusters into five levels L 0 , . . ., L 4 .Then, we will go through the levels in increasing order and add padding P 3 s from proto-clusters of the current level to proto-clusters of all lower levels if necessary.
There are two issues that we need to deal with when introducing the padding P 3 s.For the padding, we will use a number-theoretic tool that we introduce in Section 4.1.4which has the limitation that, when padding a proto-cluster D with P 3 s to some sequence D 1 , . . ., D s of proto-clusters of lower level, we need to increase the number of vertices in D to be roughly 2 Hence, first, we need to make sure that the number of levels is constant since the number of size increases of proto-clusters compounds exponentially with the number of levels.Second, we aim for the property that each vertex is only in a constant number of P 3 s in H and thus, we need to ensure that the number s of lower-level proto-clusters and their size is constant.
To achieve the above goals, we introduce an auxiliary graph H, the merging model, which will further guide the padding process.The merging model has as vertices the cliques that were introduced before and an edge between two cliques if we want it to be possible that they are merged by a solution.Formally, and the edge set, E(H), is defined as follows.See also Fig. 4. First, it shall be possible to merge the cliques in the variable gadget in a cyclic fashion,2 that is, we add to E(H).Second, it shall be possible to merge transferring cliques of a clause gadget to any of the relevant cliques of the associated variable gadget, that is, we add to E(H) the set Third, it shall be possible to merge subsets of and hence we add to E(H) the set Finally, it shall be possible to merge the transferring cliques to subsets of Hence, we add to E(H) the set Note that this construction is slightly asymmetric (see Fig. 4).Now we define the levels L 0 to L 4 such that orienting the edges in H from higher to lower level gives an acyclic orientation when ignoring the edges in level L 0 .
• L 0 contains all cliques in variable gadgets.
We now orient all edges in H from higher-level vertices to lower-level vertices.Edges in level L 0 remain undirected.Observe that, apart from edges in L 0 , all edges in H are between vertices of different levels and, indeed, ignoring edges in L 0 , there are no cycles in G when orienting the edges from higher to lower level.In the following section, we will look at each clique R in levels L 1 and higher, and add P 3 s to the packing H so as to cover all vertex pairs containing a vertex of R and an out-neighbor of R in H.

Implementation of the clause gadget
In this section, we first introduce a number-theoretical construction (Lemma 1) that serves as a basic building block for "padding" P 3 s in the packing.Then we use this construction to perform the actual padding of P 3 s.
The abstract process of padding P 3 s works as follows.It takes as input a clique R in H (represented by W in the below Lemma 1), and a set of cliques that are out-neighbors of R in H (represented by V ).Furthermore, it receives a set of vertex pairs between R and its out-neighbors that have previously been covered (represented by F ).The goal is then to find a packing of P 3 s that cover all vertex pairs except the previously covered pairs.The previously covered vertex pairs have some special structure that we carefully selected so as to make covering of all remaining vertex pairs possible in a general way: The construction so far was carried out in such a way that the connected components induced by previously covered vertex pairs are P 3 s or C 8 s.
In Lemma 1 we will indeed pack triangles instead of P 3 s because this is more convenient in the proof.We will replace the triangles by P 3 s afterwards: Recall the intuition from Section 3 that P 3 s in the packing H which have exactly one endpoint in one clique T and their remaining two vertices in another clique R can accommodate both merging R and T or separating R and T without excess edits.Hence, we will replace the triangles by such P 3 s.Recall that we aim for each clique to be a proto-cluster in the final construction, that is, each clique contains a spanning tree of edges which are not contained in P 3 s in H. Since putting the above kind of P 3 s into the packing H allows in principle to delete edges within R, we need to ensure that R remains a proto-cluster.We achieve this via the connectedness property in Lemma 1.
Lemma 1.Let p be a prime number with p ≥ 2. Let B = (V, W, E) be a complete bipartite graph such that ) is a either a singleton, a P 3 with a center in V , or a C 8 .Then there exists an edge-disjoint triangle packing τ in Proof.First, we divide W into two parts W 1 and W 2 of equal sizes such that if two vertices w, w ′ ∈ W are connected to the same vertex v ∈ V by edges in F , then w and w ′ are in different parts.Note that this is easy for a connected component of (V ∪ W, F ) if it is a P 3 .For a connected component of (V ∪ W, F ) which is a C 8 , this is also doable as shown in Fig. 5, where We now label the vertices by elements from the finite field F p of size p (recall that F p consists of the elements {0, 1, . . ., p − 1} with addition and multiplication modulo p).To each vertex v ∈ V , each vertex w ∈ W 1 , and each vertex w ′ ∈ W 2 , we will assign a unique label v i , w j , and w ′ k , respectively, with i, j, k ∈ F p .In other words, we construct three bijections that map F p to V , W 1 , and W 2 , respectively.
First, we label the vertices from the connected components of (V ∪ W, F ) (and some singleton vertices) by going through the connected components one-by-one.For each yet-unlabeled connected component of (V ∪ W, F ) that is a P 3 given by wvw ′ such that v ∈ V, w ∈ W 1 , w ′ ∈ W 2 , we label vertex w as w j , vertex v as v j and vertex w ′ as w ′ j for the smallest j from F p which is not yet used in the labeling of vertices of V .For each yet-unlabeled connected component C in (V ∪ W, F ) that is a C 8 we proceed as follows.By the way we have divided vertices from W into W 1 and W 2 , we can assign, to each such connected component C, four vertices which have degree zero in (V ∪ W, F ): two in W 1 and two in W 2 ; see also Fig. 5.We thus label the vertices in C and the four degree-zero vertices assigned to C as in Fig. 5, for the smallest integer i from F p such that i, i + 1, i + 2 and i + 3 are not used in the labeling of vertices of V .
Second, we label the remaining unlabeled vertices that are not in the connected components of (V ∪W, F ).For an unlabeled vertex w ∈ W 1 , label it as w k for an arbitrary integer k from F p which is not used in the labeling of vertices in W 1 .Similarly, for an unlabeled vertex v ∈ V , we label it as v h for an arbitrary integer h from F p which is not used in the labeling of vertices in V and for an unlabeled vertex w ′ ∈ W 2 , we label it as w ′ s for an arbitrary integer s from F p which is not used in the labeling of vertices in W 2 .After the labeling, the vertices in V, W 1 and W 2 are v 1 , . . ., v p−1 , w 1 , . . ., w p−1 and w ′ 1 , . . ., w ′ p−1 , respectively.We now proceed to constructing the packing τ .First, let In the following, for any triangle packing τ , by E(τ ) we will denote the union of the edge sets of the triangles in τ .
We claim that the triangles in τ cover are edge-disjoint and cover all edges of E. Consider an arbitrary edge v i w j ∈ E between V and W 1 for i, j ∈ F p .According to the definition of τ cover , each triangle v i w j w ′ x ∈ τ cover that covers edge v i w j satisfies x = 2j − i (over F p ).Since F p is a field, there is thus exactly one such triangle.Similarly, each edge h+1 satisfies the conditions in the definition of τ cover .Moreover, F covers all edges of F .Furthermore, each edge in the edge set F is either in F or between W 1 and W 2 .(See also Fig. 5.) Thus, E \ F has an empty intersection with E(τ ).It follows that τ covers all edges of E \ F .It remains only to show that τ satisfies the connectedness condition.Since τ cover does not cover any edge of W1 ), then at least one of these triangles is removed from τ cover to obtain τ .If v is in a C 8 of (V ∪ W, F ), then at least two of the triangles in τ cover that contain v are removed to obtain τ .This concludes the proof.
The following corollary is slightly easier to apply than Lemma 1.
Corollary 1.Let p be a prime and let B = (V, W, E) be a complete bipartite graph with |V | ≤ p, |W | = 2p.Let F ⊆ E be a nonempty set of edges such that every connected component of (V ∪ W, F ) is a either a P 3 with a center in V or a C 8 .Then there exists an edge-disjoint triangle packing τ in Proof.Add extra p − |V | dummy vertices to V , obtaining a complete bipartite graph B ′ = (V ′ , W, E), apply Lemma 1 to B ′ , p, and F , obtaining a packing τ ′ , and return a sub-packing τ ⊆ τ ′ containing only triangles with vertices in B. Since every triangle in τ ′ contains exactly one vertex of V ′ , τ satisfies all the required properties.
Concluding the construction.Equipped with Lemma 1 and Corollary 1, we can finish the construction of the clause gadgets and indeed the whole instance (G, H, 0) of CEaMP.We now specify the exact size of each clique introduced above and add padding P 3 s to G and H so as to cover all vertex pairs between cliques that are adjacent in the merging model H. Put initially the set H pad of padding P 3 s to be H pad = ∅.We start with levels 0 and 1.We do not change the sizes of any clique on level 0. That is, as shown in the variable gadget, there are five vertices in every clique of level 0. Besides, we set the size of every clique of level 1 to be one.Note that no cliques of levels 0 and 1 are adjacent in the merging model H, that is, no two of them need to be merged in the solution.Hence, it is not necessary to add padding P 3 s within these levels.
Now we turn each level i, i ≥ 2, in order of increasing i.For each clique Q of level i, we apply Corollary 1 in the following scenario.Let V be the union of all cliques of levels j < i that are out-neighbors of Q in the merging model H. Let p be the smallest prime with p ≥ |V | and 2p ≥ |Q|.Introduce 2p − |Q| new vertices, put them into Q, and make We claim that Corollary 1 is applicable to p, graph B = (V, W, E), and F .To see this, we need to show that each connected component in . ., m − 1} and j ∈ {1, 2, 3, 4}, then each connected component in (V ∪ W, F ) consists of two edges of two different transferring P 3 s with the same center in V , as claimed (see also Fig. 2).If Q is a transferring clique, then each connected component of (V ∪ W, F ) consists either of two edges of two different transferring P 3 s with the same center in some Q j d ⊆ V for some j ∈ {1, 3, 4}, or of some vertex pairs of transferring P 3 s between Q and the cliques of a variable gadget.In the first case, the claim clearly holds.In the second case, observe that the edges and non-edges between V and W in the transferring P 3 s are each incident with one of w 1 , w 2 , w 3 , w 4 and one of v 1 , v 2 , v 3 , v 4 as defined when connecting variable and clause gadgets.These edges and non-edges indeed induce a C 8 given by v 1 w 1 v 3 w 3 v 2 w 2 v 4 w 4 v 1 (see also Fig. 3).Thus, Corollary 1 is applicable.
Corollary 1 gives us an edge-disjoint triangle packing ) is connected.Note that every triangle vw 1 w 2 ∈ τ has one vertex v ∈ V and two vertices w 1 , w 2 ∈ W .For every triangle vw 1 w 2 ∈ τ , we add a P 3 to G by using exactly two edges of the triangle in G; more precisely, we put {v, w 1 }, {w 1 , w 2 } ∈ E(G), vw 2 / ∈ E(G), and then add the P 3 of G given by vw 1 w 2 into H pad .Finally, let H = H var ∪ H tra ∪ H pad .Note that H is a modification-disjoint packing of P 3 s: This is by construction for H var ∪ H tra and, by Corollary 1, no P 3 in H pad shares a vertex pair with any P 3 in H var ∪ H tra .This concludes the construction of the CEaMP instance (G, H, 0).
To see that the construction takes polynomial time and to see that indeed each vertex is in some constant number of P 3 s in H, let us now derive the precise sizes of each clique in the construction.Recall that the cliques on level 0 are exactly those in the variable gadgets, and these have exactly five vertices each.The cliques on level 1 are Q By the bounds on the number of triangles in the packing, each vertex is in at most 23 P 3 s of H.It also follows that the construction takes overall polynomial time.

Correctness
We now prove the correctness of the reduction given in Section 4.1

Completeness
Now we show how to translate a satisfying assignment of Φ into a cluster editing set of size |H| for the constructed instance.Lemma 2. If the input formula Φ is satisfiable, then the constructed instance (G, H, ℓ = 0) is a YESinstance.
Proof.Assume that there is a satisfying assignment α for the formula Φ. Recall that n is the number of variables of Φ and m is the number of clauses of Φ.Instead of building the solution directly, we build a partition P of V (G) into clusters.Then, we argue that the number of edges between clusters and the number of non-edges inside clusters is at most |H|.Thus, the partition P will induce a solution with the required number of edge edits.
Recall that H denotes the merging model of our hardness construction.The basic building blocks of our vertex partition P are the cliques in G that correspond to the vertices of V (H).We will never separate such a clique during building P, that is, P corresponds to a partition of V (H).For simplicity, we will slightly abuse notation and indeed also treat P as a partition of V (H).We build P by taking initially P = V (H) and then successively merging parts of P, which means to take the parts out of P and replace them by their union.Each vertex of H is a clique of G, so has no non-edges in G. Thus, below it suffices to consider edges and non-edges between pairs of cliques corresponding to vertices in V (H) to determine the number of edits in the solution corresponding to P.
We start with the variable gadgets.Consider each variable x i , i = 0, 1, . . ., n − 1.Call a pair of cliques K i j , K i j+1 in x i 's variable gadget even if j is even and odd otherwise (indices are taken modulo 4m i ).If α(x i ) = true, then merge each odd pair.If α(x i ) = false, then merge each even pair.We will not merge any further pair of cliques contained in variable gadgets.Now consider each clause Γ d , d = 0, . . ., m − 1, in some arbitrary order.Let x a , x b , and x c be the variables in Γ d .We use the same notation as when defining the clause gadgets.See Fig. 2 for the skeleton of the clause gadget of Γ d , up to variables appearing positively instead of negatively or vice versa.We choose an arbitrary variable that satisfies Γ d .The basic idea is to separate (that is, to not merge) the transferring clique from the the cliques in the satisfying variable's gadget by deleting some edges of the transferring P 3 s.This will induce at most one edit for each transferring P 3 since the remaining edge in a transferring P 3 will be part of a cluster in P. Then we cut from the clause gadget all transferring cliques belonging to variables that have not been chosen.Since we do not spend edits inside of transferring P 3 s in this way, this allows us to merge the transferring cliques to the variable gadgets regardless of whether the variable was set to true or false.
Formally, we perform the following merges in P.
If we have chosen x a from the variables satisfying the clause Γ d : .This concludes the definition of the vertex partition P. Let us denote the corresponding cluster editing set by S. That is, S contains all edges in G between parts of P and all non-edges within parts of P.
We claim that (c1) each edit in S is contained in a P 3 of H and (c2) every P 3 of H is edited at most once by S. Note that the claim implies that S is a solution to (G, H, 0).We first prove part (c1) of the claim.Note that each edit in S is between two cliques in V (H).There are three types of edits in H: within a variable gadget, between a clause and a variable gadget, and within a clause gadget.
Consider first the edits contained in the variable gadget of an arbitrary variable x i .Observe that each such edit is contained in an odd or an even pair of x's gadget.Such an edit is contained in a P 3 in H, because, by construction of the variable gadgets, all edges and non-edges between the cliques of an odd or an even pair are covered by P 3 s in H.
For the edits in S which are not contained in variable gadgets, observe that between each pair of cliques in a single level L s , s > 0, there are no edges in G. Whenever we merge two or more parts during the construction of P, we either merge a clique on level L 4 to two cliques on level L 0 or we merge cliques on pairwise different positive levels.Hence, each edit e ∈ S which is not in a variable gadget is between two cliques on different levels.Moreover, observe that the cliques containing the endpoints of e are adjacent in V (H).Thus, by the way we have defined H pad via Corollary 1, there is a P 3 in H pad containing e.We have thus shown that claim (c1) holds.
For part (c2) of the claim, we first observe the following.Each P 3 in H that intersects only two cliques in V (H) contains at most one edit of S. Let P be such a P 3 and let D 1 , D 2 be the two cliques in V (H) that intersect P .Note that H tra does not contain P 3 s that intersect only two cliques in V (H) and thus either P ∈ H var or P ∈ H pad .In both cases, there is exactly one edge and one non-edge of P between D 1 and D 2 : This is clear if P ∈ H pad .If P ∈ H var then P was introduced when connecting a clause gadget to a variable gadget.In the notation used there, either P = v 5 v 6 v 2 or P = v 1 v 7 v 8 , both of which have the required form.Thus, as D 1 and D 2 are either merged or not in P, there is at most one edit in P .
To prove (c2) it remains to consider P 3 s in H that intersect three cliques in V (H).Let P be such a P 3 .Note that P / ∈ H pad .If P ∈ H var , then it connects K i j to K i j+2 via K i j+1 for some even j and some variable index i ∈ {0, 1, . . ., n − 1}.Since we merge either all odd or all even pairs in x i 's variable gadget to obtain P, indeed exactly one edge of P is edited, as claimed.If P ∈ H tra , then we distinguish two cases.
First, P does not contain a vertex of some variable-gadget clique.Then, P connects some clique Second, P contains a vertex of some variable-gadget clique.Then, by construction of G and H, path P indeed contains two vertices of two variable-gadget cliques, say K i j and K i j+1 and one vertex of a transferring clique, say T i d .Assume that variable x i appears positively in clause Γ d , the other case is analogous.Then the center of P is K i j and moreover j is odd.If x i was not chosen among the variables satisfying clause Γ d when constructing P, then T i d and K i j is in the same part Q of P. Furthermore K i j+1 is either in a part different from Q or also in Q.In both cases, there is at most one edit from S in P .If x i was chosen among the the variables satisfying clause Γ d when constructing P, then T i d is in a part in P which is different from the one(s) containing K i j and K i j+1 .However, since x i satisfies Γ d , we have α(x i ) = true and thus K i j and K i j+1 are merged (recall that j is odd).Thus, indeed, the claim holds, that is, each edit in S is contained in a P 3 in H and every P 3 of H is edited at most once by S.

Soundness
Before we show how to translate a cluster editing set of size |H| for the constructed instance into a satisfying assignment of Φ, we make some structural observations.
Recall the definition of a proto-cluster, a connected component of the subgraph of G whose edge set contains precisely those edges of G which are not contained in any P 3 in H. Lemma 3. V (H) is precisely the set of proto-clusters of G with respect to H.
Proof.By construction, all edges in G between two cliques in V (H) are in a P 3 in H. Thus each proto-cluster is contained in some clique in V (H).We claim that each clique C ∈ V (H) contains a spanning tree of edges which are not contained in a P 3 in H.If C ∈ L 1 , then this is clear; such a C contains only a single vertex and a trivial spanning tree.If C ∈ L 0 , then there are only two P 3 s in H that contain edges of C: The one given by v 5 v 6 v 2 and the one given by v 1 v 7 v 8 as defined in Section 4.1.2when connecting variable and clause gadgets.Since |C| = 5, indeed C contains the required spanning tree.If C ∈ L i for i ≥ 2, then by the connectedness property of Corollary 1, C has the required spanning tree.
Recall that each solution S to (G, H, 0) cannot remove any edge from G which is not contained in a P 3 in H. Thus, since V (H) is a vertex partition of G, each solution S generates a cluster graph G△S whose clusters induce a coarser vertex partition than V (H).This leads to the following.
Observation 1.For each solution S to (G, H, 0), each cluster in G△S is a disjoint union of cliques in V (H).
Using the above structural observations, we are now ready to prove the soundness of the construction.Lemma 4. If the constructed instance (G, H, ℓ = 0) is a YES-instance, then the formula Φ is satisfiable.
Proof.Suppose that there exists a set of vertex pairs S ⊆ V 2 so that G∆S is a union of vertex-disjoint cliques and |S| − |H| = 0.In other words, there exists a solution that transforms G into a cluster graph G ′ by editing exactly one edge or non-edge of every P 3 of H.We will construct a satisfying assignment α : {x 0 , x 1 , . . ., x n−1 } → {true, false} for the formula Φ.
By Observation 1, the set of clusters in G ′ induces a partition of the cliques in V (H).Recall that we say that two cliques in V (H) are merged if they are in the same cluster in G ′ and separated otherwise.
To define α, we need the following observation on the solution.Consider variable x i and the cliques K i j , j = 0, 1, . . ., 4m i − 1, in x i 's variable gadget.Call a pair K i j , K i j+1 even if j is even (where j + 1 is taken modulo 4m i ) and call this pair odd otherwise.We claim that either (i) each even pair is merged and each odd pair is separated, or (ii) each odd pair is merged and each even pair is separated (and not both).Note that, for each even j, pair K i j , K i j+1 is merged or pair K i j+1 , K i j+2 is merged, because there is a P 3 in G containing vertices in these cliques with center in K i j+1 .To show the claim, it is thus enough to show that not both an odd pair and an even pair is merged.
For the sake of contradiction, suppose that an odd pair is merged and an even pair is merged.Then, there exists an index j ∈ {0, 1, . . ., 4m i − 1} and a cluster C in G ′ such that K i j , K i j+1 , K i j+2 ⊆ C, where here and below the indices are taken modulo 4m i .Observe that there are no edges between K i j and K i j+2 in G.If j is odd, then all of these non-edges are non-packed.All of these non-edges are thus in S.This is a contradiction to the fact that S contains at most |H| vertex pairs.Thus, j is even.We now show that for each k ∈ N ∪ {0}, pair K i j+1+2k , K i j+2+2k is merged by induction on k.Clearly, for k = 0, this holds by supposition.If k > 0 then, by the construction of H var , there are non-packed non-edges between K i j+2k−1 and K i j+2k+1 .Combining this with the fact that K i j+1+2(k−1) = K i j+2k−1 and K i j+2+2(k−1) = K i j+2k are merged by inductive assumption, it follows that K i j+2k and K i j+2k+1 are separated.Since there is a P 3 in G connecting K i j+2k , K i j+2k+1 , and K i j+2k+2 with center in K i j+2k+1 and S contains at most one edit in this P 3 , it follows that K i j+2k+1 , K i j+2k+2 are merged, as required.It now follows in particular that K i j−1 and K i j are merged (recall that indices are taken modulo 4m i ).Since by assumption also K i j and K i j+1 are merged, we have that K i j ′ , K i j ′ +1 , and K i j ′ +2 are contained in the same cluster in G ′ for some odd j ′ .As already argued, this leads to a contradiction.Thus the claim holds.
We define the assignment α as follows.For each variable Otherwise α(x i ) = true.We now show that α satisfies Φ.Consider an arbitrary clause Γ d of Φ containing the three variables x a , x b , and x c .We use the same notation as when defining the clause gadget and its connection to the variable gadget.Since there are non-packed non-edges between cliques Q We now show that case (i), (ii), and (iii) imply that variable x a , x b , and x c , respectively, is set by α so as to satisfy Γ d .We only give the proof showing that case (i) implies that x a is set accordingly.The other cases are analogous.
Assume that case (i) holds.Then, by the constraints imposed by the two transferring P 3 s P (for example, the P 3 given by w 1 v 1 v 3 ).It follows that K a 4π(a,d)+1 , and K a 4π(a,d) are merged, showing that at least one even pair is merged in x a 's variable gadget.Thus, α(x a ) = false.
Thus each clause Γ d is satisfied, finishing the proof.

XP-algorithm for half-integral packings
In this section, we study CEaMP in the special setting where every vertex is incident with at most two P 3 s of the packing H.More precisely, we consider the following variant of CEaMP.
Cluster Editing above Half-Integral Modification-Disjoint P 3 Packing (CEaHMP) is incident with at most two P 3 s of H, and a non-negative integer ℓ.Question: Is there a cluster editing set, i.e., a set of vertex pairs S ⊆ V 2 so that G△S is a union of disjoint cliques, with |S| − |H| ≤ ℓ?
We give a polynomial-time algorithm to solve CEaHMP when ℓ is a fixed constant, in contrast with the NP-hardness of the general version of CEaMP when ℓ = 0.
Theorem 2 (Restated).Cluster Editing above Half-Integral Modification-Disjoint P 3 Packing parameterized by the number ℓ of excess edits is in XP.It can be solved in n 2ℓ+O (1) time, where n is the number of vertices in the input graph.
The main tool in proving Theorem 2 is a polynomial-time algorithm for the case where ℓ = 0: Theorem 3. Cluster Editing above Half-Integral Modification-Disjoint P 3 Packing can be solved in polynomial time when ℓ = 0, that is, when no excess edits are allowed.
The proof of Theorem 3 will be given in Section 5.1.With this tool in hand, we can show Theorem 2.
Proof of Theorem 2. Let (G, H, ℓ) be an instance of CEaHMP.The algorithm is given in Algorithm 1. Essentially, it guesses (by trying all possibilities) the number, ℓ a , of excess edits that are not contained in any P 3 in H and guesses the concrete edits to be made (Lines 1-4).Then it guesses the P 3 s in H that harbor the remaining excess edits and it guesses how these P 3 s are resolved (Lines 5-9).Then it checks whether the remaining instance has a cluster-editing set without excess edits over the remaining P 3 packing H ′ using the algorithm from Theorem 3.
For the running time, observe that there are at most n 2ℓa choices for S a .Since each vertex is in at most two P 3 s in H and each P 3 covers exactly three vertices, we have 3|H| ≤ 2n and thus there are in total at most n P 3 s in H. Thus, there are O(n ℓ b ) choices for H b .Since there are four possibilities to select a set of at least two vertex-pairs in the vertex set of a P 3 , there are O(4 ℓ b ) possibilities for S b in Line 6. Hence, overall the running time is O(4 ℓ b n 2ℓa+ℓ b +O(1) ) ≤ n 2ℓ+O (1) .Algorithm 1: Solve CEaHMP.
Output: Whether (G, H, ℓ) is a YES-instance. 1 foreach ℓ a = 0, 1, . . ., ℓ do It remains to prove the correctness.If the algorithm accepts, then there is a cluster-editing set S 0 for G b with |H ′ | edits.Since S 0 is contained in the vertex sets of the P 3 s in H ′ , set S 0 is disjoint from S a and S b .Thus, G△S ⋆ is a cluster graph where a be the subset of S ⋆ that contains precisely those edits in S ⋆ that are not contained in P 3 s of H.In one of the iterations of Algorithm 1, ℓ a = |S ⋆ a | and S a = S ⋆ a .Now let H ⋆ b be the subset of H that contains precisely those P 3 s P such that S ⋆ contains at least two edits in V (P ).
Thus, in that iteration the algorithm proceeds to the if-condition in Line 10. Again since each edit in S ⋆ 0 is contained in a unique P 3 in H \ H ⋆ b , this set witnesses that (G b , H ′ , 0) is a YES-instance and thus the algorithm accepts.Hence, the algorithm is correct.

Polynomial-time algorithm for zero excess edits
Let Cluster Editing Matching Half-Integral Modification-Disjoint P 3 Packing (CEMHMP) be the special case of CEaHMP where ℓ = 0.That is, an instance of CEMHMP is given by a tuple (G, H) of a graph G and a half-integral P 3 packing H in G.In this section we give a polynomial-time algorithm for CEMHMP.Again, we use the term proto-clusters to denote the connected components of the graph obtained by removing the edges of all packed P 3 s.
The intuition behind the polynomial-time result is that, with the constraint that every vertex v ∈ V (G) is incident with at most two packed P 3 s, we cannot freely merge or separate two large proto-clusters without excess edits as in the NP-hardness proof of Section 4. This is because the triangles formed by the packed P 3 s cannot cover every vertex pair between two large proto-clusters.Thus we can separate the large proto-clusters and deal with them separately.
The polynomial-time algorithm mainly proceeds by applying reduction rules that simplify the instance step by step.Herein, our first goal is to eliminate proto-clusters of size at least four, which can be done by a series of straightforward reduction rules (Section 5.1.1).We then look at proto-clusters of size three and observe that their connections to the rest of the graph have quite a limited structure.This observation can be used to eliminate proto-clusters of size three as well (Section 5.1.2).The reduction rules we have developed at this point give more structural observations on smaller proto-clusters which can be used to show that the size of solution clusters is at most four (Section 5.1.3).Afterwards, we show that the only situation in which solution clusters of size four can occur is when there is a certain path-like structure in the instance.A final, quite involved reduction rule takes care of such path-like structures (Section 5.1.4).This then results in an instance with a solution whose clusters have size at most three.Using this cluster-size bound we can finally show that, if there is a solution, then there is also one that only deletes edges.This then leads to a formulation as an instance of 2-SAT (Section 5.1.5),which is well-known to be polynomial-time solvable.
We use the following notation.We say a proto-cluster C is isolated from a proto-cluster D if there are no edges of G between C and D. We classify the P 3 s of H into four types.For an induced P 3 xyz ∈ H: • if x, y belong to one proto-cluster and z belongs to another proto-cluster, or symmetrically y, z belong to one proto-cluster and x belongs to another proto-cluster, then xyz is a type-α P 3 ; • if x, z belong to one proto-cluster and y belongs to another proto-cluster, then xyz is a type-β P 3 ; • if x, y, z belong to three distinct proto-clusters respectively, then xyz is a type-γ P 3 ; and • if x, y, z belong to one proto-cluster then xyz is a type-δ P 3 .
As mentioned, in the following, we present a series of reduction rules, which are algorithms that take an instance of CEMHMP and produce a new instance of CEMHMP.By saying that a reduction rule is safe, we mean that the instance before applying this reduction rule is a YES-instance if and only if the instance after applying this reduction rule is a YES-instance.Since the P 3 s of H are modification-disjoint, we have the following handy observation.
Observation 2. A solution S to an instance of CEMHMP must edit exactly one edge or non-edge of every P 3 of H, and neither non-packed edges nor non-packed non-edges can be edited by S.

Simple reduction rules
We start by getting rid of several simple situations.
Reduction Rule 1.For any proto-cluster C, if there are two vertices u, v ∈ V (C) such that uv is a non-packed non-edge, i.e., uv is not covered by any P 3 of H, then return NO.
Proof.Given an instance (G, H) of CEMHMP satisfying the condition of Reduction Rule 1, suppose for contradiction that there is a solution S to this instance.Since u, v belong to the same proto-cluster, there is a non-packed path P from u to v. By Observation 2, uv / ∈ S and none of the edges of P is edited by S. Thus G△S is not a cluster graph, contradicting that the instance has a solution.This completes the proof for the lemma.
The second reduction rule handles type-β and type-δ P 3 s (see Fig. 6).

Reduction Rule 2.
If there is a type-β or type-δ P 3 xyz ∈ H, insert the edge xz and remove xyz from H. Lemma 6. Reduction Rule 2 is safe.
Proof.Suppose that the given instance of CEMHMP is (G, H) such that there exists a type-β P 3 xyz in G.After inserting the edge xz and removing xyz from H, we get an instance (G ′ , H ′ ).We claim that (G, H) is a YES-instance if and only if (G ′ , H ′ ) is a YES-instance.On one hand, suppose that (G ′ , H ′ ) is a YES-instance and S ′ is a cluster editing set of G ′ such that |S ′ | = |H ′ |.Obviously, S ′ ∪ {xz} is a cluster editing set for G and |S ′ ∪ {xz}| = |H|.On the other hand, suppose that (G, H) is a YES-instance and S is a cluster editing set of G such that |S| = |H|.We show that xz ∈ S and S \ {xz} is the solution for (G ′ , H ′ ).For contradiction, suppose this is not true.Then either xy ∈ S or yz ∈ S holds.Without loss of type-β type-δ generality we assume that xy ∈ S. Suppose that after deleting xy from G and removing xyz from H, we get an instance (G ′′ , H ′′ ).Since x, z belong to one proto-cluster of G, there is a non-packed path P from x to z in G. Thus x, z belong to one proto-cluster of G ′′ .Since xyz is removed from H, xz becomes a non-packed non-edge.By Reduction Rule 1, (G ′′ , H ′′ ) is a NO-instance, contradicting that S is a solution to (G, H).
A similar analysis applies to the case that xyz ∈ H is a type-δ P 3 .This completes the proof for the lemma.
After applying Reduction Rules 1 and 2 exhaustively, if the algorithm did not return NO, then there is no type-β or type-δ P 3 s in the instance.The next reduction rule applies to the case in which there is both a non-packed non-edge and a packed edge between two proto-clusters, see Fig. 7 for an illustration.
Reduction Rule 3.For any two proto-clusters A and B, if there is a non-packed non-edge uv such that u ∈ V (A) and v ∈ V (B), and there is a packed edge xy such that x ∈ V (A) and y ∈ V (B) (not necessarily distinct from u or v), then delete xy and remove the corresponding packed P 3 from H. Lemma 7. Reduction Rule 3 is safe.
Proof.Given an instance (G, H) of CEMHMP satisfying the condition of Reduction Rule 3 with xy covered by a type-γ P 3 xyz.Without loss of generality, we do not analyze the symmetrical case where x is the center vertex of the P 3 instead of y.We get an instance (G ′ , H ′ ) of CEMHMP after deleting xy and removing xyz from H. We claim that (G, H) is a YES-instance if and only if (G ′ , H ′ ) is a YES-instance.For the soundness, assume that (G ′ , H ′ ) is a YES-instance and S ′ is a cluster editing set of size |H ′ | for G ′ .Then obviously S ′ ∪ {xy} is a solution to (G, H).For the completeness, assume that (G, H) is a YES-instance and S is a cluster editing set of size |H| for G.We claim that xy ∈ S. Suppose for contradiction that xy / ∈ S. Then xy becomes a non-packed edge in G△S.Since u, x ∈ V (A) and v, y ∈ V (B), there is a non-packed path P A from u to x and a non-packed path P B from v to y in G.By Observation 2, the edges of P A and P B are not edited by S and uv / ∈ S. Thus there is a non-packed path from u to v. Since uv is a non-packed non-edge in G△S, G△S is not a cluster graph, contradicting the assumption that S is a solution to (G, H).
A similar analysis applies to the case in which xy is covered by a type-α P 3 xyz (and its symmetrical case where x is the center vertex instead of y).This concludes the proof for the lemma.
The next reduction rule deals with isolated cliques in graph G.
Reduction Rule 4. If there is a proto-cluster C which is an isolated clique of G, then remove C from the graph.Lemma 8. Reduction Rule 4 is safe.
Proof.Given an instance (G, H) of CEMHMP such that there is a proto-cluster C which is an isolated clique, we remove C from G and get an instance (G ′ , H).We claim that (G, H) is a YES-instance if and only if (G ′ , H) is a YES-instance.On one hand, assume that (G ′ , H) is a YES-instance.Then obviously (G, H) is a YES-instance.On the other hand, assume that (G, H) is a YES-instance and S is a solution.Since C is an isolated clique, by Observation 2, neither edges of C nor non-edges between V (C) and V (G) \ V (C) are edited by S. Thus S is also a solution to (G ′ , H).This completes the proof for the lemma.
In later analysis, we will see that some constant-size configurations cannot be connected to the rest of the graph.To remove such configurations, we introduce the following reduction rule.
Reduction Rule 5.If there is a connected component C in G of size at most 6, then do brute force on C to check if there is a cluster editing set F for C such that |F | is equal to the number of packed P 3 s incident with a vertex of C. If there is such a cluster editing set F , then perform the operations of F to C and remove the corresponding packed P 3 s from H. Otherwise, if there is no such cluster editing set F , return NO.Lemma 9. Reduction Rule 5 is safe.
Proof.Given an instance (G, H) of CEMHMP such that there is a connected component C in the graph of size at most 6, suppose that there is a cluster editing set F for C satisfying the condition of Reduction Rule 5.After performing the operations of F , we get an instance (G ′ , H ′ ) of CEMHMP.We claim that (G, H) is a YES-instance if and only if (G ′ , H ′ ) is a YES-instance.On one hand, assume that (G ′ , H ′ ) has a solution S ′ .Obviously, S ′ ∪ F is a cluster editing set for G and |S ′ ∪ F | = |H|.On the other hand, assume that (G, H) has a solution S. By Observation 2, no vertex pair between V (C) and V (G) \ V (C) is edited by S. Let S 1 ⊆ S be the set of vertex pairs which are edges or non-edges of C. Then S \ S 1 is a solution to (G ′ , H ′ ).
Suppose that there is no such cluster editing set F for C. We claim that (G, H) is a NO-instance.For contradiction, assume that (G, H) has a solution S. Let S 1 ⊆ S be the set of vertex pairs which are edges or non-edges of C. Then S 1 is a cluster editing set for C and |S 1 | is equal to the number of packed P 3 s incident with a vertex of C by Observation 2, a contradiction.Thus (G, H) is a NO-instance.
The component C is of size at most 6 so we can do brute force in constant time.This completes the proof for the lemma.
We now move to analyzing the size of the remaining proto-clusters.Lemma 10.After applying Reduction Rules 1 to 4 exhaustively, if the algorithm did not return NO, then there is no proto-cluster of size at least 5.
Proof.Suppose for contradiction that there is a proto-cluster C of size at least 5.If C is a proto-cluster which is isolated from other proto-clusters, then C must be a clique since otherwise Reduction Rule 1 or Reduction Rule 2 can be applied, a contradiction.Then Reduction Rule 4 can be applied and C will be removed from the graph.Thus C is not an isolated proto-cluster.
Let D be a proto-cluster such that there is an edge uv between C and D, say u ∈ V (C) and v ∈ V (D).If uv is covered by a type-β P 3 , then Reduction Rule 2 can be applied, a contradiction.Thus we assume that uv is covered by a type-α or a type-γ P 3 .Since v is incident with at most two packed P 3 s, there must be one vertex w ∈ V (C) such that wv is a non-packed non-edge.Then Reduction Rule 3 can be applied, a contradiction.As a result, there is no proto-cluster of size at least 5.This completes the proof for the lemma.
Next we focus on proto-clusters of size 4. Lemma 11.After applying Reduction Rules 1 to 3 exhaustively, if there is a proto-cluster C of size 4 which is not an isolated clique of G, then there is a proto-cluster D of size 1 such that the vertex pairs between C and D are covered by two type-α P 3 s.In addition, V (C) ∪ V (D) forms a connected component in the graph.
Proof.After applying Reduction Rules 1 to 3 exhaustively, let C be a proto-cluster of size 4 and V (C) = {v 1 , v 2 , v 3 , v 4 }.See Fig. 8 for an illustration.Let w be a vertex such that there is an edge between w and V (C).If the vertex pairs between V (C) and w are not covered by two type-α P 3 s, then either there is a non-packed non-edge between C and D or there is a type-β P 3 between C and D. Thus Reduction Rule 2 or 3 can be applied, a contradiction.Without loss of generality, suppose that v 1 v 2 and v 3 v 4 are covered by these two type-α P 3 s.Assume for contradiction that there is another vertex u such that u and (without loss of generality) v 1 are adjacent, and uv 1 is a packed edge.Since we have applied Reduction Rule 2 exhaustively, there are neither type-β nor type-δ P 3 s in the graph.Thus uv 1 must be covered by a type-α or a type-γ P 3 .We claim that there must be a non-packed non-edge from u to a vertex of C. For contradiction, suppose this is not true.Then either v 1 v 4 , v 2 v 3 are covered by two type-α P 3 s respectively, or v 1 v 3 , v 2 v 4 are covered by two type-α P 3 s respectively.In both cases, v 1 , v 2 , v 3 and v 4 are not in one proto-cluster anymore since after removing the packed edges, v 1 , v 2 , v 3 and v 4 are not in one connected component, a contradiction.Thus there must be a non-packed non-edge between V (C) and u.Since uv 1 is a packed edge, Reduction Rule 3 can be applied to C and the proto-cluster containing u, a contradiction.Thus there are no edges between V (C) and any other vertices except w.
Suppose that w belongs to a clique of size at least two.Then there must be a non-packed non-edge and a packed edge between C and D (there cannot be more than two packed P 3 s between a proto-cluster of size 4 and another proto-cluster).Thus Reduction Rule 3 can be applied, a contradiction.Thus w belongs to a proto-cluster of size one and let this proto-cluster be D. Since w is already incident with two packed P 3 s, w is isolated from any other proto-clusters except C. Obviously, V (C) ∪ V (D) forms a connected component in the graph.This completes the proof for the lemma.
Lemma 12.After applying Reduction Rules 1 to 5 exhaustively, there is no proto-cluster of size 4.
Proof.Suppose for contradiction that there is a proto-cluster C of size at least 4. If C is an isolated proto-cluster, C must be a clique since otherwise Reduction Rule 1 or 2 can be applied, a contradiction.Then Reduction Rule 4 can be applied and C will be removed from the graph.Thus C is not an isolated proto-cluster.By Lemma 11, there is a proto-cluster D of size 1 such that V (C) ∪ V (D) forms a connected component of size 5 in the graph.Then Reduction Rule 5 can be applied, a contradiction.As a result, there is no proto-cluster of size at least 4.This completes the proof for the lemma.
Summarizing, using the simple Reduction Rules 1 to 4 we have successfully removed all proto-clusters of size at least four.

Decreasing the proto-cluster size and structural observations
Next, we focus on the structure of proto-clusters of size three and how to remove them as well.First, we observe how connections around proto-clusters of size three look like.See Fig. 9 for an illustration of these connections.
Lemma 13.After applying Reduction Rules 1 to 4 exhaustively, if there is a proto-cluster C of size 3, then there must be a proto-cluster B of size 1 and a proto-cluster A of size 1, such that the vertex pairs between C and B are covered by a type-α P 3 and a type-γ P 3 , and the type-γ P 3 connects C and A via B. In addition, C is isolated from any other proto-clusters except B, and B is isolated from any other proto-clusters except A and C.
Proof.After applying Reduction Rules 1 to 4 exhaustively, let C be a proto-cluster of size 3.If C is isolated from other proto-clusters, then C must be a clique since otherwise Reduction Rule 1 can be applied.However, then Reduction Rule 4 can be applied, a contradiction.Thus we assume that C is not an isolated proto-cluster.
Let the three vertices of C be u 1 , u 2 , and u 3 .Let v be a vertex such that there is an edge between v and V (C).If the vertex pairs between V (C) and v are not covered by a type-α P 3 and a type-γ P 3 , then Reduction Rule 2 or 3 can be applied as v can be incident with at most two packed P 3 s, a contradiction.Without loss of generality, suppose that u 1 , u 3 , and v belong to a type-α P 3 .Assume for contradiction that there is another vertex w such that w is adjacent to some vertex of V (C) (w can either belong to the same proto-cluster as v or belong to a different proto-cluster from v).If the vertex pairs between V (C) and w are not covered by a type-α P 3 and a type-γ P 3 , then Reduction Rule 2 or 3 can be applied to the corresponding P 3 or proto-clusters, a contradiction.If the vertex pairs between V (C) and w are covered by a type-α P 3 and a type-γ P 3 , say u 1 , u 2 and w belong to the type-α P 3 , then u 1 , u 2 , and u 3 are not in one proto-cluster, a contradiction.It follows that there is no vertex adjacent to one of the vertices of V (C) except v.
Let B be the proto-cluster to which v belongs.Assume for contradiction that |B| > 1 and there is another vertex y belonging to B. As argued above, y is not adjacent to any vertex of V (C) and there is a non-packed non-edge between V (B) and V (C).Thus Reduction Rule 3 can be applied, a contradiction.It follows that |B| = 1 and C is isolated from any other proto-clusters except B. We have assumed that u 1 , u 3 and v belong to a type-α P 3 .As argued above, u 2 v is covered by a type-γ P 3 .Let u 2 vx be that type-γ P 3 where x belongs to a proto-cluster A. We claim that |A| = 1.Suppose for contradiction that |A| > 1 and there is another vertex z ∈ V (A).Then vz must be a non-packed non-edge since v is already incident with two packed P 3 s.Thus Reduction Rule 3 can be applied, a contradiction.It follows that |A| = 1.This concludes the proof for the lemma.
Lemma 13 now suffices to determine a solution around proto-clusters of size three.See Fig. 9 for an illustration of the following Reduction Rule 6.
Reduction Rule 6.After applying Reduction Rules 1 to 4 exhaustively, if there is a proto-cluster C of size 3, a proto-cluster B of size 1 and a proto-cluster A of size 1 such that C is not isolated from B, and a type-γ P 3 connects C and A via B, then delete the packed edge between A and B, insert an edge to the packed non-edge between C and B, and remove the corresponding P 3 s from H. Lemma 14. Reduction Rule 6 is safe.
Proof.Given an instance (G, H) of CEMHMP satisfying the condition of Reduction Rule 6, let u 1 , u 2 , and u 3 be the three vertices of C, let v be the vertex of B and w be the vertex of A. Without loss of generality, let u 1 u 3 v and u 2 vw be two packed P 3 s.After applying Reduction Rule 6, we get an instance (G ′ , H ′ ) of CEMHMP.We claim that (G, H) is a YES-instance if and only if (G ′ , H ′ ) is a YES-instance.
For the soundness, suppose that (G ′ , H ′ ) is a YES-instance and S ′ is a cluster editing set of G ′ such that For the completeness, suppose that (G, H) is a YES-instance and S is a cluster editing set of G such that |S| = |H|.If vw ∈ S, then u 2 v becomes a non-packed edge between C and B after removing the P 3 u 2 vw from H. Thus, in this case we have u 1 v ∈ S as well by Reduction Rule 2, that is, {u 1 v, vw} ⊆ S. Then S ′ = S \ {u 1 v, vw} is a solution to (G ′ , H ′ ) because by Lemma 13, C and B are isolated from the rest of the graph.
Thus, assume vw / ∈ S from now on.Then, either u 2 w ∈ S or u 2 v ∈ S. First, we assume that u 2 w ∈ S, and after inserting u 2 w and removing u 2 vw from H we get an instance (G ′′ , H ′′ ) of CEMHMP.Observe that since C is a proto-cluster and u 1 u 3 is packed, u 2 u 3 is not packed.Thus, u 3 u 2 w is a non-packed path in G ′′ and u 3 w is a non-packed non-edge.Thus Reduction Rule 1 can be applied to (G ′′ , H ′′ ) and (G ′′ , H ′′ ) is a NO-instance.This contradicts the fact that S is a solution to (G, H).Thus, we have u 2 v ∈ S.After deleting u 2 v and removing u 2 vw from H, u 2 v becomes a non-packed non-edge.Thus Reduction Rule 3 can be applied, showing u 3 v ∈ S. By Lemma 13, C is isolated from any other proto-clusters except B, and B is isolated from any other proto-clusters except A and C. It follows that in G△S, u 1 , u 2 and u 3 form a clique of size 3 while v and w form a clique of size 2. Furthermore, V (G) \ {u 1 , u 2 , u 3 , v, w} forms a cluster graph in G△S.Let S = (S \ {u 2 v, u 3 v}) ∪ {vw, u 1 v}.Obviously G△ S is also a cluster graph and | S| = |H|.Thus S is also a solution to (G, H).It follows that S \ {vw, u 1 v} is a solution for (G ′′ , H ′′ ).This completes the proof for the lemma.
Corollary 2. After applying Reduction Rules 1 to 6 exhaustively, there are no isolated cliques in the instance and every proto-cluster of the instance is of size at most 2.Moreover, since the edge in a proto-cluster of size 2 cannot be a packed edge, every packed P 3 in the remaining graph is a type-γ P 3 .

Reducing the size of solution clusters
In the previous section we have successfully removed all proto-clusters of size at least 3. Suppose that after applying Reduction Rules 1 to 6 exhaustively, we have an instance (G, H) of CEMHMP.Suppose that S is a solution to (G, H).Now we consider the size of the clusters in the cluster graph G△S.We first show that the largest clique in this graph has size at most 6.
Lemma 15.After applying Reduction Rules 1 to 6 exhaustively, we have an instance (G, H) of CEMHMP.Suppose that S is a solution to (G, H).Then there is no clique of size larger than 6 in G△S.
Proof.Suppose for contradiction that A is a clique of size at least 7 in G△S and let u be a vertex in A.
Then there are at least six vertex pairs between {u} and V (A) \ {u}, which are either non-packed edges or covered by packed P 3 s.Since u is incident with at most two packed P 3 s, at most four vertex pairs between {u} and V (A) \ {u} are covered by a packed P 3 .Thus at least two vertex pairs between {u} and V (A) \ {u} are non-packed edges.By Corollary 2, every proto-cluster in G is of size at most 2, a contradiction.This completes the proof for the lemma.
We can now determine more precisely the structure of potential cliques of size 6 in G△S.See Fig. 10 as an example.
Lemma 16.Let (G, H) be an instance of CEMHMP such that the size of every proto-cluster in G is at most 2. Let S be a solution to (G, H) and suppose that A is a clique of size exactly 6 in G△S.Then the following statements hold: • Every vertex pair between C 1 and C 2 , between C 1 and C 3 , and between C 2 and C 3 is covered by some P 3 of H.
Proof.Suppose for contradiction that u ∈ V (A) belongs to a proto-cluster of size 1 in G. Then there are five vertex pairs between {u} and V (A) \ {u}, which are covered by packed P 3 s.Since u belongs to at most two packed P 3 s, at most four vertex pairs between {u} and V (A) \ {u} are covered by a packed P 3 , a contradiction.
Next we show that the vertices of V (A) belong to three proto-clusters C 1 , C 2 , and C 3 of size 2 in G; see also Fig 10.We see that for every vertex v ∈ V (A), four of the vertex pairs between {v} and V (A) \ {v} are covered by packed P 3 s and the other one is a non-packed edge.Thus every vertex v ∈ V (A) belongs to two packed P 3 s.It follows that for each i ∈ [3] the proto-cluster C i is isolated from any other proto-cluster in G \ (V (C 1 ) ∪ V (C 2 ) ∪ V (C 3 )).Note that there are no type-α, type-β, or type-δ P 3 s in H anymore. Thus the edges between the proto-clusters in A are covered by type-γ P 3 s.Thus, without loss of generality, let xyz be a P 3 such that x ∈ V (C 1 ), y ∈ V (C 2 ) and z ∈ V (C 3 ).Thus, V (C 1 ) ∪ V (C 2 ) ∪ V (C 3 ) forms a connected component.This completes the proof for the lemma.
By the reduction rule that solved small connected components it follows that cliques of size 6 cannot exist in G△S.
Lemma 17.After applying Reduction Rules 1 to 6 exhaustively, we have an instance (G, H) of CEMHMP.Suppose that S is a solution to (G, H).Then there is no clique of size exactly 6 in G△S.
Proof.Suppose for contradiction that A is a clique of size exactly 6 in G△S.According to Lemma 16, V (A) induces a connected component of size exactly 6 in the input graph.Then Reduction Rule 5 or Reduction Rule 4 can be applied, a contradiction.This completes the proof for the lemma.Now we consider the structure of potential cliques of size 5 in G△S.See Fig. 11 for examples.
Lemma 18.After applying Reduction Rules 1 to 3 exhaustively, let (G, H) be an instance of CEMHMP such that the size of every proto-cluster in G is at most 2 and S is a solution to (G, H).Suppose that A is a clique of size exactly 5 in G△S.Then there are four proto-clusters C i for i ∈ [4] such that the following statements hold: • The vertices of A belong to the three proto-clusters C 1 , C 2 , and C 3 or to the three proto-clusters C 2 , C 3 , and C 4 .
• Every vertex pair between C i and C j (i, j ∈ {1, 2, 3, 4}, i ̸ = j) is covered by a packed P 3 except that the vertex pair between C 1 and C 4 is a non-packed non-edge.
Proof.Suppose for a contradiction that at least three vertices of V (A) belong to proto-clusters of size 1 in G; say u, v, w ∈ V (A) belong to three distinct proto-clusters of size one, respectively, and two vertices of V (A), say x, y ∈ V (A) \ {u, v, w}, belong to a proto-cluster of size two or belong to two distinct proto-clusters of size one, respectively.It follows that every vertex pair of V (A) 2 is either a non-packed edge or covered by some P 3 of H. Then uv, wv, xv, yv are four vertex pairs that are covered by packed P 3 s.Since v is incident with at most two packed P 3 s, there are the two following cases: (a) We assume that u, v, x belong to a packed P 3 and w, v, y belong to another packed P 3 .We omit the symmetric case that u, v, y belong to a packed P 3 and w, v, x belong to another packed P 3 since the analysis is analogous.(b) We assume that u, v, w belong to a packed P 3 and x, v, y belong to another packed P 3 .
For case (a), uw, uy are also covered by one packed P 3 or two distinct packed P 3 s.If uw and uy are covered by one packed P 3 , then this P 3 is not modification disjoint with the packed P 3 covering w, v, y, a contradiction.If uw and uy are covered by two distinct packed P 3 s, then u is incident with three packed P 3 s, a contradiction.
For case (b), ux, uy are also be covered by one packed P 3 or two distinct packed P 3 s.If ux and uy are covered by one packed P 3 , then it is not modification disjoint with the packed P 3 covering x, v, y, a contradiction.If ux and uy are covered by two distinct packed P 3 s, then u is incident with three packed P 3 s, a contradiction.As all cases lead to a contradiction, it follows that the vertices of V (A) belong to one proto-cluster of size 1 and two proto-clusters of size 2.
Next we show that the vertices in V Without loss of generality, let x, u 1 , v 1 belong to a packed P 3 and x, u 2 , v 2 belong to another packed P 3 .Then u 1 v 2 and u 2 v 1 must be covered by packed P 3 s since otherwise Reduction Rule 3 can be applied to C 2 and C 3 .
For a contradiction, assume that there are two vertices y 1 , y 2 such that y 1 , u 1 , v 2 belong to one packed P 3 and y 2 , u 2 , v 1 belong to another packed P 3 .Then y 1 u 2 , y 1 v 1 are non-packed non-edges since u 2 and v 1 are each already incident with two packed P 3 s.It then follows that Reduction Rule 3 can be applied, a contradiction.It follows that there is a single vertex y such that {y, u 2 , v 1 } and {y, u 1 , v 2 } are vertex sets of P 3 s in H. Let C 4 be the proto-cluster to which y belongs.
If |C 4 | > 1, then there must be a non-packed non-edge between C 4 and C 2 and a non-packed nonedge between C 4 and C 3 .Thus Reduction Rule 3 can be applied, a contradiction.Thus |C 4 | = 1.Since u 1 , u 2 , v 1 , v 2 , x, y are all incident with two packed P 3 s, the subgraph induced by V (C 1 )∪V (C 2 )∪V (C 3 )∪V (C 4 ) is isolated from the other parts of the graph.We can view the graph induced by V (C 1 )∪V (C 2 )∪V (C 3 )∪V (C 4 ) as a complete graph on 6 vertices with five missing edges.Note that the edge between x and y is missing by the condition of this lemma.Suppose that {u 1 , u 2 , v 1 , v 2 , x, y} does not induce a connected component in G.This is only possible when every edge incident to x (symmetrically, y) is missing because a cut of a complete graph on 6 vertices minus one edge is of size at least 4.However, x (symmetrically, y) is incident with two packed P 3 s and thus at most two of the edges incident to x (symmetrically, y) are missing, a contradiction.
This completes the proof for the lemma.
As for cliques of size 6, the reduction rule that solved small connected components thus took care of cliques of size 5.
Lemma 19.Let (G, H) be an instance of CEMHMP obtained after applying Reduction Rules 1 to 6 exhaustively.Suppose that S is a solution to (G, H).Then there is no clique of size exactly 5 in G△S.Proof.Suppose for contradiction that A is a clique of size exactly 5 in G△S.According to Lemma 18, V (A) belongs to a connected component of size 6 in the input graph.Then Reduction Rule 5 or Reduction Rule 4 can be applied, a contradiction.This completes the proof for the lemma.
Summarizing, after applying our reduction rules the cliques in G△S have size at most 4.

Path-like structures
Next, we aim to get rid of cliques of size 4.This will later enable us to reduce the instance of CEMHMP to 2-SAT.To take care of cliques of size 4, we use a similar strategy as for cliques of size 5 or 6.We first consider the structure of the proto-clusters taking part in the clique and we then devise reduction rules that remove or simplify these proto-clusters.The structure here is more involved.In particular, it is in general not true anymore that cliques of size 4 are contained in small connected components.However, as we will see, these cliques take part in a path-like structure that can either be solved locally, or that behaves analogously to a P 4 , see Fig. 13 later on.The following lemma formalizes the underlying structure that may contain cliques of size 4.
Lemma 20.After applying Reduction Rules 1 to 6 exhaustively, let (G, H) be an instance of CEMHMP.
Let S be a solution to (G, H).Suppose that A is a clique of size 4 in G△S and V (A) = {x, y, z 1 , z 2 }.Then the following statements hold: (1) Three vertices of V (A), say x, y, z 2 , belong to one packed P 3 in G, and one vertex of x, y, z 2 , say z 2 , together with z 1 forms a proto-cluster C 1 of size 2 in G.
(2) Vertices x and y form a proto-cluster C 2 of size 1 and a proto-cluster C 3 of size 1 in G, respectively.
(3) There are two vertices u and v such that x, u, z 1 belong to a packed P 3 in G and y, v, z 1 belong to another packed P 3 in G.
(4) Vertices u and v form a proto-cluster C 4 of size 1 and a proto-cluster C 5 of size 1 in G, respectively.
(5) u, v, z 2 cannot belong to the same packed P 3 .
Proof.We first show the part of Items ( 1) and ( 2) about the partition of V (A) into proto-clusters.For contradiction, suppose that V (A) does not belong to one proto-cluster of size 2 and two proto-clusters of size 1 in G. Then there are two cases: (i) Two vertices of V (A), say x 1 , x 2 , belong to a proto-cluster C 2 of size two and the other two vertices of V (A), say y 1 , y 2 , belong to a proto-cluster C 3 of size 2. (ii) All four vertices x 1 , x 2 , y 1 , y 2 of V (A) belong to four distinct proto-clusters C 1 , C 2 , C 3 , and C 4 of size 1, respectively.Case (i): Since all vertex pairs between C 2 and C 3 need to be covered to form a clique of size 4, without loss of generality, assume that there is a vertex u / ∈ V (A) such that u, x 1 , and y 1 belong to a packed P 3 .Suppose that there is another vertex u ′ / ∈ V (A) ∪ {u} such that u ′ , x 2 , and y 2 belong to a packed P 3 .Since neither u, x 2 , y 1 nor u, x 1 , y 2 could belong to a packed P 3 (uy 1 and uy 2 are already covered by the assumed P 3 s), one of the vertex pairs ux 2 and uy 2 must be a non-packed non-edge and thus Reduction Rule 3 can be applied, a contradiction.Thus u, x 2 and y 2 belong to a packed P 3 .Similarly, we can show that there is another vertex v such that v, x 1 , y 2 belong to a packed P 3 and v, x 2 , y 1 belong to a packed P 3 .It follows that each vertex of {x 1 , x 2 , y 1 , y 2 , u, v} is incident with two packed P 3 s.First we assume that u and v belong to two different proto-clusters, say C Case (ii): Since the vertex pair between each pair of C 1 , C 2 , C 3 , and C 4 needs to be covered to form a clique of size four and each vertex can be in at most two P 3 s, without loss of generality, assume that x 1 , x 2 , y 1 belong to a packed P 3 .Pair x 1 y 2 also needs to be covered by a packed P 3 ; observe that by modificationdisjointness of the packed P 3 s, the third vertex in this P 3 cannot be contained in V (A).Thus, there is another vertex y 3 / ∈ V (A) such that x 1 , y 2 , y 3 belong to a packed P 3 .The vertex pairs x 2 y 2 and y 1 y 2 cannot be covered by one packed P 3 since x 2 y 1 is already covered by a packed P 3 .Thus x 2 y 2 and y 1 y 2 need to be covered by two distinct P 3 s respectively.However, then y 2 is incident with three packed P 3 s, a contradiction.Therefore Case (ii) does not happen either.It follows that V (A) consists of one proto-cluster of size 2 and two proto-clusters of size 1.
Next we show that the claims on the P 3 s in Item (1) as well as Items ( 3) and ( 4) are true.Suppose that A is a clique of size 4 in G△S.Let V (A) = {x, y, z 1 , z 2 }.By the analysis above, we get that two vertices of A belong to a proto-cluster of size 2 and the other two vertices of A belong to two distinct proto-clusters of size 1 respectively.Without loss of generality, assume that z 1 , z 2 form a proto-cluster C 1 of size 2 in G while x and y form a proto-cluster C 2 of size 1 and a proto-cluster C 3 of size 1 in G respectively.See Fig. 12 for an illustration.
Since there are three vertex pairs, i.e., {xy, xz 1 , xz 2 }, between x and V (A) \ {x}, two of the three vertex pairs are covered by one packed P 3 .Moreover, this P 3 cannot contain two vertices of C 1 .Without loss of generality, let thus x, y, z 2 belong to a packed P 3 .Since xz 1 is also covered by a P 3 and this P 3 is Figure 13: Examples of Reduction Rule 7. Vertices z 1 , z 2 form a proto-cluster of size 2 and each of the other vertices belongs to a proto-cluster of size 1.Note that in Item (3) the P 3 y, x, z 2 is not fully specified by the conditions, that is, its packed non-edge could also be between different vertices.
modification-disjoint to the one containing x, y, z 2 , there is another vertex u / ∈ V (A) such that x, u, z 1 belong to a packed P 3 in G.
Also yz 1 needs to be covered by a packed P 3 , so there is another vertex v such that y, v, z 1 belong to a packed P 3 (u and v are different as otherwise the P 3 s induced by y, v, z 1 and x, u, z 1 are not modificationdisjoint).Suppose that u and v belongs to the same proto-cluster of size at least 2. By Corollary 2, this proto-cluster has size exactly 2. Since x and y are incident with two packed P 3 s respectively, uy and vx are two non-packed non-edges.Thus Reduction Rule 3 can be applied to the proto-clusters adjacent to these non-edges, a contradiction.It follows that u and v must belong to two distinct proto-clusters.Assume that there is a vertex u ′ such that u ′ and u belong to one proto-cluster of size at least two.Since x, z 1 are already incident with two packed P 3 s respectively, u ′ x and u ′ z 1 must be non-packed non-edges.Then Reduction Rule 3 can be applied since ux or uz 1 is a packed edge.It follows that u belongs to a proto-cluster of size one, say C 4 .Similarly, we can show that v belongs to a proto-cluster of size one, say C 5 .
Finally we show that Item (5) is true.Suppose for contradiction that u, v, z 2 belong to the same protocluster.Then every vertex of {u, v, x, y, z 1 , z 2 } is incident with two packed P 3 s.It follows that the subgraph induced by {u, v, x, y, z 1 , z 2 } is a connected component in G, which can be handled by Reduction Rule 5. Thus u, v, z 2 cannot belong to the same packed P 3 .This completes the proof for the lemma.
We next leverage the structure observed in Lemma 20 in a reduction rule.Essentially, all the possible ways to realize the structure of Lemma 20 result in a situation that can either be solved directly, or can be replaced by a P 5 with suitable new packed P 3 s.Reduction Rule 7.After applying Reduction Rules 1 to 6 exhaustively, let C 1 , C 2 , C 3 , C 4 , and C 5 be five proto-clusters such that • x, y, z 2 belong to a packed P 3 , • x, u, z 1 belong to a packed P 3 , and • y, v, z 1 belong to a packed P 3 .
Check which of the following conditions are satisfied and apply the corresponding data reduction.
If uz 2 and vz 2 are non-packed non-edges, then are neither edges nor packed non-edges from a, b, c to other vertices except to v, w.It follows that {v, a} and {c, w} are two clusters in G ′ △S ′ .Thus G△S is also a cluster graph and |S| = |H|.Thus (G, H) is a YES-instance.
For completeness, suppose that (G, H) has a solution S. We can check that there are only three possible cases ((1) vy ∈ S, uw / ∈ S; (2) uw ∈ S, vy / ∈ S (3) vy / ∈ S, uw / ∈ S).Readers can easily check that the the other case in which vy ∈ S, uw ∈ S is invalid as there is no such cluster editing set S.
(1) F 1 = W u ∪ ({yx, yz 1 , yz 2 } \ W y ) ∪ {vy} ⊆ S. Since vertices u, x, y, z 1 , z 2 are not adjacent to any vertex of V (G) \ {u, v, w, x, y, z 1 , z 2 } in G, {x, y, z 1 , z 2 } induces a clique of size four which is a connected component and {u, w} induces a clique of size two which is also a connected component in G△S.Let S ′ = S\F 1 ∪{va, bc}.It follows that G ′ △S ′ is a cluster graph and (2) F 2 = W y ∪({ux, uz 1 , uz 2 }\W u )∪{uw} ⊆ S. Since vertices u, x, y, z 1 , z 2 are not adjacent to any vertex of V (G) \ {u, v, w, x, y, z 1 , z 2 } in G, {u, x, z 1 , z 2 } induces a clique of size four which is a connected component and {v, y} induces a clique of size two which is also a connected component in G△S.
As a result, Item ( 5) is safe.This completes the proof for the lemma.
After applying Reduction Rule 7, Reduction Rule 4 can be applied to remove the isolated cliques.
Lemma 22.After applying Reduction Rules 1 to 7 exhaustively, let (G, H) be an instance of CEMHMP which has a solution S. Then there is no clique of size at least 4 in G△S.
Proof.By Lemma 15, 17 and 19, there is no clique of size at least 5 in G△S.Suppose for contradiction that A is a clique of size 4 in G△S.Let V (A) = {x, y, z 1 , z 2 }.Then by Lemma 20, three vertices of V (A), say x, y, z 2 belong to one packed P 3 in G, and one vertex of x, y, z 2 , say z 2 , forms with z 1 a proto-cluster C 1 of size two in G.Meanwhile, x and y form a proto-cluster C 2 of size one and a proto-cluster C 3 of size one in G respectively.Moreover, there are two vertices u and v such that x, u, z 1 belong to a packed P 3 in G, y, v, z 1 belong to another packed P 3 in G, and u and v form a proto-cluster C 4 of size one and C 5 of size one in G respectively.There are five cases: (1) uz 2 and vz 2 are non-packed non-edges.Then Item (1) of Reduction Rule 7 can be applied.
(2) uz 2 is a packed edge and vz 2 is a non-packed non-edge.Then one of Items (2) -( 5) can be applied.
(3) uz 2 is a packed non-edge and vz 2 is a non-packed non-edge.By Item (5) of Lemma 20, u, v, z 2 cannot belong to one packed P 3 .Thus there is another vertex w such that u, w, z 2 belong to a packed P 3 and uz 2 is a packed non-edge.Thus wz 2 is a packed edge.Since z 1 is in a proto-cluster of size one and z 1 is already incident with two packed P 3 s, wz 1 must be a non-packed non-edge.Since z 1 , z 2 belong to one proto-cluster, Reduction Rule 3 can be applied.
(4) vz 2 is a packed edge and uz 2 is a non-packed non-edge.Then one of Items ( 6) -( 9) of Reduction Rule 7 can be applied.
(5) vz 2 is a packed non-edge and uz 2 is a non-packed non-edge.By Item (5) of Lemma 20, u, v, z 2 cannot belong to one packed P 3 .Thus there is another vertex w ′ such that v, w ′ , z 2 belong to a packed P 3 and vz 2 is a packed non-edge.Thus w ′ z 2 is a packed edge.Since z 1 is in a proto-cluster of size one and z 1 is already incident with two packed P 3 s, w ′ z 1 must be a non-packed non-edge.Since z 1 , z 2 belong to one proto-cluster, reduction Rule 3 can be applied.
It follows that there is no clique of size 4 in G△S.This completes the proof for the lemma.It remains to show that S ′ is indeed a cluster deletion set, that is, there is no induced P 3 in G ′ △S ′ .We show this by going over the possibilities of such an induced P 3 for whether its edges are packed or not.Before that, for every induced P 3 uvw in G ′ such that uv and vw belong to two distinct P 3 s of H ′ , let uv = e p and vw = e q for some p, q ∈ {0, . . ., λ − 1}.By the construction, (x p ∨ x q ) is a clause of Φ so it is satisfied by α.Thus at least one edge of uvw belongs to S ′ .
First, by Corollary 2, there is no proto-cluster of size at least three in G ′ .Thus there is no induced P 3 abc in G ′ △S ′ such that ab and bc are non-packed edges in G ′ .
Second, we claim that there is no induced P 3 xyz in G ′ △S ′ such that both xy and yz are packed edges in G ′ .Suppose for a contradiction that there is an induced P 3 xyz in G ′ △S ′ such that both xy and yz are packed edges in G ′ .Then xy and yz must be covered by two distinct packed P 3 s, since otherwise xy or yz belongs to S ′ by the definition of S ′ .We contend that xz must be a packed edge covered by another packed P 3 in G ′ , i.e., xy, yz and xz are covered by three distinct packed P 3 s in G ′ .First of all, xz is an edge of G ′ , because otherwise xyz would be an induced P 3 in G ′ .Then xy or yz would belong to S ′ by the definition of S ′ , a contradiction.If xz is a non-packed edge in G ′ , then xz is an edge in G ′ △S ′ since S ′ can only contain vertex pairs covered by packed P 3 s.However, this contradicts the assumption that xyz is an induced P 3 in G ′ △S ′ .Therefore, xz is indeed a packed edge in G ′ .
By the construction of S ′ , no two of xy, yz, and xz are covered by the same packed P 3 as otherwise one of the three edges belongs to S ′ .Thus xy, yz and xz are covered by three distinct packed P 3 s in G ′ .Suppose that without loss of generality, xz is covered by uxz ∈ H ′ .Note that ux, xy, yz / ∈ S ′ as by our assumption, xyz is an induced P 3 in G ′ △S ′ .Since y is already incident with two packed P 3 s, uy is either a non-packed non-edge in G ′ or a non-packed edge in G ′ .If uy is a non-packed non-edge in G ′ , then uxy is an induced P 3 in G ′ .Let ux = e i and xy = e j , then the clause (x i ∨ x j ) of Φ is not satisfied, a contradiction.Thus uy is a non-packed edge.
By the analysis above, there is a vertex w such that x, y, w belong to a packed P 3 and there is a vertex w ′ such that y, z, w ′ belong to a packed P 3 .We have the following subcases: (1) the subgraph induced by {x, y, z, u, w, w ′ } is isolated from G ′ \ {x, y, z, u, w, w ′ }.Then Reduction Rule 5 can be applied; (2) Either wy is a non-packed non-edge and wu is a packed edge, or w ′ y is a non-packed non-edge and w ′ u is a packed edge in G ′ .Then Reduction Rule 3 can be applied as uy is a proto-cluster of size 2 in G ′ by our analysis above; (3) the subcases (1) and (2) do not hold.Then we can check that one of the items of Reduction Rule 7 can be applied (note that which item can be applied depends on the structure of the subgraph we are considering): There could be another vertex a such that a, w, u belong to one packed P 3 or another vertex a ′ such that a ′ , w ′ , u belong to one packed P 3 .If no such vertices a and a ′ exist, then Item (1) of Reduction Rule 7 applies.Otherwise, one of the other items applies.To see this more clearly, we relabel the vertices as follows: y ← z 1 , u ← z 2 , w ← u, z ← y, x ← x, w ′ ← v, a ← w, a ′ ← w ′ .Thus, all three subcases above contradict the assumption that no reduction rules can be applied in G ′ .Therefore, the claim holds, that is, there is no induced P 3 xyz in G ′ △S ′ such that both xy and yz are packed edges in G ′ .
Third and finally, we claim that there is no induced P 3 in G ′ △S ′ such that one edge of this P 3 is a non-packed edge in G ′ and the other edge is a packed edge in G ′ .Suppose for a contradiction that there is such a P 3 uvw in G ′ △S ′ such that uv is a non-packed edge and vw is a packed edge in G ′ .Then there is another vertex x such that v, w, x belong to a packed P 3 in G ′ .Since Reduction Rule 3 cannot be applied to (G ′ , H ′ ), uw must be covered by a packed P 3 in G ′ , i.e., there is a vertex y such that u, w, y belong to a packed P 3 in G ′ .We contend that at least one of vy and ux are covered by a packed P 3 .Suppose for contradiction that both vy and ux are non-packed non-edges.Then, if uy is a packed edge, Reduction Rule 3 could be applied.Thus we can assume that uy is a packed non-edge.Since uvw is an induced P 3 in G ′ △S ′ , uw, wx ∈ S. Then vwy is an induced P 3 in G ′ .Assume that vw = e p and wy = e q .Then the assignment α cannot satisfy (x p ∨ x q ), which is a clause of Φ, contradicting that α is a satisfying assignment to Φ. Thus we can assume that there is a vertex z such that v, y, z belong to a packed P 3 in G ′ (the analysis for the case that there is a vertex z ′ such that u, x, z ′ belong to a packed P 3 in G ′ is similar).
We have the following subcases: (1) the subgraph induced by {x, y, z, u, v, w} is isolated from G ′ \ {x, y, z, u, v, w}.Then Reduction Rule 5 can be applied; (2) vz is a non-packed non-edge and uz is a packed edge.Then Reduction Rule 3 can be applied as uv is a proto-cluster of size 2 in G ′ ; (3) the subcase (1) and (2) do not hold.Then we can check that one of the items of Reduction Rule 7 can be applied (note that which item can be applied depends on the structure of the subgraph we are considering).There could be another vertex a such that a, x, u belong to one packed P 3 or another vertex a ′ such that a ′ , z, u belong to one packed P 3 .If no such vertices a and a ′ exist, then Item (1) of Reduction Rule 7 can be applied.Otherwise, one of the other items applies.To see more clearly that Reduction Rule 7 applies, we relabel the vertices as follows: v ← z 1 , u ← z 2 , z ← v, w ← x, x ← u, y ← y, a ← w, a ′ ← w ′ .All three subcases above contradict the assumption that no reduction rules can be applied in G ′ .It follows that there is no induced P 3 in G ′ △S ′ such that one edge of this P 3 is a non-packed edge in G ′ and the other edge of this P 3 is a packed edge in G ′ .
As a result, S ′ is a solution to the instance (G ′ , H ′ ) of CDaMP.By Lemma 23, (G, H) is a YES-instance.This concludes the proof for the lemma.
The above lemma shows that there is a polynomial-time algorithm for the special instances of CDaMP with ℓ = 0 that our reduction rules produces.
We can now prove that, without excess edits, CEMHMP can be solved in polynomial time.
Theorem 3 (Restated).Cluster Editing above Half-Integral Modification-Disjoint P 3 Packing can be solved in polynomial time when ℓ = 0, that is, when no excess edits are allowed.
Proof.By Lemma 24, given an instance (G, H) of CEMHMP, after applying Reduction Rules 1 to 7 exhaustively, we reduce it to an equivalent instance of 2-SAT in polynomial time.Then we can decide the 2-SAT instance by invoking the algorithm for 2-SAT.It is well-known that 2-SAT can be solved in polynomial time.This completes the proof for the theorem.

Conclusions
Unfortunately the lower bound that we have obtained is a major roadblock in designing fixed-parameter algorithms for Cluster Editing parameterized above modification-disjoint P 3 s.On the positive side, Cluster Editing above Half-Integral Modification-Disjoint P 3 Packing (CEaHMP) admits an XP-algorithm with respect to the number of excess edits.We have left open whether CEaHMP is fixed-parameter tractable.Towards this, on the one hand the half-integral P 3 packings provide quite strong structure that can be exploited to design several branching rules.On the other hand, when attacking this question from several angles we discovered large grid-like structures that seemed difficult to overcome in fixed-parameter time, and a corresponding W [1]-hardness result would also not be surprising.
A different future research direction is to deconstruct our hardness reduction by examining which substructures it contains that are seldom in practical data.Forbidding such substructures may destroy the already somewhat fragile hardness construction, perhaps paving the way for fixed-parameter algorithms.
Finally, it would be interesting to see how modification-disjoint P 3 packings look in practice.If it is true that only few vertices are in a large number of packed P 3 s and most of them are in a small constant number, then a strategy that combines settling the clustering around the vertices with large number of P 3 s and applying reduction rules from Section 5 could be efficient.

Figure 1 :
Figure 1: Five proto-clusters A through E and two P 3 s in the underlying graph and in the P 3 -packing that connects A to C via B and C to E via D, respectively.The dashed edge between B and D means that there is a non-packed non-edge between B and D.

Figure 4 :
Figure 4: Merging model of a clause Γ d = (x a ∨ ¬x b ∨ ¬x c).The number i ∈ {0, 1, 2, 3, 4} beside a vertex v denotes that v ∈ L i .The placement of vertices corresponds to the placement of the cliques in Fig.2.For example, the two vertices of level 1 on the top correspond to Q 1 d and Q 4 d .We assume that m a = 3.

2 4 G a ← G△S A 5 foreach 8 G 9 H
foreach ℓ b = 0, 1, . . ., ℓ − ℓ a do3 foreach set S a of ℓ a vertex pairs {u, v} ∈ V (G) 2 such that ∀P ∈ H : |{u, v} ∩ V (P )| ≤ 1 do set H b of ℓ b distinct P 3 s in H do 6 foreach set S b containing for each P ∈ H b at least two vertex pairs in V (P ) do 7 if |S a | + |S b | ≤ |H b | + ℓ then b ← G a △S B ′ ← H \ H b 10 if G b hasa cluster-editing set with |H ′ | edits then /* Using Theorem 3 */ 11 accept and halt 12 reject Observe that |H ⋆ b | ≤ ℓ − ℓ a .Thus, in one of the iterations of Algorithm 1, we have ℓ b = |H ⋆ b | and H b = H ⋆ b .Moreover, in one of the iterations S b = S ⋆ b , where S ⋆ b is the subset of S ⋆ that contains precisely those edits that are contained in the P 3 s in H b .Let S ⋆ 0

Figure 9 :
Figure 9: An example for Lemma 13 and Reduction Rule 6.

Figure 10 :
Figure 10: An example of forming a clique of size 6 in G△S.The black edges are non-packed edges.The vertex pairs of the same color which is not black belong to the same packed P 3 and the dashed edges represent non-edges.The same rule of notation applies to the following pictures.

Figure 11 :Figure 12 :
Figure 11: Some examples of Lemma 18.In Case (1), C 1 is separated from C 2 and C 3 , and C 2 , C 3 , C 4 are merged into a clique of size 5 in G△S.In Case (2), C 4 is separated from C 2 and C 3 , and C 1 , C 2 , C 3 are merged into a clique of size 5 in G△S.In Case (3), C 1 , C 2 are merged into a clique of size 3 and C 3 , C 4 are merged into a clique of size 3 such that these two cliques of size 3 are separated from each other.In Case (4), the instance is a NO-instance.Case (3) and Case (4) are not touched by Lemma 18 but they can be handled by Reduction Rule 5 and 4.
1 and C 4 , respectively.If |C 1 | > 1 or |C 4 | > 1, then there is a non-packed non-edge involving C 1 or C 4 and thus Reduction Rule 3 can be applied.Thus |C 1| = |C 4 | = 1.It follows that V (C 1 ) ∪ V (C 2 ) ∪ V (C 3 ) ∪ V (C 4 )induces a connected component and Reduction Rule 5 can be applied, a contradiction.Assume that u and v belong to one proto-cluster, say C 1 .If |C 1 | > 2, then Reduction Rule 3 can be applied for the same reason as above.Thus |C 1 | = 2 and V (C 1 ) ∪ V (C 2 ) ∪ V (C 3 ) ∪ V (C 4 ) induces a connected component.It follows that Reduction Rule 5 can be applied to this connected component, a contradiction.Therefore, Case (i) does not happen.

5. 1 . 5
Reduction to 2-SAT First, we introduce a new problem called Cluster Deletion above modification-disjoint P 3 packing.The formal definition is as follows: . . ., 6 to H tra .We call the P 3 s of H tra transferring P 3 s.

a is true and it satisfies the clause Γ d .
According to the construction of P, either T δ d and Q s ′ d are in different parts of P and Q s ′ d and Q s d are merged, or T δ d and Q s ′ d are merged and Q s d and Q s ′ d are in different parts of P. In both cases, there is at most one edit of S in P .
Consider the case that x a appears positively in Γ d .Then, when connecting the variable gadget of x a to the clause gadget of Γ d we have introduced into G a P 3 connecting +2 are merged.There is thus at least one odd pair in x a 's variable gadget that is merged and thus α(x a ) = true.The case where x a appears negatively in Γ d is similar: We have introduced into G a P 3 connecting T a