Adaptive Massively Parallel Connectivity in Optimal Space

We study the problem of finding connected components in the Adaptive Massively Parallel Computation (AMPC) model. We show that when we require the total space to be linear in the size of the input graph the problem can be solved in O(log*n) rounds in forests (with high probability) and 2O(log*n) expected rounds in general graphs. This improves upon an existing O(log logm/nn) round algorithm. For the case when the desired number of rounds is constant we show that both problems can be solved using Θ(m + n log(k) n) total space in expectation (in each round), where k is an arbitrarily large constant and log(k) is the k-th iterate of the log2 function. This improves upon existing algorithms requiring Ω(m + n log n) total space.


Introduction
The Adaptive Massively Parallel Computation (AMPC) model is a computation model that captures the capabilities and challenges of modern platforms for processing massive data [BDE + 20, BDE + 21].In the AMPC model we have M machines that communicate with each other, in synchronous communication rounds, each equipped with local space of size S.The machines communicate using a shared distributed hash table (DHT) (a distributed key-value store).Within each round there is a read-only DHT containing the input to the round, and a writeonly DHT where the machines write the output of the round.Once the round completes, a new round begins and the output DHT from the previous round becomes the read-only input DHT for the next round.
The model has three challenging restrictions.First, the space available to each machine, S, is strictly sublinear in the input size, N .Second, each machine can only read and write data of size at most S within each round.Third, the total space of all machines should be barely big enough to store all the input, that is The AMPC model is an extension of the widely studied MPC model.The models differ in how the machines are allowed to communicate.Specifically, in the MPC model instead of writing data to DHT, within a round each machine can send messages to other machines, which are delivered in the beginning of the following round.The only restriction is that the total size of all messages sent to all machines in a round is at most S.That is, the difference between the models is that in the MPC model each machine in each round is given a chunk of data to process, i.e., the messages it receives, while in the AMPC model each machine can choose what data to read from the DHT.In particular the machine may use any value read within a round to adaptively decide what to read next (within the same round).
The AMPC model is particularly well suited to studying graph algorithms, and indeed several algorithmic problems have been solved more efficiently in AMPC compared to the MPC model, including connected components [BDE + 21, BDE + 20], maximal matching and independent set [BDE + 20, Beh22, HKSS22], and minimum cut [HKOS22].All of these results are obtained in the regime when the available space per machine is sublinear in the number of vertices of the input graph, that is S = n δ for a constant 0 < δ < 1 for an input graph with n vertices.This regime is the most challenging (and the most desirable) regime for studying graph algorithms in the MPC model.At the same time, some of the fundamental unconditional lower bounds carry over from MPC to AMPC [CMT20,RVW18b].
Recent work on both AMPC and MPC algorithms focused primarily on optimizing the number of rounds, which was motivated by the fact that synchronization in distributed systems is often an expensive step [DG08,SV11].However, if we consider the motivation behind the the AMPC and MPC models, the total space usage should also be highly correlated with empirical performance.This is because in a vast majority of AMPC and MPC algorithms, the the total space usage is determined by the maximum amount of communication that happens in any round.In fact, in the usual case when each machine uses space that is linear in its input and output size, the total space usage and total communication are equal, up to constant factors.As a result, the total space usage and the total communication can often be considered very good measures of how expensive a single round is.
This motivates the following question: what is the best round complexity that we can achieve if we require the space usage to be optimal, that is, linear in the input size?We address this question for the fundamental problem of finding connected components and give improved AMPC algorithms, which use optimal space.

Our Contributions
We give improved algorithms for finding connected components in the AMPC model with sublinear space per machine, i.e., S = n δ for any constant δ ∈ (0, 1).Our first result is an algorithm for finding connected components in forests.Note that we say that an event holds with high probability (w.h.p.) if it holds with probability at least 1−1/n c , for a constant c we can choose.
Theorem 1.1.There exists a randomized O(log * n)-round AMPC algorithm that w.h.p. computes the connected components of an n-vertex forest and uses optimal total space.
More generally, there exists a randomized O(k)-round AMPC algorithm that w.h.p. computes the connected components of an n-vertex forest and uses O(n log (k) n) total space, for any 1 ≤ k ≤ log * n.
This algorithm directly improves upon an existing result using O(1) rounds and O(n log n) total space [BDE + 21].We note that forest connectivity was used as a subroutine in some other AMPC algorithms [HKSS22, BDE + 20].We also give a new algorithm for the case of general graphs.
Theorem 1.2.There exists a randomized AMPC algorithm for computing connected components of an n-vertex and m-edge graph G.The algorithm runs in 2 O(k) rounds, each using O(m + n log (k) n) total space in expectation, for any 1 ≤ k ≤ log * n.
By setting k = log * n in Theorem 1.2 we obtain an algorithm using 2 O(log * n) rounds and optimal space in expectation (in each round).We note that for any constant k, This result improves upon two existing AMPC algorithms.The first one is a O(log log T /n n)round algorithm which uses O(T ) total space [BDE + 21].The second one is a O(1)-round algorithm which uses either O(m log n) or O(m + log 2 n) space [BDE + 20].We note that our algorithm improves the space/round complexity tradeoff compared to both algorithms.

Theoretical Motivation & Related Work
The problem of finding the connected components in an undirected graph is one of the central graph problems with many practical applications [SMS + 20].To understand the role of total space in the algorithm design in AMPC, let us first discuss the state-of-the-art of the connected components problem in MPC.In the MPC model the problem can be solved in O(log D + log log T /n n) rounds using total space of T , when the diameter of the input graph is at most D [ASS + 18, BDE + 19, CC22].Note that this running time becomes O(log D) when the total space is polynomially larger than the input size.Under the widely believed 1-vs-2 cycles conjecture [RVW18a], the Ω(log D) round complexity is also the best one can hope for.In AMPC, this conditional hardness does not hold.The DHT in AMPC alleviates the dependency on D and hence, given enough total space, the runtime collapses to O(1).Therefore in AMPC, we focus solely on the interplay between total space and the dependence on n in the runtime, and significantly improve the existing tradeoff.
The fact that additional total space makes MPC and AMPC algorithmic questions significantly easier seems to be a recurring theme for multiple problems in both models.As an example, the commonly used exponentiation and round compression techniques in MPC inherently require super-linear total space [LW10, CŁM + 18, GU19].These techniques are frequently used and even when combined with sophisticated additional techniques one still requires ω(n) total space, for example, see [CDP21b].Furthermore, somewhat surprisingly, it was recently shown that by increasing the available space significantly, yet still by a polynomial factor, one can essentially derandomize any MPC algorithm [CDP21a].
Algorithms with optimal memory have recently received attention in the MPC and AMPC models.In a very recent (and rather involved) result, it was shown that connected components in forests can be computed in O(log D) rounds using optimal space [BLM + 23].This algorithm meets the conditional Ω(log D) lower bound.It improves over the O(log D+log log T /n n)-round algorithm, which can achieve O(log D) running time only at the cost of using much larger total space T = n 1+Ω(1) .As another example, there are efficient algorithms with optimal space for local constraint satisfactions problems-think of the vertex coloring problem or the maximal independent set problem-in the case of on constant degree forests [BBF + 22].
In the case of the AMPC model, it was shown that the problem of computing maximal matching can be solved in O(log log n) rounds using optimal space, or O(1) rounds using Ω(m+ n 1+Ω(1) ) space [BDE + 20].Very recently, the O(1) runtime was also shown possible in the case of optimal space [Beh22], only thanks to highly involved new ideas.

Technical Challenges
The common building block of our algorithms is the following insight.Assume that given a graph with n vertices and total space vertex T , in a single round we can reduce the problem to a problem on a graph of only n/ exp(T /n) vertices.We observe that by iterating this algorithm we can increase the amount of available space per vertex extremely quickly, even if initially the total available space is only linear in the number of vertices.Our first technical contribution is showing that such a reduction is indeed possible both in the case of forests and general graphs.
The reduction in the number of nodes is achieved by contracting sets of nodes of the graph.The challenging part is symmetry breaking and ensuring that different vertices agree on what contractions should be performed.Observe that on average the amount of communication per each vertex is exponentially smaller than the size of the contracted set it belongs to.
To illustrate some of the challenges involved, consider a path of length l.A natural solution is to sample vertices of the path uniformly and contract each vertex of the path to the nearest sampled vertex (which can be done even in optimal total space).If our goal is to shrink the size of this path by a factor of 2 B we could try sampling each vertex uniformly with probability 1/2 B .However, when l = Θ(2 B ), this would imply that with constant probability no vertex along the path is sampled, and so the expected length of the path after the shrinking step is still Ω(l).As a result, we need to use more involved sampling schemes.In fact, we use two different methods for the two cases we consider.We propose a new shrinking algorithm for forests and improve the space usage of an existing algorithm for general graphs.
Let us now explain how the shrinking procedure is helpful in obtaining low round complexity.Consider the case of forests and assume that the available space per vertex is B. In one iteration, we can shrink the number of vertices by a factor of roughly 2 B , which increases the available space per vertex to 2 B .This additional space budget allows us to run the shrinking procedure with much larger sampling rate and increase the available space per vertex to 2 2 B .By continuing this process in only O(log * n) rounds we reach a state where the available space per each vertex is polynomial, in which case existing algorithms can solve the connected components problem in O(1) rounds and optimal space.

Preliminaries
For any x ∈ R we define log x as follows.For x ≥ 1, log x = log 2 x, and for x < 1, log x = 1.For any integer k ≥ 0, by log (k) we denote the k-th iterate of the log function.That is, log (0) n = n, and for k > 0, log (k) n = log log (k−1) n.Moreover, we define log * n to be the minimum k ≥ 0, such and log (k) n ≤ 1.We also define the "inverse" of log * n, denoted by 2 ↑↑ k.We have 2 ↑↑ 0 = 1, and for any integer k > 0, 2 ↑↑ k = 2 2↑↑(k−1) .
For a graph G we say that a connected components labeling (or CC-labeling for short) of G is a mapping M : V (G) → A (where A is an arbitrary set), such that for any two u, w ∈ V (G) we have M (u) = M (v) if and only if u and v belong to the same connected component of G.
Definition 2.1.We say that an algorithm is connected-component shrinking (or CC-shrinking for short) if it takes as input a graph G and outputs a graph H and a mapping M , such that given a CC-labeling of H and the mapping M , one can compute a CC-labeling of G in O(1) AMPC rounds using optimal space.
A CC-shrinking algorithm essentially reduces the problem of finding connected components in G to solving the problem on H.For any CC-shrinking algorithm, we generally refer to the O(1)-round operation that produces the CC-labeling of G from the CC-labeling of H as Compose(H, G).So, the operation Compose can be seen as the inverse operation of a CCshrinking algorithm.
In our algorithms we use multiple CC-shrinking algorithms.One of them is a standard vertex contraction algorithm, which we denote by Contract(G, C).It takes a graph G and a mapping C : V (G) → A, where A is an arbitrary set, and contracts (merges together) groups of vertices that are assigned the same value by C. Any resulting parallel edges are merged into one and self-loops are removed.This is a commonly used subroutine, which can be implemented in O(1) (A)MPC rounds using optimal space [BDE + 19].
For simplicity, we assume that Contract only returns a graph, and not the mapping mentioned in Definition 2.1, as the mapping that can be used to recover the CC-labeling of its input is actually one of its parameters.Another essential tool used throughout the paper is the following concentration bound.
Then the following holds .

Forest Connectivity
In this section, we present an algorithm for solving forest connectivity in O(log * n) rounds using optimal space.The forest connectivity problem is the undirected graph connectivity problem when the input graph is a forest.More formally, we prove the following theorem.
Theorem 1.1.There exists a randomized O(log * n)-round AMPC algorithm that w.h.p. computes the connected components of an n-vertex forest and uses optimal total space.More generally, there exists a randomized O(k)-round AMPC algorithm that w.h.p. computes the connected components of an n-vertex forest and uses O(n log (k) n) total space, for any 1 ≤ k ≤ log * n.
Algorithm 1 Finding connected components in a forest.
Reduce to cycle-connectivity (Observation 3.1). 3: High level view on the algorithm: See Algorithm 1 for the pseudocode of the algorithm.Throughout this section let ε = δ/10; recall, that the local space of a machine is S = n δ .The algorithm is a sequence of several CC-shrinking algorithms, which conceptually produce a sequence of graphs G ′ 1 , . . ., G ′ r , i.e., G ′ i+1 is the output of a CC-shrinking algorithm running on G ′ i .At the very end (Line 8) it computes the connected components of the graph G ′ r .Since the sequence of graphs is obtained by running a CC-shrinking algorithm, we can now obtain connected components of the input graph by a proper sequence of Compose calls.However, we skip these calls in the pseudocode for simplicity.Note that since compose runs in O(1) AMPC rounds, the running time of each compose call can be charged to the step which produces one of the graphs G ′ i .Algorithm 1 starts with a couple of easy reductions.The first step (Line 2) is to reduce the forest-connectivity problem to the cycle-connectivity problem by transforming each forest into a cycle using an Eulerian tour.As observed by [BDE + 21] this reduction can be done in O(1) AMPC rounds by directly implementing the classic PRAM construction [TV85].
Let us describe the high-level idea behind the construction.Let us first replace each edge with two oppositely directed edges.Consider a vertex v of degree d (in the undirected graph).

Denote its incident edges as
, where ← − e i and − → e i is a pair of edges to and from the same neighbor of v.We then replace v with d vertices v 0 , . . ., v d−1 , where each v i has two incident edges ← − e i and − −−−−−− → e (i+1) mod d .The algorithm applies this vertex splitting to all vertices in parallel, and then makes each edge undirected.This maps a tree containing k > 1 vertices to a cycle of length 2k − 2 [TV85].

Observation 3.1 (Forests to Cycles).
There is a deterministic CC-shrinking algorithm, which takes a forest on n vertices and outputs a collection of vertex-disjoint cycles on at most 2n vertices.It can be implemented in O(1) AMPC rounds using optimal space.
The second step (Line 3) is to ensure that each cycle has length at most O(n ε/2 ) by applying the following lemma.

Lemma 3.2 (Corollary 8.1, [BDE + 21]).
There is a randomized CC-shrinking AMPC algorithm (ShrinkLargeCycles) which can be applied to a set of cycles to reduce the size of each For every vertex v assigned to a machine, sample a rank r(v) from π B .
Step 1.For each vertex v assigned to a machine and both directions of the cycle, traverse until (i) v loops back to itself or (ii) v encounters a vertex u such that r(u) ≥ r(v).
While traversing, vertex v stamps every other vertex it encounters with its rank r(v).In the former case, v contracts * the whole cycle.In the latter case, for all highest rank nodes v and u connected via a segment of strictly lower rank nodes, w.l.o.g.ID(v) > ID(u) and v contracts * the segment between v and u in the cycle.
Step 2. For each vertex v assigned to a machine, traverse its 16B-hop neighborhood.If it contains the whole cycle and v is its highest ID vertex, v contracts * the whole cycle.If it does not contain the whole cycle and v is the highest ID vertex in its 16B-hop neighborhood, v contracts * its 4B-hop neighborhood.* apply Contract from Preliminaries (Section 2) Figure 1: individual cycle to O(n ε/2 ) w.h.p.İt can be implemented in O(1) AMPC rounds using optimal space.Now comes the most challenging part of our algorithm (Lines 5 to 7), which further contracts the cycles such that the number of vertices remaining in these contracted cycles sees a significant drop compared to the overall global memory of Θ(n), while never exceeding the total space bound of O(n).Once we have reduced the number of vertices to n ′ = O(n/ log n), we can finish the remaining instances with the algorithm Standard-Cycle-CC of [BDE + 21], see Lemma 3.3.
Let us now detail on the main part of our algorithm (Line 6: ShrinkSmallCycles, Figure 1) that works in O(log * n) iterations, and after the i-th iteration we guarantee that the number of remaining alive vertices has dropped to n i ≤ n • 1 2↑↑i .For the first iteration, let B be a sufficiently large constant.Now, every vertex picks one out of B ranks according to a truncated geometric distribution.Ignoring the rescaling factor ensuring that we obtain a proper probability distribution, this means that a vertex picks rank i with probability 1/2 i .These ranks are chosen independently for all alive vertices.Then, each vertex probes the cycle around it in a single AMPC-round.The probing of v stops in one direction if v sees a vertex of the same or higher rank.This results in every highest rank vertex knowing neighboring cycle segment(s) of vertices of lower rank.Breaking symmetry by IDs, the highest rank vertices can (collectively) contract all other vertices in the cycle.Hence, the number of vertices reduces to the number of vertices with the highest rank.The highest rank on a cycle can be any of the ranks 1, . . ., B and depends on the randomness of the vertices.
We show that overall the number of queries of this algorithm is O(n ′ • B), if there are n ′ vertices remaining in the graph at the start of the iterations (Lemma 3.7).Additionally, we show that two of these iterations w.h.p. reduce the number of remaining vertices in the graph from n to n/2 B (Lemma 3.12).The main benefit of this reduction is that we obtain (on average) 2 B words of memory per vertex that we can leverage in the next iteration.We do so, by increasing B to 2 B every second iteration.Increasing B exponentially ensures that after O(log * n) iterations, we have reduced the number of vertices to O(n/ log n).
The initial reduction of the maximum cycle length ensures that no vertex ever queries more than n ε ≤ n δ vertices in one iteration (one AMPC-round) of the algorithm.The most challenging part is to bound the query complexity in Lemma 3.7 and the vertex drop in Lemma 3.12.For both of them, we first analyze the expectation of the respective term, which is then turned into a w.h.p. guarantee via an application of Hoeffding's concentration bound.We cannot obtain a w.h.p. guarantee on each individual cycle.But, as we have more than n/ log n vertices left in the graph (recall, that otherwise we can use the algorithm of Lemma 3.3) and each cycle is of length at most n ε , we have ℓ ≥ Ω(n 1−ε / log n) cycles left in the graph, providing the necessary handle for concentration.
In particular, the vertex drop is challenging, as we can only bound the expected vertex drop on a cycle of k vertices by O(k/2 B + 2B).Even, if we would meet this expectation on all cycles, the additive 2B term would be insufficient for obtaining a global drop in the number of vertices by a factor O(1/2 B ). Intuitively, that's the case because the additive factor of 2B has a significant (relative) impact for small cycles.Hence, each iteration is additionally equipped with a deterministic phase that removes min{k, 8B} vertices on a cycle of size k.It is difficult to analyze the (expected) vertex reduction of that second deterministic phase, as it depends on whether the first phase reduced the number of vertices on a cycle to less than 8B vertices or not (which happens according to some difficult to grasp probability distribution).Hence, in our analysis, we analyze both steps (the randomized rank-based one and the deterministic one) combined which shows the desired (expected and w.h.p.) drop in the number of vertices.
We provide the statement from prior work that can solve the cycle-connectivity problem with an additional Θ(log n)-factor of global memory available (Line 8).The remainder of the section is dedicated to proving the most involved part of our algorithm, ShrinkSmallCycles in Figure 1.We refer to one execution of ShrinkSmallCycles as an iteration.First we show that picking ranks in ShrinkSmallCycles actually follows a probability distribution.
Proof.We have π B (i) ≥ 0 for any i, and We continue with a claim that we need in order to obtain the bounds in our probabilistic analysis.
2. Re-ordering the sums2 , which does not change the limit as the series is absolutely converging, and using the geometric sum we obtain

Query complexity
We begin with bounding the expected number of queries per vertex.
Lemma 3.6 (Expected Queries per vertex).In Step 1 of an iteration, the number of queries made by a vertex until it hits a vertex with higher or equal rank is at most 4B in expectation.
Proof.Consider an arbitrary vertex v in a cycle of length k and let X be the random variable describing the number of queries made by v in one direction.The probability that we have to query i vertices before finding a vertex with higher or equal rank is the probability that the i:th vertex has a higher or equal rank than v and that all the i − 1 vertices in between have a strictly lower rank than v. Let p j denote C B /2 j , which is the probability that a vertex draws rank j from π B .If vertex v has rank j < B, the expected number of queries is at most where (1 − p j ) i−1 is an upper bound on the probability that the i − 1 vertices between v and the i:th vertex have rank < j, and 2p j is an upper bound on the probability that the i:th queried vertex has rank between j and B (inclusive).If v has rank B, the expected number of queries is where (1 − p B ) i−1 is the probability that the i − 1 vertices between v and the i:th vertex have rank < B, and p B is the probability that the i:th queried vertex has rank exactly B. By combining the aforementioned cases and applying the total law of expectation we obtain the following upper bound.
At ( * ) we combine the two terms and sum to infinity instead of k, and at ( †) we apply Claim 3.5 with x = 1 − p j .Vertex v queries in both directions of the cycle, so the expected number of queries is 4B.
By linearity of expectation, from Lemma 3.6 we deduce that the global expected query complexity is O(n ′ • B) when we are left with n ′ alive vertices at the beginning of the iteration.We use the Hoeffding's inequality to turn this expected guarantee into a w.h.p. bound on the global number of used queries.The quality of Hoeffding's bound depends on the range of the used random variables.The query complexity of a vertex is in {1, . . ., n ε }, as each cycle is of length at most n ε .As the outcome of queries of vertices on the same cycle are not independent, we need to apply Hoeffding's inequality with one random variable measuring the number of queries on each cycle, which indeed are independent.Intuitively, the large number of cycles (recall that each cycle has length ≤ n ε and we have Ω(n/ log n) vertices remaining) provides the necessary concentration around the expected query complexity.Lemma 3.7 (Global number of queries).Let n ′ be the number of vertices at the beginning of one iteration.Then, w.h.p. the total number of queries used in the iteration by all vertices is at most O(n ′ • B).
Proof.We first focus on Step 1 of an iteration.Let ℓ ≥ n 10ε /n ε = n 9ε be the number of cycles and let k 1 , . . ., k ℓ be the number of vertices in these cycles before the current iteration.Let S 1 , . . ., S ℓ be the random variable (depending on the randomness of the current iteration) that described the total number of queries performed by all vertices in the respective cycle (in the current iteration).Note that S i ∈ [k i , . . ., k 2 i ] as none of the k i vertices of the cycle performs more than k i queries.Due to Lemma 3.6 and linearity of expectation we have E[S i ] ≤ B • k i for all i ∈ [ℓ] and the random variables S i ∈ [k i , . . ., k 2 i ] are independent.Let n ′ = ℓ i=1 k i and define S = ℓ i=1 S i .Let µ = n ′ • 4B and observe that E(S) ≤ µ by linearity of expectation.We apply Hoeffding's inequality (Lemma 2.3) on the ℓ independent random variables and obtain that w.h.p. the total number of queries in the first step is bounded by n ′ • 8B.More detailed, we obtain In the second step of an iteration each vertex queries at most 32B queries per vertex, or O(B • n ′ ) queries in total.Hence, the total query complexity over both steps combined is at most O(B • n ′ ).

Measure of progress (vertex drop per iteration)
We first prove that the second step of an iteration removes at least min{8B, k} vertices from a cycle of length k, as this property will be used in the analysis of the total vertex drop per iteration (Lemma 3.10).
Lemma 3.8.Step 2 of an iteration removes at least min{8B, k} vertices from a cycle of length k.
Proof.If k ≤ 32B, then the cycle is within the 16B-hop neighborhood of every vertex and the highest ID vertex will compress the whole cycle, effectively removing min{8B, k} vertices.If k > 32B, then at very least the highest ID vertex of the cycle will compress its 4B-hop neighborhood (8B vertices).The compressions do not overlap due to the condition that a compressing vertex has to be the highest ID vertex in its 16B-hop neighborhood.
Next, we analyze both steps of an iteration simultaneously.Recall that in the first step, the highest rank nodes in a cycle contract all other nodes such that contractions do not overlap due to the following claim.Claim 3.9.In Step 1 of an iteration, the highest rank nodes in a cycle contract all other nodes such that contractions do not overlap.
Proof.We have to prove that the highest rank nodes know that they are the highest rank nodes (so that they can perform contractions) and that no node gets contracted by two different highest rank nodes.
The former holds due to every node v stamping every node they visit with their rank r(v).This implies that every node in the cycle will be stamped with the highest rank in the cycle.Knowing it, every node knows whether or not they are a highest rank node.The latter holds due to the symmetry breaking via IDs.Lemma 3.10.Consider a cycle with k vertices at the beginning of one iteration.The expected number of vertices of the cycle after the iteration is bounded by 2k/2 B + 1/2 B .
Proof.Consider a cycle with k vertices at the beginning of the iteration.Let τ be the random variable describing the largest rank on the cycle and let 1 ≤ X ≤ k be the random variable describing the number of vertices whose rank equals the largest rank τ .We aim to find an expression of the expectation of X.For that purpose fix some j ∈ {1, . . ., k} and bound the probability that X equals j.We consider the cases of τ < B and τ = B separately.
We begin with bounding Pr(X = j ∧ τ < B).Fix some i < B and some set M of j vertices in the cycle.Set p i = 1/2 i .The probability that τ = i is the maximum rank appearing on the cycle and attained by all of these fixed j vertices is at most 2 • 2 −j p j i (1 − p i ) k−j .Excluding the leading factor 2, the previous expression is exactly the scenario of k players playing the coin tossing game of Claim 3.11 for B rounds, and where j players get the highest value i.The factor 2 −j appears because the i + 1-th coin toss has to be false for those j vertices, the factor p j i appears because the first i coin tosses have to be true for these j vertices, and the factor (1 − p i ) k−j appears because all other vertices should have one of the ranks 1, . . ., i − 1 which happens with probability 1 − p i (independently) for each vertex.Due to the leading coefficient (1/2) j•B in Equation (1) of Claim 3.11, we can upper bound the probability of Pr(X = j ∧ τ < B) using a factor 2 in the expression.
There are k j different sets of size j.Hence, we obtain the following probability The expectation of X is the following.
Recall that after the randomized procedure (which contracts the cycle into the highest rank vertices in Step 1), there is a deterministic procedure (Step 2), which removes at least min{8B, k} vertices from a cycle of length k (Lemma 3.8).Hence, the expected number of remaining vertices after ShrinkSmallCycles is We bound the terms separately.The first term bounds by At ( * ) we used that k j p j i (1 − p i ) k−j ≤ 1, and at ( †) we use Claim 3.5 with x = 8B + 1 and y = k.This holds as the value equals the probability of having j successes appearing in k Bernoulli trials with probability p i .
For the second term first let 1 ≤ Y ≤ k be the random variable describing the vertices that pick rank B. Note that Y = j and X = j ∧ τ = B are the same events.Let S be the set of vertices of the cycle.We obtain the following.
In total, we obtain that the expected number of remaining vertices is 2k/2 B + 1/2 B .Claim 3.11.Consider the geometric distribution π such that π(i) = 1/2 i for i ≥ 1 and π(i) = 0 otherwise.The probability of a player sampling i from distribution π is equivalent to the probability of obtaining value i in the following coin tossing game.A player gets value 1 and starts tossing a fair coin repeatedly.Upon succeeding a flip, she increases her value by 1. Upon failing, the game ends.
The analogy can be extended to a truncated geometric distribution π B (i) such that π B (i) = C B /2 i for i ∈ {1, . . ., B} and π B (i) = 0 otherwise, where C B = 1/(1 − 2 −B ) (Claim 3.4 proves that π B is a distribution).The probability of a player sampling i ∈ {1, . . ., B} from distribution π B is equivalent to the probability of obtaining value i in the following coin tossing game.A player initiates q = 1 and then repeatedly tosses a fair coin.Upon succeeding a flip, she changes q to (q mod B) + 1. Upon failing, the game ends.
Proof.For π, the probability of a player obtaining value i ≥ 1 via the coin tossing game is For π B , the probability of a player obtaining value i ∈ {1, . . ., B} via the coin tossing game is Lemma 3.12.Let B ≤ (ε log n)/100.Consider some iteration of the algorithm and let n ′ be the number of vertices in connected components (cycles) with more than one vertex.
If n ′ ≥ n 10ε , then w.h.p. the number of vertices in connected components with more than one vertex at the end of the iteration is at most 6n ′ /2 B .Proof.Fix one iteration of the algorithm.Let ℓ be the number of remaining connected components (cycles) at the beginning of the iteration.Due to Lemma 3.2 each cycle is of length at most n ε .
Let k 1 , . . ., k ℓ be the number of vertices in these cycles before the current iteration.Let k1 , . . ., kℓ be the independent random variables (depending on the randomness of the current iteration) that describes the number of vertices in the respective cycle after the Step 2 of an iteration.Due to Lemma 3.10, we have ki the number of remaining vertices after Step 2. By linearity of expectation, we obtain E(K) ≤ 2K/2 B + ℓ/2 B ≤ µ, where we used that the number of cycles is upper bounded by the number of vertices, i.e., ℓ ≤ K.
We apply Hoeffding's inequality (Lemma 2.3) on the ℓ independent random variables k that have the range {0, . . ., k − 1} (it is deterministically guaranteed that we always remove at least one vertex from each cycle) and obtain where we used K ≥ n 10ε and B ≤ (ε log n)/100 in the last step.This proves the claim.

Proof of Theorem 1.1
Let us put everything together and prove the following theorem.
Theorem 1.1.There exists a randomized O(log * n)-round AMPC algorithm that w.h.p. computes the connected components of an n-vertex forest and uses optimal total space.More generally, there exists a randomized O(k)-round AMPC algorithm that w.h.p. computes the connected components of an n-vertex forest and uses O(n log (k) n) total space, for any 1 ≤ k ≤ log * n.
Proof.We apply Algorithm 1.As the first step, we perform the reduction from the forest connectivity problem to the cycle connectivity problem as described in Observation 3.1.By Lemma 3.2, after invoking ShrinkLargeCycles, we have a bound of O(n δ ) on the longest remaining cycle.For the rest of the proof, suppose that the total number of remaining vertices n ′ (ignoring cycles with a single node) is at least n δ/10 , i.e., we satisfy the requirement in Lemma 3.12.Otherwise, we can collect the remaining graph onto a single machine and solve the problem locally.
Denote by n i the number of vertices after iteration i and notice that n 0 ≤ 2n due to the reduction Observation 3.1.Furthermore, let B i := 2 ↑↑ i. Due to the design of Algorithm 1, the value of B in iteration 2i is more than min{(ε log n)/100, B i }.As long as the the cut-off at (ε log n)/100 does not happen, due to Lemma 3.12 and a union bound, we have w.h.p that But if B = (ε log n)/100 we obtain by Lemma 3.12 that w.h.p. the number of vertices is at most n • (6/2 B ≤ n 1−ε/100) ≤ n/ log n.Hence, regardless of whether the value of B is capped at (ε log n)/100 or not, after at most O(log * n) iterations the number of vertices is at most O(n/ log n).Then we can apply Standard-Cycle-CC from Lemma 3.3 to finish the algorithm.
Total Space: By the analysis of [BDE + 21], the application of Lemma 3.2 requires O(m) total space.From Lemma 3.7, we get an upper bound on the number of queries to the AMPC hashtable, i.e., the required total space in iteration any i.Let us consider two cases.First, suppose that i = 2j, for some integer j.Then, by the design of our algorithm and by Lemma 3.12, we have that n i ≤ 6/2 B j , where B j corresponds to the current value of B in iteration 2j.Hence, by Lemma 3.7, we have that the required total space is n i • B j = O(m) = T , where m corresponds to the number of edges in the input.
Then, suppose that i = 2j + 1, for some integer j.In this iteration, we do not increase B and hence, its value corresponds to B j .Then, we can use the same calculations as above.
Local space: By the analysis of [BDE + 21], the application of Lemma 3.2 requires O(n δ ) space per machine.Afterwards, all cycles are of length O(n δ ) and hence, no vertex needs to query more than O(n δ ) vertices in its component.Combining with the total space bound, we get the O(n δ ) bound on the required memory per machine3 Compose: Finally, we need to keep track of the mapping we create, as specified in Definition 2.1.In a step of contraction, each vertex can keep a pointer to the vertex remaining after contraction.These pointers are then updated after any successive contractions, requiring O(1) rounds.The pointers do not effect the asymptotic demand in runtime.
Trading time for global memory: We obtain that the algorithm finishes in O(k) rounds if we have an additional factor of Ω(log (k) n) global memory if we initialize B = 2 ↑↑ (c • log * n − k) where c is the constant in the running time of the previous algorithm.Note that the arguments about global memory and the total number of queries per iteration stay intact, but the number of iterations until we have reduced to at most n/ log n vertices, reduces to at most O(k).
Algorithm 2 Algorithm for finding connected components in general graphs.T is the total amount of available space and S is the amount of available space per machine.4 General Graphs In this section we show our algorithm for general graphs, and prove the following.
Theorem 1.2.There exists a randomized AMPC algorithm for computing connected components of an n-vertex and m-edge graph G.The algorithm runs in 2 O(k) rounds, each using O(m + n log (k) n) total space in expectation, for any 1 ≤ k ≤ log * n.
Let us first describe the high-level ideas behind our algorithm.Similar to the case of forests, we follow the general idea of trying to rapidly decrease the number of nodes, or equivalently as we put it in this section, increase the amount of available space per each vertex in the graph .Once the space per vertex is large enough, we can simply use an existing algorithm using large total space.Theorem 4.1 ([BDE + 21]).There exists an algorithm which computes connected components of an undirected graph in O(log log T /n n) AMPC rounds using total space T = Ω(n + m).
Observe that when T /n = n Ω(1) , we have The starting point for increasing the amount of available space per vertex is the following lemma.We use ShrinkGeneral to refer to the algorithm described in the above lemma.The lemma with t = Θ( √ S) was proven in the prior work [BDE + 20] where it was used to obtain a constant-round AMPC algorithm for finding connected components using logarithmic space per vertex.In Section 4.3, we extend the algorithm and the analysis to handle the case when 1 ≤ t = O( √ S).The challenge with applying Lemma 4.2 is that it does not reduce the number of edges in the graph, and at the same time it outputs a graph, whose number of vertices depends on the number of edges in the input graph.Hence, repeated applications of Lemma 4.2 do not provide stronger guarantees than a single application.Moreover, if our goal is to use optimal space, we can only apply it with constant t, which does not imply any reduction in the graph size.
To address the former problem, we reduce the problem of finding connected components in a graph with average degree r to two instances of a connected components problems in graphs with the same number of vertices and average degrees O( √ r).This is achieved by uniformly sampling edges, as shown in the following theorem.

Theorem 4.3 ([KKT95]
).Let G be a graph without multi-edges and let p ∈ (0, 1).Assume that H is a random subgraph of G obtained by sampling each edge of G independently with probability p.Then, the expected number of edges of G which connect distinct connected components of H is at most n/p.
In our algorithm we use the following simple corollary.By alternating Lemma 4.2 and uniform edge sampling we can show that the number of vertices decreases very quickly.That is, roughly speaking, in one step we can increase the amount of available space per vertex of the graph from roughly T /n to 2 √ T /n .As a result, even if we start with only constant space per vertex, we can show that in O(log * n) rounds we get to the case when the available space per vertex is polynomially large and we can apply the algorithm of Theorem 4.1.
The pseudocode of our algorithm is given as Algorithm 2. Let us now describe the subroutines it uses.Recall that Shrink and Compose are defined in Section 2.Moreover, we use ShrinkGeneral to refer to the CC-shrinking algorithm of Lemma 4.2.Lemma 4.5.Algorithm 2 correctly computes a CC-labeling of the input graph G.
Proof.The lemma follows directly from an inductive argument.The base case holds thanks to Theorem 4.1, and the inductive step follows from the fact that both Contract and ShrinkGeneral are CC-shrinking algorithms (see Observation 2.2 and Lemma 4.2).We will separately prove that the algorithm terminates.

Running Time
In this section we prove the following bound on the size of the recursion in Algorithm 2. The proof is independent of the model in which the algorithm is run.We will discuss the aspects related to the AMPC implementation in the next section.Lemma 4.6.Assume that algorithm Algorithm 2 is run on an m-edge graph G with T = Ω(m + n log (k) n) total space for k ≥ 1.Then, the expected number of recursive Connected-Components calls is 2 O(k) .
Algorithm 2 is a recursive procedure, which either returns immediately or makes exactly two recursive calls.Let us now present the high level idea behind the proof of Lemma 4.6.For simplicity, let us for now assume that the bounds of Corollary 4.4 and Lemma 4.2 hold deterministically (rather than in expectation), and ignore constant factors.
We can now prove Lemma 4.6 Proof of Lemma 4.6.For k ≥ 1, let us denote by T (k) the number of recursive calls of the algorithm when the available space per vertex is Ω(log (k) n).Moreover, let T (0) denote the number of recursive calls when the available space per vertex is n Ω(1) .
We have T (0) = 1, and without loss of generality, we can assume that T (k) is nondecreasing.Thanks to Lemma 4.8 for k ≥ 1 we have since with probability at least 4/5 we increase the available space as needed and with the remaining probability (which we pessimistically upper bound by 1/5) we, again pessimistically, assume that we make no progress in the amount of space per vertex.By subtracting 4/5 • T (k) from both sides we get

Algorithm 2 in the AMPC model
We first show that the number of recursive ConnectedComponents calls in Algorithm 2 directly translates to the number of rounds in the AMPC model.We note that while each ConnectedComponents call makes two recursive calls, they cannot be run in parallel, as the result of the first recursive call is needed before the second recursive call can be started.
Lemma 4.9.Algorithm 2 can be implemented in the AMPC model, such that each Connect-edComponents call, excluding its recursive calls, takes O(1) AMPC rounds.
Proof.Once we have at least n Ω(1) space per vertex we use the algorithm of Theorem 4.1, which runs in O(1) AMPC rounds.In the remaining case we first need to sample graph H, which can clearly be done in O(1) rounds.In addition to that we make a constant number of calls to ShrinkGeneral, which uses O(1) AMPC rounds (see Lemma 4.2), as well as Contract and Compose, both of which use O(1) AMPC rounds as well.
Lemma 4.10.Algorithm 2 can be implemented in the AMPC model using O(T ) total space.
Proof.There are two functions called by ConnectedComponents, which may use space which is super-linear in their input sizes.We reason that these calls are still upper bounded by O(T ).

Proof of Lemma 4.2
In this section we show a CC-shrinking algorithm, which is one of the building blocks of our algorithm.
The starting point is Algorithm 1 of [BDE + 20], which provides the guarantees of Lemma 4.2, but uses O(log n) additional space (regardless of the choice of t).Let us now describe it briefly.
It begins by transforming the input graph G to a graph G 3 with maximum degree 3.This is achieved by replacing each vertex v of degree d > 3 with a cycle of length d.Each edge incident to v is then connected to a different vertex of the cycle.
After that, the algorithm picks a uniformly random rank r(v) ∈ [0, 1] for each vertex v and runs BFS from each vertex, which stops as soon as one of the following conditions holds: (1) the search from v explored t vertices, or (2) the connected component of v was fully explored, or (3) a vertex w of rank lower than v was reached.Whenever the search stopped due to case (3), we add a directed super-edge from w to v.
One can show that the super-edges induce a forest of rooted trees, and the probability that a vertex is a root of a tree is O(1/t) (Lemma 3.3 in [BDE + 20]).The last step of the algorithm is to compute a CC-labeling C of the graph defined by super-edges and return Contract(G 3 , C).Since the number of vertices in G 3 is m, it follows directly that the expected number vertices in the resulting graph is O(m/t).
We improve upon the analysis of [BDE + 20] by showing that the space usage of the algorithm outlined above is O(m log t).Proof.Let us analyze the amount of communication used by the BFS starting at some vertex v. Assume that the connected component containing v has size at least t (otherwise the communication can only be lower).Observe that the BFS explores exactly k ≤ t vertices when the kth explored vertex has the smallest rank among all k vertices and vertex v has smallest rank among the first explored k−1 vertices.This happens with probability 1/(k(k−1)).Hence, the expected number of explored vertices is = O(log t).
Since each vertex has constant degree we get that running BFS from all vertices requires O(m log t) expected space.
Claim 4.12.There exists an algorithm which can compute connected components of the forest defined by all super-edges in O(1) rounds and optimal space.
Proof.We observe that the problem of finding connected components in the forest of superedges is not a general forest connectivity problem, but rather a rooted forest connectivity problem.In particular, each tree of the forest has exactly one marked vertex (the root).The forest connectivity algorithm in [BDE + 20] first maps each tree to a cycle (i.e. its Euler-tour), which can be done in O(1) MPC rounds.Then, it shrinks each cycle to ensure it has length O(n ε ).These transformations can be done in O(1) rounds and optimal space, also see Section 3 for more details on these operations.At this point we observe that given that we start with a collection of trees, in which each tree has a single marked vertex, we can also ensure that after the transformations we are left with a collection of cycles of length O(n ε ), in which each cycle has a single marked vertex.This connected components problem can be solved in a single round, as each marked vertex can simply traverse all of the cycle it belongs to and discover its entire connected component.
This concludes the last step in proving Theorem 1.2, which follows directly by combining Lemmas 4.5, 4.6, 4.9 and 4.10.
Lemma 3.3 (Theorem 5, [BDE + 21]).There is a randomized AMPC algorithm ( Standard-Cycle-CC) that solves the connectivity problem on a collection of disjoint cycles on n vertices w.h.p.Ṫhe algorithm runs in O(1) AMPC rounds and uses O(n log n) total space.
n = n Ω(1) then 4:Compute connected components of G using algorithm of Theorem 4.1.

H
:= graph obtained by sampling each edge of G independently with probability 1/d.

Corollary 4. 4 .
Let n = |V (G)| and m = |E(G)|, and let C be a CC-labeling of H.If we set p = m/n, then the expected number of edges in both H and Contract(G, C) is O( √ mn).
First, there is algorithm of Theorem 4.1, which uses O(T ) space.Second, we call ShrinkGeneral.Thanks to Corollary 4.4 the expected number of edges passed in the argument of Shrink is O( √ mn) and the second argument is upper-bounded by 2 √ T /n .By Lemma 4.2 the expected space usage is O(√ mn•log(2 √ T /n )) = O( √ mn• T /n) = O( √ T m) = O(T ).
Claim 4.11 ([BDE + 21]).The total expected space used by the BFS step is O(m log t).