On Parallel k-Center Clustering

We consider the classic $k$-center problem in a parallel setting, on the low-local-space Massively Parallel Computation (MPC) model, with local space per machine of $\mathcal{O}(n^{\delta})$, where $\delta \in (0,1)$ is an arbitrary constant. As a central clustering problem, the $k$-center problem has been studied extensively. Still, until very recently, all parallel MPC algorithms have been requiring $\Omega(k)$ or even $\Omega(k n^{\delta})$ local space per machine. While this setting covers the case of small values of $k$, for a large number of clusters these algorithms require large local memory, making them poorly scalable. The case of large $k$, $k \ge \Omega(n^{\delta})$, has been considered recently for the low-local-space MPC model by Bateni et al. (2021), who gave an $\mathcal{O}(\log \log n)$-round MPC algorithm that produces $k(1+o(1))$ centers whose cost has multiplicative approximation of $\mathcal{O}(\log\log\log n)$. In this paper we extend the algorithm of Bateni et al. and design a low-local-space MPC algorithm that in $\mathcal{O}(\log\log n)$ rounds returns a clustering with $k(1+o(1))$ clusters that is an $\mathcal{O}(\log^*n)$-approximation for $k$-center.


Introduction
Clustering large data is a fundamental primitive extensively studied because of its numerous applications in a variety of areas of computer science and data science.It is a central type of problem in modern data analysis, including the fields of data mining, pattern recognition, machine learning, networking and social networks, and bioinformatics.In a typical clustering problem, the goal is to partition the input data into subsets (called clusters) such that the points assigned to the same cluster are "similar" to one another, and data points assigned to different clusters are "dissimilar".
The most extensively studied clustering problems are k-means, k-median, k-center, various notions of hierarchical clustering, and also variants of these problems with some additional constraints (e.g., fairness or balance).
While originally the clustering problems have been studied in the context of classical sequential computation, most recently a large amount of research has been devoted to the non-sequential computational settings such as distributed and parallel computing, mainly because these are the only settings capable of performing computations in a reasonable time on large inputs, and because data is frequently collected on different sites and clustering needs to be performed in a distributed manner with low communication.
In this paper we consider one of the most fundamental clustering problems, the k-center problem, on the Massively Parallel Computation (MPC) model.MPC is a modern theoretical model of parallel computation, inspired by frameworks such as MapReduce [DG08], Hadoop [Whi15], Dryad [IBY + 07], and Spark [ZCF + 10].Introduced just over a decade ago by Karloff et al. [KSV10] (and later refined, e.g., in [ANOY14, BKS17, C LM + 18, GSZ11]), the model has been the subject of an increasing quantity of fundamental research in recent years, becoming nowadays the standard theoretical parallel model of algorithmic study.
MPC is a parallel system with m machines, each with s words of local memory.(We also consider the global space g, which is the total space used across all machines, g = s • m.) Computation takes place in synchronous rounds: in each round, each machine may perform arbitrary computation on its local memory, and then exchange messages with other machines.Each message is sent to a single machine specified by the machine sending the message.Machines must send and receive at most s words each round.The messages are processed by recipients in the next round.At the end of the computation, machines collectively output the solution.The goal is to design an MPC algorithm that solves a given task in as few rounds as possible.
If the input is of size n, then one wants s to be sublinear in n (for if s ≥ n then a single machine can solve any problem without any communication, in a single round), and the total space across all the machines to be at least n (in order for the input to fit onto the machines) and ideally not much larger.In this paper, we focus on the low-local-space MPC setting, where the local space of each machine is strongly sublinear in the input size, i.e., s = O(n δ ) for some arbitrarily constant δ ∈ (0, 1).This low-local-space regime is especially attractive because of its scalability.At the same time, this setting is particularly challenging in that it requires extensive inter-machine communication to solve clustering problems for the input data scattered over many machines.
In recent years we have seen a number of very efficient, often constant-time, parallel clustering algorithms that have been relying on a combination of a core-set and a "reduce-and-merge" approach.In this setting, one gradually filters the data set by typically reducing its size on every machine to O(k), continuing until all the data can be stored on a single machine, at which point the problem is solved locally.Observe that this approach has an inherent bottleneck that requires that any machine must be able to store Ω(k) data points.Intuitively, this follows from the fact that if a machine sees k data points that are all very far away from each other, it needs to keep track of all k of them, for otherwise it might miss all the information about one of the clusters, which in turn could lead to a large miscalculation of the objective value.Similar arguments could be also used to argue that each machine needs to communicate Ω(k) points to the others (see [CSWZ16] for a formalization of such intuition for a worst-case partition of points for k-median, k-means, and k-center problems, though the worst-case partition assumption means that this bound does not extend directly to MPC).Because of that, most of the earlier clustering MPC algorithms, especially those working in a constant number of rounds (see, e.g., [EIM11, MKC + 15]), require Ω(k) or even Ω(k) • n Ω(1) local space.Therefore in the setting considered in this paper, of MPC with local space per machine of s = O(n δ ), the approach described above cannot be applied when the number of clusters is large, when k = ω(s).This naturally leads to the main challenge in the design of clustering algorithms for MPC with low-local-space: how to efficiently partition the data into k good quality clusters on an MPC with local space s ≪ k.We believe that this setting is quantitatively different (and more challenging) from the setting when k is smaller (or even comparable to) the amount of local space s.
In this paper, we focus on the k-center clustering problem, a standard, widely studied, and widely used formulation of metric clustering.The problem is, given a set of n input points, to find a subset of size k of these points (called centers) such that that maximum distance of a point to its nearest center is minimized.Specifically, in this work, we focus on the case where k ≫ s and hence, when k is quite large relative to n: one can think of these problem instances as "compressing" the input set of n points into k points.Very recently this problem has been addressed by Bateni et al. [BEFM21], who showed one can design an O(log log n)-round MPC algorithm, with local space s = O(n δ ) and global space g = O(n 1+δ ), 1 that returns an O(log log log n)-approximate solution with k + o(k) centers.Our main result is an improved bound in the MPC model: Theorem 1.1 (Main result stated informally).In O(log log n) rounds on an MPC, we can compute an O(log * n) approximate solution to the k-centers problem using k(1 + o(1)) centers.
The MPC has local space s = O(n δ ) and global space g = O(n 1+ρ ) for any constant ρ > 0. The n input points are in R d for some constant d and we assume that k = Ω(log c n) for a suitable constant c.Our algorithm succeeds with high probability.
The algorithmic framework is based on a repeated application of locally sensitive sampling: sampling a set of "hub" points, assigning all other points to a nearby hub, and then adding new hubs to well-approximate the point set.We improve the approximation factor by a careful examination of the progress of clusters in some fixed optimal clustering over the course of the algorithm.Due to the depth of our iteration, clusters no longer satisfy certain properties with high probability, and carefully bounding the size of the clusters that fail to meet certain checks is an important challenge to overcome in our analysis.Additionally, we provide a more flexible guarantee on the global space, providing an accuracy parameter which can be set to reduce global space used at the expense of a larger approximation ratio (or vice versa).2This is possible because of the way we implement locally-sensitive hashing (LSH) in MPC.We believe our implementation of LSH in MPC could potentially see further applications, e.g., for other geometric problems.

Related work
There has been a large amount of work on various variants of the clustering problems (see, e.g., [XW05] for a survey of research until 2005), including some extensive study of the k-center clustering problem.The k-center problem is well known to be NP-hard and simple algorithms are known to achieve a 2-approximation [DF85, Gon85,HS85]; this approximation ratio is tight unless P = NP [HN79].
The study of clustering in the context of parallel computing is extremely well-motivated: as the size of typical data sets continue to increase, it becomes infeasible to store input data on a single machine, let alone iterate over it many times (as greedy sequential algorithms require (see, e.g., [Gon85])).It comes therefore as no surprise that there has been a considerable amount of work on k-center clustering algorithms in MPC.
In particular, several constant-round, constant-approximation algorithms in the MPC setting were given recently for general metric k-center clustering, see, e.g., [ √ nk) for the problem in metric spaces with doubling dimension d.As mentioned earlier, these algorithms are not scalable if k is large relative to n (for example, when k = n 1/3 ), making the case of large k particularly challenging.Furthermore, as argued by Bateni et al. [BEFM21], the case of large k appears naturally in some applications of k clustering, including label propagation used in semi-supervised learning, or same-meaning query clustering for online advertisement or document search [WLF + 09].Unfortunately, we do not know of any O(1)-round, O(1)-approximation MPC algorithm that would use local space s = o(k).
In order to address the case of large k, Bateni et al. [BEFM21] considered a relaxed version of k-center clustering for low dimensional Euclidean spaces with constant dimension.The goal of that work was to design a scalable MPC algorithm for the k-center clustering problem with a sublogarithmic number of rounds of computation, sublinear space per machine, and small global space.Bateni et al. [BEFM21] showed that in O(log log n) rounds on an MPC with s = O(n δ ), one can compute an O(log log log n)-approximate solution to constant-dimension Euclidean k-center with k(1 + o(1)) centers.Their algorithm uses O(n 1+δ • log ∆) global space.Bateni et al. [BEFM21] complemented their analysis by some empirical study to demonstrate that the designed algorithm performs well in practice.
Finally, in the related PRAM model of parallel computation Blelloch and Tangwonsan gave a 2-approximation algorithm for k-center [BT10].However, their algorithm requires Ω(n 2 ) processors and it is therefore difficult to translate the approach to our setting.

Technical contributions
Our main result in Theorem 1.1 is an extension of the approach developed in Bateni et al. [BEFM21] that significantly improves the quality of the approximation guarantee.To present these two results in the right context, we will briefly describe the main differences between these two works at a high level.
The approach of Bateni et al. [BEFM21] starts with the entire point set P as a set of potential centers (solution), and refines it to The final set P τ is reported as the output.It is not difficult to see that if we take an optimal clustering C * for P (i.e., C * is the optimal solution to the k-center problem for P ), then the number of potential centers in any cluster C ∈ C * reduces over rounds (that is, |P i+1 ∩ C| ≤ |P i ∩ C|).Let us define a cluster C ∈ C * to be irreducible from round i, if i is the minimum index such that |C ∩ P i | ≤ 1.Two central properties of the cluster refinement due to Bateni et al. [BEFM21] are that after O(log log n) rounds the size of each cluster in C * reduces to O(log n), and that after that, the total number of the points in the reducible clusters in C * reduces after each round by a constant factor, implying that another O(log log n) rounds suffice to ensure the desired number of centers (at most k due to the irreducible clusters and o(k) due to the reducible clusters) and hence τ = O(log log n).This is then complemented by the analysis of the quality of the refinements which guarantees that each new refinement adds an additive term of O(opt) to the cost of the solution, giving in total a double logarithmic approximation ratio.They further gave a sketch of the analysis to get an approximation ratio of O(log log log n).
In our paper we substantially improve the approximation factor to O(log * n) by extending the framework in the following sense.We show that, after O(log log n) rounds, the size of each cluster in C * reduces to O(log n) such that the refinement in each round adds an additive error of O( opt log log n ) to the cost of the solution.Then, we show that after additional O(log log log n) rounds, the sizes of almost all (but not all) clusters in C * reduce to a O(log log n) such that the refinement in each round adds an additive error of O( opt log log log n ) to the cost of the solution.Next, we show that after another O(log log log log n) rounds, the sizes of almost all clusters in C * reduce to a O(log log log n) such that the refinement in each round adds O( opt log log log log n ) to the cost of the solution, and so on.We continue this until the sizes of almost all clusters in C * reduce to O(log * n).Observe that the total number of rounds taken so far is bounded by O(log log n), and we can argue that the current solution has an approximation ratio of O(log * n).An important challenge in analyzing this approach is that not all clusters satisfy these size guarantees with high probability.Indeed, we cannot obtain a high probability guarantee by cluster refinement relying on random sampling of the already small clusters; we can ensure only that most of the clusters are getting small.Let C * * ⊆ C * be the clusters that satisfy the reduction property as discussed above, that is, such that the number of points in each cluster of C * * is bounded by O(log * n) currently.We argue that the total number of points in the reducible clusters in C * * reduces by a constant factor after each successive round, adding an additive error of O(opt) each time.This implies that another O(log(log * n)) rounds are good enough to ensure that we have the desired number of centers at the end.To bound the total number of centers, we also need to show that the number of centers in clusters in C * \ C * * (that is, the set of clusters which fail to adhere to a size guarantee at some point during the algorithm) is bounded.Note that we cannot track which clusters succeed or fail (doing so would require us to know an optimal clustering), and so we use C * and C * * only for the analysis.In summary, the approach sketched above will reduce the number of clusters to k(1 + o(1)), and will ensure that the total number of rounds spent by our algorithm is O(log log n) and the approximation ratio of our solution is O(log * n).A more detailed overview is in Section 2.
Our approach relies heavily on the use of LSH (locality sensitive hashing), and we provide a flexible implementation of LSH in MPC which one can configure with an appropriate parameter ρ.Reducing the value of ρ decreases the amount of global space used by the algorithm (global space used is O(n 1+ρ )) while increasing the approximation ratio.

Notation and preliminaries
We now introduce the notation used through the paper.
First, we present the setting of the parameters of our MPC.The k-center algorithm in this paper works for any local space s = O(n δ ) for a constant 0 < δ < 1: the setting of δ has only a constant factor impact on the running time.Similarly, the MPC can have any global space g = O(n 1+ρ ) for some constant ρ > 0: ρ can be made arbitrarily small, and its setting has a constant factor impact on the approximation ratio.We sometimes refer to MPC with these choices of s and g simply as "MPC" in the rest of this paper.
Let us recall that certain operations, particularly sorting and prefix sum of n elements, and broadcasting a value of size < s, can be computed deterministically in O(1) rounds (see [GSZ11]).
The input to our problem is a set P of n points in R d , where d is a constant, and an integer parameter k < n.We define d(p, q) as the Euclidean distance between points p and q in R d .We generalize this notation to the distance between a point and a set: d(p, S) := min q∈S d(p, q) is the minimum distance from p to a point in S. We define cost(P, S) := max p∈P d(p, S) as the distance of the point in P which is "furthest away" from any point in S. Without loss of generality, we assume that the input set is re-scaled so that the minimum distance between any two points in P is 1; then we let ∆ to be the maximum distance between any two points in P .
We denote the set {1, . . ., t} by [t] and log (i) n := log . . .log Moreover, cost(P, S * ) is defined as the (optimal) cost of the k-center problem for P .
We assume throughout the paper that k > s.However, our algorithms work as described provided that k = Ω((log n) c ) for a suitable constant c (which is also the main focus of the work).

Our results -detailed bounds
We now present in details the main result of this paper: Theorem 1.4 (Main result).Let P be any set of n points in R d and let opt denote the optimal cost of the k-center clustering problem for P .There exists an MPC algorithm that in O(log log n) rounds determines with high probability a set Theorem 1.4 follows directly from a more general theorem.
Theorem 1.5 (Generalization of Theorem 1.4).Let α be an arbitrary integer, 1 ≤ α ≤ log * n − c 0 for some suitable constant c 0 .Let P be any set of n points in R d and let opt denote the optimal cost of the k-center clustering problem for P .There exists an MPC algorithm that in O(log log n) rounds determines with high probability a set T ⊆ P of centers, such that cost(P, Observe that in Theorem 1.5 we have |T | = k + o(k), since we are assuming k = Ω((log n) c ) for some suitable constant c.
Theorem 1.5 can be seen as a fine-grained version of Theorem 1.4: as α increases the cost of the solution decreases and number of center increases (with the number of rounds always being O(log log n)).Therefore Theorem 1.5 is more amiable in practical scenarios in the following sense: α in Theorem 1.5 can be set to trade off between the quality of the solution and the number of centers in the solution.We would also like to highlight that the result of Bateni et al. [BEFM21] is a special case of Theorem 1.5 when α = 1 and α = 2 to obtain O(log log n) and O(log log log n) approximation, respectively.

Organization of the paper
In Section 2 we give a proof of our main result predicated on the correctness of our main algorithm, and then give an overview of the subroutines which our main algorithm contains.In Section 3 we explain how LSH (locality-sensitive hashing) on MPC can be implemented to assign each point p ∈ P to a hub in H ⊆ P which is within a constant factor of the closest hub to p.In Sections 4 and 5 we prove critical properties of subroutines used in our main algorithm, and then in Section 6 we prove the correctness of our main algorithm.Finally, Section 7 contains some conclusions.

Technical overview
Recall that Theorem 1.4 is our main result and Theorem 1.5 is its parameterized generalization.Our proof of Theorem 1.5 (and hence of Theorem 1.4) relies on the following main technical theorem.
Theorem 2.1 (Main technical theorem proved in this paper).Let α be an arbitrary integer, 1 ≤ α ≤ log * n−c 0 for some suitable constant c 0 .Let r be an arbitrary positive real.Let P be any set of n points in R d and let C r be a clustering of P that has the minimum number of centers among all clusterings of P with cost at most r and |C r | = Ω((log n) c ) for a suitable constant c.There exists an MPC algorithm Ext-k-Center (Algorithm 5) that with probability at least 1 − The MPC uses local space s = O(n δ ) and global space g = O(n 1+ρ • log ∆).
Algorithm Ext-k-Center used in Theorem 2.1 takes two parameters: an accuracy parameter α and a cost parameter r, and produces the output in a form similar to that required in Theorem 1.5, except that the number of clusters is equal to the number of centers in an optimal clustering of P with cost at most r.This is in contrast with a standard clustering setting where the number of clusters is given as input, with no relationship to the cost of the solution.Therefore, if we knew a constant factor approximation to the optimal cost to the k-center problem, then setting it to be r in Theorem 2.1, we would get a desired solution as required in Theorem 1.5.This naturally suggests to run Ext-k-Center multiple times in parallel in order to obtain Theorem 1.5.Note that the success probability of Theorem 2.1 is not high.Hence we first run Ext-k-Center a suitable number of times in parallel to get an algorithm Ext-k-Center ′ whose output and space requirements are same as that of Ext-k-Center, but the success probability is high.Then we run Ext-k-Center ′ for O(log ∆) choices of r (starting with r = ∆ and decreasing a constant factor each time) in parallel to get algorithm Ext-k-Center ′′ (the algorithm of Theorem 1.5).Moreover, Ext-k-Center ′′ reports the output of Ext-k-Center ′ for the minimum r for which we get the number of centers equals to k + o(k).The details are in Appendix B.

Overview of the proof of Theorem 2.1
The idea to prove Theorem 2.1 is based on the framework which we call locally sensitive sampling.We generate a set H ⊆ P of points (called hubs) by sampling each point in P independently with a suitable probability, and assign all other points to one of the hubs based on its locality.Let B h be the bag of the hub h--the set of points associated to a hub h ∈ H.We run a variation of a well known greedy algorithm [Gon85] (for k-center in the sequential setting) for each bag in parallel to find a set of intermediate centers C h for hub h such that cost(B h , C h ) = O(r).We again repeat the procedure by setting h∈H C h as the point set.We continue this process a particular number of times with a particular choice of probability and radius parameters, and report the centers, at that point of time, as the final solution.
This framework was recently used by Bateni et al. [BEFM21] to give an O(log log n)-round MPC algorithm with local space s = O(n δ ) and global space g = O(n 1+δ ), which computes an O(log log log n)approximate solution to k-center with k(1 + o(1)) centers, with high probability.We extend their framework and generalize the analysis to give an O(log * n) approximate solution as stated in Theorem 1.4.Note that Theorem 2.1 takes care of Theorem 1.4 via Theorem 1.5.
The algorithm corresponding to Theorem 2.1 is Ext-k-Center (Algorithm 5 in Section 6).Before describing Ext-k-Center, we describe and contextualize the three subroutines which it uses (Nearest-Hub-Search, Sample-And-Solve and Uniform-Center).The main algorithm of Bateni et al. [BEFM21] uses subroutine Sample-And-Solve and Uniform-k-center.We use analogous subroutines Sample-And-Solve and Uniform-Center in our algorithm corresponding to Sample-And-Solve and Uniformk-center in Bateni et al. [BEFM21], respectively, to achieve the desired result.But there are some differences which we will discuss when we describe Sample-And-Solve and Uniform-Center.Due to our implementation of Nearest-Hub-Search, we are able to give a more flexible bound on global space.We can improve the approximation ratio mainly due to generalizing their Uniform-k-Center to Uniform-Center in our case and using sophisticated analysis in our main algorithm that calls Uniform-Center.
Let us discuss first at a high level what these subroutines achieve in the context of the framework of locally sensitive sampling (discussed at the beginning of this section).Intuitively, the purpose of Sample-And-Solve is to sparsify dense regions of points: it samples nodes with a given probability and iteratively adds centers in order to ensure that the cost of the centers remains low.Uniform-Center repeatedly uses Sample-And-Solve: its main purpose is to guarantee that the number of centers in each cluster of some fixed optimal clustering decreases in a certain way over time.
Nearest-Hub-Search (Q, H) Takes as input a set Q of at most n points and a set of hubs H ⊆ Q.For all points q ∈ Q \ H, it finds a point close(q) ∈ H such that d(q, close(q)) = O(d(q, H)), with probability at least 1 − 1 n Ω(1) .Nearest-Hub-Search can be implemented in MPC with local space s = O(n δ ) and global space g = O(n 1+ρ • log ∆) in O(1) rounds.Nearest-Hub-Search uses locally sensitive hashing [HIM12] and its implementation in MPC.For details on Nearest-Hub-Search, see Section 3.

Sample-And-Solve
Takes a set of Q of at most n points, a sampling parameter p, and a radius parameter r.It produces some set of centers S ⊆ Q such that cost(Q, S) = O(r).3Importantly, this can be implemented in an MPC with local space s = O(n δ ) and global space O(n 1+ρ • log ∆) in O(1) rounds (Lemma 4.1) as, aside from using Nearest-Hub-Search to assign points to hubs, the computation is all done locally.Sample-And-Solve first samples each point in Q (independently) with probability p: let H ⊆ Q be the set of sampled points called hubs.Then Sample-And-Solve calls Nearest-Hub-Search with input point set Q and hub set H. After getting close(q) for each q ∈ Q \ H, Sample-And-Solve collects all points B h ⊆ Q assigned to a hub h ∈ H (including hub h) and selects a set of centers C h from B h greedily using a variation of the sequential algorithm of [Gon85], such that cost(B h , C h ) = O(r).Finally, the algorithm outputs S = h∈H C h .However, there is a difficulty to overcome: note that |B h | may be ω(n δ ).So B h may not fit into the local memory of a machine.We show that this can be handled by distributing the points in B h into multiple machines, duplicating h across all such machines.See Section 4 for more details about Sample-And-Solve.
Algorithm Sample-And-Solve in our paper serves essentially the same purpose as the corresponding algorithm due to Bateni et al. [BEFM21].The approximation guarantee and number of rounds performed are the same in both cases.However, the global space used by our algorithm Sample-And-Solve is more flexible in the following sense: reducing the value of ρ decreases the amount of global space used by the algorithm (global space used is O(n 1+ρ ) • log ∆) while increasing the approximation ratio.
Uniform-Center (V t , r, t) Takes a set V t of at most n points, a radius parameter r, and an additional parameter t ≤ n.It produces a set S of centers, by calling Sample-And-Solve τ = Θ(log log t) times.S i−1 is the input to the i-th call and S i is the output of the i-th call: overall we have S 0 = V t and S τ = S (the output of Uniform-Center).The probability and radius parameters to the calls to Sample-And-Solve are set suitably.From the guarantees we have from Sample-And-Solve, we have the following guarantee for Uniform-Center: (i) it can be implemented in an MPC with local space s = O(n δ ) and global space g = O(n 1+ρ • log ∆) in O(log log t) rounds (Lemma 5.1), and (ii) cost(V t , S) = O(r • τ ) = O(r log log t) (Lemma 5.2).Uniform-Center guarantees a reduction in cluster sizes in an optimal clustering in the following sense.Consider a fixed clustering C t r of V t that has cost at most r.
. This is formally stated in Lemma 5.3: note that this ceases to be high probability when t ∈ o(n).This guarantee on the size reduction plays a crucial role when proving the number of centers reported by Ext-k-Center in Section 6.For more details on Uniform-Center, see Section 5.
Our Uniform-Center is a full generalization of the analagous Uniform-k-center in Bateni et al. [BEFM21].In particular, Uniform-k-Center is a special case of our Uniform-Center when t = n.This generalization plays a crucial role in the correctness of Ext-k-Center when we call Uniform-Center multiple times.Uniform-k-Center is not robust enough to be called from Ext-k-Center multiple times to achieve the desired result.

Ext-k-Center
The algorithm consists of two phases, where Phase 1 consists of α subphases and Phase 2 consists of β = Θ(log (α+1) n) subphases.In the j-th subphase of Phase 1, that is, in Phase 1.j, Ext-k-Center calls Uniform-Center(T j−1 , r j−1 , t j−1 ), where T 0 = P , r 0 = r log log n , t 0 = n, t j = Θ(log (j) n), and r j = Combining the guarantees concerning the round complexity, global space and approximation guarantee of Phase 1 and Phase 2, we get the claimed guarantees on round complexity, global space and approximation guarantees in Theorem 2.1 (see Lemma 6.1 for round and global space guarantee and Lemma 6.2 for the guarantee on approximation factor).Now, we discuss how we bound the number of centers that Ext-k-Center outputs, that is, |T |.Consider an optimal clustering C r of P with cost at most r.A cluster C ∈ C r is said to be active (after Phase 1) if |C ∩ T j | ≤ t j for each j with 1 ≤ j ≤ α.We say C is inactive, otherwise.Using the guarantee given by Uniform-Center concerning the reduction in cluster sizes, we can show that the total number of centers in Note that T α denotes the set of intermediate centers we have after Phase 1. So, for any cluster C ∈ C r that is active after Phase 1, it satisfies , we have the following: .
We define an active cluster We show that the total number of intermediate centers in any large clusters reduces by a constant factor in Phase 2.i, with probability at least 1 . Note that the total number of intermediate centers in all active large clusters, just before ), and we are executing β = Θ(log (α+1) n) many sub-phases in Phase 2. We can show that the total number of centers in the active large clusters, after Phase 2, is at most (Lemma 6.7).Combined with the fact the number of active small clusters can be at most |C r | with the bound on number of inactive clusters in Phase 2, we have the desired bound on |T |.Full details of Ext-k-Center and its analysis are presented in Section 6.

Nearest Hub Search
Recall that our Nearest-Hub-Search algorithm takes a set Q of points and a set H ⊆ Q of hubs.For each q ∈ (Q \ H), we want to find a hub h ∈ H such that the distance between q and h is only a constant-factor more than the distance between q and the closest hub to q in H: informally, h is "almost" the closest hub to q in H.
In this section, we use locally sensitive hashing (LSH) [HIM12] to implement algorithm Nearest-Hub-Search (Q, H) on MPC.Our implementation of locally sensitive hashing is parameterizable: by setting the parameter ρ appropriately, one can reduce the global space while increase the approximation ratio, or vice versa.
First, we begin by recalling the definition of locally sensitive hashing, introduced in [HIM12]: Definition 3.1 (Locally sensitive hashing [HIM12]).Let r ∈ R + , c > 1 and p 1 , p 2 ∈ (0, 1) be such that p 1 > p 2 .A hash family H = {h : R d → U } is said to be a (r, cr, p 1 , p 2 )-LSH family if for all x, y ∈ R d the following hold: Consider the following proposition that talks about the existence of a particular hash family, which will be useful to describe and analyse Nearest-Hub-Search (Q, H) in Algorithm 1.
In Nearest-Hub-Search (Q, H), Q is a set of at most n points and H ⊆ Q is the set of hubs.Our objective is to find a hub for each point which is at most some constant factor further away than the nearest hub, rather than finding the hub which is the closest.We do this by making log ∆ guesses about the distance to the nearest hub, and for each guess trying to find a hub within that distance.
For our log ∆ guesses for r (the distance to the closest hub), we take (independently and uniformly at random) L = Θ(n ρ ) many hash functions from a (r, c ρ r, (1/n) ρ , 1/n)-LSH family and use them to hash all the points, including the hubs. 4Then we gather all points with the same hash value on consecutive machines.We then need to find, for each point, a hub that is close to it.This is difficult if the number of hubs mapped to a given hash value is large: if h hubs and m points are mapped to the same hash value, then we have to perform h • m distance checks, which is potentially prohibitive if h • m > s.To overcome this we show that, if many hubs are mapped to the same hash-value, we are able to discard all but a constant number of them, and retain for each point a hub that is within a constant factor of the distance of the closest hub.This works because of the choice of our hash function and by the definition of LSH.The full algorithm Nearest-Hub-Search (Q, H) is described in Algorithm 1, and its correctness is proved in Lemma 3.3.Lemma 3.3 (Nearest hub search).Let Q be a set of at most n points in R d , H ⊆ Q denote the set of hubs, and c ρ be a suitable constant depending on ρ.There exists an MPC algorithm Nearest-Hub-Search (Q, H) (as described in Algorithm 1) that with high probability, in O(1) rounds, for all q ∈ Q \ H, finds a hub close(q) ∈ H such that d(q, close(q)) < 2c ρ • d(q, H).The MPC uses local space s = O(n δ ) and global space g = O(n 1+ρ • log 2 n • log ∆).  for (ℓ = 1 to L) do 7 Determine f ℓ (q) for each q ∈ Q.

8
Find the distance of each q ∈ Q with at most a constant (say 10) number of hubs h ∈ H such that f l (q) = f l (h).If we get such a h ∈ H such that d(q, h) ≤ c ρ • r, then we set close ijℓ (q) = h and null, otherwise.9 end 10 Set close ij (q) = null if close ijℓ (q) = null for all ℓ ∈ [L].Otherwise, set close ij (q) = close ijℓ (q) for some ℓ ∈ L.
such that j * is minimum among all j for which close ij (q) is not null.
13 end 14 If there exists a q ∈ Q such that close i (q) is null for all i ∈ [log n], then report Fail.

15
Otherwise, set close(q) = close i (q) for some i ∈ I.

end
From Algorithm 1, note that we repeat a procedure (lines 3-12 that find an almost closest hub with probability 2/3) I = Θ(log n) times, and report the output we get from any of the instances.Consider Lemma 3.4, that says that, in Nearest-Hub-Search each point q ∈ Q finds close(q) ∈ H satisfying the required property with high probability.This will immediately imply the correctness of Lemma 3.3.We then discuss the MPC implementation of Nearest-Hub-Search.
Note that close i (q) (which is either null or a point in H such that d(q, close i (q)) = O(d(q, H))) denotes the output of Nearest-Hub-Search for point q ∈ Q \ H and the instance i ∈ I. Lemma 3.4.For a particular q ∈ Q \ H and i ∈ I, close i (q) ∈ H is not null and d(q, close i (q)) ≤ 2c ρ • d(q, H), with probability at least 2/3.
Proof.Consider j * such that d(q, H) ≤ r = 2 j * ≤ 2 • d(q, H), and q h ∈ H be such that d(q, q h ) ≤ 2 • d(q, H).As each f i , i ∈ L, is a function chosen from (r, c ρ r, (1/n) ρ , 1/n)-LSH family, Pr(f i (q) = f i (q h )) ≥ 1 n ρ .As L = Θ(n ρ ), there exists an ℓ * ∈ L such that f ℓ * (q) = f ℓ * (q h ) with probability at least 9/10.But our algorithm may not find this particular q h while considering the hubs h ∈ H such that f ℓ * (q) = f ℓ * (q h ) = f ℓ * (h) (See line 8 of Nearest-Hub-Search). Again, as f ℓ * is chosen from (r, c ρ r, (1/n) ρ , 1/n)-LSH family, the expected number of hubs h ∈ H, with d(q, h) > c ρ r but f ℓ * (q) = f ℓ * (h), is at most 1.By Markov's Inequality, the probability that the number of such hubs is more than 10 is at most 1/10.So, with probability at least 2/3, Nearest-Hub-Search, sets close ij * ℓ * = h for some h ∈ H such that d(q, h) ≤ c ρ r, that is, d(q, h) ≤ 2c ρ • d(q, H).Now considering the way we set close ij (q) from close ijℓ (q)'s (ℓ ∈ L) in line 10 and close i (q) from close ij (q)'s (0 ≤ j ≤ log ∆) in line 12, we have that Nearest-Hub-Search sets close i (q) ∈ H such that d(q, close i (q)) ≤ 2c ρ • d(q, H) with probability at least 2/3.Now, consider the way Nearest-Hub-Search sets the value of close(q) in lines 14-15 from close i (q)'s.By Lemma 3.4, we have close(q) such that it is not null and d(q, close(q)) ≤ 2c ρ • d(q, H) with probability at least 1 − 1 n Ω(1) .This is because I = Θ(log n).Applying the union bound over all points in Q \ H, we see that Lemma 3.3 is implied by Lemma 3.4, except the details of MPC implementation.

MPC implementation of Nearest-Hub-Search
Without loss of generality, we assume that ρ < δ as otherwise we can set ρ = δ.First, notice that, if we can implement lines 4-10 of Nearest-Hub-Search in MPC with local space s = O(n δ ) and global space g = O(n 1+ρ • log n), then we can run these lines in parallel for each possible value of i and j (adding a factor of O(log n • log ∆) to the global space).Then the results can be aggregated in O(1) rounds using sorting and prefix sum [GSZ11].
It suffices then to show that lines 4-10 of Nearest-Hub-Search can be implemented in the desired rounds and space.The hash functions in line 5 can be generated locally by some "leader" machine and broadcast to the other machines in O(1) rounds, since we assume ρ < δ.We again perform lines 6-9 in parallel, giving each f ∈ L its own set of machines to use.
We next consider the implementation of lines 7-8 given a specific f ∈ L. Machines can compute f locally and without communication.Each point q ∈ Q is now represented by a tuple (f (q), hub(q), q), where hub(q) = 1 if q ∈ H and 0, otherwise.Machines then sort these tuples lexicographically and remove (using prefix sum) all but 10 hubs for each value in range(f ).For each v ∈ range(f ), we now have to compute the distance between each point q ∈ Q such that f (q) = v, and each hub h ∈ H such that f (h) = v and h was not removed.It might be the case that some points in Q are not located on the same machine as the hubs which are hashed to the same value (and in general, these points might not all fit on one machine, see discussion in Section 4).However, all that is required in this case is that the hubs can be sent to all machines containing points hashed to the same value: this can be done using prefix sum in a constant number of rounds; since there are at most 10 hubs for each value in range(f ), each machine receives at most 10 hubs.Now machines have the information necessary to locally compute close ijℓ (q) for all points that they contain, and for each point q ∈ Q the tuple (q, close ijℓ (q)) is generated.
Finally, observe that line 10 can be implemented in O(1) rounds using sorting and prefix sum.

Sample and Solve
In this section, we describe Sample-And-Solve (Q, p, r), which is a subroutine in Uniform-Center and Ext-k-Center in Sections 5 and 6, respectively.Sample-And-Solve (Q, p, r) takes a set Q of at most n points, a sampling parameter p, and a radius parameter r, it relies on Nearest-Hub-Search discussed in Section 3, and produces a set of centers S ⊆ Q such that cost(Q, S) = O(r).
Algorithm 2: Greedy (R, h, r) Here cρ is the constant as in Lemma 3.3.Report the set G of centers.

end
Sample-And-Solve (Q, p, r) calls algorithm Greedy (R, h, r) as a subroutine, which produces a set of centers G ⊆ R such that cost(R, G) = O(r).Greedy (R, h, r) is a variation of a classic 2-approximation algorithm for k-center in the sequential setting [Gon85].In Sample-And-Solve (Q, p, r), the idea is to sample each point in Q (independently) with probability p to form a set of hubs H. Then each point q ∈ Q will be assigned to some hub h ∈ H by using Nearest-Hub-Search (as described in Algorithm 1).For h ∈ H, let B h be the set of points assigned to h (including h itself).We run Greedy for the points in B h , to produce a set of centers S h .Finally, h∈H S h is the output reported by Sample-And-Solve.There are other technicalities -|B h | may be much larger than s.In that case, we distribute the points in B h \{h} across a number of machines, but we send h to each machine, ensuring that the total number of points assigned to a machine (including h) is less than s-and then we apply Greedy to the points on each of these machines.
The formal algorithm for Sample-And-Solve is presented in Algorithm 3. The approximation guarantee, round complexity and space complexity of Sample-And-Solve are stated in Lemma 4.1.An additional property of Sample-And-Solve is stated in Lemma 4.4 which will be useful in both Section 5 and Section 6.
Lemma 4.1 (Approximation guarantee, round complexity and space complexity of Sample-And-Solve).
Consider Sample-And-Solve (Q, p, r), as described in Algorithm 3.With probability at least 1−min e −Ω(p•n δ ) , 1 n Ω(1) , it does not report Fail, and moreover: Remark 4.2.We call Sample-And-Solve from Uniform-Center (Algorithm 4) with probability parameter . Therefore, the success probability of Sample-And-Solve in our case is always at least 1 − 1 n Ω(1) .Proof of Lemma 4.1.Note that Sample-And-Solve (Algorithm 3) crucially calls subroutine Greedy (Algorithm 2) multiple times, particularly in line numbers 2, 10 and 14.We start the proof with the following observation (about algorithm Greedy (R, h, r)) that follows from the description of Algorithm 2.
Call Greedy(Q, q, r) for some arbitrary q ∈ Q, and report the set of centers output by it as S. 6 If H = ∅, report Fail.

7
For each point q in Q, assign it to the closest hub in H by calling Nearest-Hub-Search(Q, H, ρ).We call the set of points assigned to a hub h ∈ H the bag corresponding to h, and denote it as B h .Note that B h includes h.

end
Note that both (i) and (ii) of Lemma 4.1 are direct if |Q| ≤ s.As in that case, we executes Greedy(Q, q, r) for some q ∈ Q in one machine locally, and report its output as S. By Observation 4.3, we have cost(Q, S) ≤ 4c ρ r.Now consider the case when |Q| > s.Note that Sample-And-Solve reports Fail only when the set of hubs H is ∅.As every point in Q is added to H with probability p independently, the probability that Sample-And-Solve reports Fail is at most (1 − p) |Q| ≤ e −Ω(p•n δ ) .Now, we argue (i) and (ii) separately.Recall the description of Algorithm 3 from Line 7-17.
(ii) From Lemma 3.3, Nearest-Hub-Search can be implemented in MPC with local space s = O(n δ ) and global space g = O(n 1+ρ • log ∆) in O(1) rounds.After Nearest-Hub-Search is performed, each point knows its assigned hub.Using sorting, we can place all points with the same hubs on consecutive machines in O(1) rounds, and using prefix sum, we can count the number of points assigned to each hub in O(1) rounds.Now, we consider two cases: If |B h | ≤ s (that is: the bag could fit on a single machine) then Greedy on B h can be performed on a single machine without communication, that is, in 0 rounds.5 If |B h | > s (that is: the bag could not fit on a single machine) then we arbitrarily partition the bag and perform Sample-And-Solve on each part.Specifically, we send h to each of the consecutive machines on which B h is stored, and these machines perform Greedy on the subset of the bag that they hold locally.This can be performed in O(1) rounds.
Lemma 4.4 (An additional guarantee of Sample-And-Solve).Let C r be a clustering of Q having cost at most r.Then, with high probability, the following holds for any C ∈ C r : if at least one hub is selected from C, then no further point in C \ H is selected as a center, that is, |S ∩ C| = |H ∩ C|.
Proof.Consider any point q ∈ C \H.As at least one hub is selected from C, d(q, H) ≤ 2r .By the guarantee from Nearest-Hub-Search (see Lemma 3.3), with probability at least 1 − 1 n Ω(1) , q is assigned to some hub h ∈ H such that d(q, h) ≤ 2c ρ d(q, H) ≤ 4c ρ r.So, when we call Greedy (B h , h, r), as d(q, h) ≤ 4c ρ r, q will not be selected as a center.This implies that |S ∩ C| ≤ |H ∩ C|.The claim follows as H ⊆ S.

Uniform Center algorithm
In this section, we describe Uniform-Center (V t , r, t), which iteratively refines a set of centers to a smaller set of centers, by calling Sample-And-Solve on a quadratically-increasing probability schedule.It calls Sample-And-Solve Θ(log log t) times.The i-th call to Sample-And-Solve is Sample-And-Solve (S i−1 , p i−1 , r) (in particular): it produces a set S i ⊆ S i−1 of centers as the output, where S 0 = V t and the probability parameters are set suitably.The formal algorithm is described in Algorithm 4. The round complexity, space complexity and approximation guarantee are stated in Lemmas 5.1 and 5.2 -they follow from the guarantees we have for Sample-And-Solve in Lemma 4.1 and the fact that Uniform-Center (V t , r, t) calls Sample-And-Solve O(log log t) times.Uniform-Center has an additional guarantee as stated in Lemma 5.3 relating to the reduction of cluster sizes, which plays a crucial role in proving the correctness of Ext-k-Center in Section 6.In particular it is useful in bounding the number of centers output by Ext-k-Center.
Algorithm 4: Uniform-Center (V t , r, t) Input: A set of points V t of at most n points, a radius parameter r ∈ R + , and an additional parameter t ≤ n.
8 end Lemma 5.1 (Round complexity and global space of Uniform-Center).Consider Uniform-Center (V t , r, t), as described in Algorithm 4. The number of rounds taken by the algorithm is O(log log t) and the global space used by the algorithm is g = O n 1+ρ • log ∆ .
Proof.As Uniform-Center (V t , r, t) calls Sample-And-Solve O(log log t) times, this follows directly from Lemma 4.1 (ii) .
Lemma 5.2 (Approximation guarantee of Uniform-Center).Consider Uniform-Center (V t , r, t) as described in Algorithm 4. It produces output S such that Cost(V t , S) = O(r • log log t).
Proof.In the i-th iteration of Uniform-Center(V t , r, t), we call Sample-And-Solve (S i−1 , p i−1 , r), and get S i as the centers.By Lemma 4.1 (i), cost(S i−1 , S i ) = O(r), where 1 ≤ i ≤ τ .Hence, Using induction on i (i ∈ N), we will show that |C ∩ S i | ≤ b i−1 for each i with 1 ≤ i ≤ τ , with probability at least 1 − 1 t Ω(1) .This will imply the desired result as S i is the output after the i-th iteration, and The first inequality follows as S 1 ⊆ S 0 ; the second equality follows as S 0 = V t ; the third inequality follows from the given condition that |C ∩ V t | ≤ t; and the fourth one holds by the definition of b 0 .Suppose the statement holds for each i with 1 Consider the ℓ-th iteration of Uniform-Center: it calls algorithm Sample-And-Solve(S ℓ−1 , p ℓ−1 , r), and produces S ℓ as the set of intermediate centers.Let H ℓ ⊆ S ℓ−1 be the set of hubs sampled in the call of Sample-And-Solve(S ℓ−1 , p ℓ−1 , r), where each point in S ℓ−1 (independently) included in H ℓ with probability p ℓ−1 .

Proof. By induction hypothesis, |C ∩ S
By using Chernoff bound (Lemma A.1), the probability, that the number of points of 6 The main algorithm In this section, we present our main algorithm Ext-k-Center.Recall the overall description of Extk-Center in Section 2. Ext-k-Center has two phases.In Phase 1, it calls Uniform-Center α times, and in Phase 2, it calls Sample-And-Solve β times, where α is the input precision parameter and β = Θ(log (α+1) n).The formal algorithm is described in Algorithm 5. We prove the round complexity and space complexity of Ext-k-Center in Lemma 6.1, the approximation guarantee in Lemma 6.2 and the bound on the number of centers in Lemma 6.3.for (j = 1 to α) do 5 Phase 1.j: 6 T j ← Uniform-Center(T j−1 , r j−1 , t j−1 ).7 t j = Θ(log t j−1 • (log log t j−1 ) d+2 ).
For any j with 1 ≤ j ≤ α, note that Ext-k-Center (P, r) calls Uniform-Center(T j−1 , r j−1 , t j−1 ) in Phase 1.j and produces T j as the output.So, by Lemma 5.2, cost( For any i with 1 ≤ i ≤ β, note that Ext-k-Center (P, r) calls Sample-And-Solve(T α+i−1 , 1/2, r) in Phase 2.i and produces T α+i as the output.So, by Lemma 4.1 (i), cost(T α+i−1 , T α+i ) = O(r).Hence, as Lemma 6.3 (Number of centers reported by Ext-k-Center).Consider Ext-k-Center (P, r) as described in Algorithm 5.It produces output T such that, with probability at least 1 − Here, C r is a clustering of P that has the minimum number of centers among all possible clustering of P with cost at most r such that |C r | = Ω((log n) c ), where c is a suitable constant.Now, we introduce the notion of active and inactive clusters in the following definition, which is useful in proving Lemma 6.3.Inactive clusters are clusters which, at some point during Phase 1, fail to reduce in size sufficiently.After the sub-phase during which they fail to reduce in size sufficiently, we assume that they never reduce in size again (since this is the worst case).We are then able to bound the total number of centers in inactive clusters (Lemma 6.6).Active clusters, by contrast, always reduce in size as we expect: the number of centers in active clusters is therefore easy to bound.Definition 6.4.Let C r be an optimal clustering with cost at most r.For each C ∈ C r and j with 1 ≤ j ≤ α, Note that, in Lemma 6.3, we want to bound the number of centers in T = T α+β .We first observe that |T | can be expressed as the sum of three quantities: Proof.Observe that since T α+β ⊆ T α , we obtain,

To bound the second inequality by
which used that C ′ r ⊆ C r .This yields Observation 6.5.
In the following lemmas, we bound C∈Cr\C ′ r |C ∩ T α | and Y β , and (with Observation 6.5) the result of Lemma 6.3 immediately follows from these bounds.Lemmas 6.6 and 6.7 are technical that we will prove later.Lemma 6.6.With probability at least , that is, the number of points in T α that are present in clusters that are inactive after Phase 1 is O Proof of Lemma 6.3 using Lemma 6.6 and Lemma 6.7.From the above two lemmas along with Observation 6.5 and the fact t j = Θ(log (j) n), we have the following bound on |T | with probability at least 1 Hence, we are done with the proof of Lemma 6.3.
Proof of Lemma 6.6 We prove Lemma 6.6 by using the following lemma, which we prove later.
Consider the particular C ∈ B i .By Definition 6.4, |C ∩ T Consider Phase 1.i of Ext-k-Center: we get T i as the current of centers by calling Uniform-Center(T i−1 , r i−1 , t i−1 ) Let us apply Lemma 5.3 with t This is because z = O (log log t i−1 ) d+2 , and we are done with the proof of the claim.
This completes the proof of Lemma 6.6.
Proof of Lemma 6.7 We prove Lemma 6.7 by using the following lemma, which we prove later.
Lemma 6.10.Let ζ ∈ (0, 1) be a suitable constant, let i be such that Applying the union bound over all i's in 1 to β, with probability at least 1 − 1 where ζ ′ is a suitable constant.Recalling the definition of Y i−1 , we have Moreover, 0 ≤ Z C ≤ t α for each C ∈ Γ i−1 .Hence, applying a Hoeffding bound (Lemma A.2), we have α−1 .
We are left with only the proof of Claim 6.11.
Note that Ext-k-Center calls Sample-And-Solve T α+i−1 , 1 2 , r in Phase 2.i.In Sample-And-Solve T α+i−1 , 1 2 , r , let H i ⊆ T α+i−1 be the set of hubs sampled, where each point in T α+i−1 is (independently) included in H i with probability

Conclusions
In this paper we show that even for large values of k, the classic k-center clustering problem in lowdimensional Euclidean space can be efficiently and very well approximated in the parallel setting of lowlocal-space MPC.While some earlier works (see, e.g., [EIM11, MKC + 15, CPP19]) were able to obtain constant-round MPC algorithms, they were relying on a large local space s ≫ k allowing to successfully apply the core-set approach, which permits only limited communication.On the other hand, the low-localspace setting considered in this paper seems to require extensive communication between the machines to achieve any reasonable approximation guarantees.Therefore we believe (without any evidence) that the number of rounds of order O(log log n) may be almost as good as it gets.Also, we concede that our algorithm does not achieve a constant approximation guarantee, but we feel the approximation bound of O(log * n) is almost as good.Finally, our algorithm does not resolve the perfect setting of the k-center clustering in that it allows in the solution slightly more centers, k + o(k) centers.Improving on these three parameters is the main open problem left by our work.
We believe that solely using the technique in this paper, improving the approximation factor and/or number of rounds may not be possible (a detailed explanation is in Appendix C), but the approach may be useful for related problems in MPC or other models.We remark that the extra space in global space complexity is mainly due to the use of LSH; note that, even in the RAM model setting, the use of LSH requires some extra space.
Our work naturally suggests some open directions for future research: • Can we improve the approximation factor beyond O(log * n) and/or the number of rounds beyond O(log log n)?
• In the large k regime, can we design an efficient algorithm that uses (almost) linear global space?
• In the large k regime, can we design an efficient algorithm that reports exactly k centers?
• Are similar results possible for the related k-means and k-medians problems for large k in MPC?
• What MPC results are possible when the points are in high-dimensional Euclidean space or in a general metric space?Our work has a limitation to go beyond constant dimension as we are not aware of any efficient LSH for high dimension.
for O(log * n) rounds with r as the radius that leads to approximation ratio O(log * n).One may think to apply Sample-And-Solve in Phase 2 with a radius parameter less than r in Phase 2. But in that case, guaranteeing the total number of centers to be k + o(k) seems unlikely.

in
the iterated logarithm of n.By convention log (0) n := n.The notations O(f ) and Θ(f ) hide polynomial factors in log f .We now define formally the k-center clustering problem.Definition 1.2.Let P be a set of points in R d .A clustering C of P is a partition of P into nonempty clusters C 1 , . . ., C t .The radius of cluster C i is min x∈Ci max y∈Ci d(x, y), and the cost of the clustering C is the maximum of the radii of the clusters C 1 , . . ., C t .Definition 1.3 (k-center clustering problem).Let k, n, d ∈ N with k ≤ n, and P be a set of n points in R d .The k-center problem for P is to find a set S * ⊆ P such that S * = arg min S⊆P :|S|=k cost(P, S).

r
log log tj .Observe that the guarantees of Uniform-Center ensure the following:(i) Phase 1 can be implemented in an MPC with local space s = O(n δ ) and global space g= O(n 1+ρ • log ∆) in α j=1 log log t j−1 = O(log log n) rounds; (ii) cost(T j−1 , T j ) = O(r j−1 log log t j−1 ) = O(r) for each j ∈ [α].Hence, cost(P, T α ) = O(rα).Now consider Phase 2 of Ext-k-Center.In the i-th subphase of Phase 2, that is Phase 2.i, Ext-k-Center calls Sample-And-Solve(T α+i−1 , 1 2 , r), where T = T α+β is the final output of Ext-k-Center.From the guarantee of Sample-And-Solve, we have (i) Phase 2 can be implemented in an MPC with local space s = O(n δ ) and global space g

Algorithm 1 :
Nearest-Hub-Search (Q, H) Input: A set Q of at most n points and a set of hubs H ⊆ Q. Output: For each point in Q, report close(p) ∈ H such that d(p, close(p)) ≤ 2c ρ • d(p, H), where c ρ is a suitable constant depending only on ρ. 1 begin 2 for (i = 1 to I = Θ(log n)) do 3 for (j=0 to log ∆) do 4 Set r = 2 j 5Take L = Θ(n ρ ) many hash function f 1 , . . ., f L (independently and uniformly at random) from a (r, c ρ r, (1/n) ρ , 1/n)-LSH family. 6

5
Let x ∈ R be the point furthest from G; add x to G. 6 end 7 r), where c ρ is the constant as in Lemma 3.3; (ii) It takes O(1) MPC rounds with local space s = O n δ and global space g = O(n 1+ρ • log ∆).

4 end 5
Sample each point in Q independently with probability p. Points which are sampled form the set of hubs H.
), Pr Z C ≤ 3 4 |C ∩ T α+i−1 | ≥ Pr(Z C = 1) is direct as |C ∩ T α+i−1 | ≥ 2. From Lemma 4.4, Z C = |C ∩ T α+i | = 1 if |H i | = 1.So, Pr(Z C = 1) = |C ∩ T α+i−1 | 2 |C∩Tα+i−1| .Now, we will prove (ii).From Lemma 4.4,Z C = |C ∩ T α+i | = |H i | if |H i | > 0. Observe that Pr(H i > 0) = 1− 1 2 |C∩T α+i−1 | .The expected number of points in H i is |C∩Tα+i−1| 2 .Using Chernoff bound (Lemma A.1), Pr |H i | > 3 4 |C ∩ T α+i−1 | ≤ e −Ω(|C∩Tα+i−1|).Hence, putting things together,Pr Z C ≤ 3 4 |C ∩ T α+i−1 | ≥ Pr(|H i | > 0) • Pr |H i | ≤ r log log t) .Lemma 5.3 (Reduction in cluster sizes).Consider Uniform-Center (V t , r, t) as described in Algorithm 4, and a fixed clustering C t r of V t that has cost r.It produces output S ⊆ V t such that the following holds for anyC ∈ C t r : if |C ∩ V t | ≤ t, then with probability at least 1 − 1 t Ω(1) , we have |C ∩ S| = O log t • (log log t) 2 .Proof.Let b i = (1 + η) i s i log t • (log log t) 2, where i is an non-negative integer and η = Θ r be the set of clusters that are active after Phase 1, that is, Phase 1.α.By the definition of active clusters, for each C ∈ C ′ r , |C ∩ T α | ≤ t α .Note that Ext-k-Center goes over β sub-phases in Phase 2. After Phase 1 and before the start of Phase 2, it has T α as the set of intermediate centers.For 1 ≤ i ≤ β, in Phase 2.i, we call Sample-And-Solve T α+i−1 , 1 2 , r , and get T α+i as the intermediate centers.For 0 ≤ i ≤ β; a cluster C ∈ C ′ r is said to be i-large if |C ∩ T α+i | ≥ 2. Let Γ i ⊆ C ′ r denote the set of i-large clusters, and let Y i denote the total number of points that are in i-large clusters, that is,