Optimization with pattern-avoiding input

Permutation pattern-avoidance is a central concept of both enumerative and extremal combinatorics. In this paper we study the effect of permutation pattern-avoidance on the complexity of optimization problems. In the context of the dynamic optimality conjecture (Sleator, Tarjan, STOC 1983), Chalermsook, Goswami, Kozma, Mehlhorn, and Saranurak (FOCS 2015) conjectured that the amortized search cost of an optimal binary search tree (BST) is constant whenever the search sequence is pattern-avoiding. The best known bound to date is $2^{\alpha{(n)}(1+o(1))}$ recently obtained by Chalermsook, Pettie, and Yingchareonthawornchai (SODA 2024); here $n$ is the BST size and $\alpha(\cdot)$ the inverse-Ackermann function. In this paper we resolve the conjecture, showing a tight $O(1)$ bound. This indicates a barrier to dynamic optimality: any candidate online BST (e.g., splay trees or greedy trees) must match this optimum, but current analysis techniques only give superconstant bounds. More broadly, we argue that the easiness of pattern-avoiding input is a general phenomenon, not limited to BSTs or even to data structures. To illustrate this, we show that when the input avoids an arbitrary, fixed, a priori unknown pattern, one can efficiently compute a $k$-server solution of $n$ requests from a unit interval, with total cost $n^{O(1/\log k)}$, in contrast to the worst-case $\Theta(n/k)$ bound; and a traveling salesman tour of $n$ points from a unit box, of length $O(\log{n})$, in contrast to the worst-case $\Theta(\sqrt{n})$ bound; similar results hold for the euclidean minimum spanning tree, Steiner tree, and nearest-neighbor graphs. We show both results to be tight. Our techniques build on the Marcus-Tardos proof of the Stanley-Wilf conjecture, and on the recently emerging concept of twin-width.

• a k-server solution of n requests from a unit interval, with total cost n O(1/ log k) , in contrast to the worst-case Θ(n/k) bound, and • a traveling salesman tour of n points from a unit box, of length O(log n), in contrast to the worst-case Θ( √ n) bound; similar results hold for the euclidean minimum spanning tree, Steiner tree, and nearest-neighbor graphs.
We show both results to be tight.Our techniques build on the Marcus-Tardos proof of the Stanley-Wilf conjecture, and on the recently emerging concept of twin-width.

Introduction
Modeling mathematical structure by a finite list of substructures that an infinite family of objects must avoid is a cornerstone of modern combinatorics.For instance, the graph minor theory of Robertson and Seymour characterizes graph properties closed under edge-contractions and edgeand vertex deletions, by a finite list of forbidden minors (e.g., see [RS04,Lov06,Die17]).On the algorithmic side, forbidden minors imply the existence of small separators [AST90,KR10], which restrict the solution-structure of various optimization problems, leading to algorithmic improvements (e.g., see [Epp00, Gro03, Ree03, DH04, CFK + 15]).
In permutations a rich theory of pattern-avoidance has developed (e.g., see the surveys [Bón15, § 12], [Kit11,Bón22]), however, this theory has so far focused on enumeration and extremal questions, and algorithmic consequences have been much less studied.Two notable exceptions are algorithmic questions related to binary search trees (BST), and permutation pattern matching (PPM).In this paper we characterize the complexities of three fundamental optimization problems when the input is pattern-avoiding: BST, k-server on the line, and euclidean TSP.We build on the twin-width decomposition of permutations initially developed by Guillemot and Marx [GM14] for the PPM problem.We start by defining permutation pattern avoidance, the central notion of our paper.

Binary search trees.
The BST problem concerns one of the most fundamental data structuring tasks: serving a sequence of accesses 1 X = (x 1 , . . ., x m ) ∈ [n] m in a binary search tree that stores the set [n].Standard balanced trees achieve O(m log n) total cost, which is, in general, best possible [Knu98].For some structured sequences X, however, the cost can be reduced by adjusting the BST between accesses, via rotations.More precisely, each access starts with a pointer at the root, and we may do rotations at the pointer, or move the pointer along an edge (both being unit cost operations), so that the pointer visits the accessed node at least once.Denote by OPT(X) the lowest cost with which X can be served in such a BST.We call OPT(X) the offline optimum, since it can be computed with a priori knowledge of X (we make this standard model more precise in § 3.1).
The dynamic optimality conjecture [ST85] asks whether the splay tree, a simple and elegant online strategy for BST adjustment, achieves cost O(OPT(X)) for all X (online means that access x i is revealed only when x 1 , . . ., x i−1 have already been completed).The conjecture is more recently understood as asking whether any online BST can achieve this; a prime candidate besides splay is the greedy BST [DHI + 09].The question has inspired four decades of research and remains open. 2  The quantity OPT(X) is poorly understood; much of what we know about it comes from analysing online algorithms. 3For almost all X ∈ [n] m , OPT(X) ∈ Ω(m log n) [BCK03, CGK + 15a], which can be matched by a simple balanced tree.Dynamic optimality is thus mainly about sequences X whose structure allows BSTs to adapt to them via rotations, enabling OPT(X) ∈ o(m log n) or even OPT(X) ∈ Θ(m).
In [CGK + 15b] it was observed that several classical structured sequences can be described in terms of permutation pattern avoidance, and well-studied corollaries of dynamic optimality, e.g., deque-, traversal-, sequential-access [ST85], correspond to the avoidance of concrete small patterns.Thus, as a broad generalization, [CGK + 15b] conjectured that OPT(X) ∈ O(m) whenever X ∈ [n] m is an access sequence4 that avoids an arbitrary constant-sized pattern.
It is a priori not obvious why the avoidance of an arbitrary pattern should make BST access easier.Our current understanding comes from special cases, and from analysing the online greedy BST [DHI + 09], which appears particularly well-behaved in this setting.In [CGK + 15b] it was shown that the total cost of greedy for serving an access sequence X that avoids a k-permutation is 2 α(n) O(k)  • m, which was recently improved to 2 O(k 2 )+(1+o(1))α(n) • m in [CPY23]; α(•) is the slowly growing inverse-Ackermann function.These were to date the best bounds on OPT(X), leaving open whether O(1) cost per access, i.e., independent of the tree size n, is possible.Our first main result answers this question, settling the complexity of BST access with pattern-avoiding input: Theorem 1.1 (informal).A BST access sequence of length m that avoids a permutation pattern of size k can be served with total cost 2 O(k) • m.
The result can also be interpreted as a barrier towards dynamic optimality.Any candidate online BST that is to be competitive with OPT(X) must achieve O(m) cost when X avoids a fixed pattern.
For splay, no improvement over the trivial O(m log n) is known, apart from very special cases, e.g., when X = 1, 2, 3, . . ., so X avoids (2, 1) [Tar85,LT19b].For greedy BST, the above bounds are obtained by using a certain input-revealing property that allows relating the cost to a problem in extremal combinatorics [CGK + 15b].It is known that this type of analysis cannot give a linear bound for greedy [CGK + 15b], and that the best known superlinear bound for the extremal problem is essentially tight [CPY23].
Our contribution is a different, perhaps more general approach: constructing and analysing an offline BST solution through the twin-width decomposition of permutations introduced by Guillemot and Marx [GM14].
Lower bounds and sorting.A landmark result of combinatorics is the proof of the Stanley-Wilf conjecture by Marcus and Tardos [MT04], also building on results by Füredi, Hajnal [FH92], and Klazar [Kla00].The result states that the number of permutations of length n that avoid an (arbitrary) permutation π (the pattern) is at most s n π , i.e., single-exponential in n, with a base s π that depends only on the pattern. 5erving an n-permutation τ as an access sequence in a BST with total cost C gives an encoding of τ with O(C + n) bits [BCK03, CGK + 15a].To see this, observe that each unit cost step in the execution corresponds to one of constantly many choices (e.g., move pointer up/left/right, rotate at the pointer, etc.).We can reconstruct τ by replaying the sequence of operations starting from the initial tree (encoded with another 2n bits).Theorem 1.1 implies a cost C ∈ O π (n), and thus an encoding of τ with O π (n) bits, which in turn implies that there are only 2 Oπ(n) π-avoiding n-permutations, giving an "algorithmization" of the Stanley-Wilf / Marcus-Tardos theorem. 6f splay or greedy attain the O(n) bound for serving an n-permutation τ that avoids a constantsized π (this is an important question left open by our work), then τ can be sorted in time O(n) (by simple insertion sort, with insertions into a BST according to the splay or greedy strategy, or by simple heapsort, using the smooth heap data structure [KS20, HKST21, ST23], whose cost matches that of greedy [KS20]).By a result of Fredman [Fre76], a permutation from a family of size 2 O(n) can be sorted with O(n) comparisons (in contrast to general sorting that requires Ω(n log n) comparisons).Fredman's argument, however, only implies the existence of a decision tree of depth O(n) and does not give an efficient algorithm for finding it.Our Theorem 1.1 shows that the decision tree of pattern-avoiding sorting can be efficiently implemented in the BST model, making linear-time sorting of such inputs a corollary of dynamic optimality.
Permutation pattern matching.The PPM problem asks whether a given n-permutation τ avoids a given k-permutation π.It is a well-studied algorithmic question (e.g., see [BBL98, AR08, GM14, BKM21, GR22]) and a breakthrough result by Guillemot and Marx [GM14] achieved the runtime 2 O(k 2 log k) • n, with an improvement to 2 O(k 2 ) • n by Fox [Fox13], showing that PPM is fixed-parameter tractable in terms of the pattern-size k.
Twin-width.The PPM algorithm of Guillemot and Marx works by dynamic programming over a decomposition of permutations that they introduce.This decomposition has later been generalized to the concept of twin-width in permutations, graphs, as well as in more general structures [BKTW21, BNdM + 21].The Guillemot-Marx result (with the improvement by Fox) implies that if an npermutation τ avoids a k-permutation π, then tww(τ ) ≤ 2 O(k) , where tww(τ ) is the twin-width of τ .We adapt the decomposition to compute a solution to the BST problem (i.e., a way to serve the access sequence τ via rotations and pointer moves) of cost O(n • tww 2 (τ )), implying Theorem 1.1.The discussion leading to Theorem 1.1 suggests the following question: Is the easiness of pattern-avoiding inputs specific to BSTs, or a broader phenomenon?
In this paper we emphatically argue for the latter, illustrating it via two representative problems from different areas: the k-server and the traveling salesman (TSP) problems.k-server.This problem is one of the central problems in online optimization [MMS90,BE98,Kou09].It asks to serve n requests (points in a metric space M) revealed one-by-one, by moving one of the k servers (also located at points of M) to the request.The cost is the total movement of servers (as distance in M); denote the optimum for a sequence X ∈ M n of requests by OPT k (X).
The famous k-server conjecture asks whether an online algorithm can achieve cost at most k • OPT k (X) for all X.In general, this is still open (e.g., see [Kou09]), but when M is the real line, a k-competitive algorithm is known: the simple and elegant double coverage (DC) [CL91].Our focus is this well-studied case of the k-server problem.In contrast to the BST problem, the input now is a sequence X of n reals (for simplicity, we let X ∈ [0, 1] n ).Simple examples show that in the worst case OPT k (X) is linear, i.e., OPT k (X) ∈ Θ(n/k).Pattern-avoidance in the sequence X of requests can be defined in the natural way.Our second main result shows that OPT k (X) is significantly smaller if X is pattern-avoiding: Theorem 1.2 (informal).A sequence of n requests from the interval [0, 1] that avoids a permutation pattern π of size t can be served with k servers at total cost k O(1) n Ot(1/ log k) .
We show that the exponent of n is best possible for almost all π.On the other hand, for certain avoided patterns π, we show a stronger upper bound of roughly n O(1/k) .Note that our bound on the cost also applies to an efficient online algorithm (DC) with a cost-increase of at most a factor k. In our view, this exemplifies the usefulness of competitive analysis of algorithms: nontrivial structural properties shown for the offline optimum are automatically exploited by simple and general online algorithms not necessarily tailored to this structure.
Our first two theorems concern online problems with patterns defined in the temporal order of the accesses/requests.Key to both results is a geometric view where the input is mapped to points in the plane, with the two axes being the time-, and the key-value of the access/request sequence.It is natural to ask whether the approach can be adapted to purely geometric optimization problems that take planar point sets as input.
Traveling salesman problem.TSP is such a geometric problem.Let X be a set of n points in the plane (for simplicity, X ⊂ [0, 1] 2 ) and denote by TSP(X) the (euclidean) length of the shortest tour visiting all X.Assume X is in general position, i.e., no two points share an x-or y-coordinate.Then, X can be seen as a permutation by reading out the y-coordinates of the points in a left-to-right order, and pattern-avoidance for permutations naturally extends to the point set X; see § 2 for precise definitions. 7 simple and well-known observation is that TSP(X) ∈ O( √ n) for all X, and this is best possible; take a uniform √ n × √ n grid spanning [0, 1] 2 .For point sets X that avoid an arbitrary permutation pattern, the situation changes dramatically, as our third main result shows.
Theorem 1.3 (informal).A set of n points in the unit box [0, 1] 2 that avoids a permutation pattern of size k admits a TSP solution of length O k (log n).
Theorem 1.3 implies a similar bound for related problems like euclidean minimum spanning tree and Steiner tree, whose optima are at most a constant factor away from TSP(X).
Discussion.We emphasize that our results (Theorems 1.1, 1.2, 1.3) hold for arbitrary avoided patterns.The three problems we consider are representative of their areas (data structures, online algorithms, geometric optimization).While adaptivity to structure has been extensively studied for all three, to our knowledge, pattern-avoidance was considered before only for BSTs.We study all three in a similar geometric view, but the transfer of techniques is not automatic, and the problems highlight specific technical challenges.(The effect of pattern-avoidance also differs across the problems, with the cost changing from In the BST problem, in both time-and access-key-dimensions, only the ordering matters, with no meaningful concept of distance.In k-server, this is true for the time-dimension, but the requests are distance-sensitive (the specific distances between neighboring requests can affect the cost, the input cannot be freely "stretched").The TSP problem is inherently geometric, with both dimensions distance-sensitive.This distinction is reflected in our solutions: for the BST problem the Guillemot-Marx decomposition can be used more or less directly; for the other two problems we develop a refined, "distance-balanced" decomposition ( § 5), which may be of independent interest.Pattern-avoidance can be seen as a natural restriction of the input-generating process, to which algorithms can adapt for efficiency.But why does pattern-avoidance help in the first place?In online tasks it is intuitive that reducing the entropy of the distribution for the next access/request can reduce cost.However, directly taking advantage of this in an online manner seems very difficult (even when the avoided pattern is known).Another intuition is that pattern-avoidance imposes a global sparsity on the input -this is often studied via properties of random pattern-avoiding permutations [BBF + 22].Our results, e.g., for TSP, can be seen as capturing a certain sparsity at the level of individual pattern-avoiding point sets.
The effect of pattern-avoidance on algorithmic complexity is mysterious and fragile (in § 3.1 we also exhibit a natural optimization problem for which pattern-avoidance does not help).
Structure of the paper.In § 2 we review some important definitions related to permutations and pattern-avoidance.The reader may prefer to skim these and refer to § 2 when needed.In § 3 we define the three main problems we study and overview the results at a high level.In § 5 we give our distance-balanced decomposition used in the solutions.In § 4, § 6, and § 7 we give the detailed proofs for the BST, k-server, and TSP problems respectively.In § 8 we conclude with open questions.

Preliminaries
Permutations, matrices, and point sets.An n-permutation π is an ordering π 1 π 2 . . .π n of [n] = {1, 2, . . ., n} (as customary, we omit commas when there is no ambiguity).Two alternative representations of π (see Figs. 1a, 1b) are (i) as an n × n permutation matrix M π where the π i -th entry of the i-th column 8 is 1 for all i ∈ [n], and all other entries are 0; and (ii) as a set P π ⊂ R 2 of n points such that there is a bijection f : [n] → P π with f (i).x < f (j).x ⇔ i < j and f (i).y < f (j).y ⇔ π i < π j for all i, j ∈ [n], where p.x and p.y denote the x-and y-coordinates of point p.
Observe that n-permutations and n×n permutation matrices are in bijection, while infinitely many different point sets correspond to a single permutation.Note that in a point set that corresponds to a permutation, no two points can have the same x-or y-coordinate.We call this property general position.A point set in general position represents a unique permutation.We similarly say that a sequence of reals is in general position if its values are pairwise distinct.
Pattern-avoidance.Two sequences x 1 x 2 . . .x m ∈ R m and y 1 y 2 . . .y m ∈ R m are order-isomorphic if for each i, j ∈ [m], we have x i < x j if and only if y i < y j .We say a sequence X contains another sequence Y if some subsequence (not necessarily contiguous) of X is order-isomorphic to Y .Otherwise, we say X avoids Y .
We focus on the case when sequence Y is a permutation; for X we consider both permutations and general sequences.The definition easily extends to point sets: If P and Q are point sets in general position, then P contains Q (or a permutation π) if the permutation corresponding to P contains the permutation corresponding to Q (or π).Extending the definition to permutation matrices is immediate.
We also define pattern containment/avoidance for general 0-1 matrices.A 0-1 matrix M contains a 0-1 matrix P if P can be obtained from M by removing rows or columns (i.e., taking a submatrix) and turning zero or more entries from 1 to 0. If both M and P are permutation matrices, we recover pattern containment for permutations.
Operations on permutations.Let π be a k-permutation.The reversal of π is obtained by reading π backwards.The complement of π is obtained by replacing each π i with k + 1 − π i .
A shift by t of a permutation π is the sequence π 1 +t, . . ., π k +t.The inflation of π by permutations ρ 1 , ρ 2 , . . ., ρ k is the unique permutation obtained by replacing each entry π i with a shift of ρ i by t i , for an appropriate integer sequence t 1 , . . ., t k that is order-isomorphic to π (see Fig. 2c).In the point set view, this amounts to replacing each point in π by a small copy of ρ i .Observe that the sum (skew sum) of permutations π and ρ is the inflation of 12 (21) by π and ρ.
Special permutations.We call the unique 1-permutation trivial and all other permutations non-trivial.The identity or increasing permutation of length k is contains precisely one point of P (see Fig. 2d).Observe that a k × k grid permutation contains all k-permutations.The canonical k × ℓ grid permutation corresponds to the point set with coordinates ((i − 1)ℓ + (ℓ − j + 1), (j − 1)k + i) for i ∈ [k], j ∈ [ℓ] (see Fig. 2e).Each "row" of this permutation is increasing, and each "column" is decreasing.
Permutation classes.A permutation class is a set of permutations that is hereditary, i.e., closed under containment.It is easy to see that each permutation class can be characterized by avoidance of a (not necessarily finite) set of permutations.Given a set Π of permutations, we write Av(Π) for the class of permutations avoiding each π ∈ Π.A permutation class Av(π) that avoids a single permutation π is called a principal permutation class.
We describe some simple and important permutation classes.A permutation is k-increasing (k-decreasing) if it avoids the permutation D k+1 (I k+1 ).Observe that a permutation is k-increasing (k-decreasing) if and only if it is formed by interleaving k increasing (decreasing) sequences.
A permutation is separable if it is the trivial 1-permutation, or the sum or skew-sum of two separable permutations (for example, the permutations depicted in Figs.2a to 2c are separable).The set of separable permutations is known to equal Av(3142, 2413).
Yet another characterization of separable permutations is as the set of inflations of 12 or 21 with two separable (or trivial) permutations.This definition can be generalized: A permutation is k-separable if it is the trivial 1-permutation or the inflation of a permutation of length at most k with k-separable permutations.
The classes Av(132), Av(213), Av(231), and Av(312) are important special cases, due to their recursive structure (see § 6.1.1,Fig. 10).Observe that all four are subclasses of the separable permutations.As the problems we study are invariant under reversal and complement, we often consider only one of the four classes, namely Av(231).
Füredi-Hajnal and Stanley-Wilf limits.Given a permutation π, let ex π (n) be the maximum number of ones in an n × n 0-1 matrix that avoids π.Marcus and Tardos [MT04] proved that ex π (n) ∈ O(n) for each fixed π (the Füredi-Hajnal conjecture [FH92]).The Füredi-Hajnal limit c π of π is the constant hidden in the O-notation, i.e., Cibulka [Cib09] observed that, due to superadditivity of ex π (•), the limit c π indeed exists and ex π (n) ≤ c π • n for all n.The extension to non-square matrices is immediate: Lemma 2.1.Every m × n 0-1 matrix with strictly more than c π • max(m, n) entries contains π.
The Stanley-Wilf conjecture states that the number of n-permutations avoiding a fixed pattern is single-exponential in n (i.e., 2 O(n) of the 2 Θ(n log n) n-permutations).Let Av n (π) be the set of n-permutations that avoid the pattern π.The Stanley-Wilf limit s π is defined as As observed by Klazar [Kla00], the Füredi-Hajnal conjecture implies the Stanley-Wilf conjecture, and hence the existence of Stanley-Wilf limits.
Merge sequences and twin-width.A rectangle family R is a set of axis-parallel rectangles.Note that a single point is a special kind of a rectangle and thus, any point set can be interpreted as a rectangle family.Let S, T be two different rectangles of a rectangle family R.The merging of rectangles S, T is an operation that replaces S, T by the smallest axis-parallel rectangle enclosing S ∪ T , i.e., their bounding box.
A merge sequence of a point set P ⊂ R 2 of size n is a sequence R 1 , R 2 , . . ., R n of rectangle families where R 1 = P is the original point set, R n contains a single rectangle, and each R i+1 is obtained by merging two rectangles of R i .Notice that each R i consists of exactly n − i + 1 rectangles.
Two rectangles S and T are called homogeneous if their projections onto both x-and y-axis are disjoint.Given a rectangle family R i , we consider an auxiliary graph R i , called red graph, where the rectangles of R i are vertices, and there is a (red) edge between every pair of distinct non-homogeneous rectangles S, T .Figure 3 shows an example merge sequence and its red graph.
Let d be a positive integer.We say that a merge sequence R 1 , . . ., R n is d-wide if max i∈[n] ∆(R i ) < d, i.e., the maximum degree over all red graphs associated to this sequence is strictly less than d.The twin-width tww(P ) of a point set P is then the minimum integer d such that there exists a d-wide merge sequence of P .9For a permutation π, we let tww(π) = tww(P π ) for any point set P π corresponding to π.
We make some simple observations about the behavior of twin-width.
• More generally, if σ is the inflation of π by ρ 1 , ρ 2 , . . ., ρ k , then In particular, a k-separable permutation has twin-width at most k.
Proof.For the lower bound, observe that after the very first merge, the obtained rectangle is already non-homogeneous to at least min(k, ℓ) − 1 points.
For the upper bound, suppose k ≤ ℓ.Merge the lowest two points in a column i into a rectangle R i , from left to right.Then merge each third lowest point in column i into R i , again from left to right, and so on.The rectangles stay homogeneous with all remaining points, so the degree of the red graph is always at most k − 1.After each column is a single rectangle, merge these k rectangles in an arbitrary order.
The importance of twin-width for our work is derived from the following theorem.

Theorem 2.4 (Guillemot and Marx [GM14]).
A point set that avoids a pattern π has twin-width at most O(c π ).[GM14] originally stated the slightly weaker bound 2 O(k log k) for a kpermutation pattern π, but the bound of O(c π ) is implicit in their proof.The upper bound of Fox [Fox13] on c π then implies an upper bound of 2 O(k) on the twin-width.In § 5 we describe a decomposition with additional properties that also implies this claim.

Guillemot and Marx
The merge sequence implied by Theorem 2.4 can be found in time O(n), if the points are accessible in sorted order by x-and y-coordinates.If this is not the case, and we view the coordinates as arbitrary comparable elements, an initial sorting step of O(n log n) time is necessary. 11he bound on twin-width in terms of c π cannot be asymptotically improved: (Füredi and Hajnal [FH92]); on the other hand, the canonical (k − 1) × n grid permutation is π-avoiding and has twin-width k − 1.Whether for some pattern π, the twin-width of all π-avoiding permutations can be significantly smaller than c π is an intriguing open question.
Figure 4 shows an overview of several relevant permutation classes and their inclusions.

Overview of results
In this section we present our main results and sketch some of the proofs, deferring the details to the later sections.

BST and arborally satisfied superset
Let T be a binary search tree (BST) with nodes identified with the elements of [n] with the usual search tree property [Knu98].Serving an access x ∈ [n] in T means visiting a subtree T ′ of T such that T ′ contains both the root of T and the node x.Then, T ′ is replaced with another BST T ′′ on the same node set as T ′ (all subtrees of T hanging off T ′ are linked in the unique location to T ′′ given by the search tree property).We can think of replacing T ′ by T ′′ as re-arranging the tree, in preparation for future accesses.The cost of the access is , the number of nodes touched.Note that T ′ necessarily contains the search path of x in T .This common formulation of the BST model [DHI + 09, LT19a] allows us to ignore individual pointer moves and rotations.It is well-known that any BST T ′ can be transformed into any T ′′ (on the same nodes) with O(|T ′ |) rotations and pointer moves [STT86]), therefore the model is equivalent (up to a small constant factor) with most other reasonable models (see Fig. 5a for illustration).
Serving a sequence X = (x 1 , . . ., x m ) ∈ [n] m , starting from an initial tree T 0 , means serving access x i in tree T i−1 , and replacing subtree Arborally satisfied superset.Demaine, Harmon, Iacono, Kane, and Pǎtraşcu [DHI + 09] give an elegant geometric characterization of the BST problem.A point set P is called (arborally) satisfied, if for any two points p, q ∈ P one of the following holds: (a) p.x = q.x or p.y = q.y, or (b) the axis-parallel rectangle R with corners p, q contains some point in P \ {p, q} (possibly on the boundary of R).The (arborally) satisfied superset problem asks to find, given a point set P , the smallest point set P ′ ⊇ P that is arborally satisfied (Fig. 5b).
The following theorem shows the equivalence between this geometric problem and the BST model defined earlier.
Theorem 3.1 ([DHI + 09]).Let X ∈ [n] n be an n-permutation access sequence and let P X be a corresponding point set.Then X can be served at cost C if and only if P X admits a satisfied superset of size C.
More strongly, the satisfied superset can be assumed to consist of points aligned vertically and horizontally with P X and points on the i-th vertical line in the solution exactly correspond to the nodes T ′ i touched during the i-th access.12In the following we focus on the satisfied superset problem, and denote by OPT(P ) the size of the smallest satisfied superset of P .We now state our main result.
Theorem 3.2.Let P be a set of n points in general position, of twin-width d.Then OPT(P ) ∈ O(nd 2 ).By Theorems 2.4 and 3.2 it follows that if P avoids a permutation π, then OPT(P Sharpening the bounds in terms of c π or |π| for all π is a challenging open question.Previously, linear bounds on OPT(P ) for pattern-avoiding P were known only in special cases, e.g., when P is k-increasing, k-decreasing, or k-separable [CGK + 15b, GG19, CGK + 18, CGJ + 23].Curiously, these cases are all low twin-width, tww(P ) ∈ O(k), but earlier studies did not use this fact.
Proof sketch.In the following we sketch the proof of Theorem 3.2, deferring the full details to § 4. At a high level, we use a merge sequence R 1 , R 2 , . . ., R n of P of width d to construct a satisfied superset P ′ ⊇ P (initially P ′ = P ).We maintain the invariant (I 1 ) that P ′ ∩ Q is satisfied for each Q ∈ R, for the current set of rectangles R.This is trivially true initially for R = R 1 , and as R n consists of a single rectangle containing the entire point set P , it is implied by (I 1 ) that we obtain a valid solution P ′ .
We also maintain two technical invariants: (I 2 ) states that (Q ∪ Q ′ ) ∩ P ′ is satisfied for each non-homogeneous pair Q, Q ′ ∈ R, and (I 3 ) states that for each Q ∈ R, the point set P ′ contains the intersections of Q with the grid formed by extending every side of every rectangle in R. Notice a subtletly here: as the rectangles grow, invariant (I 3 ) applies to larger areas, but at the same time, R shrinks, so the invariant becomes less stringent.
The key step is merging two rectangles Q 1 , Q 2 ∈ R into a new rectangle Q while maintaining the invariants (I 1 ), (I 2 ), (I 3 ).We can achieve this by adding to P ′ all points inside Q of the grid induced by sides of rectangles in R (including Q 1 , Q 2 ); see Fig. 8 on page 19.The validity of the invariants can be verified through a case-analysis.
Observe that Q can "see" (is non-homogeneous with) at most d rectangles (including itself), so at most 2d side-extensions intersect Q.Therefore, we add at most O(d 2 ) points in every step, for a total of O(nd 2 ) during the entire merge sequence.We defer the detailed proof to § 4.

Sparse Manhattan network
We briefly mention a connection between the BST/satisfied superset problem, and a well-studied network design task.
If P is a set of points, then we call x, y ∈ P connected and write x P ∼ y if there is a monotone path T connecting x and y that consists of axis-parallel line segments, and each corner of T is contained in P .For two sets X, Y ⊆ P we write X P ∼ Y if x P ∼ y for all x ∈ X, y ∈ Y .The following equivalence was observed by Harmon.

Observation 3.3 ([Har06]
).A set of points P is arborally satisfied if and only if P P ∼ P .
In the satisfied superset problem we ask, given a point set X, for a point set that is, to connect all pairs of the input point set, but not necessarily the pairs involving newly added points.It is natural to ask for the smallest set Y with this property.We call this the sparse Manhattan network (sparseMN) problem, and denote its optimum as SP(X) (Fig. 6a).
Let P be a set of n points in the plane.It follows from the definition that SP(P ) ≤ OPT(P ); in fact, SP(P ) is equivalent with the well-known independent rectangle lower bound [Har06, DHI + 09] (this is the best lower bound known for OPT(P ), subsuming earlier bounds by Wilber [Wil89]).A central conjecture is that SP(P ) ∈ Θ(OPT(P )) [DHI + 09].Its importance for dynamic optimality is given by the fact that for SP(P ) a polynomial-time 2-approximation is known, whereas for OPT(P ) the best known approximation ratio is O(log log n).
The quantity SP(P ) has been studied in computational geometry [GKKS07,KS16]; it is known that in the worst case, SP(P ) ∈ Θ(n log n).From Theorem 3.2 it immediately follows that SP(P ) ∈ O k (n) whenever P avoids a k-permutation.Such a bound was known before [CGK + 15b], interestingly, by a different algorithm (geometric sweepline) than the one implicit in our current work.

Small Manhattan network
A closely related network design task is the small Manhattan network (smallMN) problem.Whereas sparseMN aims to minimize the number of corner points, in smallMN the goal is to minimize the total length of the network.More precisely, given a set P of n points in the plane, we look for a collection N of axis-parallel line segments so that each pair of points in P can be connected by a monotone path contained in the union of the segments in N .The task is to minimize the total length of the segments in N .The problem is well studied [GLN01,CGS11] and known to admit a 2-approximation [CNV08,GSZ11].
Surprisingly, despite the similarity of smallMN to the other problems studied in the paper, in this case, pattern-avoidance does not lead to an asymptotic improvement.
Given a set P of n points in [0, 1] 2 , let SM(P ) denote the length of the smallest MN for P (see Fig. 6b).It is easy to see that SM(P ) ∈ O(n): Extend each point p ∈ P by horizontal and vertical lines until the boundaries of the box [0, 1] 2 .The resulting grid contains monotone paths between any pair p, q ∈ P , and is of total length 2n.The next observation indicates that smallMN is unaffected by non-trivial single-pattern avoidance.
Observation 3.4.For each n ∈ N + , there is a 321-avoiding set P ⊂ [0, 1] 2 of 2n points in general position where any Manhattan network of P has length at least n.7a for a sketch.Clearly, P is in general position and 321-avoiding.
Each pair p i , q i has to be connected by a path of length 1 2n + 2n 2 −n 2n 2 = 1.These paths cannot share common line segments (or even intersect), hence SM(P ) ≥ n.
Perturbing differently, one can make the instance avoid 231 or its symmetries (Fig. 7b).

k-server on the line
We consider the offline k-server problem on the line, in the following called simply k-server.The input is a sequence of n reals (requests) in [0, 1], which we need to serve with k servers.Each of the servers has a position in [0, 1] throughout time, initially 0. We serve the n requests one-by-one by moving the servers, requiring that at least one server visits the requested value.More precisely, let X = (x 1 , . . ., x n ) ∈ [0, 1] n be the sequence of requests and let p j i denote the position of server j after serving the request x i , for j ∈ [k] and i ∈ [n].We fix p j 0 = 0 and require p j i ∈ [0, 1] for all i, j.The values p j i represent a valid solution for X if for all i ∈ [n] there is some j ∈ [k] such that p j i = x i .The cost of the solution is the total movement of all servers, i.e., Let OPT k (X) denote the minimum cost of a valid solution for X using k servers; this is called the offline optimum, as it can be computed with full knowledge of X.It is well known that the offline optimum solution can be assumed to move at most one server for each request [BE98].(Other moves may be postponed to future requests.)Nonetheless, it will be helpful for us to also make use of solutions that move multiple servers at once.In general, OPT k (X) ≤ n k .To see this, assign an interval of size 1 k to each server, with each server responsible for requests in its interval, so each individual move costs at most 1 k . 13This upper bound is tight up to a constant factor: Consider m = n 2k repetitions of the sequence ( 1 2k , 2 2k , . . ., 1).In every repetition, there must be at least k movements of at least 1 2k , for a total cost of n 4k .
Observation 3.5.OPT k (X) ≤ n k for every input X ∈ [0, 1] n , and for every n there exists an input A slight modification of the lower bound sequence involves m = n k+1 repetitions of (0, 1 k , 2 k , . . ., 1) with a total cost of at least n k(k+1) .Observe that this sequence can be made order-isomorphic to a (k + 1)-increasing (or (k + 1)-decreasing) permutation by slight perturbation.This shows that avoiding the pattern D k+2 (or I k+2 ) does not help when we have k servers.On the other hand, only one additional server will bring down the cost to virtually zero.In general, we can serve a k-increasing input X with k servers by partitioning X into k increasing subsequences and assigning a server to each; as each server moves only in one direction, their individual cost will be at most 1. (This is of course more generally true for all sequences obtained by interleaving k monotone sequences.)Observation 3.6.For each k-increasing input X ∈ [0, 1] n , we have OPT k (X) ≤ k.On the other hand, for each n, there is a (k + 1)-increasing input X with OPT k (X) ∈ Ω( n k 2 ).
The upper bound O(k) is tight for all k: Scale down a hard k-increasing sequence to [ 1 2 , 1]; this forces all k servers to move at least distance 1 2 before serving their first request.In the following, we consider inputs that avoid one or more patterns.In the spirit of Observation 3.6, we assume that k is "large enough", i.e., at least some constant that we may choose depending on the avoided pattern(s).On the other hand, we also require k to be not too large (as a function of n), as otherwise the overhead of our techniques dominates the cost.
Our main result, stated in the introduction as Theorem 1.2, is (roughly avoids a fixed pattern π, where ε depends on π and k.We further provide an almost complete characterization of OPT k when the input is in a principal permutation class (i.e., with a single avoided pattern), which we now summarize.
Overview for principal classes.For a fixed pattern π, the worst-case cost of serving a π-avoiding sequence of n requests with k servers is: (Theorems 3.7 and 3.11) log n. (Theorems 3.8 and 3.13) (Theorems 3.7 and 3.13) The constants in the asymptotic notation all depend on π.Hence, as mentioned above, most of our results are non-trivial when k is at least some constant depending on π.
Observe that the starting position of the servers (determined as zero above) changes the overall cost by at most k, and hence is irrelevant for our asymptotic results.Further observe that the cost of the k-server problem is (essentially) invariant under reversal and complement of the input sequence.Here, complement means replacing each value x with 1 − x, which clearly changes the overall cost by at most k (because of the starting positions).A reversed input sequence can be handled by a reversed solution, which again does not change the cost up to starting positions.
The proofs for Theorems 3.8 and 3.9 do not rely on distance-balanced merge sequences.In fact, they are not based on twin-width at all, but use the decomposition inherent to 231-avoiding permutations (see § 6).The idea is to take the top-level decomposition, and serve each part of the decomposition with a number of servers depending (mainly) on its size.A similar approach can be used when X is t-separable, sharpening the bound of Theorem 3.7 in this special case.
Theorem 3.10.Let t ≥ 2.Then, for every t-separable input X ∈ [0, 1] n , we have Lower bounds.We present two constructions based on a similar general idea.
Start with an appropriate input that can be efficiently served by our k servers (or even less), but restricts movement somehow.For example, in an efficient solution, some of the servers must remain in certain intervals most of the time.Then, repeatedly inflate some values in the input with a smaller copy of the original input, further restricting efficient solutions.At some point, the servers find themselves in a lose-lose situation: Either they stick with the restrictions imposed by the "global" permutation, lacking enough servers in place for some of the small copies; or they make sure there are enough servers for most of the small copies, causing too much movement globally.
Theorem 3.11.For each n, k, d, there is an input Since separable inputs are precisely those of twin-width 1, Theorem 3.11 with d = 1 implies that Theorem 3.10 is tight up to a factor of t • k 4 .Moreover, Theorem 3.7 is tight if the avoided pattern π is non-separable (up to π-dependent constants in the exponent).This is because the separable input obtained from Theorem 3.11 with d = 1 must avoid all non-separable patterns.Since almost all patterns are non-separable [Bri10], Theorem 3.7 is tight in this way for almost all π.
Corollary 3.12.For each n, k, d, there is a permutation π of size d 2 and a π-avoiding input Note that the corresponding upper bound (Theorem 3.7) is k O(1) • n O(log cπ/ log k) .Corollary 3.12 implies that the dependence on π in the exponent of n is necessary, and we cannot hope for an equivalent of Theorem 3.10 for every avoided pattern.
Our second construction is specific to Av(231) and its symmetries.Theorem 3.13.For each n, k there is an input X ∈ [0, 1] n avoiding 231 or its symmetries such that OPT k ∈ Ω(4 This shows that Theorem 3.8 is tight up to a factor of O(k 2 • 4 k ).Further, Theorem 3.9 reveals that Theorem 3.13 is, in a sense, best possible, since for every proper subclass of Av(231) or its symmetries, OPT k (X) is already bounded.

Euclidean TSP
In this subsection we consider the following problem: Given a set P ⊂ [0, 1] 2 of n points, find a tour that visits each point in P .Let us denote the shortest euclidean length of such a tour as TSP(P ).
It is well-known that TSP(P ) ∈ O( √ n) [BHH59].For an easy argument, consider a uniform √ n × √ n grid inside [0, 1] 2 , and observe that traversing the grid points and routing each input point to the nearest grid point has total cost O( √ n).
This bound is tight as the example of the n grid points itself shows, since each point must be routed to a neighbor at cost at least 1/ √ n; in fact, a uniform random set of n points has w.h.p. cost Θ( √ n) [BHH59].The following warm-up example motivates the study of pattern-avoiding point sets.
To see this, partition P into k increasing subsets, connect each subset by a path (of length at most 2), and connect the paths to each other at a further cost of at most k.
Let us attempt to generalize the observation to arbitrary avoided patterns.We sketch a simple but suboptimal argument first.Let P ⊂ [0, 1] 2 be a set of n points avoiding π.
cπ equal square cells.Observe that at most √ n of these cells have any point in them.Otherwise, by Lemma 2.1 we would have |π| nonempty cells forming the pattern π.
Construct a tour recursively, by constructing a tour of each nonempty cell, and connecting these mini-tours by a tour of a representative in each.Assume for simplicity that the n points are distributed equally among the nonempty cells (this can be shown to maximize the bound via Jensen's inequality; in this proof sketch we omit the formal justification).The total length t(n) can then be bounded as: The recurrence already gives a non-trivial bound of the form O(log log 2 (cπ+1) n).Our main theorem (all proofs in § 7), however, will yield a significant improvement.
Theorem 3.15.Let π be a permutation with Füredi-Hajnal limit c π .Then, for every π-avoiding point set P ⊂ [0, 1] 2 in general position, we have We give a lower bound that shows the result to be asymptotically tight for essentially all avoided patterns.The proof is by a "balanced tree" point set.
Observe that each non-monontone pattern π contains 231 or one of its symmetries (132, 213, 312).Hence, we have the dichotomy that if our point set P avoids a monotone pattern, then TSP(P ) is constant, and if P avoids a non-monotone pattern π, then TSP(P ) ∈ Θ π (log n) in the worst case.
We also show a sharp dichotomy between Av(231) and its subclasses.
Finally, we show a lower bound implying that in general, an (almost) linear dependence on twin-width cannot be avoided.
Theorem 3.18.For each n and d ≥ 2, there is a set P ⊂ [0, 1] 2 of n points with twin-width d such that TSP(P ) ∈ Ω( d log d log n).
In particular, if γ d is a d × d grid permutation, then, for a worst-case γ d -avoiding point set P of n points, TSP(P ) is between Ω( d log d log n) and 2 O(d) log n.

Symmetries and general position
We remark that all three problems are invariant under reversal and complement.For the BST and k-server problems, however, reversal implies running the operation sequence backwards, which is only allowed in the offline case.In addition, the TSP problem is invariant under rotation, and the BST problem (in the satisfied superset formulation) under 90-degree rotation; the latter implies a swap between time-and keyspace-dimensions, thus less easily interpretable in the BST view.The restriction of the input to [0, 1], resp., [0, 1] 2 is without loss of generality, as the offline algorithm can conceptually scale/shift the input.
In our bounds for all three problems, we usually require the input to be in general position.We sketch a way of extending our results to inputs that are not in general position.
Let P ⊂ [0, 1] 2 be a point set, not necessarily in general position, let D be the minimum distance between two points in P , and let 0 < ε < D. Construct P ′ from P by replacing every point (x, y) with (x + ε • y, y + ε • x).Clearly, P ′ is in general position; every "row" and every "column" of points is replaced by an increasing point set.Strict inequalities between point coordinates are preserved, but pattern-avoidance is not.However, it can be shown that if P avoids a permutation π, then P ′ avoids a permutation σ, obtained by inflating π with |π| copies of the permutation 21.Note that for BST and k-server, the x-coordinates describe the temporal order of accesses/requests, and thus are already distinct; here, only the transformation of y-coordinates has any effect.
If P corresponds to a k-server or TSP input, by choosing ε appropriately, we can keep the cost of the transformed point set P ′ arbitrarily close to the cost of P .Hence, Theorems 3.7 and 3.15 generalize to inputs not in general position; only the constant c π in the bounds is replaced.
For the BST (and satisfied superset) problem, we use the same transformation.Let X ′ ∈ [m] m be the permutation access sequence obtained by perturbing the point set corresponding to X ∈

Arborally satisfied superset
In this section, we prove: Theorem 3.2.Let P be a set of n points in general position, of twin-width d.Then OPT(P ) ∈ O(nd 2 ).
As defined in § 3.1.1,if P is a set of points and x, y ∈ P , then we call x and y connected and write x P ∼ y if there is a monotone path T connecting x and y that consists of axis-parallel line segments, and each corner of T is contained in P .For two sets X, Y we write X P ∼ Y if x P ∼ y for all x ∈ (X ∩ P ), y ∈ (Y ∩ P ).A set of points is called arborally satisfied if every pair of points in P is connected in P (i.e., P P ∼ P ).Let R 1 , R 2 , . . ., R n be a merge sequence of a point set P .For each rectangle family R i , let G(R i ) be the set of all intersection points of lines v and h, where v is a vertical line containing a left or right side of a rectangle in R i , and h is a horizontal line containing the top or bottom side of a rectangle in R i .In particular, note that G(R i ) contains the corners of each rectangle in R i .
We construct an arborally satisfied superset A ⊇ P as follows.Let A 0 = P .Consider a step from R i to R i+1 in the merge sequence where two rectangles Q 1 and Q 2 are merged into a rectangle Q.
, and finally, A = A n .In words, we add to the point set each point in Q that lies on the "grid" induced by rectangles in R i (including Q 1 and Q 2 ), if that point is not already present (see Fig. 8).Lemma 4.1.For each i ∈ {0, 1, . . ., n}, the following two invariants hold: Proof.Invariant (i) clearly holds initially.It is then maintained, since the required points are added whenever adding a new rectangle, points are never removed, and the set G(R i ) does not gain new points with increasing i.
To prove invariant (ii), we proceed by induction on i.In the beginning, each rectangle Q consists a single point, hence trivially Q A 0 ∼ Q.Moreover, any two distinct rectangles are homogeneous, so the claim holds.Now consider the i-th step, where two rectangles Observe that B contains all newly added points.

x, y ∈
∼ y.Otherwise, there is nothing to prove.
2. x, y ∈ B. Since B forms a full grid inside Q, clearly x and y are connected, possibly via another point in B.

x ∈ B and y
x ∈ A i by invariant (i) and thus x A i ∼ y by invariant (ii).Otherwise, w.l.o.g.Q ′ is to the right of x. (See Fig. 9a.)Let h be the horizontal line through x.Since x and Q ′ are not homogeneous, h must intersect Q ′ .Let c be the leftmost intersection point of h and Q ′ .Since x ∈ B ⊆ G(R i ), we know that h contains the top or bottom side of some rectangle in R i , and Let v and h be the vertical and horizontal line through c.One of v, h intersects Q, say v. Let h ′ be the horizontal line through x, and let {z} = v ∩ h ′ .We have z ∈ Q and z ∈ G(R i ), hence z ∈ B ⊆ A i+1 by invariant (i).Thus, x and c are connected via z.∼ y (shown in case 3), which implies

x, y ∈
Observe that A i ⊂ Q∈R i Q for all i.This means that the list of cases is exhaustive, up to swapping x with y and Q 1 with Q 2 .
It remains to bound the number of points added in each step.
Proof.Consider a step of merging By assumption, there are at most d rectangles in R i+1 that are not homogeneous with Q (this includes Q).In R i , there are two more, namely Q 1 and Q 2 .Extending each side of these rectangles into lines yields at most 4d + 8 lines that intersect Q, creating at most (2d + 4) 2 intersection points within Q.

Distance-balanced merge sequences
We say that a tuple of intervals (I 1 , . . ., I s ) is a partition of the real interval [0, 1] if (i) the intervals I j are disjoint, (ii) their union is equal to [0, 1], and (iii) For two positive integers r, s, an r × s-gridding of the unit square [0, 1] 2 is a pair G = (P 1 , P 2 ) where P 1 = (I 1 , . . ., I r ) and P 2 = (J 1 , . . ., J s ) are partitions of [0, 1].We say that I x is the x-th column of G, J y is the y-th row of G, and I x × J y is the (x, y)-cell of G.A gridding G is a coarsening of a gridding G ′ if G can be obtained from G ′ by repeated merging of consecutive rows and columns.
Recall the definition of a d-wide merge sequence from § 2. We call a d-wide merge sequence R 1 , . . ., R n of a point set P ⊂ [0, 1] 2 , augmented with a sequence of griddings G 1 , . . ., G n , distancebalanced if the following properties hold.Here p i denotes the number of columns in G i , and q i denotes the number of rows in G i .
(a) Each row (or column) of G i contains at most d 2 non-empty cells.(b) Each rectangle of R i is contained in a single cell of G i and has width smaller than 20 p i and height smaller than 20 q i .
(c) Each column of G i wider than 40 p i contains at most d 2 points of P and each row of G i taller than 40  q i contains at most d 2 points of P .
(d) 9 40 d • (max(p i , q i ) − 1) ≤ n − i + 1 ≤ 1 2 d • min(p i , q i ).Property (a) implies that the grid is sparsely populated.Property (b) implies that rectangles are never much wider or taller than the average column width or row height.Ideally, we would like to also avoid columns/rows that are significantly wider/taller.This is not always possible, but property (c) at least ensures that very wide/tall columns/rows contain only few points.Finally, property (d) implies that p i , q i ∈ Θ(n − i + 1) = Θ(|R i |), so the number of rows and columns is linear in the number of rectangles.
Theorem 5.1.Let π be a permutation with Füredi-Hajnal limit c π and let P be a π-avoiding set P ⊂ [0, 1] 2 of n points in general position.Then P has a distance-balanced 10 c π -wide merge sequence.
Proof.Let t = 5 c π and C = 20.We say that a column x of an r × s-gridding G is wide if |x| > C r , and a row y is tall if |y| > C s where | • | denotes the size of the interval, i.e., the width of the column x (resp.height of the row y).Observe that in any gridding G of the unit square, at most a constant fraction of columns can be wide and the same holds for tall rows.We call a cell of G wide (tall ) if it is in a wide column (tall row).
First, we describe an algorithm that constructs the desired merge sequence along with a sequence of griddings and then prove that properties (a) to (d) hold.Throughout the execution the following invariants are maintained: (ii) Each row (or column) of G i contains at most t rectangles of R i .
(iii) The union of any two consecutive non-tall rows (or non-wide columns) contains strictly more than t rectangles of R.
Let s = ⌈ n t ⌉.Initially, set R 1 = P and let the gridding G 1 consist of columns c 1 , . . ., c s and rows r 1 , . . ., r s such that for every i ∈ [s − 1], the row r i and column c i both contain exactly t points of P .Observe that invariants (i) to (iii) hold trivially for R 1 and G 1 .
The algorithm repeats the following main step as long as there are at least two rectangles.If the rectangle family R i contains two rectangles sharing the same cell of the gridding G i that is neither wide nor tall, merge these two rectangles into a single new one, obtaining the rectangle family R i+1 .Later, we show that R i must always contain such a pair of rectangles as otherwise, an occurrence of π in P is guaranteed.Afterwards, the algorithm repeatedly merges all pairs of neighboring non-wide columns and non-tall rows in the gridding that violate invariant (iii).Note that one such merge might trigger a cascade of additional merges since the dimensions of the grid are decreasing and thus, other columns (resp.rows) might cease to be wide (resp.tall).Once this process stops, invariant (iii) is satisfied and we set G i+1 to be the obtained coarsening of G i .
Correctness.First, we verify that invariants (i) to (iii) are not violated in the i-th step.Invariant (i) holds since the only new rectangle in R i+1 is obtained by merging two rectangles inside a single cell of G i , and G i+1 is a coarsening of G i .Invariant (ii) is not violated since we are only merging consecutive rows and columns that violated condition (iii), i.e., they together contained at most t rectangles.Finally, (iii) is satisfied by the algorithm explicitly.
We now show that there must be two rectangles sharing a non-wide and non-tall cell in the i-th step of the algorithm.Suppose not.Let us construct the point set Observe that if M contains a permutation π then so does P since each rectangle contains at least one point of P .Let W and T denote the number of wide columns and tall rows in the gridding G i , respectively.We claim that G i contains at least ⌊ p i 2 ⌋ − W ≥ p i −1 2 − W pairs of neighboring non-wide columns.This can be seen by partitioning the columns into ⌊ p i 2 ⌋ neighboring pairs, since each wide column blocks at most one of the pairs.Thus, condition (iii) guarantees that there are strictly more than p i −1 2 − W • t rectangles in non-wide columns.At most T • t of these rectangles lie in tall rows by (ii), and each remaining rectangle must occupy a cell on its own (by assumption).Thus, we get C since each wide column has width strictly larger than C p i and thus W ≤ p i −1 C , as C is a positive integer.Similarly, T ≤ q i −1 C .Putting these together: Assume w.l.o.g. that p i ≥ q i and p i ≥ 2.14 Then we have Now Lemma 2.1 implies that M contains π, and thus P contains π, a contradiction.
Width of the merge sequence.Let Q ∈ R i be a rectangle in step i of the construction.Each other rectangle in R i that is not homogeneous with Q must share a row or column with Q.By (i) and (ii), there are at most 2t − 2 such rectangles.Hence, R i is in particular 2t = 10 c π -wide.
Distance-balanced.Finally, let us show that the obtained merge sequence is distance-balanced.Let d = 2t = 10 c π .First, observe that property (a) follows by a simple combination of (i) and (ii).Furthermore, any rectangle S ∈ R i corresponds either to an original point in P and its width and height are both zero; or it was created by merging two rectangles S, T in the j-th step for some j < i.In that case, both S and T occupied the same cell of G j in a non-wide column and a non-tall row and thus, the width (resp.height) of S is at most 20 p j ≤ 20 p i (resp. 20q j ≤ 20 q i ).Together with (i), this implies property (b).
Towards proving property (d), recall that |R i | = n − i + 1.The second inequality of property (d) then follows from (ii) since |R i | ≤ p i • t and |R i | ≤ q i • t.For the first inequality, recall that G i contains at most p i −1 C wide columns.Thus, there are at least ( . The same holds for q i , and property (d) follows.
Lastly, consider a column x of G i .Either x is already present in G 1 and thus, x contains at most t points of P , or x was created by merging two non-wide columns in the j-th step for some j < i.In the latter case, the width of x is at most 2 C p j ≤ 40 p j ≤ 40 p i .The same argument on the rows of G i shows property (c) and wraps up the proof.
Note that the width of the resulting merge sequence can be improved at the cost of its balancedness.Namely, for any integer k > 0, the proof above goes through when we set t = (4 + 1 k )c π and C = 4 + 16k.
Furthermore, observe that the proof is constructive and it describes a polynomial-time procedure that outputs the distance-balanced merge sequence.In fact, we claim that the algorithm of Guillemot and Marx [GM14] can be adapted to compute the distance-balanced merge sequence in O(n) time, given access to the points ordered by both x-and y-coordinates.We choose to omit the discussion of necessary implementation details since we only use the existence of distance-balanced merge sequences to prove the existence of solutions of small cost.
Our construction of distance-balanced merge sequences does not directly work with bounded twin-width and instead requires the point set to actually avoid a given pattern.However, we note that since each point set with twin-width d avoids some (d + 1) × (d + 1) grid permutation π, and c π ∈ 2 O(d) [Fox13], Theorem 5.1 extends to arbitrary point sets with bounded twin-width (albeit with exponential twin-width blowup).
Corollary 5.2.Let P be a point set of twin-width d.Then P has a distance-balanced 2 O(d) -wide merge sequence.
Since the full generality of distance-balanced merge sequences is not needed for our proofs, we isolate the required properties in two corollaries.First, we show that the sum of the dimensions of created rectangles is logarithmic.
Corollary 5.3.Let R 1 , . . ., R n be a distance-balanced d-wide merge sequence.For i ∈ {1, 2, . . ., n − 1}, let Q i be the rectangle created in step i, and let w i , h i be its width and height.Then Proof.By property (b), we have w i ≤ 20 p i+1 .By property (d), we have 1 where H k denotes the k-th Harmonic number.
Second, we use distance-balanced merge sequences to construct a "balanced" gridding.
For each real m ∈ [2, n 5cπ ], there exists a gridding G of [0, 1] 2 such that (I) The number of rows (columns) in G is at least m and at most 3m.
(II) Each row or column contains at most 5c π non-empty cells.
(III) Each row or column of width more than 40 m contains at most 5c π points.
Proof.Let R 1 , . . ., R n be a distance-balanced 10c π -wide merge sequence of P with griddings Let p be the number of columns in G and let q be the number of rows in G. Using property (d), m ≥ 2, and c π ≥ 1 (since π is non-trivial), we get Hence, part (I) holds.Part (II) follows directly from property (a).Property (c) implies that each column of width more than 40 p contains at most 5c π points.Since p, q ≥ m (as shown above), part (III) follows.
6 k-server on the line Consider a request sequence X = (x 1 , x 2 , . . ., x n ) ∈ [0, 1] n .We associate to X the point set , the value of the request is the y-coordinate of the corresponding point and serving the requests means moving through the point set from left to right.We generally assume X (and therefore P X ) to be in general position, and treat X and P X interchangeably; this allows us to apply the definitions of distance-balanced merge sequences from § 5 to request sequences.
We start with the proof of our main upper bound.
Proof.Let d = 5c π and let f t (n) denote the largest possible cost for a request sequence of length n by exactly d t servers.If k ̸ = d t for any positive integer t, we simply let the first d ⌊log d k⌋ servers serve all requests in time f ⌊log d k⌋ (n).We show by induction on where b t ≥ 1 is a constant dependent on t, specified later.For t = 0, linear cost is clearly sufficient to handle all requests by one server and we can set b 0 = 1.Assume t ≥ 1.If n < d t • n 1/(t+1) , we conclude that the bound holds trivially.Otherwise, we apply Corollary 5.4 with m = n 1/(t+1) ≥ d t/(t+1) ≥ 2 to get a gridding G.
Our strategy is as follows.Each column of G contains at most d non-empty cells.We split the servers into d groups of d t−1 servers and let each group serve a different non-empty cell in the first column of the gridding.After the requests in the first column are handled, we move each group to a different non-empty cell in the second column of G.We continue in the same fashion until all requests are handled.In this way, each non-empty cell of G is handled locally by a group of d t−1 servers and these groups move around the gridding only in between the columns of G.
First, let us bound the cost of moving servers around in between the columns of G. Denote by q the number of columns in G.We can upper bound the cost as where the first inequality follows from part (I) of Corollary 5.4.In order to bound the cost of serving the individual non-empty cells, we count their contributions separately depending on their height.We say that a row or a cell is extra-tall if its height is larger than 40 m .Let h 1 , . . ., h r be the heights of the extra-tall rows in G.By part (III) of Corollary 5.4, each extra-tall row contains at most d points.Therefore, the cost inside each cell of the j-th extra-tall row is at most h j • d, even with one server.This makes the total cost occurring in all the extra-tall cells at most Consider now the remaining cells.Let N be the number of non-empty cells in G that are not extra-tall (i.e., have height ≤ 40 m ), and let n 1 , . . ., n N be the number of points in each of these cells.The contribution of the non-extra-tall cells is, by induction, at most The function g(x) = x 1/t is concave on (0, ∞) and thus, by Jensen's inequality: Finally, parts (I) and (II) of Corollary 5.4 imply that N ≤ 3dm.Plugging in this inequality and m = n 1/t+1 yields 40 Summing the costs (1), (2) and (3), we obtain Thus, the desired inequality holds when we set b t = 120 • b t−1 + 4.This recurrence solves to b t ∈ Θ(120 t ) = Θ(d t•log d 120 ) which implies the claimed upper bound on f t (n).
6.1 Special upper bounds 6.1.1 231-avoiding inputs Theorem 3.8.For every input X ∈ [0, 1] n that avoids 231 (or its symmetries), we have We use the specific structure of 231-avoiding inputs, as shown in Fig. 10.Let X be a 231-avoiding sequence and let x 0 be the value of its first entry.Then there exist sequences X 1 , X 2 such that X = (x 0 ) • X 1 • X 2 , where • denotes concatenation, and all entries in X 1 are at most x 0 , and all entries in X 2 are at least x 0 .We say X decomposes into (x  An algorithm for serving a 231-avoiding sequence is given in Fig. 11.The parameter p is an arbitrary value that determines which sequences are small enough to be served with less than k servers.The root call for a sequence of requests in [0, 1] will be Serve(k, X, ⌊|X| (k−1)/k ⌋, 0, 1), but note that p is (re-)computed whenever we reduce the number of servers, i.e., call Serve(k − 1, . . .).
We now bound the total cost of a call to Serve.Let f k (n, p, a, c) be the maximum cost of the algorithm when X is a 231-avoiding sequence of length n and the other parameters are as given.Observe that f k (n, p, a, c) = (c − a)f k (n, p, 0, 1), hence we write f k (n, p) = f k (n, p, 0, 1) and ignore the parameters a and c from now on.We further write The pseudocode directly yields the following: Moreover, f k (n, p) is bounded by the maximum of the following two terms over all w 1 , w 2 , n 1 , n 2 with w 1 , w 2 ≥ 0, Below, we show that g k (n) ∈ O(k 2 + kn 1/k ), thereby proving Theorem 3.8.We first sketch the proof idea, ignoring constants and lower-order factors and treating k as a constant.
Let k ≥ 2. Compute an upper bound on g k (n) = f k (n, ⌊n (k−1)/k ⌋) by repeatedly applying eqs.( 6) and ( 7) until all occurrences of f k are gone.The occurrences of the first two terms w 1 + w 1 g k−1 (n i ) from eq. (6) add up to x i + x i g k−1 (n i ), where x i = 1 and n i = n.Using induction, the fact that n i ≤ p, and Jensen's inequality, we can bound the result by O(p 1/(k−1) ) = O(n 1/k ).
It remains to compute the total contribution of the first term w 2 of eq. ( 7).We simply bound w 2 ≤ 1 and argue that the term occurs at most n/p ≈ n 1/k times.For this, consider the evaluation tree of our calculation.This is a binary tree where each inner node is labeled f k (n ′ , p) for some n ′ and each leaf is labeled g k−1 (n ′ ) for some n ′ .A node f k (n ′ , p) corresponding to an application of eq. ( 7) must have two inner nodes as children, and must satisfy n ′ ≥ p.It is easy to see that there can be no more than n/p such nodes.
We now proceed with the formal proof.For n, p ∈ N + , define a weighted p-bounded partition of n to be two sequences (w i ) i∈[t] and (n i ) i∈ [t] such that w i ∈ [0, 1] and n i ∈ [p] for all i ∈ [t], as well as t i=1 w i = 1 and t i=1 n i = n.Let ϕ k (n, p) be the maximum value for t i=1 w i g k (n i ) over all weighted p-bounded partitions Lemma 6.2.For each k ≥ 2 and n, p ∈ N, we have Proof.Fix k and p.We prove the claim by induction on n.For n = 0, we have f k (0, p) = k − 1 by eq. ( 5) and are done.
An analogous argument proves the case when X 2 avoids β and the claim follows.

t-separable permutations
Theorem 3.10.Let t ≥ 2.Then, for every t-separable input X ∈ [0, 1] n , we have  Move 2 ℓ−1 servers to a i and 2 ℓ−1 servers to b i Serve(ℓ − 1, X i , |X i | ℓ/(ℓ+1) , a i , b i )M ove 2 ℓ−1 servers to a and 2 ℓ−1 servers to b Apart from recursive calls, the algorithm moves each server at most (t + 1) times (once for each block, and back to a and b at the end), for a total cost of at most (t + 1) • 2 ℓ = α ℓ .If there is exactly one large block X i * , then no server ever crosses X i * (outside of recursive calls), so the total cost is at most (1 − w i * )α ℓ .
We now prove our claim by induction on n.If n ≤ p, then there are no large blocks, so we have f ℓ (n, p) = (t + 1) • 2 ℓ + ϕ ℓ−1 (n, p) and are done.Now suppose n > p, so there may be large blocks.Let h ℓ (i) be the cost of serving X i (after moving servers in position).If , there is precisely one large block X i * , we have Figure 14: Lower bound constructions.
Otherwise, let W J = i∈J w i , and we have The last inequality uses 2W J ≤ |J|, which follows from |J| ̸ = 1.
The third inequality uses n i ≤ p and Jensen's inequality.

Bounded twin-width
Theorem 3.11.For each n, k, d, there is an input We first illustrate the idea for k ∈ {1, 2} and d = 1. Figure 14a sketches a sequence X with n values (properly defined below).The sequence is separable, i.e., has twin-width 1.The cost for one server is clearly Θ(n), since the server has to switch between the left and right part in every step.On the other hand, two servers can serve the sequence with cost O(1), by positioning one server at each part.
To construct a sequence that is hard for two servers, we take X and replace each point with a copy of X itself, scaled down by a factor of roughly 1 n .The resulting sequence, shown in Fig. 14b, is still separable (it is order-isomorphic to the inflation of X with 2n copies of itself).Now consider serving it with two servers.For each small copy of X, we essentially have two choices.Either use both servers, which means that afterwards, at least one server has to be moved over to the other side for the next small copy of X 1 (n), for a cost of Ω(1); or use only one server, which costs Ω( 1 n ) for each point, for a total cost of again Ω(1).Overall, the cost is Ω(n) for a point set of size n 2 .For larger k, we recursively inflate the construction.
If d > 1, we can make the construction a little more "efficient" by using multiple interleaved copies of X (Fig. 14c) as the base construction, which is again recursively inflated.We proceed with the formal definition and proof of Theorem 3.11.
Let X be a sequence of reals in [0, 1].For k, d ∈ N, we define the sequence S d (X, k) as follows.Let α = 1 4dk , let Y j i = α(4kj + i + X), and let We call each W i an epoch and each Y j i , Z j i a block of S d (X, k).Each block is a copy of X that is scaled down by α and shifted, so S d (X, k) contains 2dk|X| values in total.Lemma 6.6.We have tww(S d (X, k)) = max(tww(X), d).
Proof.It suffices to prove that tww(S d (X, k)) = d when |X| = 1.The claim then follows by the behavior of twin-width with respect to inflations (see Observation 2.2).
Without loss of generality, let X = ( 1 2 ), so each block consists of a single value.Observe that S d (X, k) is order-isomorphic to a k × 2d grid permutation, where each epoch corresponds to a column, and the rows alternate between being increasing (Y j 0 , Y j 1 , . . ., Y j k−1 ) and decreasing (Z j 0 , Z j 1 , . . ., Z j k−1 ).See Fig. 14c for an example.We first merge each pair of neighboring rows into a single rectangle, gradually from the end of the sequence.More precisely, we start by merging Y j k−1 and Z j k−1 into a rectangle R j , for each j ∈ {0, 1, . . ., d − 1}.Then, for i = k − 2, k − 3, . . ., 0, we merge each Z j k−1 into R j and then each Y j k−1 into R j .At any time, each rectangle R j is homogeneous to every non-rectangle value (but may be non-homogeneous with each other of the rectangles R j ′ ), so the rectangle family is always d-wide.
In the end, we simply merge the d rectangles (each corresponding to a double row) in any order.
n) with strictly less than (2d) t servers has cost at least n/(8d) t .Proof.Serving X d 1 (n) with (2d) 1 − 1 = 2d − 1 servers costs at least 1/(2d) per epoch, for a total of 1 2d n ≥ n/(8d).Now let t ≥ 2 and consider a solution serving X d t (n) with at most (2d) t − 1 servers.We say that an epoch is saturated if all of its blocks are touched by at least (2d) t−1 servers.Let u be the number of saturated epochs.
To prove Theorem 3.13, let m = 1 2 ⌊n 1/k ⌋.The sequence X ′ k (m) has length at most (2m) k ≤ n, and serving it with k servers costs (by Lemma 6.8):

Euclidean TSP
In this section, we show upper and lower bounds on the optimum euclidean TSP tour of a point set.
As mentioned in § 1, several characteristics of point sets are known to be within a constant factor of the TSP optimum, and it will be helpful to use them in our proofs when showing asymptotic bounds.
For a point set P , let MST(P ) be the cost of the euclidean minimum spanning tree on P , i.e., the minimum spanning tree on G P , where G P is the complete graph on P where each edge has weight equal to the distance between its two endpoints.Further, let NN(P ) = x∈P d x , where d x is the minimum distance between x and a different point from P (i.e., the nearest neighbor ).It is easy to see that 1 2 NN(P ) ≤ MST(P ) ≤ TSP(P ) ≤ 2 MST(P ).Let MStT(P ) denote the minimum MST(P ′ ) over all supersets P ′ ⊇ P , i.e., the minimum euclidean Steiner tree.We have MStT(P ) ≤ MST(P ) ≤ 1.22 MStT(P ), where the first inequality is trivial and the second is due to Chung and Graham [CG85].
Proof.Let n be the size of P .We show MST(P ) ∈ O(c π log n).Given a merge sequence R 1 , . . ., R n of P , we construct a spanning tree as follows.Replay the merge sequence, and whenever two rectangles Q 1 and Q 2 are merged, connect an arbitrary point in Q 1 with an arbitrary point in Q 2 .Observe that after step i, the points in every rectangle Q ∈ R i are connected via a spanning tree.Thus, we obtain a spanning tree T of P at the end.
To control the total length of T , use a distance-balanced 10c π -wide merge sequence (Theorem 5.1).Edge e i added in step i of the construction is contained in rectangle S i that is newly created in step i.Thus, the length of e i is bounded by the dimensions of S i .Corollary 5.3 implies that the sum of the dimensions of S 1 , S 2 , . . ., S n is O(c π log n).
Proof.Let k ∈ N + .We recursively define a point set P k of size n = 2 k −1 on the integer grid [n]×[n] and an associated rooted tree T k with node set P k .We show that NN(P k ) ≥ k2 k−2 > 1 4 n log n, even in L ∞ .This implies the theorem (after scaling).
P k consists of a single point p k = (1, 2 k−1 ) and two shifted copies of P k−1 , one directly to the right and below p, and the other in the top right corner.More formally, let P 1 = {p 1 } = {(1, 1)}, with T 1 being the tree on the single node p 1 .For k ≥ 2, let P k = {p k } ∪ A ∪ B, where (a) Distance between p and its proper descendants.
p q?
Figure 15: Sketches for the proof of Theorem 3.16.
Observe that P k is in general position, avoids 231, and contains precisely 2 k − 1 points in [2 k − 1] 2 .Define T k as the tree with p k at its root, and the trees associated to A and B as subtrees.
For a point p ∈ P k , let the level ℓ p of p in T k be the depth of the subtree of T k rooted at p. Observe that p is the leftmost point in a shifted copy of P ℓp .Call that copy Q p .
We now give lower bounds for all distances between points.First, let p, q ∈ P k such that q is a proper descendant of p, and let ℓ be the level of p. Figure 15a shows Q p .Observe that q ∈ Q p , and further, there is a child r of p such that q ∈ Q r .Let A 1 , A 2 be the point sets corresponding to the subtrees of r.If q = r, then d(p, q) ≥ 2 ℓ−2 .The same is true if q ∈ A 1 ∪ A 2 .If q ∈ B, we even have d(p, q) ≥ 2 ℓ−1 .In any case, d(p, q) ≥ 2 ℓ−2 .Now suppose p, q ∈ P k are unrelated in T k , and let ℓ be the level of p. Then q is below and to the left of all points in Q p , or q is above and to the right of all points in Q p . Figure 15b clearly shows that then d(p, q) ≥ 2 ℓ−1 .
Our two observations imply that for each point p of level ℓ, the distance to the nearest neighbor in P k is at least 2 ℓ−2 .The total sum of nearest neighbor distances is therefore Theorem 3.17.Let π ∈ Av(231).Then, for every point set P ⊂ [0, 1] 2 that avoids 231 and π, we have TSP(P ) ≤ 12|π|.
• B avoids β.Then we connect (w, 0) to (w B , 0) and (0, h) to (0, h A ).We obtain the desired bound similarly to the previous case.

Twin-width lower bound
Next, we show that there are point sets P such that the constant factor hidden in O(log n) is at least Ω(d/ log d) where d is the twin-width of P .

Open questions
In this paper we initiated the study of optimization problems with pattern-avoiding input, with three central problems as case studies.Our work raises several open questions and directions; we list the ones we find the most interesting.

BST and sorting
• Our upper bound O(n c 2 π ) and lower bound Ω(n log c π ) for OPT hold for each avoided pattern π.While we know special families of patterns where the upper bound is not tight [CGK + 18], we have no examples where the lower bound is not tight.
• Our upper bound holds for the offline optimum of serving BST access sequences.Can similar bounds be shown for online algorithms, such as splay or greedy trees?
• Such results would imply an O π (n) time algorithm for sorting permutations that avoid π (by insertion sort).Finding such a sorting algorithm (perhaps by methods other than BSTs) remains open, even if the avoided pattern π is known a priori, apart from very special cases [Art07, CGK + 15b, CPY23].
• Sorting in O π (n) time for π-avoiding inputs would also be implied by our BST result, if one could construct the twin-width decomposition in linear time.Whether this is possible is an interesting question in itself.

k-server
• We give tight bounds on the k-server cost of pattern-avoiding access sequences.Yet, a full characterization in terms of the concrete avoided pattern π is still missing (gaps between n Ω(1/k) and n O(1/ log k) ).We do establish that the cost can have at least three different growth rates depending on π: O(1) if π is monotone; n Θ(1/k) if π is non-monotone and has length three; and n Θ(1/ log k) if π is non-separable.
• Our results concern the offline cost.Does the online competitive ratio k improve under patternavoidance?Note that a lower bound of k holds even for the simplest k-point metrics [MMS90].Yet, when viewed on the line, the lower bound constructions are emphatically not patternavoiding, suggesting a possible improvement.
• Does the effect of pattern-avoidance extend to more general metric spaces, e.g., trees (the extension of double-coverage (DC) is k-competitive on trees).This would require the definition of more general pattern-concepts, which may be interesting in itself.
• By the known competitive results, our bounds immediately transfer to the DC algorithm (with a factor k cost-increase).DC is simple and has an intuitive potential-based analysis [CL91,BE98].This analysis does not hint at why DC is adaptive to avoided patterns.Can this be seen directly through the analysis of DC or other competitive k-server algorithms?TSP • Our O(log n) bound for TSP raises the question for which special cases this cost may be O(1).
We have a complete characterization for principal classes (families defined by the avoidance of a single pattern).A broader characterization, e.g., in terms of path-width and other parameters and merging [JOV18] may be possible.

Further questions
• Our result for BST holds for bounded twin-width, regardless of avoided patterns.For k-server and TSP, our main bounds depend on c π , for a pattern π avoided in the input.Recall that distance-balanced merge sequences exist for individual permutations with bounded twin-width, but with exponential width blowup.We leave open whether this dependence can be improved.
• Which other online or geometric problems benefit from pattern-avoidance in the input?Online problems with sequential structure that may be good candidates include list labeling, scheduling, online matching, or interval coloring [BE98].
• Pattern-avoidance is sensitive to perturbation, whereas the optimal cost typically changes smoothly.(Note however, that twin-width affords some robustness, e.g., a constant number of pattern-occurrences can increase twin-width only by a constant.)Do other, more robust concepts of pattern (e.g., [NRRS19]) relate to complexity?
• The relation between tww(Av(π)) and c π , i.e., the largest possible twin-width attainable by a π-avoiding permutation and the Füredi-Hajnal limit of π, is not yet understood.For instance, it is possible that they are polynomially related (which would imply that tww(Av(π)) is exponential in |π| for almost all π), but it is also possible that tww(Av(π)) is polynomially bounded in |π|.

Figure 1 :
Figure 1: The permutation 31254 as a matrix (a) and as a point set (b); the highlighted points/ones correspond to an occurrence of the pattern 213 (c).Pattern 321 (d) is avoided by 31254.

Figure 3 :
Figure 3: A 3-wide merge sequence of 23514, along with its red graphs.Black points are also degenerate rectangles.

Figure 4 :
Figure 4: A hierarchy of important permutation classes.
Figure 6: Manhattan networks on input X (black dots) and newly added points (grey).
Figure 7: Two point sets with partial, but already long, Manhattan networks.

Figure 8 :
Figure 8: Sketch of points added in a single step in Theorem 3.2.Grey lines are the grid induced by rectangle sides, grey points are known to be present already, and blue points are new.

Figure 9 :
Figure 9: Illustrations for the proof of Lemma 4.1.The red lines indicate connections between x and y.Note that in (b), Q may be much larger and overlap or even contain Q ′ .
and x is homogeneous with Q ′ .W.l.o.g., Q ′ is below and to the right of x. (See Fig. 9b.)If Q ′ and Q are also homogeneous, there is nothing to do.Otherwise, let c be the top left corner of Q ′ .Since c A i ∼ y by invariant (ii), it suffices to show that x A i+1 ∼ c. x

∼
y by induction.If Q ′ and Q are homogeneous, then there is nothing to prove.Now suppose Q ′ and Q 1 are homogeneous, but Q ′ and Q are not homogeneous.W.l.o.g., Q ′ is below and to the right of Q 1 .(See Fig. 9c.)Let c be the bottom right corner of Q 1 .We have x A i ∼ c by invariant (ii).Since c ∈ B, we have c A i+1

Figure 10 :
Figure 10: Structure of a 231-avoiding sequence procedure Serve(k, X, p, a, c) ▷ Serve X with k servers.Assume that all requests are in [a, c].One server starts at c, the others at a.All servers end at c. ◁ if |X| = 0 then Move k − 1 servers from a to c else if k = 1 then Serve X, then move the single server to c elseb ← first request in X Decompose X into (b) • X 1 • X 2 if |X 1 | ≤ p then Move server from a to b Serve(k − 1, X 1 , ⌊|X 1 | (k−2)/(k−1) ⌋,a, b) else Move server from c to b Serve(k, X 1 , p, a, b) Move server from b to c ▷ Now k − 1 servers are stationed at b, and one server at c. ◁ Serve(k, X 2 , p, b, c)

Figure 11 :
Figure 11: Algorithm for serving a 231-avoiding sequence with k servers.

procedure
Serve(ℓ, X, p, a, b) ▷ Serve a t-separable request sequence X ∈ [a, b] n , with 2 ℓ ≥ 1 servers.Half of the servers each start at a and at b. Half of the servers each end at a and at b

Figure 13 :
Figure13: An algorithm for serving a t-separable request sequence.
A 231-avoiding point set.
Recursively constructing a spanning tree if A avoids α.Newly added edges and points are colored red.

Theorem 3. 18 .
For each n and d ≥ 2, there is a set P ⊂ [0, 1] 2 of n points with twin-width d such that TSP(P ) ∈ Ω( d log d log n).For d ≥ 2, let G d be the point set with coordinates ((i − 1)1 d + (d − j + 1) 1 d 2 , (j − 1) 1 d + i 1 d 2 ) for i, j ∈ [d], i.e., the d × d canonical grid scaled to fit [0, 1] 2 (see Fig. 2e).By Lemma 2.3 we have tww(G d ) = d.First, we show that any Steiner tree of G d must have weight at least linear in d.Indeed, since for every pair p 1 , p 2 ∈ G d we have the distance d(p 1 , p 2 ) ≥ 1 d , it follows that NN(G d ) ≥ d 2 • 1 d = d.Thus, MStT(P ) ≥ 0.5 1.22 1 servers to a i and 2 ℓ−1 servers to b i Serve(ℓ, X i , p, a i , b i ) else if there is exactly one i * with |X i * | > p then if Values in X i are smaller than values in X i * then Move 2 ℓ−2 servers from a to a i and 2 ℓ−2 servers from a to b i else Move 2 ℓ−2 servers from b to a i and 2 ℓ−2 servers from b to b i else • NN(P ) ≥ 0.4d.Now we inductively define for each d ≥ 2 and t ≥ 1 a point set P d t .Let P d 1 = G d and for t ≥ 2, let P d t be the point set obtained by inflating each point of G d with a properly scaled-down copy of P d t−1 .Formally, we set P d t = p∈G d p + 1 d 2 P d t−1 .Lemma 7.1.For every d ≥ 40 and t ≥ 1, we have MStT(P d t ) ≥ d 5 t.Proof.We proceed by induction on t.When t = 1, we have MStT(P d 1 ) ≥ 0.4d > d 5 .Now let t > 1 and let T be a minimum Steiner tree of P d t .For every p ∈ G d , let A p be the box [0, 1 d 2 ] translated such that its bottom left corner coincides with p, i.e., A p = p + [0, 1 d 2 ] 2 .Observe that each A p contains precisely one small copy of P d t−1 .We split all the edges of T into parts that lie inside and outside of p∈G d A p .Let T I be the part inside p∈G d A p and let W I be its total weight.Similarly, let T O be the part outside p∈G d A p and let W O be its total weight.First, let us bound W O .If we add to T O the boundary of each A p for p ∈ G d , we obtain a Steiner tree of G d . 16The weight of the added edges is exactly d 2 4 d 2 = 4, so by the above observation we get W O ≥ 0.4d − 4. We proceed to bound W I .Inside each A p , we get a Steiner tree T p of the point set 1 d 2 P d t−1 by adding the boundary of A p to the part of T I that lies inside A p .By induction, the weight of T p must be at least 1 d 2 d 5 (t − 1) = t−1 5d .Summing over A p for every p ∈ G d and subtracting the total length of their boundaries, we get W I ≥ d 2 • where the second inequality holds since d ≥ 40.To prove Theorem 3.18, consider two separate cases.If d < 40, apply Theorem 3.16 to obtain a separable point set P such that TSP(P ) ∈ Ω(log n).Otherwise d ≥ 40 and we apply Lemma 7.1 with t = ⌊ 1 2 log d n⌋ = log n 2 log d .Then P d t is a point set of size d 2t ≤ n and twin-width d such that TSP(P d t ) ∈ Ω( d log d log n).