Irredundant intervals

This expository note presents simplifications of a theorem due to Gy\H{o}ri and an algorithm due to Franzblau and Kleitman: Given a family $F$ of $m$ intervals on a linearly ordered set of $n$ elements, we can construct in $O(m+n)^2$ steps an irredundant subfamily having maximum cardinality, as well as a generating family having minimum cardinality. The algorithm is of special interest because it solves a problem analogous to finding a maximum independent set, but on a class of objects that is more general than a matroid. This note is also a complete, runnable computer program, which can be used for experiments in conjunction with the public-domain software of {\sl The Stanford GraphBase}.


Introduction.
Let's say that a family of sets is irredundant if its members can be arranged in a sequence with the following property: Each set contains a point that isn't in any of the preceding sets.
If F is a family of sets, we write F ∪ for the family of all nonempty unions of elements of F . When F and G are families with F ⊆ G ∪ , we say that G generates F . If F is irredundant and G generates F , we obviously have |F | ≤ |G|, because each set in the sequence requires a new generator.
In the special case that the members of F are intervals of the real line, András Frank conjectured that the largest irredundant subfamily of F has the same cardinality as F 's smallest generating family. This conjecture was proved by Ervin Győri [4], who noted that such a result was a minimax theorem of a new type, apparently unrelated to any of the other well-known minimax theorems of graph theory and combinatorics. A constructive proof was found shortly afterwards by Franzblau and Kleitman [3], who sketched an algorithm to find a generating family and irredundant subfamily of equal cardinality. (Győri, Franzblau, and Kleitman were led to these results while studying the more general problem of finding a minimum number of subrectangles that cover a given polygon. Further information about polygon covers appears in [3] and [1].) The purpose of this note is to describe the beautiful algorithm of Franzblau and Kleitman in full detail. Indeed, the CWEB source file that generated this document is a computer program that can be used in connection with the Stanford GraphBase [8] to find maximum irredundant subfamilies and minimum generating families of any given collection of intervals. Perhaps this new exposition will shed new light on the class of optimization problems for which an efficient algorithm exists.
According to the conventions of CWEB [9], the sections of this document are sequentially numbered 1, 2, 3, etc. In this respect we are returning to a style of exposition used by Euler and Gauss and their contemporaries. A CWEB program is also essentially a hypertext; therefore this document may also be regarded as experimental in another sense, as an attempt to find new forms of exposition appropriate to modern technology.
Note: Győri used the term "U-increasing" for an irredundant family; Franzblau and Kleitman called such intervals "independent." Since a family of sets is a hypergraph, it seems unwise to deviate from the standard meaning of independent edges, yet "U-increasing" is not an especially appealing alternative. We will see momentarily that the term "irredundant" is quite natural in theory and practice.

2.
A far-reaching generalization of Győri's theorem was proved recently by Frank and Jordán [2], who introduced a large new family of minimax theorems related to linking systems. In particular, Frank and Jordán extended Győri's results to intervals on a circle instead of a line. But no combinatorial algorithm is known as yet for the circular case.
Can any or all of the Franzblau/Kleitman methods be "lifted" to such more general problems? We will return to this tantalizing question after becoming familiar with Franzblau and Kleitman's remarkable algorithm. §3 3. Theory. It is wise to study the theory underlying the Franzblau/Kleitman algorithm before getting into the program itself.

4.
A family of sets is called redundant if it is not irredundant. Any family that contains a redundant subfamily is redundant, since any family contained in an irredundant family is irredundant.

5.
If F is a family of sets and s is an arbitrary set, let F |s denote the sets of F that are contained in s. This operation is left-associative by convention: F |s|t = (F |s)|t = F |(s ∩ t).
We also write F for {f | f ∈ F }; thus F | F = F . (An index to all the main notations and definitions that we will use appears at the end of this note.) It follows that every point of s = F 0 is contained in at least two members of F 0 , hence in at least two members of F |s (since F 0 ⊆ F |s).

Corollary.
A finite family F of intervals on a line is redundant if and only if there is an interval s such that every point of s belongs to at least two intervals of F |s. (The set s need not belong to F .) Proof: Intervals are nonempty. By the proof of the preceding lemma, it suffices to consider sets s that can be written F 0 for some minimal redundant subfamily F 0 . In the special case of intervals, F 0 must be a single interval; otherwise F 0 would not be minimal.

8.
Henceforth we will restrict consideration to finite families F of intervals on a linearly ordered set. It suffices, in fact, to deal with integer elements; we will consider subintervals of the n-element set [0 . . n). (The notation [a . . b) stands here for the set of all integers x such that a ≤ x < b.) If F is a family of sets and x is a point, we will write N x F for the number of sets that contain x. The corollary just proved can therefore be stated as follows: "F is irredundant if and only if every interval s ⊆ F contains a point x with N x F |s ≤ 1." This characterization provides a polynomial-time test for irredundancy.

9.
Irredundant intervals have an interesting connection to the familiar computer-science concept of binary search trees (see, for example, [7, §6.2.2]): A family of intervals is irredundant if and only if we can associate its intervals with a binary tree whose nodes are each labeled with an integer x and an interval containing x. All nodes in the left subtree of such a node correspond to intervals that are strictly less than x, in the sense that all elements of those intervals are < x; all nodes in the right subtree correspond to intervals that are strictly greater than x. The root of the binary tree corresponds to the interval that is last in the assumed irredundant ordering. Its distinguished integer x is an element that appears in no other interval.
Given such a tree, we obtain a suitable irredundant ordering by traversing it recursively from the leaves to the root, in postorder [6, §2.3.1]. Conversely, given an irredundant ordering, we can construct a binary tree recursively, proceeding from the root to the leaves.
10. An example might be helpful at this point. Suppose n = 9 and Then {f 1 , f 2 , f 3 , f 4 , f 5 } and {f 1 , f 3 , f 5 , f 6 } are irredundant. (Indeed, a family of intervals is irredundant whenever its members have no repeated left endpoints or no repeated right endpoints.) These subfamilies are in fact maximally irredundant-they become redundant when any other interval of the family is added. Therefore maximal irredundant subfamilies need not have the same cardinality; irredundant subfamilies do not form the independent sets of a matroid.
On the other hand, irredundant sets of intervals do have matroid-like properties. For example, if F is where F l and F r correspond to the left and right subtrees of the root in the binary tree representation. If x ∈ g, the family ) Such near-matroid behavior makes families of intervals especially instructive.
11. Let's say that an interval s is good for F if N x F |s ≤ 1 for some x ∈ s; otherwise s is bad. Franzblau and Kleitman introduced a basic reduction procedure for any family F of intervals that possesses a bad interval s. Their procedure is analogous to modification along an augmenting path in other combinatorial algorithms. Let .) For example, we might have the following picture: In the simplest case we have k = 1 and the reduced family is simply , since s is bad. We can assume that c < a j+1 ; otherwise all intervals of F |s would be contained in [a 1 . . a j+1 ) or [a j+1 . . a k ), and both of these subintervals would be bad, contradicting the minimality of s.
The notation F ↓s is defined to be left-associative, like F |s; that is, F ↓s↓t = (F ↓s)↓t and F ↓s |t = (F ↓s)|t.

Lemma. If s is a minimal bad interval for
13. Lemma. Suppose s is a minimal bad interval for F , while t is a good interval. Then t is good also for F ↓s.
; it follows that j is uniquely determined, and the only interval containing The proof of the preceding lemma shows in particular that none of the intervals [a j+1 . . b j ) are already present in F before the reduction. And if 15. The Franzblau/Kleitman algorithm has a very simple outline: We let G 0 = F and repeatedly set G k+1 = G k ↓s k , where s k is the leftmost minimal bad interval for G k , until we finally reach a family G r in which no bad intervals remain. This must happen sooner or later, because k for all k by the lemma of §12. Franzblau and Kleitman proved the nontrivial fact that |G| is the size of the maximum irredundant subfamily of F ; hence G is a minimum generating family.
16. It is tempting to try to prove the optimality of G by a simpler, inductive approach in which we "grow" F one interval at a time, updating its maximum irredundant set and minimum generating set appropriately. But experiments show that the maximum irredundant set can change drastically when F receives a single new interval, so this direct primal-dual approach seems doomed to failure. The indirect approach is more difficult to prove, but no more difficult to program. So we will proceed to develop further properties of Franzblau and Kleitman's reduction procedure [3]. The key fact is a remarkable theorem that we turn to next.
17. Theorem. The same final family G = G r is obtained when s k is chosen to be an arbitrary (not necessarily leftmost) minimal bad interval of G k in the reduction algorithm. Moreover, the same multiset {s 0 , . . . , s r−1 } of minimal bad intervals arises, in some order, regardless of the choices made at each step. Proof: We use induction on r, the maximum number of steps to convergence among all reduction procedures that begin with a family F . If r = 0, the result is trivial, and if F has only one minimal bad interval the result is immediate by induction. Suppose therefore that s and t are distinct minimal bad intervals of F . We will prove later that t is a minimal bad interval for F ↓s, and that F ↓s↓t = F ↓t↓s. Let r ′ be the maximum distance to convergence from F ↓s, and r ′′ the maximum from F ↓t; then r ′ and r ′′ are less than r, and induction proves that the final result from F ↓s is the final result from F ↓s↓t = F ↓t↓s, which is the final result from F ↓t. (Readers familiar with other reduction algorithms, like that of [5], will recognize this as a familiar "diamond lemma" argument. We construct a diamond-shaped diagram with four vertices: F , F ↓s, F ↓t, and a common outcome of F ↓s and F ↓t.) This completes the proof, except for two lemmas that will be demonstrated below; their proofs have been deferred so that we could motivate them first.
18. This theorem and the lemma of §13 have an important corollary: Let S = {s 0 , . . . , s r−1 } be the multiset of minimal bad intervals determined by the algorithm from F , and let t be any interval. Then S|t is the multiset of minimal bad intervals determined by the algorithm from F |t. This holds because an interval s ⊆ t is bad for F if and only if it is bad for F |t. Minimal bad intervals within t never appear again once they are removed, and we can remove them first.
Reducing a minimal bad interval s when s is contained in a bad interval t may make t good, or leave it bad, or make it minimally bad. If s is minimally bad for F , it might also be minimally bad for F ↓s.
19. Now we are ready for the coup de grâce and the pièce de résistance: After the reduction algorithm has computed the irredundant generating family G = G r and the multiset S of minimal bad intervals, we can construct an irredundant subfamily F ′ of F with |F ′ | = |G| by constructing a binary search tree as described in §9. The procedure is recursive, starting with an initial interval t = [0 . . n) that contains F : The tree defined for F |t is empty if F |t is empty. Otherwise it has a root node labeled with x and with any interval of F |t containing x, where x is an integer such that N x G (t) = 1; here G (t) is the final generating set that is obtained when the reduction procedure is applied to F |t. A suitable interval containing x exists, because every element of G (t) is an intersection of intervals in F |t. The left subtree of the root node is the binary search tree for F | t ∩ [0 . . x) ; the right subtree is the binary search tree for F | t ∩ [x + 1 . . n) .
The number of nodes in this tree is |G|. For if x is the integer in the label of the root, G has one interval containing x, and its other intervals are G|[0 . . x) and G|[x + 1 . . n). The family G (t) is not the same as G|t; but we do have |G (t) | = G|t when t has the special form [0 . . x) or [x + 1 . . n), because in such cases F ↓s |t has the same cardinality as F |t when s is a minimal bad interval and 20. It is not necessary to compute each G (t) from scratch by starting with F |t and applying the reduction algorithm until it converges, because the binary tree construction algorithm requires only a knowledge of the incidence function N x G (t) . This function is easy to compute, because N

21.
All the basic ideas of Franzblau and Kleitman's algorithm have now been explained. But we must still carry out a careful analysis of some fine points of reduction that were claimed in the proof of the main theorem. If s and t are distinct minimal bad intervals, the lemma of §13 implies that no bad subintervals of t appear in F ↓s; we also need to verify that t itself remains bad.
Lemma. If s is a minimal bad interval for F and t is a bad interval such that s ⊆ t, then t is bad for F ↓s. Proof: Let s = [a . . b) and t = [c . . d). We can assume by left-right symmetry that a < c. Then b < d, by minimality of s. Assume that t isn't bad for F ↓s. The subfamily F |t must contain at least one of the maximal intervals [a j . . b j ) of F |s that are deleted during the reduction; hence c ≤ a j < b j−1 ≤ b. Let j be minimal with a j ≥ c. Then so the elements of t that are covered once less often are the elements of  d l ), respectively, where a 1 < c 1 . The lemma is obvious unless F |(s ∩ t) is nonempty, so we assume that , and let f be the interval of F |(s ∩ t) that contains x. Let p be maximal with a p < c 1 , and let q be minimal with , any newly added intervals [a j+1 . . b j ) for p ≤ j < k in F ↓s are properly contained in [c q . . d q ), so they remain in F ↓s↓t. Thus we can easily describe the compound operation F ↓s↓t in detail: No two of these intervals are identical, so F ↓t↓s gives the same result.
The remaining case f = [a p+1 . . b p+1 ) = [c q−1 . . d q−1 ) needs to be considered specially, since we can't delete this interval twice. The following picture might help clarify the situation: These intervals are replaced in F ↓s↓t by Thus F ↓s↓t is formed almost as in the previous case, but with [a p+1 . . b p ) and [c q . . d q−1 ) replaced by [c q . . b p ). And we get precisely the same intervals in F ↓t↓s.
(Is there a simpler proof?) 23. Practice. The computer program in the remainder of this note operates on a family of intervals defined by a graph on n + 1 vertices {0, 1, . . . , n}. We regard an edge between u and v as the half-open interval [u . . v), when u < v.
Graphs are represented as in the algorithms of the Stanford GraphBase [8], and the reader of this program is supposed to be familiar with the elementary conventions of that system.
The program reads two command-line parameters, m and n, and an optional third parameter representing a random-number seed. (The seed value is zero by default.) The Franzblau/Kleitman algorithm is then applied to the graph random graph (n + 1, m, 0, 0, 0, 0, 0, 0, 0, seed ), a random graph with vertices {0, 1, . . . , n} and m edges. Alternatively, the user can specify an arbitrary graph as input by typing the single command-line parameter −g filename ; in this case the named file should describe the graph in save graph format (as in the MILES SPAN program of [8]).
When the computation is finished, a minimal generating family and a maximal irredundant subfamily will be printed on the standard output file.
If a negative value is given for n, the random graph is reversed from left to right; each interval [a . . b) is essentially replaced by [−b . . −a) (but minus signs are suppressed in the output). This feature lends credibility to the correctness of our highly asymmetric algorithm and program, because we can verify the fact that the minimum generating family of the mirror image of F is indeed the mirror image of F 's minimum generating family.
In practice, the algorithm tends to be interesting only when m and n are roughly equal. If n is large compared to m, we can remove any vertices of degree zero; such vertices aren't the endpoint of any interval. If m is large compared to n, we can almost always find n irredundant intervals by inspection. The running time in general is readily seen to be O(mn + n 2 ).

29.
Update the counts for all intervals ending at u 29 ≡ for (a = u arcs ; a; a = a next ) This code is used in section 28.

30.
Clean up all count fields 30 ≡ for (w = cleanup ; w; w = w link ) w count = 0; This code is used in section 28.

31.
The reduction process is kind of cute too. Replace / * the remaining job is to shorten the other maximal arcs in [u . .
is the longest interval from t * / } §32 32. The dénouement. Now we build a binary tree in the original graph F , by filling in some of the utility fields of F 's vertices. If a node in the tree is labeled with x and with the interval [u, v), we represent it by x left = u and x right = v; the subtrees of this node are x llink and x rlink . The root of the whole tree is F root .
The rlink field happens to be the same as the count field, but this is no problem because the rlink is never changed or examined until after the count has been reset to zero for the last time.
#define left x.V / * left endpoint of interval labeling this node * / #define right w.V / * right endpoint of interval labeling this node * / #define llink v.V / * left subtree of this node * / #define rlink z.V / * right subtree of this node * / #define root uu .V / * root node of the binary tree for this graph * / Construct an irredundant subfamily of F with the cardinality of G 32 ≡ F root = make tree (F vertices , F vertices + F n − 1); This code is used in section 23.

33.
With a little care we could maintain a stack within F itself, but it's easier to use recursion in C. Let's just hope the system programmers have given us a large enough runtime stack to work with. This subroutine is based on the trick explained in §20.
This code is used in section 23.

34.
A subtle bug is avoided here when we realize that a vertex might already be in the cleanup list when its count is zero.
This code is used in section 33.
This code is used in section 33.