Genome Assembly, from Practice to Theory: Safe, Complete and Linear-Time

Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Even though it is one of the key problems in Bioinformatics, it is generally lacking major theoretical advances. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state-of-the-art assemblers are based on finding non-branching paths (unitigs) in an assembly graph. While such paths constitute only partial assemblies, they are likely to be correct. More precisely, if one defines a genome assembly solution as a closed arc-covering walk of the graph, then unitigs appear in all solutions, being thus safe partial solutions. Until recently, it was open what are all the safe walks of an assembly graph. Tomescu and Medvedev (RECOMB 2016) characterized all such safe walks (omnitigs), thus giving the first safe and complete genome assembly algorithm. Even though maximal omnitig finding was later improved to quadratic time by Cairo et al. (ACM Trans. Algorithms 2019), it remained open whether the crucial linear-time feature of finding unitigs can be attained with omnitigs. We answer this question affirmatively, by describing a surprising O(m)-time algorithm to identify all maximal omnitigs of a graph with n nodes and m arcs, notwithstanding the existence of families of graphs with Θ (mn) total maximal omnitig size. This is based on the discovery of a family of walks (macrotigs) with the property that all the non-trivial omnitigs are univocal extensions of subwalks of a macrotig. This has two consequences: (1) A linear-time output-sensitive algorithm enumerating all maximal omnitigs. (2) A compact O(m) representation of all maximal omnitigs, which allows, e.g., for O(m)-time computation of various statistics on them. Our results close a long-standing theoretical question inspired by practical genome assemblers, originating with the use of unitigs in 1995. We envision our results to be at the core of a reverse transfer from theory to practical and complete genome assembly programs, as has been the case for other key Bioinformatics problems.


INTRODUCTION 1.Theoretical and Practical Background of Genome Assembly
Genome assembly is one of the flagship problems in Bioinformatics, along with other problems originating in-or highly motivated by-this field, such as edit distance computation, reconstructing and comparing phylogenetic trees, text indexing and compression.In genome assembly, we are given a collection of strings (or reads) and we need to reconstruct the unknown string (the genome) from which they originate.This is motivated by sequencing technologies that are able to read either "short" strings (100-250 length, Illumina technology), or "long" strings (10.000-50.000length, Pacific Biosciences or Oxford Nanopore technologies) in huge amounts from the genomic sequence(s) in a sample.For example, the SARS-CoV-2 genome was obtained in [60] from short reads using the MEGAHIT assembler [40].
One reason for this stems from practice: the huge amount of data (e.g., the 3.1 Billion characters long human genome is read 50 times over) which impedes slower than linear-time algorithms, errors of the sequencing technologies (up to 15% for long reads), and various biases when reading certain genomic regions [48].Another reason stems from theory: historically, finding an optimal genome assembly solution is considered NP-hard under several formulations [29,32,33,44,47,49,50], but, more fundamentally, even if one outputs a 3.1 Billion characters long string, this is likely incorrect, since problem formulations inherently admit a large number of solutions of such length [36].
Given all these setbacks, most state-of-the-art assemblers, including, e.g., MEGAHIT [40] (for short reads), or wtdbg2 [54] (for long reads), generally employ a very simple and linear-time strategy, dating back to 1995 [33].They start by building an assembly graph encoding the overlaps of the reads, such as a de Bruijn graph [51] or an overlap graph [46] (graphs are directed in this paper).After some simplifications to this graph to remove practical artifacts such as errors, at their core they find strings labeling paths whose internal nodes have in-degree and out-degree equal to 1 (called unitigs), approach dating back to 1995 [33].That is, they do not output entire genome assemblies, but only shorter strings that are likely to be present in the sequenced genome, since unitigs do not branch at internal nodes.
The issue of multiple solutions to a problem has deep roots in Bioinformatics, but is in fact common to many real-world problems from other fields.In such problems, we seek ultimate knowledge of an unknown object (e.g., a genome) but have access only to partial observations from it (e.g., reads).A standard paradigm is to apply Occam's razor principle which favors the simplest model explaining the data.As such, the reconstruction problem is cast in terms of an optimization problem, to be addressed by various mathematical, computational and technological paradigms.
While this approach has been extremely successful in Bioinformatics, it is not always robust.First, the optimization problem might admit several optimal solutions, and thus several interpretations of the observed data.A standard way to tackle this is to enumerate all solutions [25,37].Second, the problem formulation might be inaccurate, or the data might be incomplete, and the true solution might be a sub-optimal one.One could then enumerate all the first k-best solutions to it [18,19], hoping that the true solution is among such first k ones.The motivation of such enumeration algorithms is that, e.g., later "one can apply more sophisticated quality criteria, wait for data to become available to choose among them, or present them all to human decision-makers" [18].However, both approaches do not scale when the number of solutions is large, and are thus unfeasible in genome assembly.

Safe and Complete Algorithms: A Theoretical Framing of Practical Genome
Assembly With the aim of enhancing the widely-used practical approach of assembling just unitigs-as those walks considered to be present in any possible assembly solution-a result in a major Bioinformatics venue [57] asked what is the "limit" of the correctly reconstructible information from an assembly graph.Moreover, is all such reconstructible information still obtainable in linear time, as in the case of the popular unitigs?Variants of this question also appeared in [8,9,27,38,47,55], while other works already considered simple linear-time generalizations of unitigs [30,36,45,52], without knowing if the "assembly limit" is reached.
To make this question precise, [57] introduced the following safe and complete framework.Given a notion of solution to a problem (e.g., a type of walk in a graph), a partial solution (e.g., some shorter walk in the graph) is called safe if it appears (e.g., is a subwalk) in all solutions.An algorithm reporting only safe partial solutions is called a safe algorithm.A safe algorithm reporting all safe partial solutions is called safe and complete.A safe and complete algorithm outputs all and only what is likely part of the unknown object to be reconstructed, synthesizing all solutions from the point of view of correctness.Safety generalizes the existing notion of persistency: a single node or edge was called persistent if it appears in all solutions [12,15,28], for example persistent edges for maximum bipartite matchings [15].It also has roots in other Bioinformatics works [13,23,58,61] considering the aligned symbols appearing in all optimal (and sub-optimal) alignments of two strings.
There are many theoretical formulations of genome assembly as an optimization problem, e.g., a shortest common superstring of all the reads [32,33,50], or some type of shortest walk covering all nodes or arcs of the assembly graph [29,31,44,45,47,49,52].However, it is widely acknowledged [36,42,43,[47][48][49] that, apart from some being NP-hard, these formulations are lacking in several aspects, for example they collapse repeated regions of a genome.At present, given the complexity of the problem, there is no definitive notion of a "good" genome assembly solution.Therefore, [57] considered as genome assembly solution any closed arc-covering walk of a graph, where arc-covering means that it passes through each arc at least once.The main benefit of considering any arc-covering walk is that safe walks for them are safe also for any possible restriction of such covering walks (e.g., by some additional optimality criterion 1 ).Put otherwise, safe walks Fig. 1.Walk e 0 . . .e is not an omnitig because there is a forbidden path P.
for all arc-covering walks are more likely to be correct than safe walks for some peculiar type of arc-covering walks.
Note that considering a genome assembly solution as a circular arc-covering walk makes several assumptions about the data [57]: (i) the sequenced genome is circular, (ii) there are no coverage gaps (i.e., every genomic position is sequenced), (iii) reads are error-free.We discuss these assumptions further in Section 1.5.If some of these assumptions fail, then even unitigs can be incorrect, as it has been recently shown on real data [53].

Prior Results on Safety in Closed Arc-Covering Walks
It is immediate to see that unitigs are safe walks for closed arc-covering walks.A first safe generalization of unitigs consisted of those paths whose internal nodes have only out-degree equal to 1 (with no restriction on their in-degree) [52].Further, these safe paths have been generalized in [30,36,45] to those partitionable into a prefix whose nodes have in-degree equal to 1, and a suffix whose nodes have out-degree equal to 1.All safe walks for closed arc-covering walks were characterized by [56,57] as being exactly those that are omnitigs, see Definition 1.1, Figure 1, and Theorem 3.1.This leads to the first safe and complete genome assembly algorithm (obtained thus 20 years after unitigs were first considered), outputting all maximal omnitigs in polynomial time (maximal omnitigs are those which are not sub-walks of other omnitigs).

Definition 1.1 (Omnitig).
Let W = e 0 . . .e be a walk.We say that a non-empty path P is a j-i forbidden path for W , for some 1 ≤ i ≤ j ≤ , if the first arc of P has the same tail as e j and is different from e j , and the last arc of P has the same head as e i−1 and is different from e i−1 .We say that W is an omnitig if for no 1 ≤ i ≤ j ≤ there exists a j-i forbidden path for W . Furthermore, through experiments on "perfect" human read datasets, [57] also showed that strings labeling omnitigs are about 60% longer on average than unitigs, and contain about 60% more biological content on average.Thus, once other issues of real data (e.g., errors) are added to the problem formulation, omnitigs (and the safe walks for such extended models) have the potential to significantly improve the quality of genome assembly results.Nevertheless, for this to be possible, one first needs the best possible results for omnitigs (given, e.g., the sheer size of the read datasets), and a full comprehension of them, otherwise, such extensions are hard to solve efficiently.
Cairo et al. [11] recently proved that the length of all maximal omnitigs of any graph with n nodes and m arcs is O (nm), and proposed an O (nm)-time algorithm enumerating all maximal omnitigs.This was also proven to be optimal, in the sense that they constructed families of graphs where the total length of all maximal omnitigs is Θ(nm).However, it was left open if it is necessary to pay O (nm) even when the total length of the output is smaller.Moreover, that algorithm cannot break this barrier, because, e.g., O (m)-time traversals have to be done for O (n) cases.

Our Results
Our main result is an O (m)-size representation of all maximal omnitigs2 , based on a careful structural decomposition of the omnitigs of a graph.This is surprising, given that there are families of graphs with Θ(nm) total length of maximal omnitigs [11].Theorem 1.2.Given a strongly connected graph G with n nodes and m arcs, there exists a O (m)size representation of all maximal omnitigs of G, consisting of a set M of walks (maximal macrotigs) of total length O (n) and a set F of arcs, such that every maximal omnitig is the univocal extension3 of either a subwalk of a walk in M, or of an arc in F .
Moreover, M, F , and the endpoints of macrotig subwalks univocally extending to maximal omnitigs can be computed in time O (m).
Since the univocal extension U (W ) of a walk W can be trivially computed in time linear in the length of U (W ), we immediately get the linear-time output sensitive algorithm: Corollary 1.3.Given a strongly connected graph G, it is possible to enumerate all maximal omnitigs of G in time linear in their total length.
We obtain Theorem 1.2 using two interesting ingredients.The first is a novel graph structure (macronodes), obtained after a compression operation of 'easy' nodes and arcs (Section 4).The second is a connection to a recent result by Georgiadis et al. [24] showing that it is possible to answer in O (1)-time strong connectivity queries under a single arc removal, after linear-time preprocessing (notice that a forbidden path is defined w.r.t.two arcs to avoid).
Theorem 1.2 has additional practical implications.First, omnitigs are also representable in the same (linear) size as the commonly used unitigs.Second, maximal macrotigs enable various O (m)time operations on maximal omnitigs (without listing them explicitly), by pre-computing the univocal extensions from any node, needed in Theorem 1.2.For example, given that the number of maximal omnitigs is O (m) [11], this implies the following result: Corollary 1.4.Given a strongly connected graph G with m arcs, it is possible to compute the lengths of all maximal omnitig in total time O (m).
Corollary 1.4 leads to a linear-time computation of various statistics about maximal omnitigs, such as minimum, maximum, and average length (useful, e.g., in [14]).One can also use this to filter out subfamilies of them (e.g., those of length smaller and/or larger than a given value) before enumerating them explicitly.

Significance of our Results
This paper shows that all the strings that can be correctly assembled from a graph can be obtained in output-sensitive linear time, a time feasible for being implemented in practical genome assemblers.It closes the issue of finding safe walks for a fundamental model of genome assembly (any closed arc-covering walk), a long-standing theoretical question and originating with the use of unitigs in 1995 [33].
This theoretical question is crucial also from the practical point of view: assembly graphs have the number of nodes and arcs in the order of millions, and yet the total length of the maximal omnitigs is almost linear in the size of the graph.For example, the compressed (see Section 4) de Bruijn graph of human chromosome 10 (length 135 million) has 467 thousand arcs [11,Table 1], and the length of all maximal omnitigs (i.e., their total number of arcs, not their total string length) is 893 thousand.Moreover, even though this chromosome is only about 4% of the full human genome, the running time of the quadratic algorithm of [11] on its compressed de Bruijn graph is about 30 minutes.
As mentioned at the end of Section 1.2, genome assembly solutions defined as closed arc-circular walks make several assumptions on the data, and if some of these fail, then even unitigs are incorrect [53].However, we do envision a reverse transfer from theory to practical and complete genome assembly programs, as in other Bioinformatics problems.For example, trivially, safe walks for all closed arc-covering walks are also safe for more specific types of arc-covering walks.Moreover, while a genome solution defined as a single closed arc-covering walk does not incorporate several practical issues of real data, in a follow-up work [10] we show that omnitigs are the basis of more advanced models handling many practical aspects.For example, to allow more types of genomes to be assembled, one can define an assembly solution as a set of closed walks that together cover all arcs [2], which is the case in metagenomic sequencing of bacteria.For linear chromosomes (as in eukaryotes such as human), or when modeling missing sequencing coverage, one can analogously consider one, or many, such open walks [56,57].Safe walks for all these models are subsets of omnitigs [2,10].Moreover, when modeling sequencing errors, or mutations present, e.g., only in the mother copy of a chromosome (and not in the father's copy), one can require some arcs not to be covered by a solution walk, or even to be "invisible" from the point of view safety.Finding safe walks for such models is also based on first finding omnitigs-like walks [10].
Notice that such separation between theoretical formulations and their practical embodiments is common for many classical problems in Bioinformatics.For example, computing edit distance is often replaced with computing edit distance under affine gap costs [17], or enhanced with various heuristics as in the well-known BLAST aligner [3].Also text indexes such as the FM-index [21] are extended in popular read mapping tools (e.g., [39,41]) with many heuristics handling errors and mutations in the reads.
Finally, our results show that safe partial solutions enjoy interesting combinatorial properties, further promoting the persistency and safety frameworks.For real-world problems admitting multiple solutions, safe and complete algorithms are more pragmatic than the classical approach of outputting an arbitrary optimal solution.They are also more efficient than enumerating all, or only the first k-best, solutions [18,19], because they already synthesize all that can be correctly reconstructed from the input data.

OVERVIEW OF THE PROOFS
We highlight here our key structural and algorithmic contributions, and give more formal details in Sections 4 and 5.We start with the minimum terminology needed to understand this section, and defer the rest of the notation to Section 3.
Functions t (•) and h(•) denote the tail node and the head node, respectively, of an arc or walk.We classify the nodes and arcs of a strongly connected graph as follows (see Figure 2(a): ( and a split-free node otherwise.An arc д is called a split arc if t (д) is a split node, and a split-free arc otherwise.(iii) A node or arc is called bivalent if it is both join and split, and it is called biunivocal if it is both split-free and join-free.A walk W is split-free (resp., join-free) if all its arcs are split-free (resp., join-free).Given a walk W , its univocal extension U (W ) is defined as W − WW + , where W − is the longest join-free path to t (W ) and W + is the longest split-free path from h(W ) (observe that they are uniquely defined).

Structure
The main structural insight of this paper is that omnitigs enjoy surprisingly limited freedom, in the sense that any omnitig can be seen as a concatenation of walks in a very specific set.In order to give the simplest exposition, we first simplify the graph by contracting biunivocal nodes and Fig. 2. Figure 2(a): Given a bivalent node v, the macronode M v is the subgraph of G induced by the nodes reaching v with a split-free path (in red), and the nodes reachable from v with a join-free path (in blue).These two types of nodes induce the two trees of the macronode.By definition, every arc with endpoints in different macronodes is bivalent (in green, denoted cross-bivalent arcs).The remaining bivalent arcs have endpoints in the same macronode (in purple, denoted self-bivalent arcs).Figure 2: The only omnitig traversing the bivalent node v is f 1 д 2 ; e.g., by the X-Intersection Property neither Extending the micro-omnitig f 1 д 2 to the right we notice that f 1 д 2 д 3 is an omnitig and by the Y-Intersection Property f 1 д 2 д 3 is not an omnitig (д 3 b 4 is a forbidden path).Hence, the only maximal right-micro omnitig is f 1 д 2 д 3 b 4 , and the only maximal left-micro omnitig is arcs.The nodes of the resulting graph can now be partitioned into macronodes (see Figure 2(a) and Definition 4.4), where each macronode M v is uniquely identified by a bivalent node v (its center).We can now split the problem by first finding omnitigs inside each macronode, and then characterizing the ways in which omnitigs from different macronodes can combine.
We discover a key combinatorial property of how omnitigs can be extended: there are at most two ways that any omnitig can traverse a macronode center (see also Figure 2 In order to prove the X-Intersection Property, we prove an even more fundamental property: once an omnitig traverses a macronode center, for any node it meets after the center node, there is at most one way of continuing from that node (Y-Intersection Property), see Figure 2(b).The basic intuition is that if there is more than one possibility, then strong connectivity creates forbidden paths.
Given an omnitig f д traversing the bivalent node v, we define the maximal right-micro omnitig as the longest extension f дW in the macronode M v (see Figure 2(b) and Definition 4.6).The maximal left-micro omnitig is the symmetrical omnitigW f д.By Theorem 2.1, there are at most two maximal right-micro omnitigs and two maximal left-micro omnitigs.The merging of a maximal left-and right-micro omnitig on f д is called a maximal microtig (see Figure 2(b) and Definition 4.6; notice that a microtig is not necessarily an omnitig).These at most two maximal microtigs represent "forced tracks" to be followed by omnitigs crossing v.
We now describe how omnitigs can advance from one macronode to another.We prove that any arc having endpoints in different macronodes is a bivalent arc, and moreover, for every maximal microtig ending with a bivalent arc b, there is at most one maximal microtig starting with b.As such, when an omnitig track exits a macronode, there is at most one way of connecting it with Fig. 3. Any maximal omnitig is identified (in solid blue) either by a macrotig interval (from a join arc f to a split arc д; left), or by a bivalent arc b not appearing in any macrotig (right).The full maximal omnitig is obtained by univocal extension (dotted blue), extension which may go outside of the maximal macrotig.
an omnitig track from another macronode.It is natural to merge all omnitig tracks (i.e., maximal microtigs) on all bivalent arcs between different macronodes, and thus obtain maximal macrotigs (Definition 4.14 and Figure 5).The total size of all maximal macrotigs is O (n) (Theorem 4.18), and they are a representation of all maximal omnitigs, except for those that are univocal extensions of the arcs of F , see below and Lemma 5.1.

Algorithms
Our algorithms first build the set M of maximal macrotigs, and then identify maximal omnitigs inside them.The set F of arcs univocally extending to the remaining maximal omnitigs will be the set of bivalent arcs not appearing in M (Lemma 5.1).
Crucial to the algorithms is an extension primitive deciding what new arc (if any) to choose when extending an omnitig (recall that the X-and Y-intersection Properties limit the number of such arcs to one).Suppose we have an omnitig f W , with f a join arc, and we need to decide if it can be extended with an arc д out-going from h(W ).Naturally, this extension can be found by checking that there is no forbidden path from t (д) = h(W ).However, this forbidden path can potentially end in any node of W . Up to this point, [11,56,57] need to do an entire O (m) graph traversal to check if any node of W is reachable by a forbidden path.We prove here a new key property: Theorem 2.2 (Extension Property).Let f W be an omnitig in G, where f is a join arc.Then f W д is an omnitig if and only if д is the only arc with t (д) = h(W ) such that there exists a path from h(д) to h( f ) in G f .Thus, for each arc д with t (д) = h(W ), we can do a single reachability query under one arc removal: "does h(д) reach h( f ) in G f ?"Since the target of the reachability query is also the head of the arc excluded f , then we can apply an immediate consequence of [24]: Theorem 2.3 ( [24]).Let G be a strongly connected graph with n nodes and m arcs.After O (m)-time preprocessing, one can build an O (n)-space data structure that, given a node w and an arc f , tests in Using the Extension Property and Theorem 2.3, we can thus pay O (1) time to check each outgoing arc д, before discovering the one (if any) with which to extend f W .In Appendix A we describe how to transform the graph to have constant degree, so that we pay O (1) per node.This transformation also requires slight changes to the maximal omnitig enumeration algorithm to maintain the linear-time output sensitive complexity.We use the Extension Property when building the left-and right-maximal micro omnitigs, and when identifying maximal omnitigs inside macrotigs, as follows.
Once we have the set M of maximal macrotigs, we scan each macrotig with two pointers, a left one always on a join arc f , and a right one always on a split arc д (see Figure 3 and Algorithm 5).Both pointers move from left to right in such a way that the subwalk between them is always an omnitig.The subwalk is grown to the right by moving the right pointer as long as it remains an omnitig (checked with the Extension Property).When growing to the right is no longer possible, the omnitig is shrunk from the left by moving the left pointer.This technique runs in time linear to the total length of the maximal macrotigs, namely O (n).In Figure 4 we work out all these notions on a concrete example.

Comparison with Previous Techniques
The algorithm of [57] exhaustively extends an omnitig with every arc out-going from its head, as long as the resulting walk remained an omnitig, and did not use any insights on the structure of omnitigs.The O (nm)-time algorithm of [11] was obtained using two structural results: there can be only one left-maximal omnitig ending with a split arc (which we do not use here, since we prove deeper insights on the structure of omnitigs, e.g., the X-and Y-intersection Properties) and the existence of an acyclic order between split arcs connected by "simple" omnitigs (which we use as Lemma 4.16).In [11], these allow computation to be memoized when recursively computing the left-maximal omnitig ending with a given split arc.The two-pointer technique was used also in [2] for a related problem, to test the safety of intervals of an entire solution.Our surprising discovery of macrotigs allow for a "small search space" of total size to O (n), and eliminate the need of recursion, while the Extension Property enables the use of [24], thus the pay of O (1) per omnitig extension, instead of O (m) as in [11,56,57].

BASICS
In this paper, a graph is a tuple G = (V , E), where V is a finite set of nodes, E is a finite multi-set of ordered pairs of nodes called arcs.Parallel arcs and self-loops are allowed.For an arc e ∈ E (G), we denote G e = (V , E {e}).The reverse graph G R of G is obtained by reversing the direction of every arc.In the rest of this paper, we assume a fixed strongly connected graph G = (V , E), with A walk in G is a sequence W = (v 0 , e 1 , v 1 , e 2 , . . .,v −1 , e , v ), ≥ 0, where v 0 , v 1 , . . .,v ∈ V , and each e i is an arc from v i−1 to v i .Sometimes we drop the nodes v 0 , . . .,v of W , and write W more compactly as e 1 . . .e .If an arc e appears in W , we write e ∈ W .We say that W goes from t (W ) = v 0 to h(W ) = v , has length , contains v 1 , . . .,v −1 as internal nodes, starts with e 1 , ends with e , and contains e 2 , . . ., e −1 as internal arcs.A walk W is called empty if it has length zero, and non-empty otherwise.There exists exactly one empty walk ϵ v = (v) for every node v ∈ V , and t otherwise it is open.The concatenation of walks W and W (with h(W ) = t (W )) is denoted WW .A walk W = (v 0 , e 1 , v 1 , . . ., e , v ) is called a path when the nodes v 0 , v 1 , . . .,v are all distinct, with the exception that v = v 0 is allowed (in which case we have either a closed or an empty path).To simplify notation, we may denote a walk W = (v 0 , e 1 , v 1 , . . ., e , v ) as a sequence of arcs, i.e., W = e 1 . . .e .Subwalks of open walks are defined in the standard manner.For a closed walk W = e 0 . . .e −1 , we say that W = e 0 . . .e j is a subwalk of W if there exists i ∈ {0, . . ., − 1} such that for every k ∈ {0, . . ., j} it holds that e k = e (i+k ) mod .
A closed arc-covering walk (i.e., passing through every arc at least once) exists if and only if the graph is strongly connected.We are interested in the (safe) walks that are subwalks of all closed arc-covering walks: Theorem 3.1 ( [57]).Let G be a strongly connected graph different from a closed path.Then a walk W is a subwalk of all closed arc-covering walks of G if and only if W is an omnitig.
Observe that W is an omnitig in G if and only if W R is an omnitig in G R .Moreover, any subwalk of an omnitig is an omnitig.For every arc e, its univocal extension U (e) is an omnitig.A walk W satisfying a property P is right-maximal (resp., left-maximal) if there is no walk W e (resp., eW ) satisfying P. A walk satisfying P is maximal if it is left-and right-maximal w.r.t.P. Notice that if G is a closed path, then every walk of G is an omnitig.As such, it is relevant to find the maximal omnitigs of G only when G is different from a closed path.Thus, in the rest of this paper our strongly connected graph G is considered to be different from a closed path, even when we do not mention it explicitly.

MACRONODES AND MACROTIGS
In this section, unless otherwise stated, we assume that the input graph is compressed, in the sense that it has no biunivocal nodes and arcs.In some algorithms we will also require that the graph has constant in-and out-degree.In Section A we show how these properties can be guaranteed, by transforming any strongly connected graph G with m arcs, in time O (m), into a compressed graph of constant degree and with O (m) nodes and arcs.
In a compressed graph all arcs are split, join or bivalent.Moreover, in compressed graphs, the following observation holds.Observation 1.Let G be a compressed graph.Let f and д be a join and a split arc, respectively, in G.The following holds: The following three technical lemmas will be used in various places throughout the proofs.Lemma 4.1.Every maximal omnitig of a compressed graph contains both a join arc and a split arc.Moreover, it has a bivalent arc or an internal bivalent node.
Proof.Consider an omnitig W composed only of split-free arcs.Notice first that W is a path.Consider any arc e, with h(e) = t (W ) and observe that eW is an omnitig, since the only outgoing arcs of internal nodes of eW are arcs of eW ; thus there is no forbidden path between any two internal nodes of eW .Therefore, W is not a maximal omnitig.Symmetrically, no maximal omnitig is composed only of join-free arcs.This already implies the first claim in the statement: any maximal omnitig W contains at least one join arc f and at least one split arc д.If f = д then W contains the bivalent arc f .Otherwise, either W contains a subwalk of the form f W д or it contains a subwalk of the form дW f , where W might be an empty walk.In the first case W has an internal node which is bivalent, by Observation 1(i).In the second case W contains a bivalent arc, by Observation 1(ii).Proof.By symmetry, we only consider the case of two sibling split arcs д and д .Since prefixes and suffixes of omnitigs are omnitigs, then a minimal violating omnitig would be of the form дZд, with д Z .Since G is strongly connected, then there exists a simple cycle C of G with д ∈ C and with д as its first arc.Notice that д C, since C is simple.Consider then the first node u shared by both C and Z , and let e be the arc of C with h(e) = u.Clearly, e Z ; in addition, e д, since C is a path.Let C u represent the prefix of C ending in u.Therefore, C u is a forbidden path for the omnitig дZд, since it starts from t (д) = t (д ), with д д, and it ends in u with e Z .Lemma 4.3.Let u be a bivalent node.No omnitig contains u twice as an internal node.
Proof.Suppose for a contradiction that there exists an omnitig W that contains u twice as internal node.Since u is an internal node of W , we can distinguish the case in which an omnitig contains twice a central-micro omnitig that traverses u, and the case in which an omnitig contains both the central-micro omnitigs that traverse u.In the first case, let f д be the central-micro omnitig of an omnitig W that traverses u.Notice that f is a join arc contained twice in W , contradicting Lemma 4.2.In the latter case, let f 1 д 1 and f 2 д 2 the two central-micro omnitigs that traverse u, with f 1 f 2 and д 1 д 2 .Consider W to be a minimal violating omnitig of the form f 1 д 1 W f 2 д 2 .Notice that u W , by minimality; hence д 1 W f 2 is a forbidden path, contradictingW being an omnitig.

Macronodes
We now introduce a natural partition of the nodes of a compressed graph; each class of such a partition (i.e., a macronode) contains precisely one bivalent node.We identify each class with the unique bivalent node they contain.All other nodes belonging to the same class are those that either reach the bivalent node with a join-free path or those that are reached by the bivalent node with a split-free path (recall Figure 2(a)).

Definition 4.4 (Macronode).
Let v be a bivalent node of G. Consider the following sets: Lemma 4.5.In a compressed graph G, the following properties hold: Proof.For i), by definition every node belongs to at least one macronode.Let u and v be distinct bivalent nodes and suppose for a contradiction that there exists x ∈ V (M u ) ∩ V (M v ).W.l.o.g., assume x is a join node (the case where x is a split node is symmetric).By definition, x ∈ R − (u) ∩ R − (v) holds.Let P u and P v be split-free paths from x to u and to v, respectively.Notice that x cannot be a bivalent node, since otherwise from x no split-free path can start.Since the out-degree of x is one, P u and P v share a prefix of length at least one, but since u and v are distinct bivalent nodes, P u and P v differ by at least one arc.Let e be the first arc such that e ∈ P u , but e P v , and let e be its sibling arc, with e P u , but e ∈ P v .Notice that t (e) = w is a join node, since it belongs to split-free paths, but it also has out-degree two, since w = t (e) = t (e ); hence w is an internal bivalent node of split-free paths, a contradiction.
Properties ii) and iii) trivially follow from the definition of macronode.
To analyze how omnitigs can traverse a macronode and the degrees of freedom they have in choosing their directions within the macronode, we introduce the following definitions.Centralmicro omnitigs are the smallest omnitigs that cross the center of a macronode.Left-and right-micro omnitigs start from a central-micro omnitig and proceed to the periphery of a macronode.Finally, we combine left-and right-micro omnitigs into microtigs (which are not necessarily omnitigs themselves); recall Figure 2(b).

Definition 4.6 (Micro Omnitigs, Microtigs).
Let f be a join arc and д be a split arc, such that f д is an omnitig.
• The omnitig f д is called a central-micro omnitig.
• An omnitig f дW (W f д, resp.) that does not contain a bivalent arc as an internal arc is called a right-micro omnitig (respectively, left-micro omnitig).
where W 1 f д and f дW 2 are, respectively, a left-micro omnitig, and a right-micro omnitig, is called a microtig.
Given a join arc f , we first find central micro-omnitigs (of the type f д) with the generic function RightExtension(G, f ,W ) from Algorithm 1, where W is a join-free path (possibly empty).This extension uses the following weak version (since W is join-free) of the Extension Property.To build up the intuition, we also give a self-contained proof of this weaker result.Lemma 4.7 (Weak form of the Extension Property (2.2)).Let f W be an omnitig in G, where f is a join arc and W is a join-free path.Then f W д is an omnitig if and only if д is the only arc with t (д) = h(W ) such that there exists a path from h(д) to h( f ) in G f .Proof.To prove the existence of an arc д, which satisfies the condition, consider any closed path P f in G, where f is an arbitrary sibling join arc of f .Notice that W is a prefix of P f , since f W is an omnitig, since otherwise one can easily find a forbidden path for the omnitig f W as a subpath of P f , from the head of the very first arc of P f that is not in W to h( f ).Therefore, let д be the the first arc of P f after the prefix W , in such a way that the suffix of P f starting from For the direct implication, assume that there is a path P in G f from h(д ), where д sibling of д and д д, to h( f ).Then, this forbidden path P contradicts the fact that f W д is an omnitig.
For the reverse implication, assume that f W д is not an omnitig.Then take any forbidden path P for f W д. Since f W is an omnitig, P must start with some д sibling arc of д, д д.Since W is join-free, then P must end in h( f ) with the last arc different from f .Therefore, P is a path from h(д ) to h( f ) in G f .Not only Lemma 4.7 gives us an efficient extension mechanism, but it also immediately implies the Y-Intersection Property (for clarity of reusability, we state both its symmetric variants).Corollary 4.8 (Y-Intersection Property).Let f W д be an omnitig, where f is a join arc, and д is a split arc.
Proof.For RightExtension(G, f ,W ), recall Lemma 4.7 and Theorem 2.3 and that the input graph is a compressed graph, and as such every node has constant degree.
For MaximalRightMicroOmnitig(G, f , д), notice that every iteration of the while loop increases the output by one arc and takes constant time, since RightExtension(G, f ,W ) runs in O (1) time.
Algorithm 2 is the procedure to obtain all maximal microtigs of a compressed graph.It first finds all central micro-omnitigs f д (with RightExtension(G, f , ∅)), and it extends each to the right (i.e., forward in G) and to the left (i.e., forward in G R ) with MaximalRightMicroOmnitig.To prove its correctness, we need to show some structural properties of micro-omnitigs and microtigs, as follows.(i) There exists at most one maximal right-micro omnitig f дW , and at most one maximal leftmicro omnitig W f д. (ii) There exists a unique maximal microtig containing f д.
Proof.We prove only the first of the two symmetric statements in i).If д is a bivalent arc, the claim trivially holds by definition of maximal right-micro omnitig.Otherwise, a minimal counterexample consists of two right-micro omnitigs f дPд 1 and f дPд 2 (with P a join-free path possibly empty), with д 1 and д 2 distinct sibling split arcs.Since дP is a join-free path, the fact that both f дPд 1 f дPд 2 are omnitigs contradicts the Y-Intersection Property (Corollary 4.8).
For ii), given f д, by i) there exists at most one maximal left-micro omnitigW 1 f д and at most one maximal right-micro omnitig f дW 2 , as such there is at most one maximal microtig W 1 f дW 2 .Lemma 4.11.Let e be an arc.The following hold: (i) If e is not a bivalent arc, then there exists at most one maximal microtig containing e. (ii) If e is a bivalent arc, there exist at most two maximal microtigs containing e, of which at most one is of the form eW 1 , and at most one is of the form W 2 e.
Proof.By symmetry, in i) we only prove the case in which e is a split-free arc.Notice that by Lemma 4.5, h(e) belongs to a uniquely determined macronode M u of G; let P be the split-free path in G, from h(e) to u.Let f be the last arc of eP (f = e if P is empty).By the X-Intersection Property (Theorem 2.1), there exists at most one split arc д with t (д) = u = h( f ) such that f д is an omnitig; if it exists, f д is a central-micro omnitig, hence by Lemma 4.10, there is at most one maximal leftmicro omnitig W f д.Finally, if such a maximal left-micro omnitig exists, ePд is a subwalk of W f д, by the Y-Intersection Property (Corollary 4.8).Otherwise, a minimal counterexample consists of paths f 1 Rд (subpath of ePд) and f 2 Rд (subpath of W f д), where f 1 f 2 and R is a split-free path, since it is subpath of the split-free path eP; since both f 1 Rд and f 2 Rд are omnitigs, this contradicts the Y-Intersection Property.
For ii), we again prove only one of the symmetric cases.The proof is identical to the above, since by Lemma 4.5, h(e) belongs to a unique macronode M v 1 of G.As such, e belongs to at most one maximal microtig eW 1 in M v 1 .Symmetrically, t (e) belongs to a uniquely determined macronode M v 2 of G. Thus, e belongs to at most one maximal microtig W 2 e within M v 2 .Theorem 4.12 (Maximal Microtigs).The maximal microtigs of any strongly connected graph G with n nodes, m arcs, and arbitrary degree have total length O (n), and can be computed in time O (m).
Proof.First we prove the O (n) bound on the total length.As we explain in Appendix A we can transform G into a compressed graph G such that G has n ≤ n nodes and m ≤ m arcs.
Since G has at most n macronodes (recall that macronodes partition the vertex set, Lemma 4.5), and every macronode has at most two maximal microtigs, then number of maximal microtigs is at most 2n .The total length of all maximal microtigs is bounded as follows.Every internal arc of a maximal microtig is not a bivalent arc, by definition.Since every non-bivalent arc appears in at most one maximal microtig (Lemma 4.11), and there are at most n non-bivalent arcs in any graph with n nodes, then the number of internal arcs in all maximal microtigs is at most n .Summing up for each maximal microtigs its two non-internal arcs (i.e., its first and last arc), we obtain that the total length of all maximal microtigs is at most 2n + n = 3n , thus O (n).
As mentioned, in Appendix A we show how to transform G (by applying Transformation 1 and its symmetric) into a compressed graph G with O (m) arcs, O (m) nodes, and constant degree.On this graph we can apply Algorithm 2. Since every node of the graph has constant degree, the if check in Line 6 runs a number of times linear in the size O (m) of the graph.Checking the condition in Line 6 takes constant time, by Lemma 4.9; in addition, the condition is true for every centralmicro omnitig f д of the graph.The then block computes a maximal microtig and takes linear time in its size (Lemma 4.9).By Lemma 4.11 we find every microtig in linear total time.

Macrotigs
In this section we analyze how omnitigs go from one macronode to another.Macronodes are connected with each other by bivalent arcs (Lemma 4.5), but merging microtigs on all possible bivalent arcs may create too complicated structures.However, this can be avoided by a simple classification of bivalent arcs: those that connect a macronode with itself (self-bivalent) and those that connect two different macronodes (cross-bivalent), recall Figure 4.

Definition 4.13 (Self-bivalent and Cross-bivalent Arcs).
A bivalent arc b is called a self-bivalent arc if U (b) goes from a bivalent node to itself.Otherwise it is called a cross-bivalent arc.
A macrotig is now obtained by merging those microtigs from different macronodes which overlap only on a cross-bivalent arc, see also Figure 5. Notice that no arc is contained twice in M, with the exception of the selfbivalent arc b 1 , appearing as the first and last arc of M. Bivalent nodes (e.g., u, v) can appear (at most) twice in M, by the X-Intersection Property and Lemma 4.17.

Definition 4.14 (Macrotig).
Let W be any walk.W is called a macrotig if: (1) W is an microtig, or (2) By writing Notice that the above definition does not explicitly forbid two different macrotigs of the form W 0 bW 1 and W 0 bW 2 .However, Lemma 4.11 shows that there cannot be two different microtigs bW 1 and bW 2 , thus we immediately obtain the following result.Lemma 4.15.For any macrotig W there exists a unique maximal macrotig containing W .
Proof.W.l.o.g., a minimal counterexample consists of a non-right-maximal macrotig W b, such that there exist two distinct microtigs bW 1 and bW 2 (notice that b is a cross-bivalent arc).By Lemma 4.11 applied to b, we obtain bW 1 = bW 2 , a contradiction.
The macrotig definition also does not forbid a cross-bivalent arc to be used twice inside a macrotig.In Lemma 4.17 below we prove that also this is not possible, using the following result.Lemma 4.16 ([11]).For any two distinct non-sibling split arcs д, д , write д ≺ д if there exists an omnitig дPд where P is split-free.Then, the relation ≺ is acyclic.Lemma 4.17.Let W be a macrotig and let e be an arc of W .If e is self-bivalent, then e appears at most twice in W (as first or as last arc of W ). Otherwise, e appears only once.
Proof.If e is self-bivalent, then Definition 4.14 implies that e is either the first arc of W , the last arc of W , or both.Thus, e appears at most twice.
Suppose now that e is not self-bivalent.We first consider the case when e is a split arc.We are going to prove that between any two consecutive non-self-bivalent split arcs the relation ≺ from Lemma 4.16 holds.Indeed, let д and д be two consecutive (i.e., closest distinct) non-self-bivalent split arcs along W : that is дPд subwalk of W , with P a split-free path.Notice that д and д are not sibling arcs; since otherwise, д is a self-bivalent arc, by Observation 1.If t (д ) is not a bivalent node, then P is empty.In this case, д is a join-free arc, so дд is an omnitig; as such, д ≺ д .Otherwise, if t (д ) is a bivalent node, then дPд is a left-micro omnitig and so it is an omnitig; as such, again, д ≺ д .Proof.To prove i), let u be an internal bivalent node of W , and let f u and д u be, respectively, the join arc and the split arc of W with h( f u ) = u = t (д u ); both such f u and д u exist, since u is an internal node of W .Therefore, since W contains at least f u and д u , let f and д be, respectively the first join arc and the last split arc of W . Observe that f is either f u or it appears before f u in W ; likewise, д is either д u or it appears after д u in W . Thus, f comes before д, and we can write W = W − f W дW + , where W is the subwalk of W , possibly empty, from h( f ) to t (д).Therefore, by the maximality of W , we have To prove that the subwalk f W д of W is a macrotig, we prove by induction that any walk of the form f W д, where f is a join arc and д is a split arc, is a macrotig.The induction is on the length of W .
Case 1: W contains no internal bivalent arcs.Since f W д contains a bivalent node (Observation 1), it is of the form an microtig and thus it is a macrotig, by definition.Case 2: f W д contains an internal bivalent arc b, i.e., f W д = W 1 bW 2 , withW 1 ,W 2 non empty.
By induction, W 1 b and bW 2 are macrotigs and both contain a bivalent node as internal node.Suppose b is a self-bivalent arc, then both W 1 b and bW 2 would contain the same bivalent node u as internal node, contradicting Lemma 4.3.Thus, b is a cross-bivalent arc and W 1 bW 2 is also a macrotig, by definition.
For ii), notice that if W contains no internal bivalent node then it contains a unique bivalent arc b, by Lemma 4.1 and Observation 1.Thus, by the maximality of W , it holds that W = U (b).It remains to prove that there is no macrotig containing b.
Suppose for a contradiction that there is a maximal left-micro omnitig M containing b.By definition, M is of the form bW M f M д M .Notice that W д M is an omnitig, because M is an omnitig and the arcs of W before b are join-free, so W д M can have no forbidden path.This contradicts the fact that W is maximal.
Symmetrically, we have that there is no maximal right-micro omnitig containing b.Thus, by definition, b appears in no microtig, and thus in no macrotig.
Remark 1.The number of maximal omnitigs containing an internal bivalent node (i.e., univocal extensions of a maximal macrotig subwalk) is O (n), by maximality and by the fact that the total length of maximal macrotigs is O (n) (Theorem 4.18).
Next, we are going to prove the second, algorithmic, part of Theorem 1.2.By Theorem 4.18 we can compute the maximal macrotigs of G in time O (m).We can trivially obtain in O (m) time the set F of arcs not appearing in the maximal macrotigs.It remains to show how to obtain the subwalks of the maximal macrotigs univocally extending to maximal omnitigs.
We first prove an auxiliary lemma needed for the proof of the Extension Property (Theorem 2.2).
Lemma 5.2.Let f W be an omnitig, where f is a join arc.Let P be a path from t (P ) = h(W ) to a node in W , such that the last arc of P is not an arc of f W . Then no internal node of P is a node of W .
Proof.Consider P W the longest suffix of P, such that no internal node of P W is a node of W .If P W = P, the lemma trivially holds.Let now W = (u 0 , e 1 , u 1 , e 2 , . . ., e k , u k ).Let u i = t (P W ) and u j = h(P W ). If i ≥ j, then P W is a forbidden path for f W ,a contradiction.Hence, assume i < j < k.Let f W Q be a closed path.Consider the walk Z = P W e j+1 . . .e k Q.Notice that e i+1 Z and f Z .Thus Z can transformed in a forbidden path for f W , from u i to h( f ).Theorem 2.2 (Extension Property).Let f W be an omnitig in G, where f is a join arc.Then f W д is an omnitig if and only if д is the only arc with t (д) = h(W ) such that there exists a path from h(д) to h( f ) in G f .Proof.As seen in Lemma 4.7, at least one д exists which satisfies the condition.Assume д is a split arc, otherwise the statement trivially holds.
First, assume that there is a д sibling split arc of д and a path P from h(д ) to h( f ) in G f .We prove that there exists a forbidden path for f W д. Let P W be the prefix of P ending in the first occurrence of a node in W (i.e., no node of P W belongs to W , except for h(P W )). Notice that д P W is a forbidden path for the omnitig f W д (it is possible, but not necessary, that h(P W ) = h( f )).
Second, take any forbidden path P for the omnitig f W д. We prove that there exists a д sibling split arc of д and a path from h(д) to h( f ) in G f .Notice that t (P ) = h(W ) = t (д), otherwise P would be a forbidden path for f W .As such, P starts with a split arc д д and, by Lemma 5.2, P does not contain f .Thus, the suffix of P from h(д ) is a path in G f from h(д ) to h( f ).
To describe the algorithm that identifies all maximal omnitigs (Algorithm 5), we first introduce an auxiliary procedure (Algorithm 4), which uses the Extension Property (Theorem 2.3) and Theorem 2.2 to find the unique possible extension of an omnitig.Maximal omnitigs are identified with a two-pointer scan of maximal macrotigs (Algorithm 5): a left pointer always on a join arc f and a right pointer always on a split arc д, recall Figure 3.For the sake of completeness, we write Algorithm 5 so that it also outputs the maximal omnitigs.In Appendix A. 3 we explain what changes are needed when the graph does not have constant degree.Lemma 5.4 (Maximal Omnitig Enumeration).Algorithm 5 is correct and, if the compressed graph has constant degree, it runs in time linear in the total size of the graph and of its output.
Proof.Algorithm 3 returns every maximal macrotig in O (m) time, by Theorem 4.18.By Lemma 5.1, any maximal omnitigW is either of the form U ( f W д) (where f W д is a macrotig, and thus also a subwalk of a maximal macrotig, by Lemma 4.15), or of the form W = U (b), where b is a bivalent arc not appearing in any macrotig.
In the latter case, such omnitigs are outputted in Line 2. In the former case, it remains to prove that the external while cycle, in Line 5, outputs all the maximal omnitigs of the form U ( f W д) where f W д is contained in a maximal macrotig f * Xд * .
At the beginning of the first iteration, W = U (X [f ..д ]) is left-maximal since f = f * .The first internal while cycle, in Line 6, ensures thatW = U (X [f ..д]) is also right-maximal, at which point it

m}).
Notice that their total length is Θ(m 2 ), thus one cannot enumerate all maximal omnitigs of G and convert these to maximal omnitigs of G.However, one can stop all univocal extensions of the arcs e i when reaching arcs introduced by the transformations in G , see Appendix A.3.
is O (m) and the number of arcs of G is still O (m), where m is the number of arcs of the original graph.
The trivial strategy to obtain all maximal omnitigs of G is to enumerate all maximal omnitigs of G , and from these contract all the new arcs introduced by the transformation (while also removing duplicate maximal omnitigs, if necessary).However, thus may invalidate the linear-time complexity of the enumeration step, since the length of the maximal omnitigs of G may be super-linear in total maximal omnitig length of G, see Figure 7.In Appendix A.3 we explain how we can easily modify the maximal omnitig enumeration step to maintain the O (m) output-sensitive complexity.
To prove the correctness of Transformation 1, we proceed as follows.Let c e (G) be the graph obtained from G by contracting an arc e (contracting e means that we remove e and identify its endpoints).For every walk W of G, we denote by c e (W ) the walk of c e (G), obtained from W by removing every occurrence of e (here we regard walks as sequences of arcs).In the following, we regard c e as a surjective function from the family of walks of G to the family of walks of c e (G).Proof.Let W be a maximal omnitig of G. Then c e (W ) is an omnitig of c e (G) by Lemma A.1.Moreover, if W was an omnitig of c e (G) strictly containing c e (W ), then there would exist an omnitig W of G such that W = c e (W ), by Lemma A.1.Clearly, W would contain W and contradict its maximality.Therefore, c e (W ) is a maximal omnitig of c e (G).
For the converse, let W be a maximal omnitig of c e (G).Let W be the shortest and unique minimal walk of G such that W = c e (W ).By Lemma A.1, W is an omnitig of G. Let W be any maximal omnitig of G containing W .We claim that c e (W ) = W = c e (W ), which concludes the proof.If not, then c e (W ) would strictly contain W and contradict its maximality since also c e (W ) would be an omnitig of c e (G) by Lemma A.1.
Lemma A.3.Let G be a graph and let G be the graph obtained by applying Transformation 1 to G. Then a walk W of G is a maximal omnitig of G if and only if there exists a maximal omnitig W of G such that W is the string obtained from W by suppressing all the arcs introduced with the transformation.
Proof.Notice that G is obtained by applying c e to each arc e introduced by Transformation 1, that is, to each arc of G that is not an arc of G. Notice that W is the string obtained from W by suppressing all the arcs introduced with the transformation if and only if W is obtained from W by contracting each arc e introduced by Transformation 1. Apply Corollary A.2.

A.2 Compression
We start by recalling the definition of compressed graph.

Definition A.4 (Compressed Graph). A graph G is compressed if it contains no biunivocal nodes and no biunivocal arcs.
To obtain a compressed graph, we introduce two transformations.The first one removes biunivocal nodes, by replacing those paths whose internal nodes are biunivocal with a single arc from the tail of the path to its head (see Figure 8 for an example).
The last transformation contracts the biunivocal arcs of the graph (see Figure 8 for an example).
Transformation 3. Given G, we contract every biunivocal arc e, namely we set t (e ) = t (e) for every out-going arc from h(e) and remove the node h(e).Also this transformation preserves the maximal omnitigs of G because every maximal omnitig which contains an endpoint of e, also contains e.Notice that after Transformations 2 and 3, the maximum in-degree and the maximum out-degree are the same as in the original graph.

A.3 Maximal Omnitig Enumeration for Non-Constant Degree
Given the input strongly connected graph G with m arcs, and non-constant degree, denote by G the graph with constant in-degree and out-degree obtained by applying Transformation 1 and its symmetric.The trivial strategy to obtain the set of maximal omnitigs of G, given the set of maximal omnitigs G , is to: (1) Contract in the maximal omnitigs all the arcs which were introduced by Transformation 1.
(2) Remove any duplicate omnitig which may occur due to this contraction (i.e., two different maximal omnitigs in G which result in the same walk in the G, after the contraction).
In general, the above procedure may require more than linear time in the final output size, recall Figure 7.
We avoid this, as follows.Let M and M denote the set of maximal macrotigs of G and G , respectively, and let F and F denote the set of bivalent arcs not appearing in any macrotig, of G and G , respectively (recall Theorem 1.2).
First, since G has O (m) nodes and arcs, by Transformation 1, then also the maximal macrotigs M have total length O (m), and both M and F can be obtained in O (m) time (Theorem 4.18).From M , one can obtain M in time O (m), by contracting the arcs introduced by the transformation.However, while contracting such arcs, we must keep track of the pair of arcs ( f , д) corresponding to maximal omnitigs, as follows.
We modify Algorithm 5 to also report, for each macrotig X of G and for each maximal omnitig of the form U (X [f ..д]) (in the order they were generated by the algorithm), the indexes of the arcs f and д in X .We now contract the arcs of X by removing from X every occurrence of the arcs introduced by the transformation, and updating the indexes of f and д so that they still point at the first and last arc of the walk obtained from X [f ..д], after the contraction.Second, to avoid duplicates, we scan the pair of indexes of f and д along each macrotig, and remove any duplicated pair (if duplicates are present, they must occur consecutively, and thus they can be removed in linear time).
Second, the transformations do not introduce bivalent arcs, thus F = F .This also implies that the arcs introduced by the transformation appear either inside macrotigs, or inside univocal extensions U (•).Having the set of maximal macrotigs M and the new arc pairs ( f , д) inside the maximal macrotigs in M, it now suffices to perform the univocal extensions U (•) inside the original graph G.

Fig. 4 .
Fig. 4. A concrete example of the main notions of this paper.In Figures 4(b) to 4(d) walks have different colors for visual distinguishability.

Lemma 4 . 2 .
Let e be a join or a split arc.No omnitig can traverse e twice.
where b 1 , . . .,b k are all the internal bivalent arcs of W , the following conditions hold: (a) the arcs b 1 , . . .,b k are all cross-bivalent arcs, and (b)

(
ii) Otherwise, W is of the form U (b), where b is a bivalent arc, and b does not belong to any macrotig.

ALGORITHM 4 :
Function IsOmnitigRightExtension 1 Function IsOmnitigRightExtension(G, f , д) Input :The compressed graph G.A join arc f and a split arc д such that there exists a walk f W д where f W is an omnitig.Output : Whether f W д is also an omnitig.

2 S 3 return
← {д ∈ E (G) | t (д ) = t (д) and there is a path from h(д ) to h( f ) in G f } True if S = {д} and False otherwise Corollary 5.3.Algorithm 4 is correct.Moreover, assuming that the graph has constant degree, we can preprocess it in O (m) time, so that Algorithm 4 runs in constant time.

Fig. 7 .
Fig. 7. Left: A graph G made up of a single node and m ≥ 3 self-loops e 1 , . . ., e m .Its m maximal omnitigs are e 1 , . . ., e m .Right: The graph G obtained from G by applying Transformation 1 and its symmetric transformation; the nodes of G have in-degree and out-degree at most 2. Notice that the number of arcs of G is O (m).The m maximal omnitigs of G are of the form U (e i ) = e 1 • • • e i−1 e i e i−1 • • • e 1 (for i ∈ {1, . . .,m}).Notice that their total length is Θ(m 2 ), thus one cannot enumerate all maximal omnitigs of G and convert these to maximal omnitigs of G.However, one can stop all univocal extensions of the arcs e i when reaching arcs introduced by the transformations in G , see Appendix A.3.

Observation 2 .
When e is a split-free or join-free arc, then c e is a bijection when restricted to the closed (arc-covering) walks, or to the open walks of G whose first and last arc are different than e.Lemma A.1.Let e be a join-free arc of G.A walk W of c e (G) is an omnitig of c e (G) if and only if there exists an omnitig W of G such that W = c e (W ).Proof.Consider the shortest walk W of G such that W = c e (W ).Notice that the first and last arc of W are different than e.Moreover, W is an omnitig of c e (G) iff W is an omnitig of G. Indeed, for every circular covering C of G it holds that C avoids W iff c e (C) avoids W . Corollary A.2. Let e be a join-free arc of G.A walk W of c e (G) is a maximal omnitig of c e (G) if and only if there exists a maximal omnitig W of G such that W = c e (W ).
induce two trees with common root v, but oriented in opposite directions.Except for the common root, the two trees are node-disjoint, all nodes in R − (v) being join nodes and all nodes in R + (v) being split nodes.(iii) The only arcs with endpoints in two different macronodes are bivalent arcs.