Generalized Core Spanner Inexpressibility via Ehrenfeucht-Fraïssé Games for FC

Despite considerable research on document spanners, little is known about the expressive power of generalized core spanners. In this paper, we use Ehrenfeucht-Fraïssé games to obtain general inexpressibility lemmas for the logic FC (a finite model variant of the theory of concatenation). Applying these lemmas give inexpressibility results for FC that we lift to generalized core spanners. In particular, we give several relations that cannot be selected by generalized core spanners, thus demonstrating the effectiveness of the inexpressibility lemmas. As an immediate consequence, we also gain new insights into the expressive power of core spanners.


INTRODUCTION
Ehrenfeucht-Fraïssé games are a fundamental tool for establishing inexpressibility results in finite-model theory.In this paper, we develop techniques to use Ehrenfeucht-Fraïssé games for the logic FC and use these to gain insights into the expressive power of generalized core spanners.Document Spanners.SystemT is a rule-based information extraction system, developed by IBM, which includes a declarative query language called Annotation Query Language (AQL)see [2,18,19].Fagin, Kimelfeld, Reiss, and Vansummeren [5] developed a formal framework for information extraction called document spanners (or just spanners), which captures the core functionality of AQL.
One can define the process of querying a word with a spanner in a two-step process.First, so-called extractors obtain relations of intervals (or spans) from the text.For the purposes of this introduction, we assume these extractors to be regex formulas, which are regular expressions with capture variables.For example, if someone wanted to find all occurrences of common misspellings in a text document, they could consider the regex formula Then, extracts a unary relation of spans within the given text document where one of the misspellings occur.
Secondly, the extracted relations are combined using a relational algebra.This paper is concerned with the class of generalized core spanners, which allow for ∪ (union), (projection), ⊲⊳ (natural join), \ (difference), and = (equality selection).While union, projection, natural join, and difference are defined as one might expect (assuming prior knowledge of relational algebra), equality selection is more text specific.The equality selection = , only chooses those tuples for which and are mapped to spans that represent the same factor of the input word (at potential different locations).
A word relation ⊆ (Σ * ) , for some ∈ N, is selectable by generalized core spanners if we add an selection operator (e.g., ) to our relational algebra without increasing the expressive power.One fundamental question regarding generalized core spanners is what relations are selectable?In other words, what is the expressive power of generalized core spanners?For example, from Theorem 5.14 of [9], we know that length equality len , (i.e., only keep those tuples for which the span that is mapped to has the same length as the span that is mapped to) cannot be expressed by generalized core spanners.
Ehrenfeucht-Fraïssé Games and FC.For relational first-order logic (and therefore relational algebra), one of the fundamental tools for inexpressibility are Ehrenfeucht-Fraïssé Games; for example, see [3,15,20].To use Ehrenfeucht-Fraïssé games in order to yield inexpressibility results for generalized core spanners, we need a suitable logic.For this task, we use FC; a logic that was introduced by Freydenberger and Peterfreund [9] as a finite-model variant on the theory of concatenation.For this paper, we define this logic slightly differently to [9].First, we treat every word ∈ Σ * as a relational structure consisting of a universe that contains all factors of , a ternary concatenation relation • where ( , , ) ∈ • if and only if , , and are all factors of where = • , and constant symbols for every a ∈ Σ and for the emptyword .Then, FC is simply a first-order logic over these structures.As syntactic sugar, we use ( = • ) rather than • ( , , ) for atomic FC-formulas.As an example, consider the formula := ∀ : ¬( = ) → ¬∃ , : This formula states, that for all factors where ≠ does not hold (all non-empty factors), we have that there does not exist a factor where = .In other words, defines those words ∈ Σ * that do not contain where is a non-empty word.If we consider Ehrenfeucht-Fraïssé games over two structures that represent a word ∈ Σ * , we can gain inexpressibility results for FC.Then, using techniques from combinatorics on words (namely, commutation), we are able to lift these inexpressibility results to generalized core spanners.Related Work.Fagin et al. [5] introduces three main classes of document spanners.The regular spanners (regex formulas extended with projection, union, and natural join) are perhaps the most intensively studied class with a sizeable amount of research on enumeration algorithms [1,6,27].When it comes to expressive power, Fagin et al. [5] showed that regular spanners cannot go beyond recognizable relations.Core spanners extend regular spanners with equality selection, allowing for more expressive power at the cost of tractability, see e.g.[7][8][9].With regards to their expressive power, [5] gave the core-simplification lemma, allowing one to simplify core spanners into a normal form in order to study their expressive power.Freydenberger [7] uses inexpressibility techniques from word equations (namely, results in Karhumäki, Mignosi, and Plandowski [17]) to obtain an inexpressibility lemma for core spanners, which yields relations that cannot be selected by core spanners.
When it comes to generalized core spanners, the techniques from core spanners do not work.As far as the authors are aware, only two inexpressibility results for generalized core spanners exist in literature: First, Peterfreund, ten Cate, Fagin, and Kimelfeld [24] showed that over a unary alphabet, only semi-linear languages can be expressed by generalized core spanners.Then, Freydenberger and Peterfreund [9] used the Feferman-Vaught theorem to show that the language containing those words := a b for ∈ N is not expressible by generalized core spanners.However, the use of the Feferman-Vaught theorem in [9] relies on the limited structure of the language a b , and it is unclear how to generalize this proof technique.Note that from the data complexity of model checking FC-formulas, we know that it cannot express languages outside of LOGSPACE, see [9].While this observation might not be considered particularly enlightening from an inexpressibility point of view, we state this for completeness sake.
The expressive power of other classes of spanners (that is, apart from regular spanners, core spanners, and generalized core spanners) has also been considered.For example, see [23,24,26,29].
Issues with Standard Techniques.Beyond toy examples, Ehrenfeucht-Fraïssé games are difficult to play, and usually require rather involved combinatorial arguments.Therefore, many techniques have been developed in order to achieve sufficient criteria that give Duplicator a winning strategy (see [4] for a survey).For example, locality is often used to show that Duplicator has a winning strategy.Unfortunately, these locality results often fail for non-sparse structures.The structures we look at in the present paper are very non-sparse, and therefore we cannot simply apply locality results in order to gain inexpressibility tools.
Another useful tool is the Feferman-Vaught theorem.Freydenberger and Peterfreund [9] used this result to prove that a b is not an FC-language, which then implies that length selection is unattainable for generalized core spanners.To this end, [9] defines FO[EQ], which extends first-order logic over a linear order with symbol predicates with a built-in equality relation.Then, we can decompose these structures that represent words of the form a b into two substructures (one for a and one for b ).Finally, the Feferman-Vaught theorem can be invoked to prove that a b is not an FC language.However, this proof idea does not generalize beyond words which can be decomposed into disjoint sections (at least not without a sizeable amount of extra machinery).
Structure of the Paper.Section 2 gives notational conventions and definitions that are used throughout this article.In Section 3, we define Ehrenfeucht-Fraïssé games for FC, and give some basic results.The main technical contributions are given in Section 4, where we work towards the Fooling Lemma for FC -a general tool for showing a language cannot be expressed in FC.On the way to the Fooling Lemma, we give two important lemmas called the Pseudo-Congruence Lemma and the Primitive Power Lemma.We then use the Fooling Lemma in Section 5 to obtain various relations that are not selectable by generalized core spanners.Due to space constraints, most of the proofs can be found in the appendix.

PRELIMINARIES
Let N := {0, 1, 2, . . .} and let N + := N \ {0} where \ denotes set difference.For ≥ 1, we use [ ] for {1, 2, . . ., }.The cardinality of a set is denoted by | |.For a vector ì ∈ for some set and ∈ N, we write ∈ ì to denote that is a component of ì .We use Σ for a fixed and finite alphabet of terminal symbols and we use Ξ for a countably infinite set of variables.
2 where , 1 , 2 , 3 ∈ Σ * , then we call 1 a prefix of , 2 a factor , and 3 a suffix of .If 1 ≠ , then 1 is a strict prefix of ; and likewise, if 3 ≠ , then 3 is a strict suffix of .We denote that 2 is a factor of by 2 ⊑ and if 2 ≠ also holds, then 2 ⊏ .For ∈ Σ * , we write the set of all factors of as Facs( ) := { ∈ Σ * | ⊑ }.We use | | to denote the length of a word ∈ Σ * , and for some a ∈ Σ, we use | | a to denote the number of occurrences of a within .For ∈ Σ * , let ∈ Σ * denote the word that consists of repetitions of .We always assume 0 = .
A word ∈ Σ + is called imprimitive if = for some ∈ Σ * and > 1.We always assume to be imprimitive.If ∈ Σ + is not imprimitive, then is primitive.That is, ∈ Σ + is primitive if for all ∈ Σ + , we have that = implies = .
The logic FC.We introduce a definition of FC that is slightly more technical than the original one from [9], and more suitable for defining games.The differences, and why they do no affect the expressive power, are discussed at the end of this section.
For the purposes of this paper, we consider one fixed signature Σ := { • , a 1 , . . ., a , } for every terminal alphabet Σ := {a 1 , . . ., a }, where • is a ternary relation symbol and where and a for every ∈ [ ] are constant symbols.Given a word ∈ Σ * , let := ( , • , a 1 , . . ., a , ) be the Σ -structure that represents ∈ Σ * as follows: , and a =⊥ otherwise, and One can think of ⊥ as a null value; however, we usually deal with those words where | | a ≥ 1 for all a ∈ Σ.Therefore, due to the nature of the constant symbols used in Σ , we often do not distinguish between a constant symbol a ∈ Σ , and the terminal symbol a ∈ Σ.In other words, when the structure is clear from context, we use a ∈ Σ rather than a ∈ .An FC-formula is a first-order formula, where the atomic formulas are of the form • ( , , ) for variables or constants , , and .As syntactic sugar, we use ( = • ) as FC atomic formulas, as we always interpret • as concatenation.More formally: Definition 2.1.We define FC, the set of all FC-formulas, recursively as: We use the usual first-order logic definition of bound variables and free variables.If is a sentence (that is, there are no free variables in ), then we simply write |= .Furthermore, for any ∈ Σ * , there is a unique Σ -structure that represents .We can therefore define the language of an FC-sentence.

We write
( ) to denote the set of all mappings such that ( , ) |= , where is the Σ -structure that represents ∈ Σ * .Note that for simplicity, we assume that for any ∈ ( ), the domain of is exactly the free variables of .
For a word relation ⊆ (Σ * ) , we say that is definable in FC if there exists ∈ FC with free variables 1 , 2 , . . ., such that for any ∈ Σ * , we have ∈ ( ) if and only if We say that such a formula defines .
Example 2.4.Consider the following FC-formula: This formula states that there is no factor that is a concatenation of ( ) and a non-empty word ( 2 ).Thus, for ∈ Σ * , it must be that ( ) = { } where ( ) = .Note that we use We can use ( ) to define the following sentence: It follows that |= if and only if is a Σ -structure that represents for some ∈ Σ * .Now consider ( , ) := ( = ).If follows that defines the relation copy := {( , ) ⊆ (Σ * ) 2 | = }.Since we know that copy can be defined in FC, we can use it as an additional atomic formula without changing the expressive power of FC.Furthermore, it is rather straightforward to generalize this relation to the FC definable relation -copies := {( , ) ∈ (Σ * ) 2 | = }.
Before moving on, we first make a brief note on the difference between the definition of FC in this paper, and the definition given in [9].First, the definition of FC given in [9] allows for an arbitrarily large right-hand side.That is, atomic formulas of the form = where ∈ Ξ and ∈ (Σ ∪ Ξ) * .Whereas we use ( = • ) as atomic formulas, where , , ∈ Ξ ∪ Σ ∪ { }.It is clear that an arbitrarily large right-hand side is shorthand for a binary concatenation term (for example, see Freydenberger and Thompson [10]).The reason behind our choice, is that we have a finite signature Σ which more closely aligns with "traditional" finite-model theory.The second difference is that we do not use a universe variable to denote the "input word" (see [9] for more details).However, referring back to Example 2.4, we can simply use a subformula to simulate the behaviour of .While there is also a small difference with the semantic definitions (such as the use of ⊥), these differences are negligible and do not change the expressive power.

EHRENFEUCHT-FRAÏSSÉ GAMES FOR FC.
In this section, we discuss the use of Ehrenfeucht-Fraïssé games for FC.Although broader definitions of the concepts defined here exist (see e.g.[3,15,20]), we tailor our definitions to the logic FC. (2) If Duplicator responds in round one with 1 ≠ a 2 −1 , then Spoiler chooses the structure and 2 = 1 • a. Duplicator must respond with some factor 2 , however a factor of the form 1 • a does not exist.For either of the cases given in round two, we have that ( 1 , 2 , a, ) and ( 1 , 2 , a, ) does not form a partial isomorphism and thus, Spoiler has a winning strategy for the two round game.Consequently, a 2 2 a 2 −1 for any ∈ N + .
Note that the set of all Σ -structures that represent ∈ Σ * are simply a subset of all structures over a signature that contains a ternary relational symbol, and |Σ| + 1 constant symbols.Therefore, we can state an FC-version of the the Ehrenfeucht-Fraïssé theorem (see e.g.[3,15,20] for the general version).T 3.3.Let , be Σ -structures that represent ∈ Σ * and ∈ Σ * respectively.Then, for any ∈ N, the following are equivalent: (1) |= if and only if |= for all ∈ FC( ).
Thus, Theorem 3.3 is useful for proving inexpressibility for firstorder logic over finite models.A slight reformulation of Theorem 3.3 yields the following observation: L 3.4.Let ⊆ Σ * .If for every ∈ N, there exist ∈ and ∉ where ≡ , then is not an FC-language.
For our next step, we rely on an inexpressibility technique for languages over unary alphabets.Over a unary alphabet {a}, we can identify every unary word a with the natural number , which allows us to treat languages ⊆ {a} * as sets of natural numbers.
A set ⊆ N is linear if there exist an ≥ 0 and 0 , . . ., A set is semi-linear if it is a finite union of linear sets, and we call a unary language semi-linear if the corresponding set is semi-linear.
As 2 grows faster than any linear function, no finite union of linear sets can define {2 }.Hence, pow := {a 2 | ∈ N} is not semi-linear and, therefore, is not expressible in FC.From this, we almost directly infer the following: For every ∈ N, there exists , ∈ N such that a ≡ a where ≠ .Lemma 3.5 is one of our main building blocks for FC inexpressibility.That is, we shall utilize Duplicator's winning strategy for the -round game over structures that represent a and a in order to construct winning strategies for more general structures.
Proposition 3.6 rules out what would be a convenient technique for strategy composition.Despite this, in Section 4.1 we consider a special case where ≡ can be used as if it were a congruence relation.

FC INEXPRESSIBILITY
Our next goal is using Ehrenfeucht-Fraïssé games in order to yield inexpressibility results for FC.To do so, we heavily utilize strategy composition.That is, we "bootstrap" known winning strategies for Duplicator -gathered from inexpressibility results such as Lemma 3.5 -and construct more general winning strategies for Duplicator.The main technical contribution of this section is the Fooling Lemma for FC that is derived from the Pseudo-Congruence Lemma and the Primitive Power Lemma.Section 4.1 contains the Pseudo-Congruence Lemma (Lemma 4.4), and Section 4.2 contains the Primitive Power Lemma (Lemma 4.9).Both these sections also contain many useful lemmas on the way to proving the respective result.Before looking at these lemmas, we first demonstrate the capabilities of FC by giving a language that (somewhat surprisingly) is expressible in FC.
For every ∈ N, we define ∈ {a, b} * recursively as follows: 0 := a, 1 := ab, and := −1 • −2 for all ≥ 2. The Fibonacci word is the limit (see [22] for details on and its properties).We show that the language fib containing those words c 0 c 1 c • • • c c for c ∈ Σ and for all ∈ N is expressible in FC.While the definition suggests that expressing fib requires a logic that allows recursion, we can use the universal quantifier to simulate recursion in certain cases.As a curious aside, this also shows that FC does not have a pumping lemma in the sense that for sufficiently long words, some non-empty factor can be repeated an arbitrary number of times without falling out of the language.This is due to the fact that does not contain any factor of the form 4 with ≠ , see Karhumäki [16].

The Pseudo-Congruence Lemma
In this section, we first give some necessary lemmas, we then give the Pseudo-Congruence Lemma along with a proof idea.To conclude this section, we consider some consequences.
First, we consider the case where Spoiler has chosen a factor that is so short (with respect to the number of remaining rounds) that Duplicator must respond with the identical factor or lose.and ì = ( 1 , 2 , . . ., +|Σ|+1 ) be the tuple resulting from a -round game over and where Duplicator plays their winning strategy.
Next, we show that for rounds with ≤ − 2, if Spoiler picks a prefix (or a suffix), then Duplicator must respond with a prefix (or suffix, respectively); assuming they play a winning strategy.and ì = ( 1 , 2 , . . ., +|Σ|+1 ) be the tuple resulting from a -round game over and where Duplicator plays their winning strategy.For any ≤ − 2, we have that is a suffix (or prefix) of if and only if is a suffix (or prefix respectively) of .
While we mainly use Lemma 4.2 and Lemma 4.3 to prove the Pseudo-Congruence Lemma, they also provide some insights into necessary conditions of Duplicator's strategy.
In some of the subsequent proofs, we use strategy compositions that require quite technical proofs of correctness.To avoid handwaving, we shall sometimes define Duplicator's strategy using what we call look-up games.If Spoiler and Duplicator are playing a -round game G over and , a look-up game is a ′round auxiliary game G ′ over two (potentially different) structures ℭ and , where ℭ ≡ ′ .The idea is that in the -th round in G, each move by Spoiler corresponds to a move by Spoiler in G ′ .Then, Duplicator "looks up" what their response would be in G ′ to form their response to Spoiler in G. Generalizing this idea, Duplicator can use multiple look-up games G 1 , G 2 , . . ., G to form their response in G.In other words, if Duplicator has winning strategies in G 1 , G 2 , . . ., G , then Duplicator has a winning strategy in G.
While ≡ is not a congruence relation in general (see Proposition 3.6), we identify a special case where we can use ≡ as if it were a congruence relation.

P S
. We consider two look-up games G 1 and G 2 .For ∈ {1, 2}, the game G is a + + 2 round game over and .We assume Duplicator plays G 1 and G 2 using their winning strategy.We use these look-up games to give Duplicator's strategy for the round game G over := 1 • 2 and := 1 • 2 .
The proof that this composition of strategies is a winning strategy for Duplicator is somewhat tedious.However, in this sketch, we informally explain how this strategy is indeed a strategy for Duplicator over and .That is, Duplicator's strategy is well-defined and Duplicator always chooses a factor of or .
The extra + 2 rounds for G 1 and G 2 are to ensure that Duplicator makes certain choices.If in G Spoiler chooses some ∈ Facs( 1 ) ∩ Facs( 2 ) for some round ≤ , then | | ≤ holds (from the lemma statement).Therefore, in both G 1 and G 2 , Duplicator responds with , see Lemma 4.2.The same holds when Spoiler chooses some ∈ Facs( 1 ) ∩ Facs( 2 ).Therefore, Duplicator's choice is well defined for the case where Spoiler chooses some ∈ Facs( 1 ) ∩ Facs( 2 ) due to the fact that Duplicator's responses in G 1 and G 2 coincide.The same holds when Spoiler chooses some factor in Facs( 1 ) ∩ Facs( 2 ).
Let us consider the case where for round ≤ in G, Spoiler chooses some ∈ Facs( ) \ (Facs( 1 ) ∪ Facs( 2)).Observing Fig. 1, we can split into the suffix 1 of 1 and the prefix 2 of 2 .Thus, due to Lemma 4. We have given an description of Duplicator's strategy, and informally sketched a proof that this strategy is well-defined.
Freydenberger and Peterfreund [9] showed that the language {a • b | ∈ N} is not expressible in FC.Their proof uses an alternative logic FO[EQ] with equivalent expressive power, for which the Feferman-Vaught theorem can be invoked.As this approach heavily relies on the restricted nature of words a • b , it seems to be very difficult to generalize.Furthermore, even for FO[EQ], the Feferman-Vaught theorem needs some additional reasoning.Compare this to the following alternative proof.
Example 4.5.From Lemma 3.5, we know that for every ∈ N, there exists , ∈ N where a ≡ +2 a and, without loss of generality, < .Trivially, we have that b ≡ +2 b for all ∈ N. Furthermore, Facs(a ) ∩ Facs(b ) = { } for any , ∈ N. We can now invoke Lemma 4.4, with = 0, to state that for any ∈ N, there exists In addition to giving an alternative proof of an existing result, we can also use Lemma 4.4 to prove new inexpressibility results.

P .
To prove this result, we show that for every ∈ N, there exists , ∈ N where < and a (ba) ≡ a (ba) .
Observing Proposition 4.6, we see that the Pseudo-Congruence Lemma by itself is a useful tool for inexpressibility.

The Primitive Power Lemma
In this section, we look at the Primitive Power Lemma.Before giving this result, we first consider a lemma for words that are primitive (that is, not a power of a shorter word).
For any ∈ Σ + , let exp : Σ + → N be a function that maps ∈ Σ + to ∈ N if ⊑ and there does not exist ′ > such that ′ ⊑ .Less formally, exp ( ) is the maximum value ∈ N such that is a factor of .The following is due to some basic results on primitive words.where exp ( ) > 0, then there is a unique proper suffix 1 ∈ Σ * of and a unique proper prefix 2 ∈ Σ * of , such that  To prove this lemma, we give a winning strategy for Duplicator for the -round game G over and .This strategy is defined based on the + 3-round look-up game G over a and a , where a ≡ +3 a .
For each round ≤ in G, if Spoiler chooses where exp ( ) = , then we let Spoiler choose a in round of G in the corresponding word.Duplicator's response in G is some a .Then, we form Duplicator's response to Spoiler in G as follows: • If in G , Duplicator responds with , then let Duplicator's response in G be the same factor that Spoiler chose.This is because it can be shown that Spoiler chose some where exp ( ) = 0. • If in G , Duplicator responds with a where ∈ N + , then we can show that in G, Spoiler chose = 1 • • 2 where 1 and 2 are a unique suffix and prefix of (see Lemma 4.8).We define Duplicator's response in G as 1 • • 2 .See Fig. 2 for an informal illustration of Duplicator's strategy.
In the full proof, we show that this strategy is indeed a winning strategy for Duplicator by utilizing several lemmas regarding primitive words (such as Lemma 4.8).
For any ∈ Σ + , the primitive root of is the unique primitive word ∈ Σ * such that = for some ∈ N. Therefore, almost immediately from Lemma 4.9, we are able to show: P 4.10.For any ∈ Σ + and any ∈ N, there exists ∈ N + and ∈ Σ * such that ≡ where ≠ .
As an immediate consequence of Lemma 4.9, it follows that for any primitive word ∈ Σ + , the language 2 is not expressible in FC.While this may seem somewhat restrictive, due to the fact that this result holds for primitive words, any non-primitive word is a repetition of primitive words, hence Proposition 4.10.

The Fooling Lemma for FC
The Fooling Lemma for FC combines the Pseudo-Congruence Lemma with the Primitive Power Lemma.In order to give a convenient formulation of this result, we first look at co-primitive words: For two words , ∈ Σ + , we say that and are conjugate if there exists , ∈ Σ * such that = • and = • .If and are not conjugate, and both and are primitive words, then we say that and are co-primitive.We now state the periodicity lemma.For our purposes, we use the formulation of this result given in [14].Further information regarding the periodicity lemma can be found in [25].The periodicity lemma and some further reasoning gives us: L 4.13.For any primitive words , ∈ Σ * , the following are equivalent: (1) and are co-primitive, (2) there exists 0 , 0 ∈ N such that Facs( 0 ) ∩ Facs( 0 ) = Facs( ) ∩ Facs( ) for all > 0 and all > 0 , and (3) there exists ∈ N such that ≥ max{| | ∈ N | ∈ Facs( ) ∩ Facs( )} for all , ∈ N.
We are almost ready to give the Fooling Lemma for FC.Before doing so, the Fooling Lemma is simply a possible way of combining the results we have given so far, and there could be a more general inexpressibility lemma for FC just using the results from this paper.However, for our purposes, Lemma 4.14 is enough.L 4.14 (F L ).Let 1 , 2 , 3 ∈ Σ * , let , ∈ Σ + be two co-primitive words, let ∈ FC, and let : N → N be an injective function.
We use the Primitive Power Lemma along with the Pseudo-Congruence Lemma to determine that for every ∈ N there exists • 2 where ≠ .Then, we again invoke the Pseudo-Congruence Lemma (using Lemma 4.13 to ensure that the required prerequisites hold) to conclude that for every ∈ N, there exists , ∈ N where ≠ and 1 Immediately from Lemma 4.14, we infer the following: P 4.15.Let 1 , 2 , 3 ∈ Σ * , and let , ∈ Σ + be two co-primitive words.Then, for any injective function : So far, we have developed several inexpressibility techniques.Next, we look at using these tools to show that the following languages are not FC languages.L 4.16.The following languages are not expressible in FC: The authors note that it seems at least possible to show that 3 and 4 in Lemma 4.16 are not FC languages by using the Feferman-Vaught Theorem and the logic FO[EQ] (see [9] for more details).However, in order to do so, a sizeable amount of work would be needed.In comparison, using the techniques developed in this work, it is rather straightforward to show 3 , 4 ∉ L(FC).
While Lemma 4.16 seems somewhat "language theoretic", in Theorem 5.5 we shall use Lemma 4.16 to show that several relations are not definable in FC[REG].

GENERALIZED CORE SPANNER INEXPRESSIBILITY
Freydenberger and Peterfreund [9] proved that FC, extended with so-called regular constraints captures the expressive power of generalized core spanners.That is, a word relation ⊆ (Σ * ) is definable in FC[REG] if and only if is selectable by generalized core spanners.We do not give the definition of generalized core spanners here, since we are able to simply use FC[REG] instead.See [5] for the formal definitions of generalized core spanners1 .
Definition 5.1.A regular constraint is an atomic formula ( ∈ ) where ∈ Ξ and is a regular expression.Let FC[REG] be the logic that extends FC with regular constraints.The semantics are as follows: For a regular constraint ( ∈ ), and an FC interpretation I := ( , ), where is a Σ -structure that represents ∈ Σ * , we write I |= ( ∈ ) if ( ) ⊑ , and ( ) ∈ L( ).
Unfortunately, the introduction of regular constraints also introduces some difficulties when dealing with the quantifier rank of a formula.Whilst one could easily adapt the definition of quantifier rank to FC[REG], we run into the issue that there are infinite FC[REG] formulas of quantifier rank one.Consider ∃ : ( ∈ ), along with the infinite number of regular languages.This would mean that we cannot use Theorem 3.3 for FC[REG], which requires (up to logical equivalence) a finite number of formulas of quantifier rank for any ∈ N (see, e.g., [20] for more details).
We now consider a way to handle the regular constraints in order to derive inexpressibility results for generalized core spanners.
A bounded regular language is simply a language that is both bounded, and regular.From previous literature (see [7,9] for example), we know that every bounded regular language can be expressed in FC.Thus, using a very similar proof idea to Theorem 6.2 from [7], we are able to show the following: From Lemma 5.5 in [9], we know that if every regular constraint in an FC[REG]-formula is a so-called simple regular expression, then we can rewrite the FC[REG]-formula into an FC-formula.We can therefore replace every constraint that is a Boolean combination of simple regular expressions with an FC-formula.We do not go further into this, as Lemma 5.3 is all that we require.
Before looking at relations that are not definable by FC[REG], we first define some concepts.
We are now ready to use Lemma 4.16 to show that several relations are not selectable by generalized core spanners.
T 5.5.The following are not definable in FC[REG]: Assuming some relation from theorem statement is definable in FC[REG], we can easily construct ∈ FC[REG] such that L( ) is a language given in Lemma 4. 16.This results in a contradiction due to Lemma 5.3.A similar proof idea was used in Proposition 6.7 of [7] to show that several relations are not definable by core spanners.
Immediately from [9], all the relations given in Theorem 5.5 are not selectable by generalized core spanners.Consequently, if one needed to express a query from Theorem 5.5, they would have to add a specialized operator to obtain this functionality.
For any relation ⊆ (Σ * ) , where ∈ N, we have that is definable in FC[REG] if and only if (Σ * ) \ is definable by FC[REG].This is simply due to the fact that FC is closed under complementation.Thus, the complement of all relations given in Theorem 5.5 are also not selectable by generalized core spanners.
Core Spanners.Core spanners are a subclass of generalized core spanners (see [5]).While there are known inexpressibility results for core spanners (for example, Num a , Sca , Perm, Rev, and < := {( , ) | | | < | |} were shown to be not selectable by core spanners in [7], and [5] proved that length equality is not selectable by core spanners), our results extend what is known about the expressive power of core spanners.For example, since the complement of the relations given in Theorem 5.5 are not selectable by generalized core spanners, they are also not selectable by core spanners.Furthermore, we also show that Add, Mult, Shuff and Morph ℎ are not selectable by generalized core spanners, and thus are not selectable by core spanners.

CONCLUSIONS
In Section 3, we established Ehrenfreucht-Fraïssé games for FC.Then, in Section 4, we gave two technical lemmas -the Pseudo-Congruence Lemma, and the Primitive Power Lemma -which serve the basis of the Fooling Lemma.These lemmas are the main contributions of this work; however, in Section 5, we lifted our inexpressibility results to FC[REG] which immediately from [9] yields inexpressibility results for generalized core spanners.
The main limitation of our results is that our generalized core spanner inexpressibility results require Boolean combinations of bounded languages.It is possible to use closure properties in order to partially overcome this issue.Such an approach can only get us so far, and therefore going beyond Boolean combinations of bounded languages is an important area for future research.We note that it is still open as to whether FC and FC[REG] have the same expressive power.The authors believe that this is unlikely, although proving that FC and FC[REG] do indeed have equivalent expressive power would remove the restriction of Boolean combinations of bounded languages.
As an intermediate step, one could look at more general regular languages that can be expressed by FC.This would close the gap between what we know can be expressed by FC and what can be expressed by FC [REG].Then, a similar result to Lemma 5.3 could gain further inexpressibility results for generalized core spanners.A starting point for this direction would be to see whether regular constraints beyond the Boolean closure of simple regular expressions are achievable in FC (see Lemma 5.5 in [9]).
Another promising area of future work, is looking at related two player games.For example, pebble games could be used to derive insights on FC-formulas with a finite number of variables (for example, see Chapter 11 of [20]).Alternatively, the restriction to existential Ehrenfreucht-Fraïssé games could yield further results regarding the inexpressibility of core spanners.
We note that there are many further open problems regarding FC.We refer to [9] for a more comprehensive list of future research directions.
A APPENDIX FOR SECTION 3 A.1 Proof of Lemma 3.4

P
. We prove this observation by a contradiction.To that end, assume there exists ∈ FC such that L( ) = , and let qr( ) = for some ∈ N. Further assume that we have ≡ , where ∈ and ∉ .Since ∈ and L( ) = , we know that |= where is the Σ -structure that represents .Furthermore, as ≡ , we know from Theorem 3.3 that |= where is the Σ -structure that represents .Therefore, ∈ L( ) which is a contradiction.
A.2 Proof of Lemma 3.5

P
. Working towards a contradiction, assume that there exists some ∈ N such that for all , ∈ N where is not a power of two we have that a 2 a .Observing Lemma 3.4, there exists ∈ FC( ) such that |= if and only if = a 2 for some ∈ N. As pow := {a 2 | ∈ N} is not semi-linear, it is not definable in FC.Thus, we have reached a contradiction.
B PROOF OF PROPOSITION 4.1 P .Recall that for every ∈ N, we define ∈ {a, b} * recursively as follows: 0 := a, 1 := ab, and := −1 • −2 for all ≥ 2. In this proof, we show that fib := {c 0 c 1 c • • • c c | ∈ N} over Σ := {a, b, c} is an FC language.For this proof, we use arbitrary concatenation as a shorthand for binary concatenation.

C APPENDIX FOR SECTION 4.1
Recall that Section 4.1 is dedicated to the Pseudo-Congruence Lemma.Before giving the proof of this result, we first consider some lemmas and definitions that are required for the proof.
C.1 Proof of Lemma 4.2
Working towards a contradiction, assume that is a suffix of and is not a suffix of for some ≤ − 2. We now show that Spoiler can use rounds − 1 and to win the -round game over and .For round − 1, let Spoiler choose and .Duplicator responds with some ⊑ .We look at two separate cases based on whether = , or not.Case 1, ≠ : For round , let Spoiler choose and := • a such that a ∈ Σ and a ⊑ .It follows that = −1 • a yet there does not exist ′ ⊑ such that ′ = • a. Consequently, Spoiler has won and we have thus reached a contradiction.
Case 2, = : For round , let Spoiler choose and such that −1 = • .Duplicator must choose such that −1 = • .Thus it follows that is a suffix of and consequently, we have reached a contradiction.
Concluding the proof.Note that we have not considered the case where is a prefix of , however this follows using the analogous reasoning.Thus, so far we have shown that if ≤ − 2 and is a prefix (or suffix) of , then is a prefix (or suffix respectively) of .However, and are arbitrarily named, and hence it immediately follows that if ≤ − 2 and is a prefix (or suffix) of , then is a prefix (or suffix respectively) of .

C.3 Proof of Lemma 4.4
We first give a necessary definition.

In other words,
| ′ is the restriction of the structure to the sub-universe ′ ∪ {⊥}.We are now ready for the proof of Lemma 4.4.
Let be a Σ -structure for := 1 • 2 and let be a Σ -structure for := 1 • 2 .Let and be the universes for and respectively.
Consider the following subsets of and : Notice that for any ∈ other , we have that = 1 • 2 , where 1 is a suffix of 1 and 2 is a prefix of 2 .Analogously, for any ∈ other , we have that = 1 • 2 , where 1 is a suffix of 1 and 2 is a prefix of 2 .For an illustration, see Fig.     1) and 2 ∈ Facs( 2).The precise definition of these functions is not important, and these functions are only used to get a unique pair from Facs( 1 ) and Facs( 2 ) (or Facs( 1 ) and Facs( 2)) from a given element of other (or other ).
Note that since .To help streamline the proof of correctness, we shall assume that Spoiler can skip certain rounds in G 1 and G 2 .That is, Spoiler can decide not to make a move for some round, and Duplicator therefore does not need to respond.In reality, this can be easily done by Spoiler choosing ⊥ for example; however, we assume Spoiler can skip as it is simpler for us to deal with.
Spoiler's Choice in G 1 and G 2 : We now give Spoiler's choice in G for each ∈ {1, 2}.For every ∈ [ ], Spoiler's choice for round in G is uniquely determined from Spoiler's choice in G. Recall that G is a + + 2 round game over | Facs( ) and | Facs( ) , for ∈ {1, 2}.Spoiler's choice in round ∈ [ ] of G is defined as follows: First, we consider the case where in round of G, Spoiler chooses .Then, in round of G , Spoiler chooses the structure | Facs( ) and the factor Spoiler chooses is defined as follows: • If in G Spoiler chooses some ∈ Facs( ), then Spoiler chooses in round of G , • if in G Spoiler chooses some ∈ other where split ( ) • if in G Spoiler chooses some ∈ Facs( ) \ Facs( ) where ∈ [2] \ { }, then Spoiler chooses to skip round in G .Next, we consider the case where Spoiler chooses in round of G, where ≤ .For this case, we have that Spoiler chooses | Facs( ) in round of G.
• If in G Spoiler chooses some ∈ Facs( ), then Spoiler chooses in round of G , • if in G Spoiler chooses some ∈ other where split ( ) A Short Discussion on G 1 and G 2 : For rounds + 1 to + + 2, we assume Spoiler chooses any structure and any factor in both G 1 and G 2 .This ensures that in any round ∈ [ ] of G 1 and G 2 , if Spoiler chooses some ∈ Facs( 1 ) ∩ Facs( 2 ) (or ∈ Facs( 1 ) ∩ Facs( 2)), we have that Duplicator responds with .This is because | | ≤ , and Duplicator has a winning strategy for the + + 2-round games G 1 and G 2 , see Lemma 4.2.After Spoiler has made their choice in G 1 and G 2 , Duplicator responds in both games using their winning strategy. Let ) be the tuples generated from the first -rounds of G 1 , along with the interpreted constants.Let ì 2 = ( 2,1 , 2,2 , . . ., 2, +|Σ|+1 ) and let ì 2 = ( 2,1 , 2,2 , . . ., 2, +|Σ|+1 ) be the tuples generated from the first -rounds of G 2 , along with the interpreted constants.If Spoiler chose to skip round ∈ [ ] in G 1 (or G 2 ), then we say that 1, = 1, =⊥ (or 2, = 2, =⊥).Note that skipping a round has no bearing on whether the generated tuples are a partial isomorphism.Therefore, since , and , it follows that ( ì 1 , ì 1 ) and ( ì 2 , ì 2 ) both form a partial isomorphism.Before moving on to define Duplicator's strategy in G, we make some remarks about the tuples generated from G 1 and G 2 .For each ∈ [ ], let 1, be the factor that Duplicator chose in round of G 1 , and let 2, be the factor that Duplicator chose in round of G 2 .From the definition of Spoiler's choice in G 1 and G 2 we have that: • If Spoiler skipped round in G 1 , then for round in G, Spoiler chose some ∈ Facs( 2 ) \ Facs( 1 ), or some ∈ Facs( 2 ) \ Facs( 1 ).
• If in round of G, Spoiler chose some ∈ Facs( 1 ) ∩ Facs( 2 ), or some ∈ Facs( 1 ) ∩ Facs( 2 ), then it follows that 1, = 2, = .This is because we know that | | ≤ , and Duplicator must be able to win after + + 2-rounds.Therefore, we can use Lemma 4.2 to determine that 1, = 2, = must hold.• The final case is when Spoiler chose some ∈ other , or ∈ other .Note that this is the only case where Spoiler does not skip round in both G 1 and G 2 , and where 1, ≠ 2, .
Duplicator's strategy.Duplicator derives their response in G using their responses in G 1 and G 2 .For each round ∈ [ ], the structure Duplicator chooses is always the opposite structure to what Spoiler has chosen, therefore, we only look at which factor Duplicator chooses.We use , for ∈ {1, 2} to denote Duplicator's response to Spoiler in round of G .Duplicator response to round ∈ [ ] in G is as follows: , or some ∈ Facs( 1 ) ∩ Facs( 2 ) in G, then Duplicator responds with 1, .Recall that 1, = 2, = for this case; and • if Spoiler chose some ∈ other , or ∈ other , then Duplicator responds with 1, • 2, .Note that due to Lemma 4.3, it must hold that 1, is a suffix of 1 (or 1 ), and 2, is a prefix of 2 (or 2 respectively).Refer back to Fig. 3 for an illustration of elements in other and other .Therefore 1, • 2, is always an element of the structure in which Duplicator plays round .This gives a complete strategy for Duplicator in G.
To sum up Duplicator's strategy for G informally: If Spoiler chooses an element from either Facs( 1 ) or Facs( 1 ), then Duplicator responds with their winning strategy for G 1 .If Spoiler chooses from either Facs( 2 ) or Facs( 2 ), then Duplicator responds with their winning strategy for G 2 .Finally, if Spoiler chooses from other or other , then we split the factor Spoiler chose into two factors from Facs( 1 ) × Facs( 2 ), or Facs( 1 ) × Facs( 2 ), and Duplicator responds with the concatenation of their winning strategy from G 1 and G 2 .
Recall that ì 1 = ( . . ., 1, +|Σ|+1 ) are the tuples generated from the first rounds of G 1 , and . . ., 2, +|Σ|+1 ) are the tuples generated from the first rounds of G 2 .Also recall, that the last |Σ| + 1 components of these tuples are the interpretations of the constant symbols.We know that ( ì 1 , ì 1 ) and ( ì 2 , ì 2 ) both form a partial isomorphism.While Duplicator's strategy for G is based upon their strategy for the first -rounds of G 1 and G 2 , Duplicator can survive + + 2-rounds of G 1 and G 2 .This ensures that Duplicator makes certain decisions in G 1 and G 2 which ensures that Duplicator's strategy for G is indeed a winning strategy.Thus, to prove that Duplicator's strategy for G is a winning strategy, we consider Spoiler's choices in G 1 and G 2 for rounds + 1 up to + + 2. In fact, we only look at rounds + 1, + 2, and + 3, since we leave the last − 1 rounds to ensure that if Spoiler chooses some such that | | ≤ , then Duplicator responds with .
Let := + |Σ| + 1.Then, we use 1, + for ∈ [ + 2] to denote Spoiler's choice in round + (assuming Spoiler chooses the structure | Facs( 1 ) ).Thus, after all + + 2 rounds, we have that the first -rounds of G 1 and G 2 are in the first -components of the corresponding tuples, the next |Σ| + 1 components are then the interpreted constants, and finally, the last + 2 components are the chosen elements for the last + 2-rounds.This is simply a permutation of the resulting tuples, and therefore ( ì 1 , ì 1 ) and ( ì 2 , ì 2 ) still form a partial isomorphism.
Like Case 3.1.1,we shall prove that if Duplicator plays their winning strategy for G 1 and G 2 , then = • .To show this, we look at possible choices for Spoiler in rounds + 1 up to round + + 2 for G 1 and G 2 .
We now look at round + 1 to round + + 2 of G 1 and G 2 .
We now look at round + 1 to round + + 2 of G 1 and G 2 .

D APPENDIX FOR SECTION 4.2
Section 4.2 is dedicated to the Primitive Power Lemma.Before giving the proof of this result, we first prove some lemmas regarding primitive words.Most of these subsequent lemmas follow from the fact that a primitive word ∈ Σ + cannot be a non-trivial factor of .That is, if = , then either = or = ; for more details, see Chapter 6 of [25].For our purposes, we look at a slight generalization of this idea: L D.1.A word ∈ Σ + is primitive if and only if for all ∈ N, we have that = • • for any , ∈ Σ + , implies that = and = ′ for some , ′ < .

P
. For this proof, we only show that ∈ Σ + is primitive if and only if for any ∈ N, we have that = • • for , ∈ Σ + implies that = for some < .The fact that = ′ for some ′ < also holds, follows immediately from a length argument.Furthermore, since , ∈ Σ + , it follows that ≥ 2. If direction.We show that if = • • implies that = for some < , then ∈ Σ + is primitive by proving the contraposition.That is, ∈ Σ + is imprimitive implies that there exists , ∈ Σ + such that = • • and ≠ for some < .Let ∈ Σ + and let ∈ Σ + where = for some > 1.Thus, = , and therefore we can write = • • − −1 .Since ≠ , we have proven the contraposition.
Only if direction.For this direction, we show that if ∈ Σ * is primitive, then for all ∈ N, we have that = • • for any , ∈ Σ + implies that = for some < .Working towards a contradiction, let ∈ N and assume that ∈ Σ * is primitive, but = • • where ≠ for any ∈ N. Trivially, ≠ must hold since = 0 .Furthermore, since , ∈ Σ + , it must hold that ≥ 2 for = • • to hold.
First, we show that we can assume is a strict prefix of .Note that | | ≠ for any ∈ N.
• ¯ for some ∈ N + and a proper prefix ¯ ∈ Σ + of .Thus, we can consider the new equality For the rest of the proof, we simply use to denote a strict prefix of such that the equality = • • holds.Since is a strict prefix of , we can write = • ′ for some ′ ∈ Σ + .Furthermore, since • • = and = ( • ′ ) , we have that We know that | | = | | + | ′ |, and therefore = • ′ and = ′ • .Proposition 1.3.2 in Lothaire [21] states that if = for , ∈ Σ * , then there is some ∈ Σ * and 1 , 2 ∈ N such that = 1 and = 2 .It follows that = for some ∈ N, and since ⊏ ⊏ and ⊏ ′ ⊏ , we know that > 1. Therefore is imprimitive, which is a contradiction.
Clearly, if ( , ) is infinite, then there does not exist 0 , 0 ∈ N such that Facs( 0 ) ∩ Facs( 0 ) = Facs( ) ∩ Facs( ) for all > 0 and all > 0 .We can therefore reformulate the contraposition of the if direction as: If ( , ) is infinite then and conjugate.To that end, assume ( , ) is infinite.Therefore, for every ∈ N, there exists ∈ ( , ) such that | | ≥ .Invoking the periodicity lemma, we conclude that and are conjugate.
(2) if and only if (3).Let us first consider the if direction.Note that for any 1 , 2 ∈ N where 1 < 2 , and any word ∈ Σ * , we have that 1 ⊑ 2 .Therefore, Facs( 1 ) ⊆ Facs( 2 ).Let , ∈ Σ + be two primitive words such that there exists ∈ N where for all , ∈ N. We assume that ∈ N is the smallest such value.Thus, for some ′ , ′ ∈ N, we have that Furthermore, due to the fact that Σ is a fixed and finite alphabet, the number of words of in Σ * with a length that is less than or equal to is finite.We now define the language intersect := { ∈ Σ * | ∈ Facs( ) ∩ Facs( ) for some , ∈ N}.Since the number of words in Σ * with a length that is less than or equal to is finite, it follows that intersect is also finite.Combining this with the fact that for any 1 , 2 ∈ N where 1 < 2 , and any word ∈ Σ * , we have that Facs( 1 ) ⊆ Facs( 2), there must exist some 0 , 0 ∈ N such that for all ≥ 0 and all ≥ 0 , we have that Facs( 0 ) ∩ Facs( 0 ) = Facs( ) ∩ Facs( ).Now let us consider the only if direction.Assume that there exists 0 , 0 ∈ N such that Facs( 0 ) ∩ Facs( 0 ) = Facs( ) ∩ Facs( ) for all > 0 and > 0 .Let := max{Facs( 0 ) ∩ Facs( 0 )}.Since for any > 0 and > 0 , there does not exist such that | | > and ∈ Facs( ) ∩ Facs( ), this concludes the proof.
E.2 Proof of Lemma 4.14

Figure 2 :
Figure 2: Duplicator's strategy when Spoiler chooses some where exp ( ) ≥ 1.The top oval illustrates Spoiler's choices in G and G , and the bottom oval illustrates Duplicator's responses in G and G .

Example 4 . 11 .
It is clear that := aabba and := aaabb are both primitive words.However, and are not co-primitive.This can easily be observed by considering := aabb and := a.Then, = and = .Now consider ′ := aba and ′ := bba.Again, both ′ and ′ are primitive.Furthermore, from the fact that | ′ | a ≠ | ′ | a , we can also see that ′ and ′ are co-primitive.For any two words , ∈ Σ * , let = • • • • and := • • • • be the one-sided infinite word consisting of continuous repetitions of and respectively.

L 4 .
12 (P ).Let and be primitive words.If and have a common factor of length at least | | + | | − 1, then and are conjugate.

P.
For every , there exists , ∈ N where ≠ and a ≡ a , see Lemma 3.5.Trivially, b • a ≡ b • a .But a • b • a a • b • a .To show that this does indeed hold, we give an FC-formula that accepts words of the form • b • for ∈ Σ * , and thus can distinguish between a • b • a and a • b • a where ≠ .:= ∃ , , : Working towards a contradiction, assume that + | | − 1 < and ≠ for some ∈ [ ].Then, we can define a winning strategy for Spoiler as follows: Let = a 1 • a 2 • • • a , where a ∈ Σ for ∈ [ ]. • For round + 1, Spoiler chooses , and a 1 • a 2 , • for round + 2, Spoiler chooses , and a 1 • a 2 • a 3 , • . . .• for round + − 1, Spoiler chooses , and a 1 • a 2 • • • a .

Figure 3 :
Figure 3: Illustration of factors in other .The factors of other are analogously illustrated by replacing 1 and 2 with 1 and 2 respectively.
∪ Facs( 2 ) ∪ {⊥} and That is, other contains those factors of 1 • 2 that are not factors of either 1 or 2 .Likewise, other contains those factors of 1 • 2 that are not factors of either 1 or 2 .