Validation of Modern JSON Schema: Formalization and Complexity

JSON Schema is the de-facto standard schema language for JSON data. The language went through many minor revisions, but the most recent versions of the language added two novel features, dynamic references and annotation-dependent validation, that change the evaluation model. Modern JSON Schema is the name used to indicate all versions from Draft 2019-09, which are characterized by these new features, while Classical JSON Schema is used to indicate the previous versions. These new"modern"features make the schema language quite difficult to understand, and have generated many discussions about the correct interpretation of their official specifications; for this reason we undertook the task of their formalization. During this process, we also analyzed the complexity of data validation in Modern JSON Schema, with the idea of confirming the PTIME complexity of Classical JSON Schema validation, and we were surprised to discover a completely different truth: data validation, that is expected to be an extremely efficient process, acquires, with Modern JSON Schema features, a PSPACE complexity. In this paper, we give the first formal description of Modern JSON Schema, which we consider a central contribution of the work that we present here. We then prove that its data validation problem is PSPACE-complete. We prove that the origin of the problem lies in dynamic references, and not in annotation-dependent validation. We study the schema and data complexities, showing that the problem is PSPACE-complete with respect to the schema size even with a fixed instance, but is in PTIME when the schema is fixed and only the instance size is allowed to vary. Finally, we run experiments that show that there are families of schemas where the difference in asymptotic complexity between dynamic and static references is extremely visible, even with small schemas.


INTRODUCTION
JSON Schema [Org 2022] is the de-facto standard schema language for JSON data.It is based on the combination of structural operators, describing base values, objects, and arrays, through logical operators such as disjunction, conjunction, negation, and recursive references.
The evaluation model of Draft-04 and Draft-06 is quite easy to understand and to formalize, and it has been studied in [Pezoa et al. 2016], [Bourhis et al. 2017[Bourhis et al. , 2020]], [Attouche et al. 2022], yielding many interesting complexity results.However, Draft 2019-09 introduces two important novelties to the evaluation model: annotation-dependent validation, and dynamic references.According to the terminology introduced by Henry Andrews in [ Andrews 2023], because of these modifications to the evaluation model, Draft 2019-09 is the first Draft that defines Modern JSON Schema, while the previous Drafts define variations of Classical JSON Schema.These novelties are motivated by application needs, but none of them is faithfully represented by the abstract models that had been developed and studied for Classical JSON Schema.Further, both novelties need formal clarification and specification, as documented by many online discussions, such as [Neal 2022] and [Jacobson 2021].These discussions involve main actors behind JSON Schema design and show that informal JSON Schema specification leads to several, different yet reasonable interpretations of the semantics of the new operators in Modern JSON Schema.The contextual redefinition mechanism is illustrated in the schema shown in Figure 3.The schema "https://mjs.ex/strict-tree"redefines the dynamic anchor "tree" (line 3), so that it now indicates a conjunction between "$ref" : "https://mjs.ex/simple-tree#tree"(line 4) and the keyword "unevaluatedProperties" : false (line 5), which forbids the presence of any property that does not match the properties listed in "https://mjs.ex/simple-tree#tree".If one applies this schema, it will invoke "$ref" : "https://mjs.ex/simple-tree#tree"(Figure 3 -line 4), which will execute the schema of Figure 2 in a "dynamic scope" where "https://mjs.ex/strict-tree"has redefined the meaning of "$dynamicRef" : "https://mjs.ex/simple-tree#tree".More precisely, the specifications say that the "outermost" (or "first") schema that contains "$dynamicAnchor" : "fragmentName" is the one that fixes the meaning of that anchor for any other schema  ′ that will invoke "$dynamicRef" : "absURI" • "#" • "fragmentName" later, independently from the absolute URI "absURI" used by  ′ , hence, in this case, the meaning of "tree" is fixed by the schema of Figure 3, that is invoked before the one in Figure 2. 1 { " $schema " : " https : // json − schema .org / draft /2020 −12/ schema " , 2 " $id " : " https : // mjs .ex / strict − tree " , 3 " $dynamicAnchor " : " tree " , 4 " $ref " : " https : // mjs .ex / simple − tree # tree " , 5 " u n e v a l u a t e d P r o p e r t i e s " : false ,
This semantics is quite surprising, and we believe it needs a formal definition.We are going to provide that definition, and this was actually the original motivation of this work.

Annotation dependency
In Modern JSON Schema, the application of a keyword to an instance produces annotations, and the validation result of a keyword may depend on the annotations produced by adjacent keywords.These annotations carry a lot of information, but the information that is relevant for validation is which children of the current instance (that is, which properties, if it is an object, or which items, if it is an array) have already been evaluated.This information is then used by the operators "unevaluatedProperties" and "unevaluatedItems", since they are only applied to children that have not been evaluated.
For instance, assertion "unevaluatedProperties" : false in the schema of Figure 3 depends on the annotations returned by the adjacent keyword "$ref" : "https://mjs.ex/simple-tree#tree"(that refers to the schema in Figure 2).In this case, "$ref" evaluates all and only the fields whose name is either "data" or "children".Hence, "unevaluatedProperties" : false is applied to any other field, and it fails if, and only if, fields with a different name exist.
The order in which the keywords appear in the schema is irrelevant for this mechanism; as formalized later, the result is the same as if the "unevaluatedProperties" keyword were always evaluated last among the keywords inside a same schema.
The definition of evaluated in the specifications of Draft 2020-12 ([Wright et al. 2022]) presents many ambiguities, as testified by online discussions such as [Neal 2022] and [Jacobson 2021].These ambiguities do not affect the final result of evaluation, but only the error messages that are generated; these aspects are further discussed in the Supplementary Material, and we will formalize here the interpretation that is more widely accepted.We believe that an important contribution of this work is that it provides a precise and succinct language where these ambiguities can be discussed and settled.

FORMALIZING JSON SCHEMA SYNTAX
A key contribution of this work is a formalization of the entire Modern JSON Schema language, but, for reasons of space, we only report here a crucial subset, that illustrates the approach and is sufficient to carry out the complexity analysis; the remaining part is in the Supplementary Material.
Schemas are structured as resources, which are collected into documents and refer to each other using URIs.We deal with these aspects through a "normalization process" (Section 3.1), where we eliminate the issues related to URI resolution and to the organization of many resources in just one document; we then formalize the syntax of normalized JSON Schema in Section 3.2.

URI resolution, resource flattening, schema closing
The keywords "$id", "$ref", and "$dynamicRef" accept any URI reference as value, they apply the resolution process defined in [Berners-Lee et al. 2005], and they then interpret the resulting resolved URI according to JSON Schema rules.Since resolution is already specified in [Berners-Lee et al. 2005], we will not formalize it here, and we will assume that, in every schema that is interpreted through our rules, the values of these three keywords have been already resolved, and that the result of that resolution has the shape absURI •"#"•fragmentId for "$ref", and "$dynamicRef", where fragmentId may be empty, and has the shape absURI , with no fragment, for "$id".
A "$id" : absURI keyword at the top-level of a schema object   that is nested inside a JSON Schema document  indicates that the schema object   is a separate resource, identified by absURI , that is embedded inside the document  but is otherwise independent. 4Embedded resources are an important feature, since they allow the distribution of different resources with just one file, but they present some problems when they are nested inside arbitrary keywords, and when a reference crosses the boundaries between resources, as does "$ref" : "#/properties/foo/items" in Figure 4, line 3 (see also Section 9.2.1 of the specifications [Wright et al. 2022]).
To avoid this kind of problem, we assume that every JSON Schema document is resource-flattened before validation.Resource flattening consists in moving every embedded resource identified by "$id" : absURI into the value of a field named absURI of a "$defs" keyword at the top level of the document, and replacing the moved resource with an equivalent schema {"$ref" : absURI •"#"} that invokes that resource; "$defs" is a placeholder keyword that is not evaluated, but which provides a place to collect schemas that can be invoked using "$ref" or "$dynamicRef".During this phase, 4 When the "$id" : absURI keyword is found in an object that is not interpreted as a schema, for example inside an unknown keyword or a non validation keyword such as "default", then it has no special meaning.The set of positions that are interpreted as schemas is defined by the grammar presented in Section 3.2.we also replace any reference that crosses resource boundaries with a canonical reference that directly refers the target (as suggested in Section 8.2 of the specifications [Wright et al. 2022]), so the schema in the top half of Figure 4 is rewritten into the equivalent one in the bottom half.
Closed schemas.The input of a validation problem includes a schema  and all schemas that are recursively reachable from  by following the URIs used in the "$ref" and "$dynamicRef" operators.For complexity evaluation we will only consider closed schemas, that is, schemas that include all the different resources that can be recursively reached from the top-level schema.There is no loss of generality, since external schemas can be embedded in a top-level one by copying them in the "$defs" section, using the "$id" operator to preserve their base URI.

JSON Schema Normalized Grammar
JSON Schema syntax is a subset of JSON syntax.We present in Figure 5 the grammar for a subset of the keywords, which is rich enough to present our results.In this grammar, the meta-symbols are (X ) * , which is Kleene star of  , and (X ) ? , which is an optional  .Non-terminals are italic words, and everything else -including { [ , : ] } -are terminal symbols.
JSON Schema allows the keywords to appear in any order, and evaluates them in an order that respects the dependencies among keywords.We formalize this behavior by assuming that, before validation, each schema is reordered to respect the grammar in Figure 5.The grammar specifies that a schema  is either a boolean schema, that matches any value (true) or no value at all (false), or it begins with a possibly empty sequence of Independent Keywords or triples (IK), followed by a possibly empty sequence of First-Level Dependent keywords (FLD), followed by a possibly empty sequence of Second-Level Dependent keywords (SLD).Specifically, the two keywords in FLD, "additionalProperties" and "items", depend on some keywords in IK (such as "properties") and "patternProperties"), and the two keywords in SLD depend on the keywords in FLD, and on many keywords in IK, such as "properties", "patternProperties","anyOf", "allOf", "$ref", and others.
This grammar specifies the predefined keywords, the type of the associated value (here JVal is the set of all JSON values, and plain-name denotes any alphanumeric string starting with a letter), and their order.We do not formalize here further restrictions on patterns , absolute URIs absURI , and fragment identifiers  .A valid schema must also satisfy two more constraints: (1) every URI

Introduction to the proof system
We are going to define a judgment that describes the result and the annotations that are returned when a keyword  =  :  is applied to an instance  in a context , where  provides the information needed to interpret dynamic references.Hence, we read the judgment  ⊢ K  :  → (, ) as: the application of the keyword  to the instance  , in the context , returns the boolean  and the annotations .The annotations, as defined in [Wright et al. 2022], are a complex data structure, but we only represent here the small subset that is relevant for validation, that is, the set of evaluated children, of the instance  .The evaluated children of an object are represented by their names, and the evaluated children of an array by their position, so that: Hence, the set of annotations  can contain member names (strings) or array positions (integers).We define a similar schema judgment  ⊢ S  :  → (, ) in order to describe the result of applying a schema  to an instance  , and we define a list evaluation judgment  ⊢ L  : [|  1 , . . .,   | ] → (, ) in order to apply a list of keywords to  , passing the annotations produced by a sublist [|  1 , . . .,   | ] to the following keyword  +1 .Observe that the letters K, S, and L that appear on top of ⊢ are not metavariables but just symbols used to differentiate the three judgments.
In the next sections we define the rules for keywords and for schemas.Keywords are called assertions when they assert properties of the analyzed instance, so that "$id" is not an assertion, while "type" is.Assertions are called applicators when they have schema parameters, such as  1 and  2 in "anyOf" : [ 1 ,  2 ], that they apply either to the instance, in which case they are in-place applicators (e.g, "anyOf" : [ 1 ,  2 ]), or to elements or items of the instance, in which case they are object applicators or array applicators (e.g., "properties" : {  1 :  1 ,  2 :  2 }).
In the following, we present the rules for "terminal" assertions, Boolean in-place applicators, and for object and array applicators.We also illustrate the rules for sequential evaluation and for the schema judgments, and finally, for static and dynamic references.Rules are shown in Figure 6.

Terminal assertions
Terminal assertions are those that do not contain any subschema to reapply.The great majority of them are conditional on a type  : they are trivially satisfied when the instance  does not belong to  , and they otherwise verify a specific condition on  .Hence, these keyword are defined by a couple of rules, as exemplified here for the keyword "minimum" : .Rule (minimumTriv) (Figure 6) always returns T (true) when  is not a number, while rule (minimum), applied to numbers, returns the same boolean  ∈ {| T , F | } as checking whether  ≥ .The set of evaluated children is ∅.
Typed terminal assertions are completely defined by a type and a condition; a complete list of these keywords, with the associated type and condition, is in the Supplementary Material.
We also have four type-uniform terminal assertions, that do not single out any specific type for a special treatment.They are "enum", "const", "type" : [Tp 1 , . . ., Tp  ], and "type" : Tp.We only define here the rule for "type" : Tp, where TypeOf ( ) extracts the type of the instance  .
Hence, the rule for a type-uniform terminal assertion is completely defined by a boolean condition, as reported in Table 1.

Boolean applicators
JSON Schema boolean applicators, such as "anyOf" : [ 1 , ...,   ], apply a list of schemas to the instance, obtain a list of intermediate boolean results, and combine the intermediate results using a boolean operator.For the annotations, all assertions always return a union of the annotations produced by their subschemas, even when the assertion fails; this should be contrasted with the behavior of schemas, where a failing schema never returns any annotation (Section 4.5). 5he rule for the disjunctive applicator "anyOf" combines the intermediate results using the ∨ operator, and a child of  is evaluated if, and only if, it has been evaluated by any subschema   .
The rules for "allOf" and for "not" are analogous: "allOf" is successful if all premises are successful, and negation is successful if its premise fails.
Validation of Modern JSON Schema: Formalization and Complexity 111:9 ⊢ L  : ( ì  + "unevaluatedProperties" : ) → ( ∧  ′ , {|  1 . . .,  |}) 4.4 Independent object and array applicators (independent structural applicators) Independent structural applicators are those that reapply a subschema to some children of the instance (structural) and whose behavior does not depend on adjacent keywords (independent).We start with the rules for the "patternProperties" applicator that asserts that, if  is an object, then every property of  whose name matches a pattern   has a value that satisfies   .This rule constrains all instance fields whose name matches any pattern   in the applicator, but it does not force any of the   's to be matched by any property name, nor any property name to match any   ; if there is no match, the keyword is satisfied.Patterns   may have a non-empty intersection of their languages.
We first have the trivial rule (patternPropertiesTriv) for the case when  is not an object: non-object instances trivially satisfy the operator.
The evaluated properties are all the properties    for which a corresponding pattern    exists, independently of the result   of the corresponding validation, and independently of the overall result  of the keyword.Observe that the sets   of the children that are evaluated in the subproofs are discarded; this happens because elements of   are children of a child    of  ; we collect information about the evaluation of the children of  , and are not interested in children of children.
The rule (properties) for "properties" : {  1 :  1 , . . .,   :   } is essentially the same, with equality  ′  =   taking the place of matching  ′  ∈ (  ); likewise, no name match is required, but in case of a match, the corresponding child of  must satisfy the subschema with the same name.
Of course, we also have the trivial rule (propertiesTriv), analogous to rule (patternPropertiesTriv): when  is not an object, "properties" is trivially satisfied.The rules for the other independent object and array applicators can be found in the Supplementary Material.
The independent keywords presented in this section (and in the previous one) produce (respectively, collect and transmit) annotations that influence the behavior of the dependent keywords, which are "additionalProperties", "items", "unevaluatedProperties", and "unevaluatedItems".All these dependencies are formalized in the next sections.

The semantics of schemas: sequential evaluation of keywords
We have defined the semantics of the independent keywords.We now introduce the rules for schemas and for the sequential executions of keywords.
The rules (true) and (false) for the true and false schemas are trivial.The rules (schema-true) and (schema-false) for an object schema { ì  } are based on the keyword- → , which applies the keywords in the ordered list ì , passing the annotations from left to right.
Rule (schema-true) just reuses the result of the keyword-list judgment, but (schema-false) specifies that, as dictated by [Wright et al. 2022], when schema validation fails, all annotations are removed, hence no instance child is regarded as evaluated.This is a crucial difference with keyword-lists, since the  ⊢ K  :  → (, ) judgment may return non empty annotations even when  = F . 6e now describe the rules for the sequential evaluation judgment  ⊢ L  : ì  → (, ).The rules are specified for each list ì  +  by induction on ì  and by cases on .We start with the crucial rule (unevaluatedProperties), for ì  + "unevaluatedProperties" : .To evaluate ì  + "unevaluatedProperties" :  we first evaluate ì , which yields a set of evaluated children , we then evaluate  on the other children, and we combine the results by conjunction.We return every property as evaluated.
The rule for ì  + "additionalProperties" :  (additionalProperties) is identical, apart from the fact that we only eliminate the properties that have been evaluated by adjacent keywords.The specifications [Wright et al. 2022] indicate that this information should be passed as annotation, but that a static analysis is acceptable if it gives the same result.We formalize this second approach since it is slightly simpler.We define a function propsOf( ì ) that extracts all the patterns and all the names that appear in any "properties" and "patternProperties" keywords that appear in ì  and combines them into a pattern; a property is directly evaluated by a keyword in ì  if, and only if, it belongs to (propsOf( ì )).The notation k i used in the first line indicates a pattern whose language is {|   | }; ∅ in the third line is a pattern such that (∅) = ∅.
propsOf("properties" : {  1 :  1 , . . .,   : The keyword "additionalProperties" was already present in Classical JSON Schema, and, as shown by our formalization, it does not need to access the annotations passed by the previous keywords, but can be implemented on the basis of information that can be statically extracted from "properties" and "patternProperties"; critically, it is not influenced by what is evaluated by an adjacent "$ref", as happens to "unevaluatedProperties" in the example of Figure 3. Modern JSON Schema introduced the new keyword "unevaluatedProperties" in order to overcome this limitation.
The rules for "unevaluatedItems" :  and "items" :  are similar, and they can be found in the Supplementary Material.
Having exhausted the rules for dependent keywords, we have a catch-all rule (klist-( n+1)) for all other keywords, that says that, when  is an independent keyword, we combine the results of  ⊢ L  : ì  → (  ,   ) and  ⊢ K  :  → (, ), but no information is passed between the two judgments.Rule (klist-0) is just the base case for induction.

Static and dynamic references
Annotation-dependent validation and dynamic references are the two additions that characterize Modern JSON Schema.Dynamic references are those that had the greatest need for formalization.
The references operators "$ref" : absURI•"#"•fragmentId and "$dynamicRef" : absURI•"#"•fragmentId are in-place applicators that allow a URI-identified subschema to be applied to the current instance, but the two applicators interpret the URI in a very different way, as reflected in their rules.
"$ref" : absURI •"#"•fragmentId retrieves the resource  identified by absURI , which may be the current schema or a different one, retrieves the subschema  ′ of  identified by fragmentId, and applies  ′ to the current instance  (rule ("$ref")).The "$dynamicRef" keyword, instead, interprets the reference in a way that depends on the dynamic scope, which is, informally, the ordered list of all resources that have been visited in the current branch of the proof tree, that we represent in the rules by listing their URIs in the context .
Specifically, as shown in rule ("$ref"), the evaluation of "$ref" : absURI •"#"•fragmentId changes the dynamic scope, by extending the context  in the premise with absURI -in the rule,  +  denotes the operation of adding an element  at the end of a list .
In rule ($ref), load(absURI ) returns the schema  identified by absURI , an operation that we cannot formalize since the standards leave it undefined [Berners-Lee et al. 2005;Wright et al. 2022].get(, f ) returns the subschema identified by  inside ; the fragment  may either be empty, hence identifying the entire , or a plain-name, which is matched by a corresponding "$anchor" operator inside ,7 or a JSON Pointer, that begins with "/" and is interpreted by navigation.The get function is formally defined in the Supplementary Material.
For simplicity, we assume that the schema has already been analyzed to ensure the following properties; it would not be difficult to formalize these conditions in the rules: (1) the load function will not fail; (2) the get function will not fail; (3) every "$id" operator assigns to its schema a URI that is different from the URI of any other resource recursively reachable from its schema.
The applicator "$dynamicRef" is very different, and is defined as follows (see [Wright et al. 2022] Section 8.2.3.2): If the initially resolved starting point URI includes a fragment that was created by the "$dynamicAnchor" keyword, the initial URI MUST be replaced by the URI (including the fragment) for the outermost schema resource in the dynamic scope (Section 7.1) that defines an identically named fragment with "$dynamicAnchor".Otherwise, its behavior is identical to "$ref", and no runtime resolution is needed.This sentence is not easy to decode, but it means that, given an assertion "$dynamicRef" : absURI • "#"•f , one first verifies whether the resource referenced by the "starting point URI" absURI contains a dynamic anchor "$dynamicAnchor" :  ′ with  ′ =  .In this is the case, "$dynamicRef" : absURI•"#"•f will be interpreted according to the dynamic interpretation specified in the second part of the sentence, otherwise it will be interpreted as if it were a static reference "$ref"; this verification is formalized by the premises dget(load(absURI ), f ) ≠ ⊥ and dget(load(absURI ), f ) = ⊥ of the two rules that we present below for "$dynamicRef".The function dget(,  ) looks inside  for a subschema that contains "$dynamicAnchor" :  , but it returns ⊥ if there is no such subschema.8After this check is passed, the dynamic interpretation focuses on the fragment  , and it looks for the first (the "outermost") resource in  + that contains a subschema identified by "$dynamicAnchor" :  , where  + is the dynamic context  extended with the initial URI absURI .
We formalize this specification using two functions: dget(, f ) and fstURI(, f ).The function dget(, f ) returns the subschema  ′ that is identified in  by a plain-name f that has been defined by "$dynamicAnchor" : "f", and returns ⊥ when no such subschema is found in , and its definition is given in the Supplementary Material.The function fstURI(, f ) returns the first URI in the list  that defines  , that is, such that dget(load(absURI ), f ) ≠ ⊥.
We can finally formalize the dynamic reference rule ($dynamicRef).It first checks that the initial URI refers to a dynamic anchor, but, after this check, the result of load(absURI ) is forgotten.Instead, we look for the first URI fURI in  + absURI where the dynamic anchor  is defined, and we extract the corresponding subschema  ′ by executing dget(load(fURI ),  ).
Remark 1. Observe that fstURI( + absURI,  ) searches fstURI into a list that contains the dynamic context extended with the URI absURI .We have the impression that the specifications (as copied above) would rather require fURI = fstURI(,  ), but we contacted the authors, and we checked some online verifiers that are widely adopted.There seems to be a general agreement that fURI = fstURI( +absURI,  ) is the correct formula (see the Supplementary Material for a concrete example).This is a typical example of the problems generated by natural language specifications, where different readers interpret the same document in different ways, and one needs to discover the current consensus by social interaction and experiments.Formal specifications would be extremely useful to address this kind of problem.
The second rule for "$dynamicRef" (rule ($dynamicRefAsRef)) applies when the initially resolved starting point URI does not include a fragment that was created by the "$dynamicAnchor" keyword, in which case "$dynamicRef" behaves as "$ref".

Compressing the context by saturation
The only rule that depends on the context  is rule ($dynamicRef) that uses fstURI( + absURI,  ) to retrieve, in  + absURI , the first   that identifies a schema that contains f as a dynamic anchor.When   is already present in , its addition at the end of  does not affect the result of fstURI, hence, for each   , we could just retain its first occurrence in .Let us define +?  , that we read as  saturated with   , as +?  =  when   ∈  and +?  =  +  when   ∉ .By the observation above, we can substitute  +   with +?  in the premises of rules ($ref), ($dynamicRef), and ($dynamicRefAsRef), obtaining, for example the following rule.
This observation will be crucial in our complexity evaluations.From now on, we adopt this version of the reference rules, and we assume that contexts are URI lists with no repetition.

Ruling out infinite proof trees
A proof tree is a tree whose nodes are judgments and such that, for every node  of the tree, there is a deduction rule that allows  to be deduced from its children.A judgment  is proved when there is a finite proof tree whose root is  .
The naive application algorithm, given a triple ,  , , builds the proof tree rooted in  ⊢ S  :  → (, ) by finding the deduction rule whose conclusion matches ,  , , and by recurring on all the judgments in its premises.
The naive algorithm would produce an infinite loop when applied to a triple (, ,  loop ), for any  and  , which reflects the fact that any proof tree whose root is  ⊢ S  :  loop → (, ) is infinite.
The JSON Schema specifications forbid any schema which may generate infinite proof trees.Pezoa et al. [Pezoa et al. 2016] formalized this constraint for Classical JSON Schema as follows (we use the terminology of [Attouche et al. 2022]).
Definition 1 (Unguardedly references in Classical JSON Schema; well-formed schema).Given a closed Classical JSON Schema schema , a subschema   of  "unguardedly references" a subschema   of  if the following three conditions hold: (1) "$ref" : absURI •"#"•f is a keyword of a subschema  ′  of   (that is, "$ref" : absURI •"#"•f is one of the fields of the object  ′  ); (2) every keyword (if any) in the path from   to  ′  is a boolean applicator; (3) "$ref" : absURI •"#"•f refers to   , that is: get(load(absURI ), f ) =   , A closed schema  is well-formed if the graph of the "unguardedly references" relation is acyclic.
For example, the schema  loop above unguardedly references itself, hence it is not well-formed; instead, the only reference in the following schema is guarded by a "properties" keyword, hence the schema is well-formed.
Definition 2 (Unguardedly references for Modern JSON Schema, well-formed).Given a closed Modern JSON Schema schema , a subschema   of  "unguardedly references" a subschema   of  if either   "unguardedly references" subschema   accordingly to Definition 1, or if: (1) "$dynamicRef" : absURI A closed schema  is well-formed if the graph of the "unguardedly references" relation is acyclic.
For example, consider a closed schema embedding the two schemas in Figure 2 and 3. Let us use   to indicate the subschema that immediately encloses the dynamic reference "$dynamicRef" : "https://mjs.ex/simple-tree#tree" in Figure 2. The subschema   unguardedly references the entire schemas of Figure 2 and of 3, since both contain a dynamic anchor tree.However, the schema of Figure 2 does not unguardedly references itself, nor the schema of Figure 3, since its subschema   is guarded by the intermediate "properties" keyword.Moreover, no other subschema references   , hence the graph of the unguardedly references relation is acyclic.
When a closed schema  is well formed, then every proof about any subschema of  is finite.
Theorem 3 (Termination).If a closed schema  is well formed, then, for any  that is formed using URIs of , for any  ,  , and , there exists one and only one proof tree whose root is  ⊢ S  :  → (, ), and that proof tree is finite.
From now on, we will assume that our rules are only applied to well-formed schemas, so that every proof-tree is guaranteed to be finite.Of course, all schemas that we will use in our examples will be well-formed.

PSPACE HARDNESS: USING DYNAMIC REFERENCES TO ENCODE A QBF SENTENCE
Dynamic references add a seemingly minor twist to the validation rules, but this twist has a dramatic effect on validation complexity.We prove here that dynamic references make validation PSPACEhard, by reducing quantified Boolean formulas (QBF) validity, a well-known PSPACE-complete problem [Stockmeyer 1976;Stockmeyer and Meyer 1973]  Observe that the actual value of  is irrelevant: the schema   is either satisfied by any instance, or by none at all, which means that   is a trivial schema, where trivial indicates a schema that returns the same result when applied to any instance  , as happens for the schemas true and false.Hence, we actually prove that the validation problem is PSPACE-hard even when restricted to trivial schemas only.We start with an example.Consider the following QBF formula: ∀1.∃2.(1∧2) ∨ (¬1∧¬2); Figure 7 shows how it can be encoded as a JSON Schema schema.
For each variable  we define a resource "urn:truex"• (lines 5-13 and 23-29 of Figure 7), which defines two dynamic schemas, one with plain-name "x"• and value true, and the other one with plain-name "not.x"• and value false 9 (lines 8-9 and 26-27).For each variable  we also define a resource "urn:falsex"• (lines 14-22 and 30-36), where, on the contrary, "x"• has value false, and "not.x"• has value true.
9 More precisely, it is "anyOf" : [false], since we cannot add an anchor to a schema that is just false; "anyOf" : [true] in the body of "x" • is clearly redundant, and is there only for readability.
Now we describe how we encode the quantifiers.The first quantifier is encoded in the root schema; if the quantifier is ∀, as in this case, then we apply "allOf" to two references (line 3), one that checks whether the rest of the formula holds when 1 is true, by invoking "urn:truex1#afterq1", which sets 1 to true by bringing "urn:truex1" in scope, and the second one that checks whether the rest of the formula holds when 1 is false, by invoking "urn:falsex1#afterq1", which sets 1 to false by bringing "urn:falsex1" in scope.
The formulas "urn:truex1#afterq1" and "urn:falsex1#afterq1", which are identical, encode the evaluation, in two different contexts, of the rest of the formula ∃2.(1 ∧ 2) ∨ (¬1 ∧ ¬2).They encode the existential quantifier (lines 12 and 21) in the same way as the universal one in line 3, with the only difference that "anyOf" substitutes "allOf", so that "urn:truex1#afterq1" holds if the rest of the formula holds for at least one boolean value of 2, when 1 is true, and similarly for "urn:falsex1#afterq1" when 1 is false.This technique allows one to encode any QBF formula with a schema whose size is linear in the size of the formula  1  1 . . .    .: the size of "urn:phi#phi" is linear in | |, and the rest of the schema is linear in | 1  1 . . .    |.
We now formalize this encoding.

VALIDATION IS IN PSPACE
We present here a polynomial-space validation algorithm, hence proving that the PSPACE bound is tight.To this aim, we consider the algorithm that applies the typing rules through recursive calls, using a list of already-met subproblems in order to cut infinite loops.This list could be replaced by a static check of well-formedness, but we prefer to employ this dynamic approach since the list is useful for the complexity evaluation.
For each schema, Algorithm 1 evaluates its keywords, passing the current value of the boolean result and of the evaluated children from one keyword to the next one.Independent keywords (such as "anyOf" and "patternProperties") execute their own rule and update the current result and the current evaluated items using conjunction and union, as dictated by rule (klist-(n+1)), while each dependent keyword, (such as "unevaluatedProperties"), updates these two values as specified by its own rule.
Function SchemaValidate (Cont, Inst, Schema, StopList) applies Schema in the context Cont, that is a list of absolute URIs without repetitions, to Inst, and uses StopList in order to avoid infinite recursion.The Cont list is extended by evaluation of dynamic and static references using the function Saturate (Cont, URI ) (line 30), which adds URI to Cont only if it is not already there.The StopList records the (Cont, Inst, Schema) triples that have been met in the current call stack.It stops the algorithm when the same triple is met twice in the same evaluation branch, which prevents infinite loops, since any infinite branch must find the same triple infinitely many times, because every instance and schema that is met is a subterm of the input, and only finitely many different contexts can be generated.
We prove now that this algorithm runs in polynomial space.To this aim, the key observation is the fact that we have a polynomial bound of the length of the call stack.The call stack is a sequence of alternating tuples SchemaValidate (Cont, Inst, Schema, StopList) -KeywordValidate (. . . ) -k(...) -SchemaValidate (Cont ′ , Inst ′ , Schema ′ , StopList ′ ), where (. . . ) is the keyword-specific function invoked by KeywordValidate.We focus on the sequence of SchemaValidate (Cont, Inst, Schema, StopList) tuples, ignoring the intermediate calls.This sequence can be divided in at most  subsequences, if  is the input size, the first one with a context that only contains one URI, the second one with contexts with two URIs, and the last one having a number of URIs that is bound by the input size, since no URI is repeated twice in a context.In each subsequence all the (Inst, Schema) pairs are different, since the stoplist test would otherwise raise a failure.Since every instance in a call stack tuple is a subinstance of the initial one, and every schema is a subschema of the initial one, we have at most  2 elements in each subsequence, hence the entire call stack never exceeds  3 .We finally observe that every single function invocation can be executed in polynomial space plus the space used by the functions that it invokes, directly and indirectly; the result follows, since these functions are never more than  ( 3 ) at the same time.This is the basic idea behind the following Theorem, whose full proof can be found in the Supplementary Material.Since our algorithm runs in polynomial space, the problem of validation for Modern JSON Schema is PSPACE-complete.
Theorem 7.For any closed schema  and instance  whose total size is less than , Algorithm 1 applied to  and  requires an amount of space that is polynomial in .

POLYNOMIAL TIME VALIDATION FOR STATIC REFERENCES
While dynamic references make validation PSPACE-hard, annotation-dependent validation alone does not change the PTIME complexity of Classical JSON Schema validation.We prove this fact here by defining an optimized variant of Algorithm 1 that runs in polynomial time in situations where there is a fixed bound on the maximum number of dynamic references, hence, a fortiori, for schemas where no dynamic reference is present.
Our optimized algorithm exploits a memoization technique: when, during the computation of [|  | ] ⊢ S  :  → (, ), we complete the evaluation of an intermediate judgment  ′ ⊢ S  ′ :  ′ → ( ′ ,  ′ ), we store this intermediate result.However, while there is only a polynomial number of  ′ and  ′ that may be generated while proving [|  | ] ⊢ S  :  → (, ), there is an exponential number of different  ′ , corresponding to different subsets of the URIs that appear in  and to different reordering of these subsets; this phenomenon happens, for example, in our leading example (Figure 7).We solve this problem in the case of a fixed bound on the number of dynamic references by observing that two different contexts  1 and  2 are equivalent, with respect to a specific validation problem  ′ ⊢ S  ′ :  ′ → ( ′ ,  ′ ) , when the two resolve in the same way any dynamic reference that is actually expanded during the analysis of that specific problem.In the bounded case, this equivalence relation on contexts has a polynomial number of equivalence classed, which allows us to recompute the result of  ′ ⊢ S  ′ :  ′ , for a fixed pair  ′ ,  ′ , only for a polynomial number of different contexts  ′ .
In greater detail, our algorithm returns, for each evaluation of  over  in a context , not only the boolean result and the evaluated children, but also a DFragSet, that returns the set of fragment ids  such that fstURI(_,  ) has been computed during that evaluation.For each evaluated judgment  ⊢ S  :  → (, ), we add the tuple (, , , , , DFragSet) to an updatable store.When, during the same validation, we evaluate again  and  in an arbitrary context  ′ , we retrieve any previous evaluation with the same pair (, ), and we verify whether the new  ′ is equivalent to the context  used for that evaluation, with respect to the set of fragments that have been actually evaluated, reported in the DFragSet; here equivalent means that, for each fragment  in DFragSet, fstURI(,  ) and fstURI( ′ ,  ) coincide.If the two contexts are equivalent, then we do not recompute the result, but we just return the previous (, , DFragSet) triple.It is easy to prove that, when the number of different dynamic references is bounded, this equivalence relation has a number of equivalence classes that is polynomial in the size of , hence that memoization limits the total number of function calls below a polynomial bound.
For simplicity, in our algorithm we keep the UpdatableStore and the StopList separated; it would not be difficult to merge them in a single data structure that can be used for the purposes of both.We show here how SchemaValidate changes from Algorithm 1.In the Supplementary Material we also report how KeywordValidate is modified.This optimized algorithm returns the same result as the base algorithm, and runs in polynomial time if the number of different dynamic fragments is limited by a fixed bound.
Theorem 9. Consider a family of closed schemas  and judgments  such that (| | + | |) ≤ , and let  be the set of different fragments  that appear in the argument of a "$dynamicRef" : initURI•"#"•f in .Then, Algorithm 2 runs on  and  in time  ( +| | ) for some constant .
Corollary 10.Validation is in PTIME for every family of schemas where the maximum number of different fragments that are argument of "$dynamicRef" is bounded by a constant.references; this allows us to reuse results and algorithms that have been defined for Classical JSON Schema.Specifically, we will prove here that dynamic references can be eliminated from a schema, and substituted with static references, at the price of a potentially exponential increase in the size of the schema.This entails that the data complexity of the problem is polynomial (Corollary 12).
A dynamic reference "$dynamicRef" : initURI • "#" • f is resolved, during validation, to a URI reference fstURI(+?initURI,  ) • "#" • f that depends on the context  of the validation (Section 4.6), so that the same schema  behaves in different ways when applied in different contexts.This context-dependency extends to static references: a static reference "$ref" : absURI •"#"•f is always resolved to the same subschema, however, when this subschema invokes some dynamic reference, directly or through a chain of static references, then the validation behavior of this subschema depends on the context, as happens with "$ref" : "urn:phi#phi" in our example, which is a static reference, but the behavior of the schema it refers to depends on the context.
To obtain the same effect without dynamic references, we observe that, if the context  is fixed, then every dynamic reference has a fixed behavior, and it can be encoded using a static reference "$ref" : fstURI(+?initURI,  ) •"#"•f .Every dynamic reference can be eliminated if we iterate this process by defining, for each subschema  ′ and for each context , a context-injected version CI(,  ′ ), which describes how  ′ behaves when the context is .The context-injected CI(,  ′ ) is obtained by (1) substituting in  ′ every dynamic reference "$dynamicRef" : initURI•"#"•f with a static reference to the context-injected version of the schema identified by fstURI(+?initURI,  ) •"#"•f , and (2) substituting every static reference "$ref" : absURI • "#" • f with a static reference to the context-injected version of the schema identified by absURI •"#"•f .Step ( 2) is crucial, since a static reference may recursively invoke a dynamic one, hence the context must be propagated through the static references.
The complete unfolding is in the Supplementary Material.
We can now give a formal definition of the translation process.For simplicity, we assume that all fragment identifiers referred by "$ref" are plain-names defined using "$anchor", without loss of generality, since JSON Pointers can be easily translated using the anchor mechanism.
Given a judgment  0 with base URI , we first define a "local" translation function CI that maps every pair (, ), where  is a list of URIs from  0 without repetitions and  is a subschema of  0 , into a schema without dynamic references, and that maps every pair (, ) to a keyword without dynamic references.This function maps references as specified below, and acts as an homomorphism on all the other operators, as exemplified here with "anyOf".
Consider now a schema  0 and the set C of all possible contexts, that is, of all lists with no repetitions of absolute URIs of resources inside  0 ; a fragment of  0 is any subschema that is identified by a static or a dynamic anchor (e.g., the subschema identified by "urn:phi#phi" is a fragment).The static translation of  0 , Static( 0 ), is obtained by substituting, in  0 , each fragment   identified by absURI•"#"•f with many fragments •f , one for any context  ∈ C, where the schema identified by each absURI • "#" • • f is CI(,   ), as exemplified in the Supplementary Material.If we have   absolute URIs in  0 , we have Σ  ∈ {| 0...  |} (!) lists of URIs without repetitions, hence, if we have   fragments, the possible (,   ) pairs are (Σ  ∈ {| 0...  |} (!)) ×   , which is included between   !×  and (  +1)!×  .This exponential expansion was to be expected, since this transformation can be used to reduce the validation problem of  using  0 , that is PSPACE-complete with respect to | | + | 0 |, to validation using Static( 0 ), which is PTIME with respect to | | + |Static( 0 )|.
We can now prove that this process preserves schema behavior.
Theorem 11 (Encoding correctness).Let  be a closed schema with base URI b.Then: PTIME data complexity.There are situations where the schema is fixed and has a very small size by comparison to the instance size, hence it is important to understand how the cost of evaluating [|  | ] ⊢ S  :  depends on the size of  , when  is fixed; this is analogous to the notion of data complexity that is standard in the database field [Vardi 1982].
The construction that we presented allows us to reduce the problem of validating arbitrary instances using a fixed schema  to the same problem using a fixed schema   that contains no dynamic references.By Corollary 10, this problem is in PTIME, hence the same holds for the problem of validating arbitrary instances using any fixed schema.
Corollary 12 (Instance complexity).When  is fixed, the validation problem Fixed-schema complexity is similar to data complexity in query evaluation, but the parallelism is not precise: while queries are, in practical cases, almost invariably much smaller than data, there are many situations where JSON Schema documents are bigger than the checked instances, for example when complex schemas are used in order to validate function parameters.

EXPERIMENTS
We implemented Algorithm 2 for the entire JSON Schema language in Scala, applying the rules described here and in the Supplementary Material.

Correctness of formalization
We applied our algorithm to the official JSON Schema test suite [Bergman 2023b]. 11We pass all tests apart 25 pertaining to schemas using the following characteristics: special characters in patterns, references with an empty label or a label with special chars, unknown keywords, a vocabulary different from JSON Schema, a decimal with a high precision.This experiment shows that the rules that we presented, and which are faithfully reflected by our algorithm, are correct and complete with respect to the standard test suite.

Complexity
We have proved that validation in Modern JSON Schema is PSPACE-complete in the presence of dynamic references, while it is in PTIME when dynamic references are not present.In the upcoming experiment, we test (1) whether there exist families of schemas where this difference is reflected by considerable validation times, and (2) whether this difference already manifests with small schemas.
Schemas.We designed three families of schemas that are universally satisfiable, and we validate the JSON instance null against them.
The corresponding stat schema family comprises Draft-04 schemas "stat1.js" up to "stat100.js",where each keyword "$dynamicRef" is just substituted with the keyword "$ref", without applying the expansion that we described in Section 8.
Validators.For third-party validators, we employ the meta-validator Bowtie [Bergman 2023a], which invokes validators encapsulated in Docker containers.We tested all 16 different open-source validators currently provided by Bowtie that support Draft 4 or Draft 2020.They are written in 11 different programming languages, as detailed in Table 3 of the Supplementary Material.We also integrated within Bowtie the validator from [Pezoa et al. 2016], as well as our own.Execution environment.Our execution environment is a 40-core Debian server with 384GB of RAM.Each core runs with with 3.1Gz and CPU frequency set to performance mode.We are running Docker version 20.10.12,Bowtie version 0.67.0, and Scala version 2.12.
All runtimes were measured as GNU time, averaged over five runs, and include the overhead of invoking Bowtie and Docker.We overrode the default timeout setting in Bowtie, to allow for longer-running experiments. 11We only focus on main schemas and do not consider the optional ones.Plotted lines terminate when either the validator produces a logical validation error, a runtime exception (most commonly, a stack overflow), or when Bowtie reports no response by the validator.
Results.In Figures 8a, 8b, and 8c we provide the results of our evaluation.In all figures, the x-axis indicates the  index of schemas, while the y-axis reports the runtime.The results are perfectly coherent with the theoretical results in the paper, as we elaborate next.
Results on the "dyn" and "stat" schemas show that, on this specific example, the difference in the asymptotic complexity of the static and the dynamic version is extremely visible: if we focus on our validator (red line), we see that validation with dynamic references can become impractical even with reasonably-sized files (e.g., schema "stat5.js"counts fewer than 250 lines when prettyprinted), while the runtime remains very modest when dynamic references are substituted with static references.The runtime on the "dyn_bounded" family reflects Corollary 10, showing the effectiveness of the proposed optimization on this specific example.
The results on the other implementations strongly suggest that many validators have chosen to implement an algorithm that is exponential even when no dynamic reference is present.This is not surprising for the validators designed for Draft 2020-12, since we have been the first to describe an algorithm (Algorithm 2) that runs in polynomial time over the static fragment of Draft 2020-12.It is a bit more surprising for the algorithm published as additional material for [Pezoa et al. 2016]: This validator implements Draft-04, which belongs to Classical JSON Schema.For Classical JSON Schema, the validation problem is in PTIME, as proved for the first time in that same paper.
Discussion.This experiment shows that there are families of schemas where the PSPACE-hardness of the problem is visible (Figure 8a), and that the algorithm we describe in Section 7 is extremely effective when dynamic references are replaced with static references (Figure 8b), or limited in number (Figure 8c).In this paper we focus on worst-case asymptotic complexity, and we do not make claims about real-world relevance of our algorithm, which is an important issue, but is not in the scope of this paper.

RELATED WORK
To the best of our knowledge, Modern JSON Schema has not been formalized before, nor has validation in the presence of dynamic references been studied.
Overviews over schema languages for JSON can be found in [Baazizi et al. 2019a,b;Bourhis et al. 2017;Pezoa et al. 2016].In [Pezoa et al. 2016] Pezoa et al. proposed the first formalization of Classical JSON Schema Draft-04 and studied the complexity of validation.They proved that JSON Schema Draft-04 expressive power goes beyond MSO and tree automata, and showed that validation is PTIME-complete.They also described and experimentally analyzed a Python validator that exhibits good performance and scalability.Their formalization of semantics and validation, however, cannot be extended to modern JSON Schema due to the presence of dynamic references and annotation-dependent validation.
In [Bourhis et al. 2017] Bourhis et al. refined the analysis of Pezoa et al.They mapped Classical JSON Schema onto an equivalent modal logic, called recursive JSL, and studied the complexity of validation and satisfiability.In particular, they proved that validation for recursive JSL and Classical JSON Schema is PTIME-complete and that it can be solved in  (| | 2 | |) time; then they showed that satisfiability for Classical JSON Schema is EXPTIME-complete for schemas without uniqueItems and is in 2EXPTIME otherwise.Again, their approach does not seem very easy to extend to modern JSON Schema, as it relies on modal logic and a very special kind of alternating tree automata.
While we are not aware of any other formal study about JSON Schema validation, dozens of validators have been designed and implemented in the past (please, see [val 2023] for a rather complete list of about 50 implementations).Only some of them (about 21), like ajv [ajv 2023] and Hyperjump [hyp 2023], support modern JSON Schema and dynamic references.These validators usually compile schemas to an efficient internal representation, that is later used for validation purposes.ajv, for instance, uses modern code generation techniques and compiles a schema into a specialized validator, designed to support advanced v8 optimization.
Validation has been widely studied in the context of XML data (see [Martens et al. 2009[Martens et al. , 2006], for instance).However, schema languages for XML are based on regular expressions, while JSON Schema exploits record types, recursion, and full boolean logics, and this makes it very difficult to import techniques from one field to the other.
Schema languages such as JSON Schema and type systems for functional languages are clearly related, and a lot of work has been invested in the analysis of the computational complexity of type checking and type inference for programming languages and for module systems (we will only cite [Henglein and Mairson 1994], as an example).We are well aware of this research field, but we do not think that it is related to this specific work, since in that case the focus is on the analysis of code while JSON Schema validation analyzes instances of data structures.

CONCLUSIONS AND OPEN PROBLEMS
Modern JSON Schema introduced annotation-dependent validation and dynamic references, whose exact interpretation is regarded as difficult to understand [Neal 2022], [Jacobson 2021].The changes to the evaluation model invalidate the theory developed for Classical JSON Schema.
Here we provide the first published formalization for Modern JSON Schema.This formalization provides a language to unambiguously describe and discuss the standard, and a tool to understand its subtleties, and it has been discussed with the community of JSON Schema tools developers.The formalization has been expressed as a Scala program, which passes the tests of the standard JSON Schema validation test suite, and is available in the Supplementary Material.
We use our formalization to study the complexity of validation of Modern JSON Schema.We proved that the problem is PSPACE-complete, and that a very small fragment of the language is already PSPACE-hard.We proved that this increase in asymptotic complexity is caused by dynamic references, while annotation-dependent validation without dynamic references can be decided in PTIME.We have defined, implemented, and experimented an explicit algorithm to this aim.
We defined a technique to eliminate dynamic references, at the price of a potential exponential increase in the schema size, and we used it to prove that data-complexity of validation is in PTIME.
Many interesting problems remain open, such as the definition of a new notion of schema equivalence and inclusion that is compatible with annotation-dependent validation, the study of its properties, and the study of the computational complexity of the problems of satisfiability, validity, inclusion, and example generation.Theorem 3 (Termination).If a closed schema  is well formed, then, for any  that is formed using URIs of , for any  ,  , and , there exists one and only one proof tree whose root is  ⊢ S  :  → (, ), and that proof tree is finite.
Proof.We prove a more general property: assume that a closed schema  0 is well-formed, and consider any subschema  of  0 , consider any keyword  that appears in such a subschema , and consider any context  that is formed using URIs of the schema .Then: • there exists one unique finite proof tree whose root is  ⊢ S  :  → (, ) (1) • and there exists one unique finite proof tree whose root is  ⊢ K  :  → (, ) (2).
In order to prove (1) and ( 2), we first consider the graph  of the "unguardedly references" relation for  0 , and we define the degree  () of any subschema  of  0 as the length of the longest path in  that starts from ;  () is well-defined since  is acyclic by hypothesis; the degree of any keyword  () is defined as the degree of the schema that immediately encloses the keyword.
We analyze all operators and we observe that, for every judgment  ⊢ S  :  → (, ) or  ⊢ K  :  → (, ), there is always exactly one rule that can be applied.
For the structural applicators, we conclude by induction on  .
For the referencing applicators "$ref" and "$dynamicRef", the premises are applied to the same  , and we conclude by induction of  ().
For a boolean applicator , the premises are applied to the same  , and all the schemas  ′ that are arguments of the applicator are unguarded, hence their degree is either smaller than  (), or equal.When it is strictly smaller, we conclude by induction on  ( ′ ) <  ().Proof.Consider a QBF formula  =  1  1 . . .    . and the corresponding schema   .We say that a context  is well-formed for (  , ) with  ≤  if (1) all of its URIs denote a resource of   and (2) for every  ≤  either "urn:truex"• ∈  or "urn:falsex"• ∈  holds, but not both of them and (3) for every  >  neither "urn:truex"• ∈  nor "urn:falsex"• ∈ .We say that a context  is well-formed for   if there exists  such that  is well formed for (  , ).The index of a well-formed context , Index (), is the only  such that  is well formed for (  , ) We associate every well-formed context  to an assignment   defined over  1 , . . .,  Index () as follows: (  ) = T ⇔ "urn:truex" The evaluation of every subschema of   either returns (T , ∅) or (F , ∅), since no annotation is generated.We will prove, by induction, the following properties that describe how the boolean returned by some crucial subschemas are related to the validity of the subformulas of  , where we context it receives or a context that contains one more URI, which happens when the keyword analyzed is either "$ref" or "$dynamicRef" and the target URI was not yet in the context.
Property (3) holds since no two elements of the stoplist can be equal, since a failure is raised when the input is already in the StopList parameter.All elements in a sublist have the same context, hence they must differ either in the instance or in the schema.The instance must be a sub-instance of the input instance, hence we have at most  choices.The schema must be a sub-schema of the input schema, hence we have at most  choices.Hence, the size of each sublist   is at most  × .
The combination of these three properties implies that the StopList parameter contains at most ( × ) × ( + 1) elements, since it can be decomposed in at most  + 1 sublists, which implies that this parameter has a polynomial size, and that the call stack, by property (1), has a polynomial depth.
We now provide the rest of the proof, which is quite straightforward.We observe that SchemaValidate (Cont,Inst,Schema,StopList) scans all keywords inside Schema in sequence, and it only needs enough space to keep the pair (Result,Eval) between one call and the other one, and the size of this pair is in  (), since the list Eval of evaluated properties or items cannot be bigger than the instance.Since we reuse the same space for all keywords, we only need to prove that each single keyword can be recursively evaluated in polynomial space, including the space needed for the call stack and for the parameters.
Every keyword has its own specific algorithm, some of which are exemplified in Algorithm 1.
We first analyze "patternProperties".All the four parameters can be stored in polynomial space; Cont since it is a list of URIs that belong to the schema and because it contains no repetition, and the other parameters have already be discussed.Each matching pair can be analyzed in polynomial space, apart from the recursive call.For the recursive call, the call stack has a polynomial length, and every called function employs polynomial space.
We need to repeat the same analysis for all rules, but none of them is more complex than "patternProperties".For example, the dependent keywords such as "unevaluatedProperties" have one extra parameter that contains the already evaluated properties and items, but it only takes polynomial space, as already discussed.
This completes the proof of the fact that the algorithm runs in polynomial space, hence the problem of validation for JSON Schema 2020-12 is PSPACE-complete.□ D PROOFS FOR SECTION 7 (POLYNOMIAL TIME VALIDATION FOR STATIC REFERENCES) Theorem 8. Algorithm 2 applied to (, , , ∅, ∅) returns (, , ), for some , if, and only if,  ⊢ S  :  → (, ).
This means that when we say "for any . . ." we actually mean "for any list  of tuples (, , ). . .".We first define an enriched version of the typing rules, that defines a judgment  ⊢ Sd  :  → (, , ) that returns in  the fragment names of the dynamic references that are resolved.The most important rule is ($dynamicRef  ), that adds the resolved f to the list .
, and we conclude by induction.The case for ($ref) is similar but simpler.□

F COMPLETE FORMALIZATION OF MODERN JSON SCHEMA
We report here a formalization that describes that entire grammar of Modern JSON Schema, Draft 2020-12, and of all of its typing rules.

F.1 Complete grammar
The complete grammar is reported in Figure 9.This grammar groups the keywords "if"-"then"-"else", and specifies that the presence of any keyword among "if"-"then"-"else" implies the presence of "then" and "else", which is enforced by adding a trivial "then" : {}, or "else" : {}, when one or both are missing; this presentation reduces the number of rules needed to formalize "if"-"then"-"else".In the same way, the grammar groups "contains"-"minContains"-"maxContains" and imposes the presence of "minContains" when any of the three is present, which is enforced by adding the default "minContains" : 1 when "minContains" is missing.

F.2 Terminal keywords
The types and the conditions of the terminal keywords are specified in Table 2. There, the length operator | | counts the number of characters of a string, the number of fields of an object, and the number of elements of an array.names(J) extracts the names of an object.When  is a pattern or a format, we use () to indicate the corresponding set of strings.
When TypeOf (kw) is no type then the assertion does not have the (kwTriv) rule and does not have the condition TypeOf ( ) = TypeOf (kw) in the (kw) rule.

F.4 References
G FUNCTIONS get(,  ) AND dget(,  ) dget(,  ) searches inside  for a subschema { "$dynamicAnchor" :  ,  1 , . . .,   } and returns it.However, it only searches inside known keywords that contain a schema as a parameter, and the search is stopped by the presence of an internal "$id" keyword, because "$id" indicates that the subschema is a separate resource, with a different URI.
The function dget(,  ) is defined as dgets(StripId(),  ), where StripId() removes any outermost "$id" keyword from .This is necessary since dgets interrupts its search when it meets a "$id" keyword, but the presence of an "$id" in the outermost schema should not interrupt the search: =  otherwise The functions dgets(,  ) and dgetk(,  ) are defined as follows, where max returns the maximum element of a set that contains either schemas or ⊥, according to the trivial order defined by ⊥ ≤ : that is, max select the only element in the set that is different from ⊥, if it exists and is unique.If we assume that no plain-name is used by two different anchors in the same schema, then, in line 4, there exists at most one value of  such that dget(  ,  ) ≠ ⊥, hence the maximum is well defined, and the same holds for lines 6 and 7. where kwSimPar = {| "not", "contains", "propertyNames", "items", "additionalProperties", "unevaluatedProperties", "unevaluatedItems" | } kwObjPar = {| "$defs", "patternProperties", "properties", "dependentSchemas" | } kwArrPar = {| "anyOf", "allOf", "oneOf", "prefixItems" | } kwOther = Str \ kwSimPar \ kwObjPar \ kwArrPar The four lines of dgetk definition specify that the search for a "$dynamicAnchor" keyword is performed only inside keywords that are known and whose parameter contains a schema object in a known position.For example, the search is not performed inside a user-defined keyword or inside "const" or "default".
When  is a plain-name, then the function get is identical to dget, but it extends case 3, since it matches both "$anchor" and "$dynamicAnchor":  [Bryan et al. 2013]. However, Draft 2020-12 specs [Wright et al. 2022] (Section 9.2.1) specify that the behavior is undefined when the pointer crosses resource boundaries.

111:42
H THE ROLE OF absURI IN "$dynamicRef" : absURI •"#"•f In Remark 1 we discuss the difference between looking for a dynamic reference in  or in +?absURI .We report here a concrete test.
Here is the second part; since "urn:phi#phi" can be reached from four different contexts, we need four different definitions, as follows.In the four cases, the way the dynamic variables "x1" and "x2" are resolved varies depending on the context , encoded in the anchor  •  .

Fig. 4 .
Fig. 4. A schema with embedded resources and its flattened version.

Fig. 8 .
Fig.8.Runtimes of validators on the three schema families, schema increase in size.Third-party validators support Draft4 or Draft2020 (green/blue lines)."mjs" (red line) refers to our implementation.
Fig. 10.Different behavior of several JSON Schema validators.
• ∈    (  ) = F ⇔ "urn:falsex"• ∈  Given a QBF formula  ′ that may contain open variables and an assignment  that is defined for every open variable of  ′ , we use Valid ( ′ , ) to indicate the fact that  ′ is valid when every open variable is substituted with its value in .