A Proxy for Assessing the Automatic Encodability of Regulation

Artificial Intelligence (AI) is already changing the way law is being applied, thereby fundamentally affecting the very core of society. While there is increasing interest in the possibility to create machines that automatically process, adapt, and enforce regulation, cross-disciplinary research between AI and law has not yet determined to what extent such legally intelligent machines can and should be built. This article addresses this gap by providing the first attempt to quantify the automatic encodability of regulations. To do so, we propose an algorithm that first gauges sentence complexity for machines by leveraging natural language processing (NLP) techniques for sentence simplification for open relations extraction systems; in addition, the algorithm assesses word complexity for machines by attempting to link terms to supposed functional requirements, a task that involves finding matching concepts in public ontologies and controlled vocabularies. We apply our methodology to several legislations---a few of which have already been manually transformed into machine-processable form successfully, and others for which the assumption is that they are less encodable due to the many open-textured terms that they contain. This analysis demonstrates coherence with our expectations that those deemed as featuring high ambiguity are less prone to be automatically turned into automatically processable regulation. This research is highly relevant as it provides directions to the AI as well as the legal community, and to interdisciplinary teams, with respect to enabling a nuanced discussion needed within the field on the normative challenges that the automatic processing, adaptation, and enforcement of regulation is already creating, and will trigger in the future.


INTRODUCTION
Over the past decade, several projects from New Zealand to Canada have been conducted in which regulation was turned manually into automatically processable regulation (APR) [GUITTON et al., 2022].Automatically processable regulations are-as the name indicates-executable by a machine.We can arrive at APR in multiple ways: either by manually coding each clause of a specific regulation into a machine-understandable (and executable) format (e.g., in the scope of an individual project [Alauzen 2021]), or with the help of intermediary steps that for instance make use of controlled natural languages (e.g., Attempto Controlled English [Fuchs and Schwitter 1996]) which facilitate or even automate the process of turning language into a (logic) program.So far, APR has been created predominantly by bringing together legal professionals and software engineers to "manually" encode the statutes-a cumbersome process [McNaughton 2020].These projects have nonetheless given rise to increasing debates (especially in the legal community) about the possibility, benefits, and perils of turning regulations into APR automatically [GUITTON et al., 2022;VILLATA et al., 2022]-in other words, with much higher human disintermediation.Issues with this process are numerous [DIVER et al., 2022;Guitton, Tamò-Larrieux and Mayer 2022], and with respect to the scope of this article-developing an estimate on how readily encodable a certain regulation is-three challenges are of particular interest: the challenges arising because law is often formulated in an inherently vague manner [Endicott 2011;Robinson 2022] and vagueness impedes translation into APR; the balancing of interests, an issue fundamental in legal studies and the application of the law [Cobbe 2020;Moses 2020], and which involves definitions in need of context and hence not immutable; and the necessity for the law to evolve when societies change [Hildebrandt 2020], also implying that definitions of terms need to be updated accordingly.
These three challenges are linked with one another and while mitigation strategies to redress some of them exist (e.g., having an interdisciplinary team of researchers working on the encoding of a regulation so as to ensure that such problems receive sufficient attention), these strategies are unlikely to form a perfect antidote to the challenges raised either.For instance, open-texture-vagueness, ambiguity, or abstract concepts-should be specifically acknowledged when considering turning regulations into an automatically processable form, and the extent of open-texture in a regulation is likely to be an important determinant for whether it is feasible to turn the regulation into APR.However, what is missing today is a technically-grounded study of the feasibility of APR that takes into account these challenges.
The "AI and Law" community has notably made several advances in many different directions (natural language processing (NLP) models, specification of legal ontologies, legal rule languages, etc., [Bench-Capon 2022]), but it is currently not known to which extent automatically turning legislation into APR is possible with current technologies.We contend that being able to quantify this would have at least four benefits.First, it would allow decision-makers to better gauge the viability of projects in terms of gains and benefits.This could hence spur further motivation to try to implement such projects, as much in the private as in the public sector.Second, for much criticism of APR, it is unclear to what proportion of regulation it applies to.The quantification of feasibility would hence bring much needed data to the current debate.Third, framing it in terms of what is feasible and what is not forces us to be clear on where the current technical limits lie, thereby guiding research in automatic APR enablers such as natural language processing (NLP).And fourth, mirroring the third point, it would guide the legal field regarding which regulations to focus on for improving the automation-readiness of legal texts.In this paper, we address this gap by proposing an algorithm to estimate the feasibility of turning individual (humanreadable) regulation-and bodies of regulation-into APR.

AN EXAMPLE FROM GDPR
Before we can start to investigate the establishment of a proxy that can indicate to what extent we can turn legislation into APR, we start by giving a very concrete example and illustration of what we mean by "turning regulation into APR"; this should also highlight a few of the challenges that arise both on bringing resources together to express a statute's intended meaning, and on making choices as to this meaning.
Bringing together three vocabularies and standards, the following demonstrates the formalisation of a specific article from the European General Data Protection Regulation (GDPR), namely Art.7(1).While it is hard to argue the novelty of this aspect since it merely combines work published by others, it fills a more and more obvious gap: the computer science community-specifically, the communities on the Semantic Web and on Autonomous Agents and Multiagent Systems-have been working on approaches and vocabularies that permit the integration of legal primitives into automated systems.However, there is today a lack of work that shows the practical use of these systems, i.e., their application in the scope of examples that derive directly from regulation rather than being artificial scenarios.
Several manual efforts have been made to create machinereadable representations of different extracts of law [BONATTI et al., 2020;DE VOS et al., 2019].To turn regulation into such APR, typically at least a controlled vocabulary-which often exists in conjunction with an ontology-is required.Several aspects from different vocabularies and ontologies may be needed to be brought together, however, in order to be able to turn a piece of legislation into APR.This is not trivial, and already integrating these different resources constitute a contribution as they may have been designed with different aims in mind.
One such vocabulary is the Data Privacy Vocabulary (DPV), whose objective is to capture the usage and processing of personal data considering different legislative requirements.1 DPV is being developed by a W3C Community Group of interdisciplinary scholars and interested industry stakeholders.DPV can be used to specify common rules (namely representing permissions, prohibitions, and obligations) associated with personal data handling, but it does not define additional semantics for rules.Its authors consequently recommend considering other proposals to express the rules in a richer way, such as the Open Digital Rights Language (ODRL),2 a W3C-recommended policy expression language, the Shapes Constraint Language (SHACL), 3 which permits to express constraints on the content, structure, and meaning of Resource Description Framework (RDF) graphs, and RuleML [BOLEY et al., 2010] whose goal is to represent rules in a machine-understandable and executable form.In the formalisation of our example, we bring these together, specifically by using DPV to express that there exists an obligation and then integrating this with ODRL to specify the details of this obligation.We furthermore make use of the Friendof-a-Friend Ontology (FOAF), 4 which is a widely used ontology to express information about social agents and their networks and is used and supported by online services such as WordPress.
In our example, this combination-along with several assumptions and interpretations that we discuss in the paper-permits us to express GDPR Art.7(1).This article expresses requirements on data controllers regarding demonstration of consent, and we express it in a machine-readable and understandable way.In addition, we describe concrete run-time instances that may be validated with respect to GDPR Art.7(1).To keep this example illustrative, we do not provide a deep integration with ODRL but merely demonstrate how DPV and ODRL may be used in conjunction to express a legal circumstance.
GDPR Art.7(1) reads as follow: "Where processing is based on consent, the controller shall be able to demonstrate that the data subject has consented to processing of his or her personal data".
A basic assumption we made when turning this statement into APR is that any processing of personal data that is not explicitly permitted within the scope of GDPR Art.7(1) is prohibited; this is similar to the closed-world assumption, which is a fundamental principle used in logic, rule-based systems, and databases [Reiter 1978].With this assumption, we propose to formalize GDPR Art.7(1) as an obligation that the data controller must be able to demonstrate that it has a valid permission to process certain personal data owned by a data subject.According to GDPR Art.7(1), such an obligation becomes active when (a) the action of processing those personal data with a specific process is executed and when (b) the process is based on consent.We base our example on a formalisation of the relevant entities (i.e., the data subject, the data controller, etc.) in such a processing.The full formalisation is given in Appendix A, and we discuss the decisions in our design process in the following, referring to individual Lines of this formalisation as appropriate.
We consider the case where a data controller processes voice data of a data subject.In our example, the respective voice recordings of the data subject, Alice, are held in a personal data store (e.g., a Pod via the Solid system 5 ) where they can be accessed by the software of the data controller, a company called ACME.This software then analyses the data, for instance to manage the data subject's personal calendar or to actuate devices in their smart home.This scenario is described in a machine-understandable form in Appendix A. After specifying the required linked vocabularies (such as DPV, FOAF, and ODRL) through @prefix directives (see Lines 1-8), we define Alice (as a data subject, Lines 10-11) along with her voice data (Lines 13-14), and ACME (as a data controller, Lines 16-17).Note how we make use of published shared vocabularies to declare that Alice is a person (according to FOAF) who is a data subject (according to the DPV vocabulary; both Line 10), that ACME is an organization (according to FOAF) that is a data controller (according to the DPV vocabulary; both Line 16), and that the voice data is personal data (according to DPV, Line 13) of Alice (according to DPV, Line 14).We furthermore specify that Alice's personal data is being handled by ACME: Lines 19-23 explain (to a machine) that there is a PersonalDataHandling of type Analyse by ACME (all three according to DPV, Lines 19-21), that this concerns the voice data (Line 22), and that it has consent as legal basis (according to DPV, Line 23).
These conditions are sufficient to fulfil the requirements set in GDPR Art.7(1) in the specific case of Alice's voice data processing because they describe in the respective vocabularies that the processing (ex:AnalyzeSpeech) of personal data about a Data Subject (ex:Alice) by a Data Controller (ex:ACME) is based on consent (dpv:hasLegalBasis).This is hence sufficient to satisfy the activation clause of GDPR Art.7(1) for this instance: "Where processing [of personal data] is based on consent […]".It should next be expressed in a machine-understandable way that, given this circumstance, GDPR Art.7(1) stipulates that there exists the above-mentioned obligation to demonstrate consent and that this obligation is related to a permission to process the personal data.This permission to process is commonly represented as a record of the given consent itself that the data controller needs to be able to demonstrate.To enable this, we next express this record using DPV and link it to the data controller, data subject, and data processing instances from above.In Appendix A, this is shown in Lines 25-32: we specify a specific ConsentRecord (according to DPV, Line 25) that is linked through DPV to Alice (Line 26) and to the processing of Alice's personal data (and hence to ACME, Line 27).We furthermore use DPV to state that this record expresses that Alice has consented (Line 28) and that the scope of the consent is the European Union (Line 29).Finally, we again use DPV in Lines 30-32 to specify practically 5 https://solidproject.org/ relevant information: the identity of Alice's specific consent record, the time when this consent was given, and a link to ACME's data processing policy.
Finally, we express that the data controller (ex:ACME) needs to be able to demonstrate this consent, where we make the assumption that the presence of the information about the processing of Alice's data (i.e., the aforementioned Lines 19-23 in Appendix A) implies that this processing is actually happening.This can be accomplished by integrating ODRL and DPV, where we express the requirement to be able to demonstrate consent as an obligation on the data controller.This is shown in Lines 34-46 in the Appendix A: we re-use the vocabularies introduced in ODRL and DPV along with terms proposed within the DPV-Legal vocabulary to express the aforementioned obligation on the data controller.That is, that the data controller is assigned the obligation to demonstrate consent, which we formalise as the requirement to keep a consent record (odrl:archive, Line 38) that is active in the European Union (expressed as a spatial constraint in Lines 40-45).We note that our way of representing GDPR Art.7 (1) does not make full use of the semantics of ODRL but merely creates bridges on a level that is sufficient to express what we intend to show with this example.For instance, we do not show the representation of the underlying permission to process, which would be activated by the data subject giving the authorisation to process (in terms of ODRL).We furthermore decided to model the action that is connected to the obligation of the data controller to be able to demonstrate that the data subject has consented as an odrl:archive action.Strictly, this only means that the data controller has the consent record "stored (in a nontransient form)".However, we argue that for practical purposes this is closest to the necessary precondition to be able to demonstrate consent.And lastly, to highlight yet another limitation of this very approach: it would currently not be possible using ODLR to express that the permission to process data, given via given demonstrated consent, has to be active only when the actual processing of the data is performed, as ODRL has no formal operational semantics allowing it to represent a permission as active or inactive.
Expressing a specific circumstance in the proposed way and then automatically monitoring it with respect to GDPR Art.7(1) is possible because both the DPV and ODRL vocabularies are available and agreed-upon by the involved stakeholders, and because the statements above express GDPR Art.7(1) in terms of DPV and ODRL.What we are interested in this article is to investigate under which circumstances it is possible to express human-readable regulation in this way.Evaluating this is relevant because the output from a (fictitious) system that would automatically create APR from human-readable inputs could be used in many ways to increase both the efficiency and the accessibility to regulation-a few examples are given here: • It would permit mapping legal obligations to the program code of a device that records personal data and would thereby permit enforcing the processing of this data in a compliant way [GARCÍA et al., 2021].• The generated APR could be subjected to automatic checks to find inconsistencies or loopholes in the regulation [Morris 2020].
• The system could check whether implementations of processes or systems are compliant, and could deliver a certification [Bonatti, Kirrane, Petrova and Sauro 2020].• In the case of APR used in the context of specific benefits and taxes, the system could check the eligibility of a person to the ever-evolving and complex requirements that an update to the legislation brought about [Alauzen 2021].

METHODOLOGY
As demonstrated through the above discussion and example, the (manual) creation of APR often requires interpretation and making implicit or explicit assumptions-already for seemingly small and simple clauses such as the GDPR Art.7(1).It is hence unlikely that it is, in general, possible to turn regulation into an automatically processable form.On this basis, in this paper, we ask whether it is possible to create a proxy that would permit us to determine the extent to which a specific regulation may be turned into APR.To this end, we set out to derive an algorithm that would automatically quantify to what extent the automatic derivation of APR from human-readable regulatory text such as the above example is possible with software that is available today.It is certainly so that there is a gap between advances in research and software implementing these advances, as the most recent progresses do not necessarily come bundled with software.We seek with this project to bridge the research frontier with current possible applications-and that in and of itself means that we seek to establish a trade-off which is going to be tilting more towards possible applications than research (be it theoretical, or only applicable to a limited subset of text types).
We surveyed as such recent scholarly publications in the field of automatic semantic analysis of (legal) texts, or automatic extraction of logic from text.As we are interested in measuring the feasibility of each clause in a legal document, we put aside any studies which did not focus on clauses but which were instead trying to automatically extract the meaning of a full text [WU et al., 2021], common in summarizers.We filtered publications whose code base was public, and had to discard publications whose findings would have been relevant but for which there was no code [DRAGONI et al., 2016;Joshi and Saha 2020;Lame 2004].We similarly discarded articles which did not focus on English [LENCI et al., 2007;TAKANO et al., 2010].We briefly considered one parser which would help turn simple sentences into code, notably with controlled language (Attempto Controlled English [Fuchs and Schwitter 1996]), but discarded it in the face of insufficient results with even simple sentences which the parser could not process.We then tested the experimental codes from four studies which matched these criteria [CETTO et al., 2018;Dong and Lapata 2016;Dong and Lapata 2018;PERTIERRA et al., 2017].We note that they mostly use different approaches to the problem, from re-using different semantic parsers [Pertierra, Lawsky, Hemberg and O'Reilly 2017], to building their own [Cetto, Niklaus, Freitas and Handschuh 2018;Dong and Lapata 2016].Of these four, three were dependent on external libraries, which led to a cycle of finding the exact right configurations to run them; only one [Cetto, Niklaus, Freitas and Handschuh 2018] offered a great ease of use which would allow anyone to replicate this very algorithm with minimum resource investments-and which would hence guarantee that other (legal) professionals could obtain statistics on other pieces of legislation of interest; it relied only on Stanford CoreNLP, a resource currently widely available.
Note, however, that the concept of metrics for feasibility we present in this article-with its two axes, one for word complexity and one for sentence complexity-is independent of the actual underlying technology.This means that other researchers can re-use the feasibility metrics to adapt it to the evolution of available meaningful technology.Furthermore, the word complexity [Cetto, Niklaus, Freitas and Handschuh 2018]

AUTOMATIC FEASIBILITY ESTIMATION
Our main assumption regards how legal professionals read and process legal texts.As an example, in the context of data protection, if a person is dealing with understanding notification duties for a security breach, this person will look at those provisions only.Within the same law, they may need to look at other terms or clauses which are linked to other legislation or definitions, but the clause would then explicitly or implicitly reference it [Bonatti, Kirrane, Petrova and Sauro 2020].Similarly in the case of a question on access rights, other articles would be at the centre of attention, and so on.Hence, we argue that, depending on the context at hand, in law, norms (articles/clauses) can be looked at individually, taking other provisions only into account if a specific reference is made or necessary in a given context.We acknowledge that there is a long debate in legal scholarship on how to interpret norms ranging from textualism vs. purposive interpretation to applying a mixture of interpretation elements [Greenberg 2021].The theory of textualism does simplify the translation of natural language into code and it is thus unsurprising that research on APR has based itself on this school of interpretation (a that critiques have emerged [BOELLA et al., 2016]).While more inclusive (pluralistic) approaches exist, the mapping of the network of statutes and case law that a piece of legislation relies on for interpretation is still in its infancy [Ashley 2017].
When taking clauses individually, different institutions could interpret these differently.The executive initiating a legislative proposal may assign a different meaning to the parliamentarians voting on the law, and words or clauses left vague on purpose for future interpretation by a judiciary authority [Endicott 2011] could also diverge from initial intentions.The existence of these different interpretations is an inherent feature of the legal system in democracies [Charnock 2006;Shane 2003].What we posit is that we should be able to query different institutions for meaning of words and integrate different responses.In this early work, this translates into counting as encodable as soon as at least one institution can return the meaning of a word.The impact of implicit overarching rules influencing the meaning the words should, ideally, be reflected in the definitions returned when querying a database regardless of the executive/legislative/judiciary institution behind it.While this may not be the case today, we also posit that the algorithm presented here is a support for a decision to embark on an APR project and that the definitive implemented interpretation should offer the opportunity for review by legal experts who could modulate the approach by adding nuances with any overarching legal references which might not have been captured so far.Based on the available approaches to text understanding, we combine two metrics to determine the feasibility of turning a norm into APR: 1. Sentence Complexity: On the one hand, we estimate whether the sentence can be automatically processed to yield a representation of the relationships between tokens, where all relationships should be semantically qualified in a well-known or even standard vocabulary.If the relationships in a sentence cannot be (syntactically) detected or cannot be (semantically) interpreted by matching them to a standard vocabulary, a sentence will rank low on this metric.We refer to this metric as "sentence complexity" since it captures whether the structure of a sentence can be automatically broken down into simpler, shorter sub-sentences that can then be turned into semantic triples, i.e., the overall complexity of finding relationships between tokens in a sentence.
2. Word Complexity: On the other hand, we estimate whether it is possible to match individual tokens in the input sentence with terms that are well-defined in a publicly available well-known or even standardised vocabulary.A sentence that contains many words that cannot be matched, or only with high ambiguity, will rank low on this metric.We refer to this metric as "word complexity"-the overall complexity of automatically interpreting individual words in the sentence-since it captures whether individual tokens in a sentence exist, can be found in existing vocabularies, and whether the matches are of sufficiently high confidence to indicate that the meaning of the words in the vocabularies sufficiently matches the meaning of the words in the given context.To support this, we use vocabularies that cater to the legal domain and we set high acceptance thresholds for words.
In our approach, the word "complexity" is hence used as a portmanteau for either how difficult it can be to extract triples, or, to which extent there is little or no ambiguity with individual tokens.The better the sentence can be broken down into qualified triples, the lower the sentence complexity score.Similarly, the more words within a clause are grounded in existing vocabularies or even ontologies, the lower the word complexity for a sentence.As these are two very different metrics and we want to retain their value, we keep them separate, mapping the feasibility onto a two-dimensional space.No matter what technology is used to automatically turn a regulation into APR in the future, we claim that the core twopronged question remains: how complex a sentence is on the one hand, and how complex it is to map words, on the other.In the following, we describe how we evaluate these two metrics for a given input.

An Algorithm to Automatically Determine Feasibility
In the following, we detail the steps of a process that we propose to automatically determine the feasibility of turning a piece of regulation into APR; this follows a host of prior research [FERRARO et al., 2020] and our considerations on sentence and word complexity from the previous section.Figure 1 gives an overview of our proposed algorithm:6 Given a legal text, we separate its individual clauses und then feed these clauses to three different modules: Relation Extraction is based on Graphene [Cetto, Niklaus, Freitas and Handschuh 2018] (see Section 4.1.1).This extracts relevant relationships from sentences, simplifies them, and allows the computation of the sentence complexity metric.Entity Linking considers widely used public vocabularies (see Section 4.1.2),and we employ several manually created heuristics for mapping frequently occurring terms in legal texts (see Section 4.1.3);both of these modules contribute to calculating the word complexity metric.Finally, word and sentence complexity are combined to classify a clause as difficult to encode, encodable, or uncertain.In the following, we detail each module of our proposed algorithm.

Relation extraction and sentence complexity.
The goal of this step is to break down the sentence into sub-sentences, so that the sub-sentences can fit within <subject, predicate, ob-ject> triples as illustrated in Section 2 (e.g., ex:AnalyzeSpeech dpv:hasProcessing dpv:Analyse).To accomplish this, we use the Graphene tool for relation extraction with coreference (e.g.identifying what a pronouns refers to) and discourse simplification.Graphene relies on Stanford Core NLP to identify the function of words in a sentence, a tool we will reuse separately again in entity linking (see below).Concretely, we apply Graphene to process each clause in a regulation to obtain information about sentence complexity.Consider, as an example, the following sentence from GDPR, Art.1(1): "This Regulation lays down rules relating to the protection of natural persons with regard to the processing of personal data and rules relating to the free movement of personal data".Graphene turns this into 3 sub-sentences of the type <arg1, relation, arg2> (see Table 1): These sub-sentences bring us one step closer to a machinereadable representation of each sentence, since ontologies aim at creating a representation of a domain by utilising a triple pattern <arg1, relation, arg2>.The remaining complexity depends on arg2-arg1 is always short, as is the relation-but arg2 can remain long.Ideally, the triple would be constituted of single words, all easily identifiable as concepts in an ontology.This is why we take for the sentence complexity the average length of arg2 in relation to the length of the full original sentence. 7We then obtain the metric per clause for sentence complexity with: Average of the number of words in Arg2 Total length of words of the original input sentence

Arg1
Relation Arg2 This is relating to the free movement of personal data This Regulation lays down rules Rules are relating to the protection of natural persons In the example above, this would translate to a sentence complexity score of 0.16.In this way, our algorithm obtains a sentence complexity score for each sentence in a given regulation, permitting a comparison of the complexity if these sentences within a piece of regulation, across regulations, and across regulatory fields (e.g., of tax law vs.human rights law).To compute the next score, word complexity, we need first to look into entity linking and heuristics.
4.1.2Entity linking.Next, our algorithm estimates the complexity of the individual words in a clause.This is done through Entity Linking together with several heuristics (see next section).In Entity Linking, we take into consideration the semantics of the terms.To accomplish this, we first distinguish the function of the words in a clause by leveraging Stanford Core NLP to identify the function of words.Then, we query appropriately existing ontologies and vocabularies to obtain the word complexity metric.We acknowledge that by doing so, the metric is dependent on the dictionaries used but we argue that this is acceptable as legal definitions are domain dependent and as such rely on a particular contextualised definition anyway (thus implying that using any dictionary definition would be insufficient as not accounting for the legal contexts, jurisdictions, interpretation of the law, and additional factors).
We query seven knowledge bases, five of which are specific to EU laws, in a specific sequence: from the more organised and easier to turn into a processable form (specialised legal ontologies), to less so (general vocabularies), to, in last resort, a well-structured extremely large database: WikiData [Hachey, Radford, Nothman, Honnibal and Curran 2013].The specialised ontologies we use are the same as the ones that others have identified as the most relevant [Liepina, Nanda, Haegen and Moodley Forthcoming]: LKIF 8 , and the European Legislation Identifier9 .And the specialised vocabularies are IATE10 , FrameNet11 .On top of this, we also query the general ontology Common Data Model 12 and the general vocabulary schema.org.If no match is identified in the configured ontologies and vocabularies, we query WikiData, a knowledge base that contains over 100 million concepts.In this case, since our query may return several potentially matching results, we set a threshold (at cosine=0.4)to only consider good matches (in line with earlier research testing disambiguation with WikiData [Bunescu and Pasca 2006;Hachey, Radford, Nothman, Honnibal and Curran 2013]).
Not all words are queried: We only do so for compound nouns and for verbs, with compound nouns tested all together and not as separate words (e.g."data localisation requirement") as this would otherwise distort the meaning of what is being found.A few adverbs are tested via heuristics.Using the Stanford Core NLP terminology, this means that we look at querying words of the following type: NN, NNS, NP, VB, VBG, VBN.We navigate within the tree to test for these types according to the Stanford Core NLP's answer.In the following, as an illustration, we present the tree that is returned by Stanford Core NLP for GDPR Art.1( 1 Before we compute word complexity, we still would like to heed that certain words may not have been recognised in any queried ontologies or vocabularies, but that there is sufficient knowledge about them to state with high certainty that the terms could be encoded.Moreover, we would like to ensure that certain words should somewhat be counted as non-encodable due to their plurality of meanings.For both of these, we use heuristics.

Heuristics.
To classify commonly occurring tokens in legal documents, we decided to create our own legal vocabulary and mapping heuristics while building upon the literature [Humphreys 2016;Joshi and Saha 2020].To this end, in one experiment, we tasked two legal scholars to scan and annotate 8 pieces of legislation regarding categories we had established.One of the categories is "open-texture" covering vagueness, ambiguity, under-definition, and concepts [GUITTON et al., 2024].For open-texture, we devised a further experiment involving 26 people (other than the 2 involved in the first experiment), and two rounds of review and comparisons, combined as well as with a specific framework to test the agreeability around open-textured [Guitton, Tamò-Larrieux, Mayer and Djick 2024].We argue that the number of annotators was sufficient as our 28 reviewers used two distinct methodologies and, by comparing the results from two annotators vs. 26 annotators in pairs, we noticed that there was actually very little added value by using even more annotators.This result gives credence to other similar studies in the legal domain who only used two annotators to identify ambiguity [MASSEY et al., 2014;Waltl and Matthes 2014].
For the goal of evaluating the feasibility of automatically creating APR, we distinguish between four types of relevant terms that are handled by our heuristics.Where applicable, we took relevant concepts in the LegalRuleML ontology [PALMIRANI et al., 2011] as the range of our heuristic mapping functions: • "If/Then" Terms: "if/then"-terms (see Table 1), such as ones indicating a definitional scopes [Humphreys 2016], a proof, an exception, a condition, or a threshold; • "Legal Effect" Terms: We classified within different legislation terms that have been defined as "legal effect"-terms; • "Reference" Terms: These are internal and external references [Boella, Caro, Humphreys, Robaldo, Rossi and Torre 2016]; • "Open Textured" Terms: These terms are defined as terms that lack a legally established, acknowledged, qualified, or contextualised delimitation [Vecht 2020].For open-textured terms, we devised and applied a lengthy methodology involving conceptually clarifying and delineating open-texture [Guitton, Tamò-Larrieux, Mayer and Djick 2024].Those open-textured terms do not count towards the sum of recognised terms, as we consider that they, in essence, have a high word complexity.
The categories which we extracted from the literature, and examples of words per categories which we built with annotators are presented in Table 2, and the full list of heuristics can be found in the project's Github archive.Furthermore, we kept track of which category of heuristics was found in each clause to classify each clause (see Table 5), and each clause could fall into several categories.
Our heuristics support the distinction between recognised and non-recognised words as the recognised word variable will be incremented accordingly to the matches.
4.1.4Word complexity.Lastly, we make the distinction between recognised words and non-recognised words.Non-recognised words are those for which we queried different ontologies and vocabularies (during Entity Linking) but where we did not find any matching result.
We finally compute the word complexity for a clause by taking the number of difficult to encode words and dividing it by the total number of tested words (i.e., the number of words we considered in total).The numerator is hence a combination of the number of open textured words (from Heuristics) with the number of nonrecognised words (from Entity Linking) while ensuring no double counting for words overlapping between Heuristics and Entity Linking.This gives the following formula for word complexity:

From Sentence and Word Complexity to Encodability
Once we obtain a vector (sentence complexity score, word complexity score) per clause, we should seek to derive an overall encodability score for the whole regulation.Yet, this is no easy task, as the distributions of neither sentence complexity nor word complexity scores map well onto a known distribution described by one parameter (e.g., Student's t, or Poisson), or even by two (e.g., normal or log-normal).Furthermore, empirically (see below Table 3 and Figure 2), we have noticed that the distributions (and their parameters) could be quite close when comparing several legal texts, even if the texts differ widely.At the overall text level, the encodability score is hence less useful than at the clause level.Hence, with distributions overlapping and not following known distributions, extracting parametric information (average, standard deviation, alpha etc.) would not be meaningful in helping distinguish texts from one another.We therefore need to turn to non-parametric solutions, and for this, we go back to the vectors <sentence complexity, word complexity>.
We delineate the classifier into three categories (see Figure 3): encodable (or likely to be so); uncertain as to its encodability; difficult to encode (or unlikely to be doable).We justify the delineation as follows: a low level of sentence complexity would have 1 or 2  words for <arg1>, same for <relation>, and same for <arg2>; hence the maximum for very simple sentences is at 0.33.Beyond this threshold, it becomes more difficult to give a definitive answer.
Similarly, with a word complexity above 0.5, more than half of the words in a sentence are not recognised, and hence encodability is far from guaranteed.To account for the indecision around the 0.5 threshold, we consider one level above (correspondingly below) for difficult to encode (correspondingly for encodable).Note that such a classification also allows future further steps such as using it as an input for an economic model to quantify roughly the costs and gains associated with being able to (automatically) encode a piece of regulation.

DISCUSSION OF RESULTS
We apply the algorithm on the GDPR and on two regulations we know were manually encoded (see Figure 4).For the GDPR, a lengthy piece of regulation with 894 clauses, we obtain a low correlation between the two axes at around 11% which we interpret to mean orthogonality (that is, that it is meaningful to have the two axes).
We also apply it to three further regulations: two which have been manually encoded, and one which should contain many opentextured terms.Unlike what was presented in Table 3, we are able here to clearly see differences between encodable and less encodable regulations (see Table 4).
We explain the lower scores for those two regulations which were encoded manually (and not automatically) mostly because those encoding them manually also reported encountering significant barriers, notably due to requiring clarifications of the syntactic and semantic type.In order to implement these, they needed workshops and lengthy discussions, proving the point that the statutes were not, despite their shortness, easy to translate into APR without further required inputs.
We also obtained the following general statistics with the algorithm, following Step 1 (see Table 5): We note that the Charter has the lowest encodability score of the four statutes, which also matches with them having the highest percentages for open-textured terms.Lastly, regarding the relation between open-textured clauses and encodability, we looked at whether the encodability scores could be higher when we only focused on those clauses without open-textured.Only looking at the GDPR, the statistics are indeed much better: encodability increases to 41%, uncertainty decreases to 53% and difficult ones also decreases to 6%.
Overall, we notice that the percentage of clauses within the encodable surface is relatively low.And yet, the algorithm is likely to still over-assess these percentages: we do not heed whether the breaking-down of sentences into triples is correct for the metric "sentence complexity", and we may still be counting more words than recognised in ontologies even by setting a threshold quite high when accepting the right terms for the metric "word complexity".

CONCLUSION
The work has shown that it is possible to obtain a high-level estimate of the feasibility to automatically turn statutes into APR.This should support further steps to evaluate the cost and benefits of turning regulations into APR, a crucial data point needed by any manager looking into investing time and money into such an enterprise.So far, the lack of possibility to assess both-feasibility and associated costs-is likely to have hampered implementation decisions.Beside developing an economic model with this as a basis, further improvements of this very model could run along three specific lines.First is a question of degree: when querying ontologies, we accepted if the term was present, but we did not recursively test the word complexity of the defining terms.We merely accepted that their presence in an ontology was sufficient, and this could bring further fine-grained differences (although our assumption is that this would only be minor).Second, the size of the area for uncertainty remains large, both in terms of defining the area and in terms of results obtained.Other additional methods could be successful in reducing the uncertainty, in turn bringing further certainty as to the automatic encodability of clauses.And lastly, this article has focused on a very specific goal for APR.While we focused on currently the most wide-spread and accepted way of approaching APRs, the field's evolution could warrant in the near future a re-evaluation of what automatically encoding regulation means, and with it, the need to somewhat approach this method to assess the encodability of statutes beyond a clause-by-clause approach too.

ACKNOWLEDGMENTS
This work has received support from the Hasler Foundation (project grant #21089).

Figure 2 :
Figure 2: Distribution of sentence and word complexity for four different regulations with important similarities even for very different statutes.

Figure 3 :
A three-category classifier between encodable, uncertain, and difficult to encode.

Figure 4 :
Figure 4: An example of all vectors <sentence, word> complexity with the GDPR (the largest bubble represents a frequency of 20 occurrences for the vector).

Table 1 :
An example of how Graphene splits a sentence.

Table 2 :
The different categories for heuristics.

Table 3 :
Distribution parameters of the encodability score for four different regulations showing that the parameters are rather close to one another.

Table 4 :
Comparison of encodability classification across four categories.

Table 5 :
Comparison between different types of regulation in terms of general statistics