Secure RDTs: Enforcing Access Control Policies for Offline Available JSON Data

Replicated Data Types (RDTs) are a type of data structure that can be replicated over a network, where each replica can be kept (eventually) consistent with the other replicas. They are used in applications with intermittent network connectivity, since local (offline) edits can later be merged with the other replicas. Applications that want to use RDTs often have an inherent security component that restricts data access for certain clients. However, access control for RDTs is difficult to enforce for clients that are not running within a secure environment, e.g., web applications where the client-side software can be freely tampered with. In essence, an application cannot prevent a client from reading data which they are not supposed to read, and any malicious changes will also affect well-behaved clients. This paper proposes Secure RDTs (SRDTs), a data type that specifies role-based access control for offline-available JSON data. In brief, a trusted application server specifies a security policy based on roles with read and write privileges for certain fields of an SRDT. The server enforces read privileges by projecting the data and security policy to omit any non-readable fields for the user's given role, and it acts as an intermediary to enforce write privileges. The approach is presented as an operational semantics engineered in PLT Redex, which is validated by formal proofs and randomised testing in Redex to ensure that the formal specification is secure.


INTRODUCTION
Modern distributed applications often replicate data over the network.By storing a copy of the data locally, an application can guarantee low latency data access, and it becomes resilient to temporary network failures.Notable data abstractions called Replicated Data Types (RDTs) manage the complexity of data replication to ensure that each replica (i.e., local copy) can be brought back to a consistent state with the other replicas after a network failure.A popular variant of RDTs are Con ict-free Replicated Data Types (CRDTs) [Shapiro et al. 2011a,b], which are used in distributed databases and are o ered by many libraries.Of particular importance for this paper is the recent work on a CRDT for the JSON data type, 1 which has practical implementations for JavaScript and Rust [Automerge Contributors 2023;Kleppmann andBeresford 2017, 2018].
Consider a modern web or mobile application built with popular Graphical User Interface (GUI) frameworks such as React or React Native [Madsen et al. 2020;Meta Platforms 2023a,b].To reduce initial page load times and to encourage interactivity, users are initially served a page with limited static content and skeletons for the dynamic content.The skeletons are replaced dynamically, which in practice means that the client fetches JSON objects from an API and renders them locally via React components.Later on, the server will push more JSON objects to update the page dynamically (e.g., comments, chat messages, likes, shares, a data feed, . . .).
Instead of manually passing large JSON objects between a client and server, RDTs such as the aforementioned JSON CRDT are used as a programmer abstraction to more easily implement the application, and to make it usable o ine "for free".In this case the client would receive a replica of the JSON data which, via the properties of RDTs, is automatically kept (eventually) consistent with other replicas of the same data whenever updates occur (e.g., by the server or other clients).This scenario is appealing for ease and correctness of the application's functionality, but raises security concerns because current approaches such as the JSON CRDT assume that all clients are trusted.For web applications this has the following security drawbacks: (1) All clients receive a full copy of the data even if they are not allowed to read all parts of it.
While those parts can be hidden behind the GUI, the clients do receive the data and can still access it (e.g., via a web browser's developer tools).
(2) It is a fundamental property of RDTs that any local changes to a replica will be merged with the replicas of other clients, such that when no other updates occur, all replicas will (eventually) converge to the same state [Shapiro et al. 2011a].In other words, any changes made to any part of a local replica will be merged with the other clients.While the GUI can hide the ability to modify data, malicious clients can bypass it easily (e.g., via web browser extensions).Hence, applications will have to enforce additional access control.For RDTs such as the JSON CRDT this currently requires ad-hoc coding which is: (1) very di cult to combine with an RDT's o ine availability, (2) impossible to enforce on malicious clients, and (3) very likely to be insecure: "Broken Access Control" is the number 1 security problem identi ed in the 2021 OWASP 2 Top 10 [OWASP Foundation 2021].
The main contribution of this paper is the design of Secure RDTs (SRDTs), a data type which speci es Role-based Access Control (RBAC) [Sandhu et al. 1996] for o ine-available JSON data.While the proposed approach applies to o ine-available JSON data in general, it was designed with libraries such as Automerge [Automerge Contributors 2023;Kleppmann and Beresford 2018] and Yjs [Jahns and Yjs Contributors 2023;Nicolaescu et al. 2015] in mind, since they are popular, real-world implementations of o ine-available JSON data.We assume a single leader per SRDT that speci es a security policy for said SRDT, and which acts as a central authority that is responsible for authentication and authorisation.The leader enforces read privileges by creating di erent projections of an SRDT depending on the security role of a client, and it enforces write privileges when clients modify a replica and try to push changes to the leader.As far as we know this is the rst proposal to add RBAC to o ine-available JSON data for the purpose of easier application development with code which is secure by design.
The paper is structured as follows.In Section 2 we more precisely state the problem with combining RBAC and o ine available replicated data, and Section 3 explains our proposed approach.The formal semantics in PLT Redex is explained in Section 4. Sections 5 and 6 prove (formally and (1) Users may add and modify sightings for their own team, which consist of a location, the species, and a photo.(2) Users can see (but not modify) the points and feedback given on their sightings.
(3) Users can see the photo and points awarded to sightings of other teams.(4) Biologists can see the sightings of all teams, can award points, and provide feedback.
(5) Anyone can see the name of a team.
Note that web-based applications are just one example of an application domain where o ine available RDTs are used, and which have an inherent security component.

Problems When Using RBAC for O line Available Replicated JSON Data
By using an existing RDT such as Automerge or Yjs, one can already build an interactive application with the data model of Figure 1a.This makes the application easy to implement and usable when the user is temporarily o ine.However, under the assumed adversary model, the application is vulnerable to the following problems.Replicated Data Leaks Read privileges cannot be enforced because the entire data structure is replicated to every user.In the running example each volunteer would receive the data about their team and all other teams, including the locations of their sightings.While the GUI can implement the security policy by hiding information, it cannot enforce it since the data is stored on the local machine and can be extracted, e.g., via a web browser's JavaScript console.This clearly violates the principle of least privilege [Saltzer and Schroeder 1975], which is one of the most important principles to adhere to when securing an application.The only way to prevent leaking sensitive data is to not send the data.Data Contagion Write privileges cannot be enforced locally, and any maliciously written data will be synchronised with well-behaved clients.E.g., volunteers are not allowed to modify the points and feedback of sightings.While the GUI can hide that functionality, it cannot be enforced since the user can bypass it (e.g., via the JavaScript console or web browser extensions), and due to the eventual consistency of RDTs all other replicas will eventually converge to the same, compromised application state.Any change to a replica that does not conform to the security policy must not be merged with replicas of well-behaved clients.Lack of O line Policy Enforcement One of the main bene ts of RDTs is that an application remains usable without network connectivity, and when the network is restored the properties of RDTs guarantee that all replicas (eventually) converge to the same state.Without a network connection, a client cannot authorise read or write operations with the leader before executing them locally.Thus, if a replica's security hinges on its ability to reach the leader, then the RDT e ectively becomes unusable without network connectivity, which is unacceptable.Any security mechanism that is suitable for RDTs should preserve the o line availability of RDTs.

APPROACH: SECURITY POLICY PROJECTION AND DATA PROJECTION
SRDTs overcome the problems listed in Section 2.3 by enforcing an application's security policy on both (well-behaved) clients and the leader.In brief: Client-side Enforcement To support o ine policy enforcement, each (well-behaved) client can check whether it is allowed to write to a eld of their replicas.This prevents a well-behaved client from erroneously writing to a write-restricted eld of a local replica (e.g., during a period of being o ine).If write privileges are only checked when changes are (eventually) merged at the leader, then a well-behaved client could be forced to roll back its changes after the rst disallowed write.This complicates the notion of eventual consistency in RDTs and invalidates the core purpose of o ine available mutable state.Note that due to the adversary model it is impossible to enforce a security policy on the client-side.Hence, true enforcement must happen on the leader.Leader-side Enforcement The leader prevents replicated data leaks by excluding the SRDT elds for which a client has no read privileges from the replica which is sent to said client.When a client authenticates with the leader to acquire an initial copy of the data, the leader will sanitise this data depending on the role of the client, as well as any future updates to the data.The leader also prevents data contagion by enforcing the security policy for all writes to elds, and discarding any writes from clients that are not authorised to do so.Hence, unauthorised changes will not be merged with the replicas from well-behaved clients.To o er both types of enforcement, SRDTs require projections of the security policy and the SRDT data.Before showing a formal semantics in Section 4, we discuss our approach by specifying the concepts of security policies, security policy projections, and data projections in more detail.

Security Policy Specification Language
A leader securely replicates an SRDT based on a policy that consists of roles and privileges, which are machine-readable descriptions of operations that may be performed by a role.The technique to specify privileges is inspired by access control for XML documents [Fundulaki and Marx 2004], which has similar requirements.In essence, both JSON and XML are tree-structured documents where (parts of) the structure can be described as a path from the root of the document.We will call such a path a path selector.Then, a privilege grants read or write access to a set of SRDT elds (identi ed by a path selector) for a speci ed role, or a wildcard for all roles.In the S-expression syntax which we use throughout the remainder of this paper, a privilege is denoted as follows: (ALLOW role READ/WRITE OF path-selector) We rst detail how to specify path selectors, followed by how to implement a full security policy.
3.1.1Specifying Object Paths With Path Selectors.Security rule (4) in the running example says that biologists can (among other things) see the sightings of all teams.This means that any biologist can access the values of the elds of the JSON object in Figure 1a named by the following paths, expressed as S-expression lists describing a path from the object's root: (team1 sightings 1674813931967 location lat) (team1 sightings 1674813931967 location lng) (team1 sightings 1674813931967 species) (team1 sightings 1674813931967 photo) (team1 sightings 1674813931967 points) (team1 sightings 1674813931967 feedback) (team2 . . . ) . . .Child operator.Goes deeper into an object as indicated by the following path expression.

*
Wildcard.Matches all keys in the current object regardless of their name.
Union operator.Matches any of the given keys in the current object (e.g., k 1 , k 2 , etc).
Expression operator.Matches all keys in the current object for which the predicate function f? holds, given each of the current object's keys as rst argument, and the provided k as the second argument.An alternative form uses a nested "∼" expression which looks up a value (e.g., given by the plain path (k ...)) in the user's environment.Supported predicates include = (object's key matches the value named by the lookup exactly) and ∈ (object's key is a member of the list named by the lookup).
Specifying privileges using only absolute paths is cumbersome, and in general even impossible given the dynamic nature of replicated JSON objects which can by modi ed continuously (e.g., when new sightings are made).For XML documents this problem is tackled by using the XPath query language [Clark and DeRose 1999] to specify which parts of an XML document can be accessed [Crampton 2006;Fundulaki and Marx 2004;Murata et al. 2006].When using JSON's equivalent to XPath, called JSONPath [Friesen 2019;Goessner 2007], the paths above can be referred to via the 2 following path selectors that use a wildcard to abstract over multiple keys, thus capturing multiple teams and sightings.
(* sightings * *) (* sightings * location *) A wildcard is just one example of an expression that can capture multiple elds.All types of path expression that we adapted from JSONPath are described in Table 1.We will further explain them when they are used.In essence, we adapted JSONPath to S-expression syntax, and restricted the expressivity of the path expressions in 3 ways, namely: (1) Path expressions cannot depend on a eld's value (e.g., via JSONPath's "@" and "$" expressions).
Instead, we introduce a restricted form (cf. Table 1) which depends only on the values of a per-user private object, which we call the user environment.The user environment can be thought of as an immutable dictionary (i.e., another JSON object) which is sent by the leader.
Values from a user's environment can be read only by the user themselves and the leader using the syntax "[∼ k ...]", chosen to invoke the intuition of a Unix user's "home directory".(2) We do not support any expressions to manipulate arrays since SRDTs do not support array operations (e.g., push, pop, etc.).Instead, arrays can be represented as an object where the keys are numeric indices, and the elds' values serve as the array's values.(3) JSONPath expressions such as ".." (a recursive descent operator) are an engineering e ort for practical implementations, and are excluded for simplicity of the formalism.
The limitations (1) and (2) are further discussed in Section 7.5.

Specifying Security Policies.
A security policy is a set of privileges.For example, the full security policy of the running example is implemented in Listing 1.Compared to the textual description of Section 2.2 they are ordered based on increasing complexity, starting with rule (5).Rule ( 5) is implemented on Line 1.It grants all roles (denoted by the wildcard role) read access to the name eld of all teams.
Rule (4) is implemented on Lines 2 to 4 using the path selectors previously shown in Section 3.1.1.The privileges grant the biologist role read access to the described paths.Note in particular that Listing 1. Specification of the security policy of the running example (Section 2.2).
being granted access to a particular object does not mean that the role has access to all values within that object.In this case, Line 2 grants read access to all keys of sighting objects (e.g., the 1674813931967 eld in Figure 1a).This includes the location key, but not location's children lat and lng.A biologist can hence e ectively traverse the location eld, but still needs to be granted read privileges for its children on Line 3. The remainder of rule ( 4) is implemented on Line 4, which grants biologists write access to every sighting's points and feedback elds via a union path expression which selects both keys (cf.Table 1).Rule (3) of the running example is implemented on Line 5. The privilege grants every volunteer (i.e., the user role) read access to the photo and points of all teams' sightings.
Rule (2) is implemented on Line 6, and gives users read access to the feedback eld of only their own team's sightings.To encode this requirement of "a user's own team", the path selector uses an expression operator (cf.Table 1) which tests via a built-in predicate, in this case =, whether a key from the object should be included.In this case the keys in the object are teams, e.g., team1 (cf. Figure 1a), and the expression with which it should match is speci ed using the ∼ operator which looks up the current user's my-team eld in the user's environment. 3ule (1) is implemented on Lines 7 to 9 and gives users write access to their own team's sightings, excluding the feedback and points elds which only biologists can write to.Because users have wildcard write access to any key of the sightings object, it also grants them the permission to add new sightings (because they are covered by the wildcard).Note that a new sighting object that is added by a user can only include the elds to which users are explicitly granted write access, i.e., species, photo, and location.The points and feedback elds can be added later by biologists.
In general, Murata et al. [2006] state that access control policies should satisfy 3 requirements: succinctness, least privilege, and soundness.Succinctness means that policies should be expressible with a smaller number of privileges instead of having to specify every single eld in the data.We satisfy this requirement by using path selectors modelled on JSONPath.Least privilege means that the security policy should grant the smallest possible access to a role, and soundness means that the security policy should always either allow or deny an access.Both are satis ed because all access to elds is denied unless access is explicitly granted via an ALLOW privilege.While we do not implement explicit DENY privileges for simplicity, they can be added using the "denial takes precedence" principle [Murata et al. 2006] where a DENY privilege removes any access that was previously granted.Note that in our model write permissions for a eld imply that the role is allowed to read said eld.

Projecting a Security Policy
The complete security policy is speci ed only on the leader of an SRDT.To support o ine policy enforcement of SRDTs, clients must be able to check (o ine) whether they are allowed to write to

Policy Excerpts
Fig. 2. The projection of a security policy based on each role yields a policy excerpt per role.
a eld.Hence the leader compiles di erent projections of the security policy, namely one for each role.A high-level representation is depicted in Figure 2. The left depicts a security policy which is a set of roles and privileges, and the result depicted on the right is a set of policy excerpts.A policy excerpt contains the subset of privileges that (well-behaved) clients of a particular role are expected to adhere to locally.More speci cally, the policy excerpt includes only a client's own privileges.
Including other roles' privileges would enable a malicious client to know which elds exist that they do not have read access to, and what roles can access those elds.This poses a security risk in its own right.

Data Projection
To enforce that each client accesses only the data from the SRDT for which they have been granted read privileges, we ensure that each client is sent only the data it has read privileges for.We refer to the act of selecting the readable subset of an SRDT for a certain user as data projection.A correct data projection ensures that adversaries can never bypass privileges to extract data.However, a consequence of data projection is that each client has a potentially di erent subset of the complete SRDT.The leader manages cooperation between those ostensibly incompatible replicas.Figure 3 depicts the interactions of clients and the leader, including the points where data projection occurs.
(1) First, a client authenticates with the leader (e.g., using a password, or public key authentication).
The leader assigns a role to the client according to the security policy, which we depict as a red square (i.e., the "red square" role).
(2) Once authenticated, the client can ask the leader for a copy of the data to instantiate a local replica.The leader responds with the policy excerpt that corresponds to the client's role, a projected copy of the data that excludes all elds for which the client has no read privileges, and the user's environment needed to correctly interpret the policy excerpt.
(3) After initialisation, a client can perform operations on its local replica.
During periods without network connectivity a well-behaved client can check whether they are allowed to execute write operations by using the policy excerpt.Eventually, when network connectivity is restored, the local changes are sent to the leader, who veri es that the changes were permitted.Accepted changes are merged according to the semantics of the underlying RDT library.Other clients need to be (eventually) informed of this change in order for their replicas to (eventually) become consistent.In Figure 3 we depict 2 other clients with a "blue circle" and "green triangle" role.The leader cannot simply forward the change to both of them, because their roles may not have read privileges for the changed eld.Instead the change is sent only to clients with the correct read or write privileges, which in Figure 3 includes the blue client but not the green client.Withholding the change for the green client prevents leaking sensitive data.It does not negatively impact the green client's normal operation since the changed eld is not part of their local replica.

Client Leader
Push(change)

Overview of Assumptions
We brie y state the assumptions made (implicitly or explicitly) throughout Sections 2 and 3.
• All clients are untrusted.A client is supposed to check their security policy excerpt locally for correct operation, but the security model assumes that clients can disable any security feature included in their local software (e.g., in their web browser).• The leader is completely trusted (trusted hardware and software).
• No defence against clients that leak data which they legitimately have access to.

Security Policy.
• The security policy is static (does not change at run-time).
• The security policy can only depend on a replica's eld keys, not on their values (limitation further discussed in Section 7.5.4).• Write access to a certain eld also implies read access.

Data Model.
• A replica's data contains no arrays (limitation further discussed in Section 7.5.2).
• A replica's data contains no cycles (which are also not a part of standard JSON).
• No compound changes (limitation further discussed in Section 7.5.7).
• The underlying RDT is operations-based (further discussed in Section 7.5.8).

Distribution Model.
• There is a single leader per SRDT.Practical implementations may internally divide the work of supporting many clients, e.g., by using ordinary RDTs that synchronise peer-to-peer with each other, running on multiple trusted servers behind a load balancer.• The leader is trusted by all clients.
• No direct (peer-to-peer) communication or synchronisation between clients (limitation further discussed in Section 7.5.1).

FORMAL SPECIFICATION OF SRDTS
We present a formal speci cation and implementation of SRDTs in PLT Redex [Felleisen et al. 2009;Klein et al. 2012], a domain-speci c language in Racket [Felleisen et al. 2018] to specify operational semantics which are also executable.The formalism comprises 3 di erent languages as depicted in Figure 4: ReplicaLang and LeaderLang specify the behaviour of the client and leader respectively, and their commonalities are shared via a language called CommonLang.The complete implementation spans 2161 lines of Racket and Redex code (565 for the core formalism), and is available as a publication artifact. 4All gures in this paper were generated by Redex to avoid the introduction of mistakes, sometimes with slight modi cations in Adobe Illustrator to add explanatory notes or to reposition elements to t within the allowable space.We use the same font convention in the text as in the gures.Non-terminals in the language grammar are typeset in italic in a serif font.Terminals are typeset in an upright monospaced non-serif font.

CommonLang: Objects, Roles and Privileges
The formalisation of the leader and the replicas share a common formal language called Common-Lang.CommonLang speci es primitive atoms (numbers, booleans, strings, quoted symbols, and the empty object), and the de nition of objects, roles and privileges.Its semantic entities are given in Figure 5, and will be introduced by example throughout this section.( species := Fly Agaric ) 5 ( photo := blob :... ) 6 ( points := 3) 7 ( feedback := Do not eat this !))))))) 8 ( team2 := <omitted for brevity> )) Listing 2. The data schema of Figure 1a in CommonLang syntax.
Figure 1a, but conforming to the de nition of a term d.The value of team1 spans Lines 1 to 7, and team2's on Line 8 has been omitted for brevity.
4.1.2Specifying Security Policies.Each privilege from a security policy is represented by term priv in Figure 5.Note that the role can be a concrete identi er (e.g, biologist) or a wildcard (matching any role).The set of elds to which access is provided is given by a term ps that describes a path selector.Each path selector is a sequence of path expressions p-exp which we previously described in Table 1.To correctly assess whether a role is allowed to access a eld, each path selector will expand to a set of concrete paths.Here, each concrete path conforms to semantic entity p, which is a sequence of keys that correspond to the elds of a (nested) object.
4.1.3SRDT Deltas.Changes made by clients to the SRDT data structure are represented by a semantic entity that describes a single change (a written value) to a eld.This can be considered as an operation from the underlying (operation-based) RDT (e.g., a JSON CRDT) that is sent from one replica to another to synchronise and (eventually) converge to the same state.Since the implementation of a full RDT library is outside the scope of this formalism, we represent such a change as a term that indicates that a value (an atom) was written to a path p that identi es the eld it was written to.

LeaderLang: Securely Projecting Policies and Data
LeaderLang models the behaviour of the leader which is responsible for the authentication of users, projecting the security policy and SRDT data, and authorising changes to replicas made by clients.It is de ned in 2 phases that correspond to the aforementioned projection phase and replica management phase (cf.Section 3).In the remainder of this section we discuss both phases in turn.

Projection
Phase.We use Redex metafunctions to de ne computations on Redex terms.Essentially, a metafunction has a name and transforms the given input terms (its arguments) to an output term.Metafunctions pattern match on the structure of terms, optionally with side-conditions that must hold for a clause to match.Consider the metafunction called excerpt-for-role in security policy excerpt user conûguration id (» user id, » session id), user authentication key user session client requests to log in client requests a replica client requests to push a change leader accepts a request, will perform actions leader rejects a request Figure 6 which implements the main logic of the projection phase.For each equation, the left-hand side denotes the application of the metafunction on the input terms speci ed between brackets "〚〛" (separated via commas).The right-hand side of the equation is the output term.When sideconditions apply for an input term to match, those are speci ed as where clauses.
The excerpt-for-role metafunction computes the policy excerpt for one role.It takes 2 arguments, namely a role and security policy (given as a list of privileges).The result of the metafunction is a list of privileges that apply only to the given role.Essentially, excerpt-for-role recurses over the security policy from left to right and keeps only the privileges that apply to the given role.From top to bottom, the 4 clauses of the metafunction are as follows: (1) When the privilege in the rst position has the wildcard role, then the output list contains the same privilege but with the wildcard role replaced by role.The term priv 2 in the output list is the result of applying excerpt-for-role to the tail of the input list.
(2) When the privilege in the rst position has the same role as the role argument, then the privilege is copied to the output list.
(3) When the privilege in the rst position matches a di erent role than the one given as argument, then do not include it into the output.(4) When the security policy is empty, then the policy excerpt is empty too.

Replica Management Phase.
LeaderLang models the behaviour of a leader by reducing terms that represent a request from a client to a term that represents the actions that a leader undertakes to correctly process said request.The semantic entities de ned by LeaderLang are given in Figure 7. Supported requests are LOGIN to authenticate a client, GET-REPLICA to obtain an SRDT replica, and PUSH-Δ to send a local change to the leader.A response is computed by the handle-request metafunction in Figure 8.The arguments of handle-request represent the run-time state of the leader, which is a list of known users (user ...), a list of policy excerpts (excerpt ...), the leader's copy of the SRDT data d, a list of user sessions (s ...) (one per client), and nally a request term to reduce.The result is the updated application state: new SRDT data, a new list of sessions, and a result term that indicates whether the leader accepts or rejects the request.
Handling LOGIN.Clients authenticate via a LOGIN request that contains a correct username and password, as handled by the rst case of handle-request (Figure 8).The term matches only when all of the where clauses hold: the user must not have an active session, and the auth-key must be valid.In the returned application state a fresh session ( s u ) is added to the preexisting sessions, and the LOGIN term is reduced to an ACCEPT-LOGIN term that contains the fresh session id.A real-world system will use existing authentication protocols, but we do not model them as they are well-established and orthogonal to our approach.
handle-requestç(user .),(excerpt .),d, (s old .),(LOGIN » u auth-key)ç = (d ((» s » u ) s old .)(ACCEPT ((ACCEPT-LOGIN » s )))) where no-active-session-forç» u , (s old .)ç,key-is-validç(user .),» u , auth-keyç, » s = fresh-session-idç» u ç handle-requestç(user .),(excerpt .Handling GET-REPLICA.A GET-REPLICA term is reduced by the second case of handle-request.The returned term contains the data that is used to initialise a client.In this case the speci ed where clauses that must hold are "lookups", typeset via the ∈ notation, to retrieve information from a list of terms.The rst clause speci es that the session id s in the request must be an active session ( s u ), and the second clause uses the matching user id u to retrieve the user's role and user environment.The third clause retrieves the policy excerpt for that role, and the fourth clause calls another metafunction readable-projection that projects the complete SRDT data d to a new object d projected which only contains the elds that can be read by the user according to their privileges.The implementation of readable-projection will be shown in Lemma 5.4.It is essentially a tree recursive copy which omits the non-readable elds.The information returned to the client is an INIT term with the session id s , the policy excerpt (priv ...), the projected data d projected , and the user's environment env.
Handling PUSH-Δ.The third case of handle-request handles a change to the SRDT pushed by a client, represented by a PUSH-Δ term.The input PUSH-Δ term denotes that a client identi ed by session s has written an atom to a particular eld access path p.The computed response is a list of PUSH-Δ terms that indicate to which other clients the accepted change should be forwarded in order to reach eventual consistency.When all of the where clauses hold, the result of the metafunction is an updated copy of the data d new , the unmodi ed sessions, and an ACCEPT response that contains a list of action terms.The rst 3 clauses are the same as for GET-REPLICA to extract the user's role, policy excerpt and user environment.Additionally, the leader uses the is-writable metafunction to verify that the given user is allowed to write to the path p according to their privileges.When this condition holds, the data is actually written to the object on the leader (yielding a new object d new ), and a list of actions is computed via the metafunction actions-per-session.Essentially, this metafunction returns a list of PUSH-Δ terms for all other sessions5 , but only when those clients have read privileges for the path that was written to.
Any other request which does not match the rst, second or third case of handle-request is rejected by the fourth case.

ReplicaLang: Secure Manipulation of Replicas
ReplicaLang is a formalism of (well-behaved) clients which intends to strike a balance between being simple enough, yet being representative for client-side manipulation of replicas.Because ReplicaLang and LeaderLang model the behaviour of the system instead of serving as an implementation, there is no direct communication between ReplicaLang and LeaderLang in the formalism, i.e., ReplicaLang does not "send" LOGIN requests to LeaderLang, and LeaderLang does not "send" a response back to ReplicaLang.Instead, ReplicaLang models a client that has authenticated with a leader, has acquired a local replica, and which is manipulating replicas through program expressions.The paper's code artifact contains an interactive tool that, given data, a security policy, and users, integrates the requests and responses of ReplicaLang and LeaderLang to model real interactions.
Figure 9 shows the semantic entities of ReplicaLang.A program term contains a list of replica objects (r ...) and an expression e.Each replica object has an identi er r , a list of privileges (priv ...), the replicated data d, a user environment env, and a list of changes ( ...) which are performed throughout the evaluation of e, but which are not immediately sent to the leader (i.e., they are "o ine").The expressions e are modelled after a variant of the -Calculus with support for multi-argument lambdas (for convenience), let-bindings, some primitive Racket operators op, and operators for interacting with replicas.

Interacting With
Replicas.The expressions that interact with replicas are " • " to read a eld of a replica object, and " • !" to write to a eld.A reference into a replica object is represented as a "cursor", which is used to establish the full eld access path whenever a eld is written to.A real-world system might implement a cursor as a proxy that wraps the replicated object.A cursor (term ) stores the identi er of replica object it refers to and a path from the object's root.They are needed both for the normal workings of the underlying replica (e.g., cursors are also used by Kleppmann and Beresford [2017]), as well as speci cally for our security policy enforcement.
Consider the example program in Listing 3 which is a valid program term of ReplicaLang that models an interaction with a replica.The rst part of the program on Lines 1 to 10 contains the replica objects known to the program, in this case a replica called teams , which contains the policy excerpt for this user and the replica data.The expression on Lines 11 to 13 represents the program code that performs the interaction, which in this case adds feedback to a sighting.First, it navigates to the correct part of the replica object by obtaining a cursor cr to the root of the replica object (Line 11).Second, it navigates down into the object via the " • " operator to obtain a cursor where v b #f .ç]) [apply] where where (» r _ d _ _) * (r .),(error string) where r c * (r .),(» r (priv .)d env (• .))= r c , is-writableçd, (k 1 .k 2 ), (priv .),envç, (r other .)= (r .)\rc , d new = json-writeçd, (k 1 .k 2 ), atomç where (» r (priv .)d env _) * (r .),¬is-writableçd, (k 1 .k 2 ), (priv .),envç where ¬is-atomçvç an atom or an extended cursor if the read eld is another object.The read expression is reduced to this value v.Note that security checks are not needed, since non-readable elds were removed by the projection on the leader.Whenever a read is not permitted (i.e., the eld does not exist) then the rule [¬read] discards the entire context E and returns error.
Complementary to [read] is the [write] rule to write an atom to a replica.This rule uses the metafunction json-write to modify the replica's local data.The crucial di erence is that [write] only proceeds when is-writable holds, i.e., that the client is authorised to write to the given eld according to the privileges (priv ...) of the policy excerpt.Otherwise, [¬write-¬w] rejects the write with an error.Finally, [¬write-¬a] rejects any write of a non-atom value.

FORMAL VALIDATION OF THE SPECIFICATION
In this section we prove the claim that the formal speci cation in Section 4 tackles the 3 problems outlined in Section 2.3, namely freedom from Replicated Data Leaks (as Theorem 5.6), freedom from Data Contagion (as Theorem and O ine Policy Enforcement (as Theorem 5.8).Before we can prove these 3 theorems, we rst have to prove some lemmas.

Correctness of Projections
We rst prove the lemmas related to the correctness of the selection of policy excerpts (Lemma 5.1), correctness of the projection of deltas (as Lemmas 5.2 and 5.3), and correctness of the initial projection of data sent to replicas (as Lemma 5.4).

L
5.1 (C C P E ).Given a role and list of privileges (priv ...), excerpt-for-role role, (priv ...) constructs the correct local policy excerpt for that role.

P
. It follows directly from the de nition of metafunction excerpt-for-role (see Figure 6 on Page 11) that exactly those privileges that apply to role are selected for the policy excerpt.Since matches-in-envç(p-exp .),(k 2 .),envç matches-in-envç(k 1 p-exp .),(k 1 k 2 .),envç any occurrence of the role wildcard * in the privileges is replaced with role (see Section 4.2.1),no (side-channel) leaks of information on other roles' privileges exists.A role receives information only on privileges that apply to role itself.).The leader accepts and propagates writes to elds only when the role of the writer is permitted to write to the a ected elds.

P
. To handle requests of the form (PUSH-Δ s (! p atom)), LeaderLang's handle-request uses is-writable (see Figure 8).is-writable checks whether a privilege exists that assigns a WRITE permission to a path selector that matches the written path: matches-in-env ps, p, env .The metafunction matches-in-env (see Figure 11) formalises a direct implementation of the notion of paths matching path selectors described in Section 3.1.Jointly, that means that a change (! p atom) is propagated exactly when the write is permitted.
).The leader propagates writes to exactly those target replicas that are permitted to read the a ected eld.

P
. To handle requests of the form (PUSH-Δ s (! p atom)), the handle-request metafunction constructs (action ...) = actions-per-session (s other ...), (user ...), (excerpt ...), d, (! p atom) , a list of PUSH-Δ actions to propagate (see Figure 8).The actions-per-session metafunction lters each session based on whether is-readable d, p, (priv ...), env holds, given the session's privileges and user environment.As shown in Figure 12, is-readable checks whether the tested path (k 1 ...) is a pre x of a path that exists in d (i.e., one in (p all ...)), and where at least one READ or WRITE privilege permits access to.Since WRITE implies READ, and since the same logic holds for matches-in-env as in the proof for Lemma 5.2, leak safety holds during delta projection.

P
. The induction hypothesis is that if (priv ...) in env permits reading json which is situated at path p inside a replicated data structure d, then readable-projection json, (priv ...), d, env, p returns a structure containing exactly the sub elds of json that (priv ...) permits reading.This hypothesis holds for the call readable-projection d, (priv ...), d, env, () in handle-request since an SRDT's root is part of each replica's projection.Hence, any list of privileges trivially permits reading the initial json, as that json is the replicated object's root d.The 3 clauses of readable-projection (see Figure 13) uphold this induction hypothesis as follows: (1) If a json 1 exists at key k 1 within json 0 , and if for the accumulated path that ends with that k 1 it holds that is-readable d, (k accum ...k 1 ), (priv ...), env , then json 1 is readable (since the same logic holds for is-readable as in the proof for Lemma 5.3), and its readable projection must be included in the projection.By the induction hypothesis, json 2 = readable-projection json 1 , (priv ...), d, env, (k accum ...k 1 ) contains exactly those elds of json 1 that (priv ...) permits reading.Also by induction, kj 3 = readable-projection (kj 2 ...), (priv ...), d, env, (k accum ...) contains exactly the sub elds of the rest of (kj 2 ...) that (priv ...) permits reading.The induction hypothesis is upheld by combining both results into one data structure ((k 1 := json 2 ) kj 3 ...).(2) In the second case, subobject json 1 needs not be included in the projection.By the induction hypothesis, kj 3 = readable-projection (kj 2 ...), (priv ...), d, env, (k accum ...) contains exactly the sub elds of the rest of (k accum ...) that (priv ...) permits reading.The induction hypothesis is upheld by returning that data structure (kj 3 ...).

Correctness of Security Policy Enforcement Locally at the Replicas
We now prove the lemma related to the local enforcement of security policies in ReplicaLang.
Let program be a program in ReplicaLang, let role be the role for which the replica evaluating program has authenticated, let (priv local ...) be the privileges from the policy excerpt for role, and let (k 1 ...k 2 ) be a non-empty (potentially invalid) path into the replica's projection of the SRDT.
If and only if a replica's user's role is permitted to write to the eld at (k 1 ...k 2 ), then ReplicaLang's reduction relation → reduces a write operation ( • ! ( r (k 1 ...)) k 2 atom) to atom and updates the replica object's inner state by adding one to the local list of changes.Otherwise, the write operation reduces the program's entire expression to an (error _) expression, and leaves the list of replica objects unchanged.Any expression other than a locally permitted write operation does not modify a replica's local state, nor logs changes.

P
. Consider the reduction relations of ReplicaLang, shown in Figure 10.The write operation we are interested in is only performed in [write].In that clause, a write operation to replica r is reduced.The replica object corresponding to r is bound to r c .The local privileges of r are retrieved from r c .By de nition of "local privileges", the privileges (priv ...) in r c are the local privileges (priv local ...).If at least one privilege in (priv ...) grants a WRITE permission to a path selector that matches (k 1 ...k 2 ) in environment env, then the judgment is-writable d, (k 1 ...k 2 ), (priv ...), env holds (see proof for Lemma 5.3).Hence, by de nition, the judgment holds if role is permitted to write to the eld at (k 1 ...k 2 ) by the local privileges.Hence, in this clause the premise of Lemma 5.5 holds, and we must therefore prove that in this clause exactly one corresponding is logged.
On the right-hand side of the reduction relation, a new = (! (k 1 ...k 2 ) atom) is logged in the replica object's list of changes.This corresponds to the write: it records the correct atom for the correct path (k 1 ...k 2 ) of the correct replica object r .The requirement is thus met.
Every other clause has the form ((r ...)(in-hole E e)) → ((r ...) e ).In the absence of a write, no should be logged.Since (r ...) is left unchanged, the requirement is trivially met.

Main Theorems
Finally, we prove the main three theorems as explained at the start of Section 5: ).A replica receives information only on the elds of an SRDT that the replica is permitted to read by the security policy.A replica is not informed of the elds that it is not permitted to read, nor on the privileges of other roles.

P
. Let (priv global ...) be the privileges that make up the full security policy for a replica identied by r , and let role be a role speci ed in the security policy for r .Let ( r (priv local ...) d env ( ...)) be a replica object that is in scope at a ReplicaLang program that has authenticated for role.To prove Theorem 5.6 we prove that all members of the replica object are free from data leaks: (1) This holds trivially for r , the identi er of the local replica.
(2) (priv local ...) is the policy excerpt for role, listing only the name of role itself and the readable and writable (and hence also readable) elds (Lemma 5.1).
(3) The elds of d can come from three origins: (a) if the eld was sent during initialisation, it is part of the readable projection, hence role is permitted to read the eld according to (priv global ...) (Lemma 5.4), (b) if the eld was pushed by the leader, it is part of the delta projection, i.e., role is permitted to read the eld according to (priv global ...) (Lemma 5.3), (c) if the eld was locally written to, no new non-local data is introduced.Freedom from Replicated Data Leaks holds trivially for locally produced data.(4) This holds trivially for env, the environment of the local replica.
(5) The list ( ...) contains only changes made locally by program.Freedom from Replicated Data Leaks holds trivially for locally produced data.

T 5.7 (F D C
).A replica has write access only to those elds of an SRDT that the replica is permitted to write to by the security policy.

P
. Let (priv global ...) be the privileges that make up the security policy, and let role be a role speci ed in the security policy.A ReplicaLang program only writes to the elds of d which are writable according to (priv local ...) (Lemma 5.5).Since (priv local ...) is the correct policy excerpt of (priv global ...) for role (Lemma 5.1), it holds that ReplicaLang programs only write to elds of d which are writable according to (priv global ...).Finally, delta projection in LeaderLang only accepts and propagates writes that are permitted (Lemma 5.2).
Since no disallowed writes are performed by well-behaved clients, and since malicious clients' writes are rejected by the leader, disallowed writes to an SRDT do not impact the leader nor other replicas.E ectively, replicas only have write access to those elds of an SRDT that they are permitted to write to according to the security policy.

T 5.8 (O P E
).A replica retains o ine availability, including eventual data consistency with the leader and the other replicas of the SRDT, even when enforcing the SRDT's security policy.

P
. Each locally permitted write in a ReplicaLang program is locally logged as a (Lemma 5.5).When those deltas are pushed to the leader, the leader propagates the deltas to all replicas permitted to read the eld that was written to (Lemma 5.3).SRDTs hence o er the same form of eventual data consistency as the underlying replication mechanism as long as all locally performed writes are accepted by the leader during delta projection.Since the correct policy excerpt is contained in a ReplicaLang program (Lemma 5.1), all writes permitted by well-behaved clients are accepted by the leader.Since Theorems 5.6 and 5.7 also hold, the leader will eventually converge on a consistent state which accounts for all permitted writes on all replicas, and all replicas eventually see their role's data projection of that consistent state.

RANDOMISED TESTING USING PLT REDEX
One of the main bene ts of Redex is that the formal semantics becomes executable, and thus testable.Redex's randomised testing has been used successfully by Klein et al. [2012] to nd errors in formal speci cations and proofs (including mechanised proofs) in all of the 9 considered ICFP papers.In the same spirit, in Section 6.1 we brie y explain our suite of randomised testing to gain additional con dence in our claims from Section 5, and in Section 6.2 we discuss the issues uncovered during development that would likely have slipped into our formalism.While most of them are minor implementation bugs, some were unlikely to be found manually, and 1 bug had concrete security implications.

Randomised Testing of Read and Write Privileges
We designed 2 automated tests to verify the correct enforcement of read privileges (cf.Theorem 5.6) and write privileges (cf.Theorem 5.7).We brie y discuss the experimental set-up for both.
6.1.1Randomised Verification of Read Privileges.For all possible security policies and objects, the goal is to verify that a client with a particular security role: (1) can read the elds allowed by the security policy, i.e., they are not accidentally omitted by LeaderLang, and (2) cannot read any other elds, i.e., they are not accidentally included by LeaderLang, which is a security violation.Redex has features to generate random terms that adhere to the semantic entities of a Redex language, and to verify certain properties about those terms [Klein and Findler 2009].However, completely random generation of program terms in ReplicaLang is extremely unlikely to yield meaningful objects, policy excerpts, and read expressions which adequately test a security policy.
Hence, we guided the generation of test cases by starting from a completely random generated (nested) object, and then programmatically extracting a random (but correct) security policy (including all types of path expressions) which grants read access to a random subset of the object.The object is then projected according to the security policy, and the test veri es for all possible paths in said object that: (1) every readable path according to the security policy can actually be read by a ReplicaLang program, and (2) that every other path is not present in the projected object (because it was correctly removed by LeaderLang's projection).
6.1.2Randomised Verification of Write Privileges.The test setup to verify write privileges is similar to that for read privileges, but more convoluted because it also involves randomly generated roles and clients.Essentially, starting from a randomly generated (nested) object and multiple roles (including the wildcard role), we extract a random security policy from the object that contains both READ and WRITE privileges for a random subset of (some valid and some invalid) paths in the object.The security policy is used to project the generated object, and to verify that all roles are correctly able to read or write the elds that they should be able to access according to the policy.This means to verify that paths which are read-only or non-readable cannot be written to, and that all writable paths can actually be written to by a ReplicaLang program.Additionally, we verify that the returned list of PUSH-Δ terms by LeaderLang is correct, i.e., that if one client writes to a eld, then the clients who are informed of the written value must have read privileges for the eld.

Issues Detected Through Randomised Testing
We repeated the read and write tests 1,000,064 times (7813 tests per program instance, ran 128 times on a 64 core, 128 thread CPU), which we feel was more than su cient to uncover any issues.Randomised tests found 10 problems in total, which were either found immediately, or in the worst-case after a couple thousand iterations of a single program instance.We categorise 7 of those problems as minor issues that constitute small implementation bugs in the Redex formalism, but which did not endanger the security of the model.For example, in one case, when trying to read a non-existing eld from an object, a reduction of a term in LeaderLang would get stuck instead of rejecting the program.More interestingly, randomised testing also revealed 2 implementation bugs which we were unlikely to nd manually, and 1 bug which had consequences for data security.
The most important of the identi ed bugs is the one which impacted data security.It relates to the erroneous handling of wildcards within the readable-projection metafunction.In essence, consider the following object which is a slightly reduced variant (for brevity) of the actual counterexample found by a random test, and read privileges for the path selector (* JIvt).
The expected projection is ((r := ((JIvt := #t)))), such that a client can read the value of the path (r JIvt).However, the projection erroneously included the top-level eld 7 as well despite it not having a JIvt sub eld.The formalism was meant to specify (* JIvt) to mean "traverse any eld to nd sub eld JIvt", but the behaviour of our implementation was "traverse any eld to nd sub eld JIvt, or admit access if the eld contains an atom", thus erroneously revealing the data stored in the 7 eld.If another client would write to the object such that the 7 eld becomes (7 := ((JIvt := 0))), then reading the content of that sub eld of 7 would be permitted by the path selector (* JIvt).

DISCUSSION AND RELATED WORK 7.1 Access Control in Replicated Databases
RDTs are used in the implementation of (geo-)replicated databases [Nadal 2023;Redis 2020;Riak 2013;Shukla 2018].In such databases it is possible to encrypt data before passing it on to users [Barbosa et al. 2021;GUN 2022].Encrypted database elds can be locally read and modi ed by clients with the decryption key, or by applying operations on homomorphically encrypted data (i.e., apply operations on encrypted data without having to decrypt it rst).The advantage is that no synchronisation with a central authority is required.However, the approach is di cult to use for ne-grained access control due to di culties in distributed key management.For example, when multiple clients have read access for one eld, then they require the encryption/decryption key only for said eld.When roles have access to multiple elds, then it is up to the developer to gure out which (potentially overlapping) elds must be encrypted with di erent keys, how those keys are distributed to each client, and tracking which keys must be used for which elds.This approach does not scale beyond a couple of elds with little or no overlap between roles.
Rather than replicating a database to users, many distributed database management systems (DBMS) have the option of using replication internally to increase availability and reduce latency.In this case replication is an implementation concern, and not part of the programming model.Data is not locally available at the clients, which instead must query the DBMS.Clients could use RDTs to make the data available o ine to them, but then they again face the original problems of insecure RDTs.

Access Control for XML Documents
In the early to mid 2000's XML was thought to be the future format for data interchange between systems.There is a body of work to enforce access control for XML documents such that only parts of documents are exposed to di erent users [Crampton 2006;Damiani et al. 2002;Fundulaki and Marx 2004;Murata et al. 2006].We based the speci cation of our security policy language on this work, resulting in our policy language using JSONPath versus their use of XPath, as well as similar security policy semantics.The run-time enforcement of security for XML documents is not directly applicable to o ine available replicated JSON data because security constraints are imposed only when an XML document is fetched, whereas replicated data is continuously (locally) modi ed.

Multitier Programming
Several tools and languages exist to develop (web) applications as a single code base which is automatically split into the multiple tiers of a distributed application (client, server, . . .), e.g., Hop.js [Serrano and Prunet 2016], ScalaLoci [Weisenburger et al. 2018], and Stip.js [Philips et al. 2018].Some of this work explicitly targets security, such as Swift [Chong et al. 2009] and Fabric [Liu et al. 2017], which use source code annotations to specify security constraints on code and information ow.These tools are an alternative programming model for developing secure-by-design distributed (web) applications.This paper departs from the fact that using replication to o er o ine availability is a given, as it has been widely motivated and is being used in practice.Multitier programming does not tackle the same concerns as SRDTs, namely o ine availability, eventual consistency, and secure access control for replicated data.

Byzantine Fault Tolerance for RDTs
Malicious entities can try to circumvent an RDT's security mechanisms by attacking the algorithm that the RDT uses to establish consensus.In the CRDT literature, a CRDT that can retain a mutually agreed ordering of data updates among replicas (needed for eventual consistency), even in the presence of malicious clients, is said to be Byzantine fault tolerant [Kleppmann 2022;van der Linde et al. 2020].In principle, the challenges posed by Byzantine faults are orthogonal to the approach in this paper.A solution that combines our approach (to enforce access control on the data on the application-level), with a technique such as the one by Yactine et al. [2021] on the implementation level would yield a Byzantine fault tolerant SRDT.However, since SRDTs require a central authority, in their current form there is no need for Byzantine fault tolerance to maintain eventual consistency.

Limitations and Future Work
The approach outlined in this paper can serve as a foundation for building advanced security features for RDTs.We brie y discuss the main limitations and possible avenues for future work.
7.5.1 Central Authority.One of the main limitations compared to other work on RDTs is the assumption of a single leader.Whereas RDTs are frequently used because they support decentralisation, there is also a need for RDTs in centralised designs.For example, in academia there is AutoCouch, a JSON CRDT framework which combines Automerge with CouchDB to support o ine availability in client-server web applications [Grosch et al. 2020].In industry, a central authority is already available and often desired.A prime example is Figma, a collaborative web application for user interface design (acquired in 2022 by Adobe for around $20 billion [Adobe 2022]), which uses CRDTs for o ine availability and con ict resolution, while explicitly omitting decentralisation.They note that: "Even if you have a client-server setup, CRDTs are still worth researching because they provide a well-studied, solid foundation to start with."[Wallace 2019] A full peer-to-peer implementation of SRDTs is interesting, but requires additional research (e.g., possibly to relax the security guarantees), and is outside the scope of this paper.

Arrays.
There are unsolved semantic issues when applying access control to arrays, e.g., to access only a part of an array.The problem is that the contents of the array can change at any moment in time, including when a client is o ine.Questions arise such as how the accessible parts of an array are represented on a client (e.g., are non-accessible entries removed and indices remapped?), and how to deal with retractions of access, e.g., when an object in an array is moved from an accessible part to a non-accessible part.How the programming model should be adapted to solve or avoid these semantic issues is an open problem.Rather than ignoring these issues, for now SRDTs cannot contain arrays.Note that SRDTs can still be used to build collections of items, but using an object's elds as opposed to an automatically indexed array.7.5.3Extended Policy Language.We deliberately kept the policy language small by o ering only simple permissions (read or write).Real-world security policies specify privileges that currently cannot be expressed in the policy language, such as role hierarchies [Sandhu 1998], DENY privileges, and inter-eld constraints [Oostvogels et al. 2017].
7.5.4Expanded Set of Policy A ributes.The current security policy language disallows privileges to depend on data in the RDT because this causes unresolved semantic di culties.More speci cally, an update to an RDT eld whose value is used in a security policy could cause a client (including the one who updated the eld) to lose access to elds.How to deal with retractions of access is an open issue.Additionally, malicious users may try to widen the scope of their privileges by writing to elds that are used in the security policy.A solution can be a part of a larger e ort towards Attribute-Based Access Control, where a security policy may depend on run-time data, resources, system environment, connection, and administrative decisions [Servos and Osborn 2017].7.5.5 Static Enforcement.Clients are expected to dynamically check whether they are allowed to write to elds of an SRDT.Including such a security check (e.g., as a library call) is undesirable because it mixes the enforcement of security policies and application logic.Practical solutions should statically enforce the security policy, e.g., via a type system that rejects programs that write to read-only elds.Similar to the original work on JSON CRDTs [Kleppmann and Beresford 2017], we leave the speci cation of a data schema as future work.
7.5.6Schema Migration.Support for schema migration and updates to the security policy of an already deployed SRDT are open problems.7.5.7 Compound Changes.The language features supported by ReplicaLang are hampered by the lack of support for writing compound data structures to elds, or moving subtrees in an RDT.Recent work by Kleppmann et al. [2022] adds these features for JSON CRDTs, but our formalism cannot yet verify these features' security.[Rinberg et al. 2022] implementations as well, though 2 challenges need to be addressed.First, for state-based RDTs, computing "state deltas" at the leader to verify that all changed elds were permitted to be written to.Second, designing a consistency mechanism which merges a partial state (namely, the part of the state that is in the projected data of some client) with the complete state at the leader.

CONCLUSION
This paper proposes SRDTs, a data type that speci es role-based access control for RDTs.This is an important step towards practical implementations of RDTs for applications with extra security constraints such as business applications, especially when parts of the application are ran in unsecured environments such as ordinary web browsers.Concretely we identi ed 3 problems, namely (1) Replicated Data Leaks, where sensitive data is inadvertently replicated to clients which should not have that data, (2) Data Contagion, where modi cations of a client to a local replica will be merged with the replicas of other clients as well regardless of whether those changes were permitted, and nally, (3) the Lack of O ine Policy Enforcement, where any enforcement mechanism must be available o ine.
To overcome the identi ed problems, SRDTs demonstrate a combination of Role-Based Access Control and o ine-available JSON data to securely replicate said data over a network.To prevent Replicated Data Leaks, a leader de nes multiple projections to exclude any data for which a client with a particular role does not have read privileges.To prevent Data Contagion, a leader acts as an intermediary between all clients to prevent malicious writes (that do not conform to the security policy) from reaching other clients.Finally, to enable O ine Policy Enforcement, each client receives an excerpt of the global security policy which contains the privileges that apply to their role, such that it can be enforced locally.
An operational semantics of SRDTs was implemented in PLT Redex.We validated this speci cation via formal proofs that verify that SRDTs do not su er from the identi ed problems, and that the underlying properties of RDTs (such as eventual consistency) are una ected.Furthermore, we used randomised testing to experimentally check the absence of the identi ed problems, which uncovered multiple bugs and 1 security problem that existed in earlier versions of the formal speci cation.

DATA-AVAILABILITY STATEMENT
The full executable implementation of the formal speci cation in Redex (Section 4) is available as a software artifact [Renaux et al. 2023].This speci cation was used for the randomised testing of Section 6.Furthermore, we provide an easy to use command-line interface (not discussed in this paper) to interact with SRDTs (e.g., via the running example of Section 3).The artifact is available on Zenodo via the following link: https://doi.org/10.5281/zenodo.8310917.
Mockup of a sighting, biologist's view.

Fig. 1 .
Fig. 1.Example data model and GUI mockup for a citizen science application.
Fig. 9. Terms of ReplicaLang.Terms from CommonLang such as atom, priv, d and env are inherited.

Fig. 10 .
Fig. 10.The complete set of reduction rules for expressions in ReplicaLang.

Fig. 11 .
Fig. 11.Verifying whether a path matches a path selector in a user environment in CommonLang.
Fig. 13.LeaderLang: Selecting the readable projection of a data structure for a certain role.
7.5.8State-based RDTs.SRDTs assume an underlying operations-based RDT like Automerge [Automerge Contributors 2023] or Yjs [Jahns and Yjs Contributors 2023].Future work can apply the results to state-based or delta-based

Table 1 .
Overview of the supported types of path expressions adapted from JSONPath.