Maybenot: A Framework for Traffic Analysis Defenses

End-to-end encryption is a powerful tool for protecting the privacy of Internet users. Together with the increasing use of technologies such as Tor, VPNs, and encrypted messaging, it is becoming increasingly difficult for network adversaries to monitor and censor Internet traffic. One remaining avenue for adversaries is traffic analysis: the analysis of patterns in encrypted traffic to infer information about the users and their activities. Recent improvements using deep learning have made traffic analysis attacks more effective than ever before. We present Maybenot, a framework for traffic analysis defenses. Maybenot is designed to be easy to use and integrate into existing end-to-end encrypted protocols. It is implemented in the Rust programming language as a crate (library), together with a simulator to further the development of defenses. Defenses in Maybenot are expressed as probabilistic state machines that schedule actions to inject padding or block outgoing traffic. Maybenot is an evolution from the Tor Circuit Padding Framework by Perry and Kadianakis, designed to support a wide range of protocols and use cases.


Introduction
After decades of effort, end-to-end encryption is becoming ubiquitous, with modern protocols such as QUIC [33], HTTP/3 [7], TLS 1.3 [61], DoH [28], DoQ [32], MLS [4], and more encrypted by default.In parallel, technologies that provide unlinkability between end users and their IP addresses under various threat models are being increasingly adopted, such as Tor [16], VPNs [17,18,60], and Apple's iCloud Private Relay [3,49].Together, these two trends are making it harder for network operators to detect and block malicious traffic and for attackers to identify and target specific users.The final frontier is traffic analysis: inferences based on the metadata of encrypted traffic.
While traffic analysis attacks have been studied for a long time [13,39,43,67,76], we see relatively few deployed real-world defenses.Most defenses can be found in technologies aiming for user-IP address unlinkability, such as Tor [16]; however, they remain relatively modest in induced overhead [52], limiting their efficacy against many attacks.Recent protocol standards do support essential building blocks and note that they may be used to defend against traffic analysis (cf.support for the PADDING frame in QUIC [33, §21.14] and reflections on traffic analysis in TLS 1.3 [61, §3]), yet they do not mandate any specific use.
There are several reasons for the lack of deployed traffic analysis defenses.For one, the negative performance impact of padding and added latency is a significant concern: with high costs, the benefits must be very clear.This is compounded by the fact that the landscape of traffic analysis attacks and defenses is rapidly evolving, driven by advances in deep learning and artificial intelligence [42].Additionally, one must recognize that it has taken decades to achieve widespread adoption of end-to-end encryption.Without robust encryption, data/payload leakage has typically been perceived as a more significant threat than metadata.
This paper presents version 2 of Maybenot, a framework for traffic analysis defenses.Maybenot is designed to be easy to integrate into existing protocols and flexible enough to support a wide range of traffic analysis defenses.Version 1 is also documented on arXiv [54] and partially covered by a publication at WPES 2023 [56].Maybenot has been integrated with WireGuard by Mullvad VPN into their Defense against AI-guided Traffic Analysis (DAITA) 1 VPN feature.
Section 2 provides background on traffic analysis with a focus on website fingerprinting defenses.Section 3 presents the Maybenot framework, followed by Section 4, which describes machines that run in the framework.Section 5 covers the Maybenot simulator for simulating network traces with Maybenot running at the client and server side.Section 6 discusses limitations and the merits of a framework and motivates the changes made in version 2 of Maybenot.Section 7 explains related work, and Section 8 concludes the paper.

Background
Maybenot is mainly sprung out of work on website fingerprinting defenses, which is closely related to end-to-end flow correlation/confirmation [14,15,30,44,47,57].We therefore focus on website fingerprinting in Section 2.1 and the broader impact of traffic analysis defenses in Section 2.2.

Website Fingerprinting
In the website fingerprinting (WF) setting, a local, passive attacker observes encrypted network traffic generated by a client visiting a website using some proxy or relay that hides the destination IP address-typically Tor [16], but it could just as well be a VPN [9,10,26,27,41,71] or some developing standard like MASQUE [3,49].The goal of the attacker is to infer the website the client is visiting (or the subpage of a website rather than the typical frontpage, in the case of webpage fingerprinting).Section 2.1.1 gives an overview of WF attacks and Section 2.1.2 of WF defenses.

Attacks
WF attacks can be grouped based on how they deal with feature engineering: manually or automatically.Both types of attacks make use of the sequence of packets in a network trace as well as their directions, timestamps, and sizes (if applicable: most attacks are tuned for Tor, where traffic is packed into constant-size cells of 514 bytes 2 ).
In gist, the community spent about a decade discovering the most valuable features for WF, culminating in the kfingerprinting attack by Hayes and Danezis [24] and CUMUL by Panchenko et al. [48].They both have on the order of 100-200 features, where the top 10-20 features provide the overwhelming majority of utility.
Unfortunately, progress in the area of deep learning from 2016 and onward has made automatic feature engineering practical and superior to manual feature engineering [1,5,6,59,62,65,69].An attacker trains on raw 3 network traffic and uses the learned features for classification.This is a significant improvement over manual feature engineering, as it does not require any domain-specific knowledge.Yet, it is superior to manual feature engineering in terms of attack effectiveness.State-of-the-art attacks include Deep Fingerprinting by Sirinam et al. [69] and Robust Fingerprinting by Shen et al. [65], which both made notable gains compared to k-fingerprinting and CUMUL on mature WF defenses.
2 https://spec.torproject.org/tor-spec/preliminaries.html?highlight=514#msg-len 3 Albeit the trace representation used as input differs-such as a sequence of directional timestamps [59] or per-direction counts of packets within time windows [65]-which is a (minimal) form of feature engineering.

Defenses
The community has been working on defenses in parallel to the development of WF attacks.They can be categorized into imitation, regulation, alteration, traffic splitting, and adversarial techniques [30,31].We briefly describe each category to showcase the diversity of defenses and highlight some notable examples.
Imitation defenses aim to make traffic from visiting one website (nearly) identical to that of another website.Some canonical examples are Walkie-Talkie [75], which groups "sensitive" pages with "non-sensitive" pages, ensuring each pair has identical burst sequences; and Glove [46] and Supersequence [73], which group pages into explicit sets and shape their traces to appear identical.A recent defense, Palette [66], uses explicit sets optimized to reduce overhead.Such defenses require knowledge of traces (or their key features) to shape traffic towards, which is a practical concern.
Regulation defenses aim to regulate traffic, making all traces look the same as some target trace or pattern.Tamaraw [8] is a highly effective constant-rate defense but incurs high overheads.WTF-PAD [37] is based on the concept of Adaptive Padding [67] and attempts to regulate bursts to a target distribution using padding.While WTF-PAD is effective against attacks with manual feature engineering, deep learning-based attacks significantly reduce its effectiveness [69].RegulaTor [31] primarily uses padding with minimal latency (blocking) to withstand deep learning-based attacks with modest overheads.Suarakav [22] uses Generative Adversarial Networks (GANs) to generate realistic traffic patterns dynamically and regulates traces to match a generated pattern, similarly with modest overheads.
Alteration defenses aim to alter traffic to such an extent that traces become useless for classification.FRONT [20] and DeTorrent [30] are the most effective alteration defenses.FRONT obfuscates the front of a trace with padding, while DeTorrent uses a Long Short-Term Memory (LSTM) network to determine how to add download padding and keeps a fixed 1:5 ratio between upload and download traffic.Interestingly, both are highly randomized padding-only defenses.
Traffic splitting defenses build upon the assumption that the client has multiple paths available and that only one of the paths is observable to an attacker.Which path is observable by the attacker is unknown, and the attacker cannot adaptively choose a path.Defenses unpredictably split traffic between the paths according to some splitting strategy.Three such defenses are TrafficSliver [40], HyWF [25], and CoMPS [72].These defenses are less effective against tailored deep learning attacks, even with the niche and perhaps unrealistic assumption that the attacker can only observe a single path [5].
Adversarial techniques are used by Mockingbird [58] and BLANKET [45] to defend against deep learning-based attacks in particular.While promising, this family of defenses typically struggles against an adaptive attacker [30].

Real-World Impact of Traffic Analysis
The real-world impact of traffic analysis attacks has been widely discussed-not only in the website fingerprinting community [12,36,48,50,74] but also in the broader traffic analysis community concerning end-to-end flow correlation/confirmation attacks [13-16, 39, 43, 44, 47, 63].
In summary, when evaluating defenses, we consider empowered adversaries that may be given unrealistic capabilities or operate under simplified assumptions.Examples include being able to determine the start and end of network traces [74], a limited number of possible websites to evaluate [36,48,55], and the ability to train under the same network conditions as victims [12].While such empowered attacker models are useful for evaluating defenses, they may not accurately convey the threat posed by traffic analysis [35].
Regardless, a key takeaway is that there is an inherent tradeoff between effectiveness and efficiency in the space [14,15], i.e., stronger protection comes at the cost of higher overhead.Being able to tune defenses to assessed needs for different use cases is a key feature of any defense (framework).This is where Maybenot gets its name: the goal is to enable integrators to shed doubt on the attacker's ability to draw conclusions from traffic analysis of a protocol's traffic.

Maybenot Framework
Maybenot is a framework for traffic analysis defenses.Defenses are implemented as Maybenot machines running in the framework; this approach is adopted from Tor's Circuit Padding Framework [51,52], further described in Section 7. In this section, we look at the Maybenot framework from a top-down perspective, showing how Maybenot can be integrated into a protocol such as TLS or QUIC.Maybenot is written in the Rust programming language, so we use Rust code snippets to complement our explanations.
Maybenot is designed to be used as a library (Rust crate), integrated into existing protocols.There should be one instance of the framework per connection and party/participant.For example, in the case of Tor, this could be per circuit between a client and relay or between relays in the network.Similarly, it could be per connection for TLS, per stream for QUIC, or even per participant in an MLS group chat.To fit such a wide range of protocols, we emphasize that Maybenot is a piece of the puzzle, not a complete solution.

Creating an Instance
Figure 1 shows how to create an instance of the Maybenot framework.The framework takes zero or more machines (see Section 4) as input that are run in parallel.Creating the framework is a lightweight operation: simply recreate the framework if a machine should be added or removed.
The next two arguments are fractions, i.e., they can have values in the [0, 1] range.They are upper limits on padding traffic and blocking duration, respectively, enforced by the framework for all machines (0 means no limit).max _ padding _ frac is expressed in relation to the total number of packets queued on the defended connection so far, and max _ blocking _ frac is relative to the time elapsed since the framework instance was created.Having framework-wide limits is important when multiple machines are used.It also provides a simple way to control the overhead of using Maybenot, potentially as an easy way to tune defenses.
The current _ time is passed to the framework to control the perceived time of the machines.This makes it easier to implement some types of defenses (varying notions of time, e.g.real-world or epoch/step-based) and support the simulation of Maybenot (see Section 5).It also serves to ease integration by minimizing the number of dependencies in the framework.Note that instants in time and durations are represented as Rust traits (similar to interfaces) in Maybenot.This allows using Maybenot with custom time sources, such as coarsetime 4 , significantly improving performance on some architectures and for some use cases (e.g.frequent sampling of time from user space requiring system calls).
Finally, rng specifies the PRNG the framework will use for sampling.Exposing the PRNG provides additional flexibility to the integrator and simplifies testing.One particularly noteworthy use case is shared randomness between communicating endpoints using the framework (perhaps based on some protocol-dependent secret), which may be helpful for randomized defenses that require synchronization.It can also contribute to enabling endpoints to reason about the expected behavior of the other endpoint, i.e., whether the observed padding or blocking behavior is expected or not.With an instance of the framework created (or a suitable error returned, e.g. if a fraction is not within the range [0, 1]), there are only two more steps to integrate Maybenot into a protocol.Note that the created instance is lightweight and separates machine definitions from their runtime state, so machines can be efficiently shared between many instances.

Triggering Events
Events related to the connection (tunnel) the instance of the framework is protecting need to be reported.The events to trigger are defined in Figure 2. In many cases, these events are caused directly or indirectly by the actions of machines in the framework, as explained in Section 3.3.
When normal (non-padding) or padding data is queued for sending (before encryption, NormalSent/PaddingSent) or has been processed from a packet (after decryption, NormalRecv/PaddingRecv), the framework needs to know.The same applies when an encrypted packet is sent or received via the tunnel (TunnelSent/TunnelRecv).Note that when padding is queued the framework requires the identifier MachineId of the machine that caused the padding.
When a machine starts blocking outgoing traffic, the framework needs the MachineId of the machine (BlockingBegin) and notification when blocking ends (BlockingEnd).
Finally, the integrator must keep track of two timers for each machine, one of which (the internal timer) can be set explicitly by the machine.The TimerBegin and TimerEnd events must be triggered when this timer is set and expires, respectively.We explain this in more detail shortly.For each event in Figure 2, the integrator needs to detect when they occur and trigger them in the framework.This is done by calling the trigger _ events function, as shown in Figure 3, with one or more events.Note that in Figure 3, the current time is an argument: Instant::now().As described above, the integrator is responsible for providing a notion of time to the framework.

Scheduling and Performing Actions
The trigger _ events function in Figure 3 returns an iterator of actions.This is the heart of the integration with the framework: given that one or more events happened at a particular point in time on the connection, which actions should be taken, according to the running machines?
The possible actions are defined in Figure 4.Each machine running in the framework has at most one scheduled action (padding or blocking), and other actions are carried out immediately.As an integrator, you need a per-machine timer for scheduling actions and an internal timer that the machine can set explicitly (two timers per machine).The integrator is responsible for scheduling and performing the actions returned by the framework as specified.This division of responsibilities is intentional.It allows the framework to remain simple and lightweight while integration can be tailored to the specific protocol and implementation: for example, in Rust, the choice of asynchronous runtime (e.g.Tokio5 or async-std 6 ) is a significant implementation detail.

Timers
An integrator must maintain two timers for each MachineId: an action timer and an internal timer, as shown in Figure 5.The action timer is set according to the timeout field of padding and blocking actions (further described shortly).Upon its expiration, the scheduled action for the machine is to be taken.Thus, a Cancel action that specifies the Action timer MUST have the effect of canceling any scheduled action for the machine with the given identifier machine.
Each machine also has an internal timer: though it is not actually internal to the framework, it is updated based on the machine's internal logic.The UpdateTimer action sets the internal timer for the machine with identifier machine.If the replace flag is true, the timer MUST be set with the given duration; otherwise, it MUST be set to the longest of the remaining timer duration and the provided duration.The internal timer can be silently canceled via a Cancel action, in which case the TimerEnd event MUST NOT be triggered.
If a Cancel action has timer All, both the scheduled action for the machine and its internal timer MUST be canceled.

Send Padding
Schedule sending a padding packet after a given timeout.The padding size is not specified; this is left to the integrator to decide.Two flags control how the padding is sent: bypass and replace.
The bypass flag indicates if the padding packet MUST be sent despite active blocking of outgoing traffic ("bypass blocking").This is only allowed if the active blocking was set with the bypass flag ("bypassable blocking"); see Section 3.3.3.If the replace flag is set, it indicates that the padding packet MUST be replaced by any enqueued (and blocked from being sent on the tunnel) normal data, if available.
The combination of bypass and replace flags can be used to build machines that send constant-rate traffic.For example, consider a machine that first triggers bypassable blocking and then, at a constant rate, triggers the action to send padding with the bypass and replace flags set.At each action, either padding or enqueued normal data will be sent.

Block Outgoing
Schedule the blocking of outgoing traffic for a duration after a given timeout.The blocking is for all outgoing traffic on the connection across all machines running in the framework instance.Two flags control how the blocking is performed: bypass and replace.
If the bypass flag is set, the caused blocking MAY be bypassed by padding packets with the bypass flag set.We call this "bypassable blocking".If the bypass flag is not set, the caused blocking MUST NOT be bypassed by any means.This design is motivated by ensuring that it is possible to create fail-closed blocking defenses.
If the replace flag is set, the blocking duration MUST replace any current active blocking duration.If the replace flag is not set, the new blocking duration MUST be the longest of the remaining blocking duration and the provided duration.This ensures that the only way to reduce the blocking duration is by setting the replace flag.Any bypass flag on active blocking is only updated if the duration is updated (i.e., longer duration or the replace flag is set).

Example Main Loop
Figure 6 shows an example main loop to illustrate the framework's intended usage.In gist, select your machines and create an instance of the framework.Then, in a loop, periodically collect events and trigger them in the framework.The framework will return an iterator of actions to schedule.The integrator is responsible for performing the actions as specified.Note that trigger _ events takes a slice of events as input, making it possible to trigger multiple events simultaneously.
Regardless of the number of events, the result is a maximum of one action per machine running in the framework.This skips unnecessary updates to actions in case the main loop is run infrequently or the number of events is high.Though ideally there would be a separate loop iteration for each event, as a rule of thumb, drop/overwrite the oldest events first if you run out of space for events to trigger (e.g. a full events queue).

Maybenot Machines
The Maybenot framework is a runtime for Maybenot machines.A machine has some runtime state in the framework, processes triggered events, and produces actions.Each machine has up to one scheduled action at any point in time.
Maybenot machines are probabilistic finite-state machines.They are based on the notion of "padding machines" (nondeterministic finite-state machines) in the Tor Circuit Padding Framework [51,52], further discussed in Section 7.
To explain Maybenot machines, we take a bottom-up approach, starting with distributions (Section 4.1), then states (Section 4.2), and finally machines (Section 4.3).

Distributions
Central to a machine is the notion of a distribution.Distributions are frequently sampled as machines transition between states and produce actions.Figure 7 shows the types of distributions supported by Maybenot machines, provided by the rand_distr7 crate.Likely, the supported distributions are both too many and wrong for many applications.Extending support is straightforward, though.Figure 8 shows the core Dist struct and the sample method used to sample distributions.The dist field specifies the distribution type and parameters.The start field is the starting value added to the value sampled from the distribution, and max is the maximum value that can be sampled (0 for no limit).Note that, due to what the sampled values are used for (explained in Section 4.2), the resulting value is clamped to be [0.0,max].Values are floats, rounded to discrete values if needed based on what they will be used for.When sampling times, the framework operates in microseconds.Sampling distributions is the most computationally expensive part of triggering events in the Maybenot framework.Efficient PRNGs may significantly improve performance, but, to err on the side of caution, we recommend using a cryptographically secure PRNG.Interesting future work is thoroughly evaluating if using a non-cryptographically secure PRNG would be safe for some use cases (famous last words).

States
Figure 9 shows the core State struct of machines.It consists of an action, a counter update, and state transitions.

Actions
A state can either be a no-op state by setting the action field to None, or it can optionally specify one of the four actions in Figure 10.
An action has associated distributions that are sampled.The duration distributions for blocking and timer actions are used to sample the duration that will be returned to the integrator.The bypass and replace flags are used to carry out the actions as described in Section 3.3.An action is scheduled every time a machine transitions.Note that, since only one action is allowed per machine per call to trigger _ events, the action of the last state (with an action: note from Figure 9 that the action is optional) transitioned to is scheduled.

Counters
Each machine has two (discrete) counters, updated upon transition to a state if its counter field is set.The Counter struct, shown in Figure 11, specifies the operation to apply to a counter (increment, decrement, or set), an optional distribution to sample a value from (default one if not set), and if a copy should occur, in which case the previous value of the other counter is used and the distribution is ignored.When either counter is decremented to zero, the framework triggers a CounterZero event internally for the machine.However, to prevent infinite looping and restrict execution time, only one CounterZero event may be triggered per counter and call to trigger _ events across all machines.The counters are part of the machine runtime kept by the framework.

Transitions
Each state has a vector of state transitions for all possible events (Figure 12).State transitions are specified with Trans, a tuple struct consisting of a state index to transition to and the probability.Probabilities must sum to at most one.Figure 13 shows an example with transitions on four events.Observe that the probabilities do not need to sum to one.This is to support machines that only transition with a small probability.
The state index may also be the pseudo-state STATE _ END, which does not cancel any scheduled timer but permanently ends the machine, preventing future transitions.Another pseudo-state, STATE _ SIGNAL, sends a signal to all other machines running in the framework-i.e., transitioning to it triggers a Signal event.If multiple machines signal during the same call to trigger _ events (including when responding to signals), all machines receive exactly one signal.

Limits
A state can transition to itself, and a (discrete) limit can be sampled on the number of actions allowed to be repeatedly scheduled due to such self-transitions.There is no limit if the limit dist of the state's action is None.Otherwise, the limit is decremented when padding is queued, blocking is started, or the internal timer is set.When the limit reaches zero, the framework triggers a LimitReached event internally for the corresponding machine.A new limit is sampled each time a state is transitioned into from another state.

Machines
Figure 14 shows the Machine struct.In gist, a machine consists of a vector of states and fields that limit the machine's behavior.When a machine is added to the Maybenot framework, it is set to state 0 and no action is scheduled.Actions are scheduled when/if the machine transitions.Note that a limit is sampled for state 0 to allow immediate self-transitions.

Limits and Allowed Actions
Per-machine limits are set by the max _ padding _ frac and max _ blocking _ frac fields.This is in addition to state limits (Section 4.2.4) and global framework limits (Section 3.1).If any limit is hit, the machine will be prevented from scheduling an action on state transitions until below all the limits.
However, the two fields allowed _ padding _ packets and allowed _ blocked _ microsec are not limits but rather budgets of allowed actions.These budgets ignore all limits and should, therefore, be used carefully.This design might seem a bit reckless, but it is motivated by machines that act early on in the connection, when any reasonable fraction of limits would be exceeded.For example, a machine might want to obfuscate handshakes, as is done in Tor for the setup of onion circuits [52].Machines are given their budgets on framework instance creation and all actions (regardless of if allowed by limits or not) subtract from them until depleted.

Serialization and Deserialization
Machines can be serialized to and deserialized from Base64encoded strings.We use Serde 8 , a commonly used framework for serialization and deserialization in Rust, together with the bincode 9 encoder/decoder.Encoded machines are then compressed using zlib 10 before finally being Base64-encoded.By default, Maybenot enforces a maximum decompressed memory size of 1 MiB and performs extensive parameter validation to make it safer to parse machines from untrusted inputs.The use of Serde makes it trivial to create custom serialization and deserialization logic if needed.
Note that a machine is separate from its runtime (memory keeping track of its current state) kept in each instance of the framework using it.This means that many instances of the framework can share the same machine, since each instance will have its own runtime.No runtime information is ever serialized.The size of the runtime is constant, i.e., the number of states in a machine has no impact on runtime memory.

Maybenot Simulator
Developing effective and efficient machines is challenging.In an ideal world, machines would be evaluated by collecting real-world datasets of the machines running in Maybenot integrated with the intended protocol.Unfortunately, this is typically prohibitively expensive.For such cases, we provide 8 https://serde.rs/ 9 https://github.com/bincode-org/bincode 10https://www.zlib.net/ a simple bare-bones simulator implemented in Rust as a crate, similar to the framework.
The goal of the simulator is not to be perfect-whatever that would entail, given that the framework is designed to be integrated into a wide range of protocols-but to be useful.Hopefully, most development can be done with the simulator and only fine-tuning is needed for later integration.
The core idea of the simulator is to simulate how a base network trace would have looked if one or more machines were running at the client and/or server.Therefore, there are two steps to using the simulator: parsing a base network trace and simulating machines on the trace.

Parsing a Base Network Trace
Figure 15 shows a network trace with ten packets when visiting a website and how to parse it.The trace is a string of lines, where each line is a packet with the format "timestamp,direction\n".The timestamp is the number of nanoseconds since the start of the trace, and the direction is either "s" for sent or "r" for received (from the client's perspective).
To parse the trace, the simulator also takes a network model of the latency between the client and server.The network model is used to simulate event queues for the client and server, so packets are sent and arrive at the client precisely as in the provided trace.This is a crude approximation of the network between the client and server and should be improved to make the simulator more useful in the long term.The resulting input trace is a queue of Maybenot events.Note that the simulator will change the queue ("mut"), so repeated simulations using the same queue must clone it.

Simulating Machines
Figure 16 shows an example of simulating a machine on the trace from Figure 15.The simulator supports zero or more machines running at the client and server.Because machines may run forever (e.g.sending more padding on padding being sent), it is possible to set the maximum number of events (client and server) to simulate and a flag to filter out events only related to network packets.The simulator's output is a vector of events describing the simulated trace.Parsing the output is straightforward.The output is a vector of Maybenot events, including a simulated timestamp, a flag indicating whether the event is from the client or server, and a flag (for TunnelSent and TunnelRecv events) indicating whether the packet contains padding.Figure 17 shows an example of printing events related to network packets going through the tunnel at the client.For more advanced use cases, the simulator exposes a sim _ advanced() method with more comprehensive settings; see Figure 18.Notably, the simulator supports optional in-tegration delays at the client and server.Integration delays, as the name suggests, can be present in integrations of Maybenot, for example, if Maybenot is running in user space and an integrated protocol runs in kernel space.The simulator supports delays surrounding performing actions and reporting events; see the Rust documentation for more details.

Simulator Internals
One detail of the internals of the simulator to highlight is its core logic in moving simulated time forward.The internal state of the simulator is driven by four possible sources for the next event to simulate: event queue the next event in the main event queue given the state of the client and server (active blocking, bypass and replace flags, etc).
block expiry active blocking expiring at the client or server.
internal timers an internal timer for a machine expires.
scheduled actions any action scheduled by a machine running at the client or server.
The simulator will always simulate the next event from one of these sources.When sources have their events simultaneously, the simulator will prioritize in the order of event queue, block expiry, internal timers, and, finally, scheduled actions.The logic behind this is that events in the event queue typically happen outside of Maybenot (e.g.network packets), while blocking expiration, internal timers, and scheduled actions are tied to how Maybenot is integrated into a protocol.We assume that such logic will not be integrated inline in the protocol but instead run in a lightweight thread or similar.The above prioritization in the simulator might be noticed when base network traces lack timestamp granularity and many resulting events end up at the same time in the parsed queue.

Discussion
The ultimate problem at hand-improved traffic analysis driven by rapid developments in AI/ML-is shared by many protocols.So are the key building blocks of defenses: padding and blocking.Maybenot is a library that aims to capture the main common features of traffic analysis defenses and leave the protocol-specific integration to the integrator, e.g.negotiation and how to cause the padding and blocking.The hope is that this will allow for tailored use of the library in different protocols, with the added benefit of a common framework with many available defenses.What features should or should not be included in the framework is a balancing act.

Padding and Chaff
One important feature missing from Maybenot is inherited from the focus on Tor with constant-size packets (cells) in the defense community.Maybenot does not address the question of padding packets to fixed sizes but instead adopts the simple notion of padding packets (chaff).We believe it is an open question whether it is even possible to make an effective and efficient defense against more advanced traffic analysis, such as website fingerprinting, on normal packets of variable sizes.As part of integrating Maybenot into a new protocol, the integrator will likely need to carefully consider this issue in their analysis.

Sending and Queues
Version 2 of Maybenot provides events for queuing/processing normal or padding data and sending/receiving encrypted packets over the defended tunnel; however, what this means in practice will depend on the details of the protocol with which the framework is integrated.Consider the PaddingSent event (padding data queued to be sent).That the event is to be triggered as part of performing the scheduled padding action is natural.However, how this is implemented in the protocol is not.For example, it could be implemented as custom (distinguishable) data queued to be sent through the protocol, just like other application data.However, it could also be implemented as a particular packet type in the protocol sent immediately.The differences are significant because in the former case, the padding would be queued up with other application data and, in the latter, immediately turned into a packet and egress queued (delivered) in the network stack heading towards the NIC.
From the above, it should be clear that the exact semantics of the PaddingSent event are not defined by the framework; the same is the case for the NormalSent, TunnelSent, and receive events.This may or may not matter.Note that the distinction between queuing (PaddingSent and NormalSent) and delivery (TunnelSent) should, if possible, be made in such a way as to provide benefits during periods of conges-tion, allowing padding limits to be reached even if packets are dropped or delayed due to network conditions.As a rule of thumb, what "queueing" entails will depend on the specific protocol that the framework is integrated with, but it should capture any attempt to send data via the protocol, and we expect "tunnel" events to occur when data has been passed to some lower layer in the network stack.In general, machines that make sense with the selected semantics should be chosen.

Expressing Defenses
Another consideration is the focus on state machines, inherited from the Tor Circuit Padding Framework [51,52].While state machines have been used in the community in the past [38,51,67] and Maybenot expands on this by supporting multiple probabilistic machines running in parallel, it is far from a given that this is the best approach.Smith et al. in QCSD [70] use a single regularization algorithm with target traces as input, and Gong et al. in WFDefProxy [21] simply use the programming language Go.Another example is FAN by Rochet and Elahi [64], which uses eBPF [19,29] to make anonymous networks more flexible by allowing dynamic updates to protocols, including the Tor Circuit Padding Framework.Using eBPF or similar technologies, such as We-bAssembly [23], may allow for richer defenses than the state machines offered by Maybenot while retaining the flexibility gained from not using a hardcoded defense.This comes at the cost of much added complexity, though, compared to the simple state machines in Maybenot with a "runtime" of 1,555 loc in Rust (and 2,467 loc of tests) at the time of writing.Consider Maybenot's API for integration, covered in Section 3. Ignoring the machines for a moment, the main loop is basically "report relevant events at a point in time and get actions to schedule in return".This interface fits the vast majority of defenses.Even the recent work of using GANs (e.g.Surakav [22]) and LSTMs (DeTorrent [30]) as defenses can be expressed in terms of such an interface.Surakav [22] samples burst sizes from a GAN (discrete steps mapped to bursts), while DeTorrent operates on log-scale timed bins.Future versions of Maybenot could replace state machines with, say, neural networks or more general-purpose programming languages with few API changes as a result.Alternatively, solutions such as Surakav or DeTorrent could be used to generate state machines for Maybenot.One reason to prefer state machines over, e.g., neural networks is performance.Inline inferences in neural networks are typically much slower than state machines, though the very rapid ongoing developments in AI/ML may change this in the long term.
Finally, we remark that version 2 of the framework includes two counters per machine, a per-machine timer, and a mechanism for explicit signaling between machines.We expect this to improve the framework's expressiveness substantially while maintaining its simplicity.However, Maybenot is the first state machine-based framework to introduce such notions; thus, their practical utility for implementing defenses-and whether they sufficiently support effective defenses-is an open question to be answered through experience.This will aid future iterative development of the framework.

Why a Framework
Why bother with a framework for defenses instead of directly implementing defenses?For one, defenses are often moving targets.The last decade has seen large improvements in traffic analysis attacks, particularly around website fingerprinting, and defenses have been improving in response.A prime example here is from the Tor Project.They started out intending to implement the WTF-PAD defense by Juárez et al. [37], but the Deep Fingerprinting attack by Sirinam et al. [69] greatly reduced its effectiveness compared to earlier evaluations against (among others) the k-fingerprinting attack by Hayes and Danezis [24].So, instead of implementing WTF-PAD, the Tor Circuit Padding Framework was born [51,52].
A framework also allows for the easy combination of multiple defenses.Combinations have been shown to be effective [22,25,72].The selection of defenses to combine could also be dynamic and adaptive, e.g. based on current normal traffic to hide moments of inactivity or turned off for bulk downloads.A framework is part of orchestrating defenses.
Another consideration is moving from website fingerprinting to webpage fingerprinting defenses.While most web traffic is encrypted, it goes directly between the client and the server.Any network attacker can, therefore, in the vast majority of cases, simply perform website fingerprinting by observing all relevant IP addresses [68].In a webpage fingerprinting setting, optimal defenses would be per website and optimized for the pages distributed by that website.It is known that application-layer knowledge can be used to create more effective and efficient defenses [11].A framework for defenses integrated into, e.g., QUIC or HTTP/3 would allow for tailored per-site defenses.Websites could distribute serialized defenses to clients upon connection establishment, or the server could implement the client side of the defense partly in the application layer (the inverse of QCSD [70]).

Simulator Limitations
The simulator has several limitations.For one, it is a simulator!We are simulating the integration with the application/destination using the framework and the network between the client and server/relay.We have a sim2real problem.
In terms of networking, we use a fixed static delay.This should be improved and evaluated against real-world network experiments.The goal of the simulator is not necessarily to be a perfect simulator but to be a useful simulator for making different kinds of traffic analysis defenses.One possible source of inspiration for improving the networking is Shadow [34].
There are also fundamental issues with simulating the blocking actions of machines.Because the simulator takes as input a base network trace of encrypted network traffic, we do not know any semantics or inter-dependencies between the packets in the encrypted trace.As a result, we cannot correctly simulate blocking actions.For example, if a machine blocks a packet, we cannot know if the blocked packet contains a request for a resource that leads to a response in the following received packets.The simulator will still happily receive the resource in the encrypted network trace.

Lessons Learned from Version 1
Early work on implementing defenses in version 1 of Maybenot (e.g.[56]) yielded several insights into which capabilities are needed to support efficient and effective defenses.In particular, it became clear that it is possible to approximate several hand-crafted defenses using a simple probabilistic state machine model.Still, more expressiveness is required to fully capture their intended behavior and achieve the desired trade-offs between overhead and protection.
It is often necessary to count events-primarily related to packets sent/received-and this was previously accomplished via machines with many states and carefully chosen limits on self-transitions.However, this can complicate machine design and lead to significant runtime overheads, rendering defenses impractical.Thus, a significant addition to Maybenot v2 is counters: two are available to each machine, and they can optionally be updated upon transition to a state.A CounterZero event is triggered when either is decremented to zero.This raises the grammar of Maybenot in the Chomsky hierarchy (in fact, two counters can, in principle, simulate an arbitrary Turing machine [77]) and facilitates simple machine design, richer machines, and efficiency.
Time-based actions are also common in traffic analysis defenses, but it was not possible for machines to explicitly measure time in Maybenot v1.This has led to introducing a permachine internal timer, updated based on a machine's internal logic but maintained by the integrator.The UpdateTimer action can be used to set the internal timer, and TimerBegin and TimerEnd events signal that it has started or expired, respectively.One advantage of having the integrator be responsible for the timer is that varying notions of time (e.g.discrete steps or wall time) can be selected depending on the use case.We expect this feature to be handy when implementing hand-crafted defenses.
In the same fashion, it is now possible for the integrator to specify a PRNG for all sampling done by the framework.Though we caution that using a non-cryptographically secure PRNG may have unintended and unexpected consequences, this allows for selecting the trade-off between efficiency and statistical properties.It also paves the way for randomized defenses that require synchronization: the same seed can be used for the PRNG on both ends of a defended tunnel.An example of a defense that may benefit from this capability is Surakav, which could not be implemented in v1 [78].
Even at a more local scale, maintaining a shared state between or otherwise synchronizing multiple machines running in the same framework instance is often desirable.For this, we have added a STATE _ SIGNAL pseudo-state, which, when transitioned to, sends a signal to all other running machines.This supports Maybenot's vision of providing a platform for orchestrating defenses and eliminates the need for, e.g., synchronizing by re-enabling blocking, which is cumbersome, affected by integration delays, and not semantically correct.
In the interest of simplicity, Maybenot v2 no longer supports features that were found to be of little use.As it is unclear whether it is even possible to create effective, efficient defenses in settings with variable packet sizes, support for packet sizes and knowledge of the MTU of the underlying tunnel have been removed in favor of packet counts.As a result, the include _ small _ packets flags for machines have been removed.Similarly, states no longer contain a limit _ includes _ nonpadding flag: limits never include normal packets, as this is a niche feature.Defenders interested in such limits may now opt to use counters instead.
Recognizing the diversity of protocols that may integrate the framework and the commonalities among their mechanics, we have reimagined how packet-related events are reported to the framework.We begin with a slight change in terminology: "non-padding" packets (carrying application data) are now referred to as "normal" throughout the framework, including in event names.Importantly, it may be useful for certain defenses to differentiate between sending and receiving normal/padding data.In contrast, others may need to know precisely when a packet has been delivered to or received from a lower layer in the network stack.Thus, the {Normal,Padding}+{Sent,Recv} events now refer to queueing data to be sent or processing it from a packet, while the new TunnelSent and TunnelRecv events will be triggered when data has actually been delivered or received.
We have also endeavored to improve Maybenot's interface.For instance, an explicit Cancel has been added, which allows a machine to cancel a pending action timer, the internal timer, or both; in tandem, the redundant pseudo-state STATE _ CANCEL has been removed, simplifying state transitions.Much of the codebase has also been refactored to improve the interface for creating and modifying machines.Events, actions, and counter updates are now represented by separate data structures linked to states, among other changes.
Finally, some miscellaneous updates include improvements to serialization (Section 4.3), the addition of the SkewNormal distribution, and an optional parsing feature to retain partial support for v1 machines.The described changes make Maybenot more expressive, efficient, and user-friendly.Yet, much remains to learn: we plan to inform future development based on experiences with v2 integrations.

Related Work
Maybenot is based on the Tor Circuit Padding Framework [51,52] by Perry and Kadianakis.The Tor Circuit Padding Framework is, in turn, a generalization of WTF-PAD [37] by Juárez et al., a website fingerprinting defense based on the concept of Adaptive Padding by Shmatikov and Wang [67].

Tor Circuit Padding Framework
As the name suggests, the Tor Circuit Padding Framework is a framework for implementing padding in Tor circuits.It is closely integrated into Tor's C codebase.Clients can negotiate fixed hardcoded "padding machines" with relays in the network.At the time of writing, while the framework is deployed as part of Tor, only two small padding machines are active in the network [52].Their goal is to hide the setup of client-side onion service circuits, making them produce the same sequence of cells as non-onion circuits.
Comparing Maybenot to the Tor Circuit Padding Framework, the main differences are that Maybenot is designed to be integrated into a wide range of protocols, is written in Rust as a library, and includes a richer set of features.The Tor Circuit Padding Framework is closely tied to Tor and supports negotiation of padding machines, which Maybenot considers out of scope.Maybenot machines support probabilistic transitions, padding, blocking, and associated bypass/replace flags, counters, timers, and signaling.Maybenot also has more distributions but removes support for histograms.Finally, Maybenot does not support RTT-based estimates as offsets for timers.RTT-based estimates and histograms are excluded due to a lack of identified use cases, simplifying Maybenot.

QCSD
QCSD [70] by Smith et al. is a framework for traffic analysis defenses tailored to QUIC [33].The main focus of QCSD is to be a client-side framework, generating padding and delaying traffic at the server by leveraging protocol features of QUIC and HTTP/3 [7].This is significant for fostering adoption of defenses, as it removes the need to modify the server or any party other than the client.However, this strength is also QCSD's main weakness: its dependency on QUIC and HTTP/3 at endpoints lacks support for defending only between the client and intermediate relays/proxies.Also, websites are typically made up of resources from multiple domains/endpoints, adding further complexities.
The analog to Maybenot machines in QCSD is a regularization algorithm running at the client, which shapes the connection according to a provided target trace (static or dynamically generated).In principle, the regularization algorithm should be possible to implement as Maybenot machines are generated based on the target trace.Shaping traffic from the server to the client involves control messages, which adds some over-head and makes it challenging to ensure exact shaping at the server.Smith et al. highlight that extensions to QUIC may assist clients in precisely shaping server traffic.

WFDefProxy
WFDefProxy [21] by Gong et al. is a platform for implementing website fingerprinting defenses and empirically evaluating them in real networks.WFDefProxy is based on obfs4 [2], a Pluggable Transport (PT) [53] for Tor.Being based on obfs4 as a PT, WFDefProxy is implemented as a bridge.Clients directly connect to a bridge before traffic is forwarded into the Tor network in the typical way.This limits defenses in WFDefProxy to protecting against network adversaries between client and bridge and not against adversaries in the Tor network or in control of the bridge (i.e., the bridge is trusted).Defenses are implemented at a high level in the Go programming language, providing richer features than Maybenot machines, padding machines in the Tor Circuit Padding Framework, and the regularization algorithm in QCSD.

Conclusion
We presented version 2 of Maybenot, a framework for traffic analysis defenses heavily inspired by the Tor Circuit Padding Framework [51,52].Defenses are implemented as probabilistic finite-state machines, and the framework provides a common interface for integrating them into protocols such as Tor [16], Wireguard [17], and QUIC [33].Maybenot is implemented as a Rust library (crate) with the goal of being easy to integrate into new and existing protocols.To assist in the development of defenses, we provide a simulator that can be used to simulate how provided network traces may change if given machines were running at the client and/or server.
Our goal with Maybenot is to contribute towards the widespread real-world use of traffic analysis defenses.We hope that Maybenot will be useful for researchers, protocol developers, and defenders alike.With the monumental progress being made in AI and machine learning, we believe that traffic analysis defenses will become increasingly important.Because we are in the middle of this AI revolution, a framework is likely worthwhile in the short to medium term until the dust settles.It took us decades to get to where we are today with making encrypted end-to-end communication the norm.We will probably need a similar amount of time to get to where we want to be with traffic analysis defenses.

Figure 1 :
Figure 1: Creating an instance of the Maybenot framework.

Figure 2 :
Figure 2: Events to trigger in the Maybenot framework.

Figure 4 :
Figure 4: Actions returned by the Maybenot framework.

Figure 5 :
Figure 5: Types of timers in the Maybenot framework.

Figure 9 :
Figure 9: States that make up a machine.

Figure 10 :
Figure 10: Actions associated with a state.

Figure 11 :
Figure 11: Counter updates associated with a state.

Figure 13 :
Figure 13: Example of creating a new State with transitions.

Figure 17 :
Figure 17: Printing packet-related events at the client.