Historia: Refuting Callback Reachability with Message-History Logics

This paper considers the callback reachability problem --- determining if a callback can be called by an event-driven framework in an unexpected state. Event-driven programming frameworks are pervasive for creating user-interactive applications (apps) on just about every modern platform. Control flow between callbacks is determined by the framework and largely opaque to the programmer. This opacity of the callback control flow not only causes difficulty for the programmer but is also difficult for those developing static analysis. Previous static analysis techniques address this opacity either by assuming an arbitrary framework implementation or attempting to eagerly specify all possible callback control flow, but this is either too coarse to prove properties requiring callback-ordering constraints or too burdensome and tricky to get right. Instead, we present a middle way where the callback control flow can be gradually refined in a targeted manner to prove assertions of interest. The key insight to get this middle way is by reasoning about the history of method invocations at the boundary between app and framework code --- enabling a decoupling of the specification of callback control flow from the analysis of app code. We call the sequence of such boundary-method invocations message histories and develop message-history logics to do this reasoning. In particular, we define the notion of an application-only transition system with boundary transitions, a message-history program logic for programs with such transitions, and a temporal specification logic for capturing callback control flow in a targeted and compositional manner. Then to utilize the logics in a goal-directed verifier, we define a way to combine after-the-fact an assertion about message histories with a specification of callback control flow. We implemented a prototype message history-based verifier called Historia and provide evidence that our approach is uniquely capable of distinguishing between buggy and fixed versions on challenging examples drawn from real-world issues and that our targeted specification approach enables proving the absence of multi-callback bug patterns in real-world open-source Android apps.


INTRODUCTION
The standard approach for creating user-interactive applications (apps) is with event-driven frameworks.In this programming model, a developer defines callback methods that the framework invokes at run time in response to asynchronous events (e.g., starting the application, clicking a button, or a background task finishing).Since the callbacks modify the state of the application, an unexpected order of callback invocations may lead to a bad application state and a subsequent crash.To fix such a crash, it is common for the developer to update the application to change or accommodate complex callback interactions.To help the developer verify fixes to the crashing app, we present in this paper a technique to reason about callback order and to develop a tool that can automatically prove such fixes correct.
As a specific example of an event-driven framework, we consider Android, a widely-used and complex mobile operating system.Figure 1 shows a stack trace for a reported crash in AntennaPod, a popular open-source podcast player.But this stack trace is not helpful to the developer because the cause and effect span multiple callback invocations.This stack trace shows that there was a null dereference in the call callback -but not how the reference became null.In particular, the reference must have been set to null in some previous callback invocation before this call callback invocation that is not visible in this stack trace.To make such reasoning even more difficult, the app itself may affect the order of callbacks through the invocation of methods defined by the framework API; we refer to calls from the app to framework API methods as callins by analogy to callbacks.In summary, the developer of the app needs to reason about the order in which the framework could invoke callbacks and how the app invokes callins to understand and fix this crash.
Understanding the order of callbacks in this event-driven programming model is not only challenging for the developer but is also a central challenge for a program verifier attempting to prove an app safe from crashes.The program verifier has access to the app code, but the framework code is unavailable for all intents and purposes.While this is true for most event-driven frameworks, it is particularly difficult in Android, which consists of thousands of API classes [Android Developers 2022b], evolves quickly, includes lots of native code, and varies by device with manufacturer customizations.One approach to the unavailable-framework problem is to analyze the app code assuming an arbitrary framework implementation.This design corresponds to analyzing each toplevel callback (i.e., entry point into the app code) as a separate program with an application-only call graph [Ali and Lhoták 2012].The advantage of this approach is that it is simple and general (i.e., it can over-approximate any framework implementation by assuming all callbacks can be invoked at any time in the event loop) and thus is the approach generally taken by industrialscale analyzers for Android apps (e.g., [Distefano et al. 2019;Fuchs et al. 2009;Liang et al. 2013;Mariana Trench 2022]).
However, this most-over-approximate framework model is also wildly unrealistic.Without callbackordering constraints, a verifier cannot possibly prove correct the accepted fix for Figure 1.Thus, many static analyzers for Android attempt to eagerly encode the callback control flow of core classes of the framework (e.g., the Activity Lifecycle [Android Developers 2022a] modeled by [Arzt et al. 2014;Blackshear et al. 2015b;Yang et al. 2015]).This approach also has some significant limitations.For one, it is not feasible to eagerly specify the callback control flow of thousands of framework classes individually -let alone callback control flow involving relationships between multiple Fig. 1.A reported stack trace from a confirmed bug [Fietz 2018b] that crashed the AntennaPod Android app.We have elided multiple lines identifying Android framework methods (using . . .for 14 elided lines in total).
classes.On top of that, even specifying the behavior for a few framework components results in both soundness and precision issues [Cao et al. 2015;Meier et al. 2019;Wang et al. 2016].
The main observation of this paper is that although the most-over-approximate framework model is unrealistic, the application-only approach for analyzing event-driven apps is not completely hopeless either.In particular, if we decouple the framework modeling from the analysis of the app code, then the approach offers the appealing capability to gradually refine the possible callback control flow as needed and in a targeted manner.To get a sense, imagine a call graph with a "framework" node representing the event loop and outgoing edges to each callback entry (as well as edges from callback nodes to the callin nodes they invoke).We can now consider traces through this graph instantiating callback or callin nodes with object instances; we refer to such a callback or a callin instantiation generically as a message.Thus, we are interested in reasoning about message histories -sequences of messages obtained by call-return traces through this graph.
For any given framework implementation, not all message histories are realizable at run time.Thus, our key insight is to encode possible framework implementations by abstracting possible message histories.Crucially, this encoding and reasoning about message histories enables decoupling the specification of callback control flow from the abstract interpretation to compute an inductive program invariant.Specifically, in this paper, we make the following contributions: • We define the notion of an application-only transition system that records messages and a message-history program logic (MHPL) to describe and reason about boundary transitions between the app and the framework independently of any specification of the framework (Section 3).In MHPL, we consider a backwards-from-error formulation that enables goaldirected reasoning from a state assertion in the app, and we observe that deriving infeasible initial message histories refutes callback reachability.To capture consumption and ordering in MHPL, message-history assertions are derived from a fragment of ordered linear logic [Polakow and Pfenning 1999a,b] -making use of an ordered linear implication.• We formalize a callback control-flow temporal logic (CBCFTL) to specify realizable message histories -that is fully decoupled from any particular program logic (Section 4).This specification logic enables us to restrict possible callback control flow in a manner that is targeted and compositional.To capture possible traces, CBCFTL is a specialization of past-time linear temporal logic [Lichtenstein et al. 1985].• We design an automated reasoning approach for the combination of MHPL assertions and CBCFTL specifications (Section 5).To utilize MHPL and CBCFTL together in a static verifier, we define an algorithm instantiating CBCFTL specifications with MHPL assertions into a single formula describing realizable message histories.We then use this encoding to answer queries about message-history entailment or whether a message history excludes the initial state with an off-the-shelf SMT solver.• We empirically evaluate H , a prototype goal-directed verifier with MHPL and show that it can refute callback reachability assertions with succinct specifications of callback control flow in CBCFTL (Section 6).In particular, we applied it to distinguish between the buggy and fixed versions from 5 real-world multi-callback issues from in-the-wild crashes of Android apps.Furthermore, we codified these 5 issues into bug patterns and evaluated the ability to use H to prove the absence of these bug patterns on 47 open-source apps containing over 2 million lines of code: 43% of the potentially buggy locations could be proven safe using H with no additional modeling of callback control flow and then on a sample of the remaining locations, half of them could be proven safe (or witnessed as buggy) with a small amount of additional modeling.

OVERVIEW
In this section, we illustrate how a developer could go deeper to diagnose and fix the bug causing the crash shown in Figure 1 (Section 2.1) and then demonstrate how our approach is able to prove the fixed version correct (Section 2.2).

Using Message Histories to Distinguish Bugs from Fixes
We show a simplified version of an actual pull request submitted to fix the issue in Figure 2a.What distinguishes the buggy and fixed versions -without and with line 8, respectively -are the possible message histories (i.e., the possible sequences of callbacks and callins).In Figure 2b, we show a message history witnessing a crash in the buggy version, while in Figure 2c, we show the analogous message history in the fixed version.The key to distinguishing the buggy and fixed versions is determining if such message histories are realizable at run time.
Figure 2a shows part of the PlayerFragment class of the AntennaPod app that displays a user interface (UI) and loads some media for playing podcasts.Importantly, the app loads the media in a background task, a thread running asynchronously to the UI thread, to not block the user interface.The framework invokes the onCreate callback of a PlayerFragment object when initializing the user interface.At location 2, this callback invokes the Single.createcallin (from the RxJava library) to start the background task that loads the media.The subsequent call to task.subscribe(this) at location 3 registers the call callback.At a later time, the framework will invoke the call callback.The call callback uses the Glide class to display the media on the this.actobject.At any time, the framework can destroy the PlayerFragment (e.g., when the user navigates to another part of the app) and invoke the onDestroy callback, which at location 5, sets the field this.act to null to prevent memory leaks.
As noted above, Figure 2b shows a crashing message history, that is, a sequence callback entries (cb), callin calls (ci), and callback returns (cbret).The arguments and return values of each message (e.g., f, t, s, and a) represent run-time addresses of objects.The app crashes if the call callback is invoked after the PlayerFragment is paused.After the f.onCreate() callback invocation (i.e., messages 1-4), the framework disposes the user interface and invokes the f.onDestroy() callback, setting the this.actfield to null (i.e., app transitions between messages 5-6).Then, the background process completes triggering the call callback (i.e., message 7).However, the this.actfield is now null causing the call to Glide.with(this.act) to crash (represented by the assert on line 5 of Figure 2a).
The fixed version adds a call to this.sub.unsubscribe()at line 8, which unsubscribes the call callback preventing its invocation after onDestroy.In Figure 2c, we show a message history that, while similar looking to the crashing message history of Figure 2b, is not realizable with respect to the framework implementation.There is no execution that can generate the message history of Figure 2c because the callback at message 8 is removed by the added message 6 in the fixed version.Such a minimal difference between the "realizable crashing" message history and the "unrealizable safe" message history highlights the automated reasoning challenge in distinguishing between the buggy and fixed versions of the app.(a) A patch adding line 8 fixing the crash [Fietz 2018a].On the le side, we illustrate an application-only transition system with a location fwk representing the framework location inside the event loop with call and return edges ⇀ for callbacks.Callins are captured by self-loops as they affect framework state but are not on the event loop.We further indicate that the framework location fwk is annotated with an inductive invariant to prove the assertion safe (expanded later in Figure 3). 1 cb f.onCreate(); 2 ci t = Single.create(. ..); 3 ci s = t.subscribe(f); 4 cbret f.onCreate(); 5 cb f.onDestroy(); 6 cbret f.onDestroy(); 7 cb f.call(m); exn a == null; (b) A realizable message history witnessing an exception (exn) in the buggy version.Fig. 2. The crash from Figure 1 arises in the buggy version of the code (without line 8) when a null value is passed to Glide.with from the this.actfield at line 6 in the call callback.To prevent the crash, the developer creates the fixed version by adding unsubscribe at line 8. Message histories shown by sub-figures (b) and (c) show how a crash might be reached in both the buggy and fixed apps (without any reasoning about their realizability).What we prove is that the message history for the fixed version shown in (c) is unrealizable.In the message histories, single-le er identifiers represent run-time instances (e.g., f is an instance of PlayerFragment) and boxes are drawn around callback invocations.

H
: Refuting Callback Reachability with Message-History Logics Here, we provide an overview of our static analysis abstraction and our framework specification logic that enables refining paths through the application-only transition system by reasoning about the realizability of message histories.In Figure 2a, we illustrate an application-only transition system, which consists of app transitions from the app code augmented with boundary transitions to a single, distinguished framework location fwk.Boundary transitions represent places where the framework makes non-deterministic choices of callback invocations as well as return values of the callin invocations (formalized in Section 3.1).The application-only transition system has the benefit of being a sound framework model by default (i.e., without further specification, is the most-over-approximate framework model) but clearly admits many unrealizable paths.Our key insight is to internalize the concept of realizable message histories into the static analysis abstraction.This internalization of realizable message histories into the abstract domain enables decoupling the specification of possible callback control flow from the abstract interpretation to compute an inductive program invariant.Such an approach is in contrast with the ones that eagerly augment the interprocedural control flow graph with framework-specific control flow.At a high level, our 3. An inductive invariant for the fixed app consisting of abstract message histories and abstract app states at each location in the application-only transition system (Figure 2a) that may reach the assertion failure.For brevity, this figure excludes less interesting transitions such as the case where the failing call invocation is preceded by another invocation of call.However, all transitions are considered by the verifier.
The abstract message histories only show messages from the user provided specification for clarity.Note, there is only one framework location and that we use the subscripts (e.g., fwk 1 , fwk 2 ) to indicate disjunctive elements of the abstract state at the framework location.With the specification of realizable message histories (Section 2.2.2) combined with the abstract message histories (Section 2.2.3), we can show that this invariant has reached a fixed point and that it excludes the initial state proving the assertion safe.
approach shares some conceptual similarity with context-free language (CFL) reachability-based analysis [Reps 1998] in reasoning about realizability in the analysis but for imposing callback control flow instead of call-return semantics.
To describe our static analysis, we define Message-History Program Logic (MHPL), a program logic with an ordered linear implication for capturing assumptions about future messages.Then, to enable specifying callback-control flow, we introduce Callback-Control Flow Temporal Logic (CBCFTL), a past-time temporal logic for specifying constraints on the past message history given the present message.Finally, to automate reasoning about the realizability of message histories, we define an algorithm for instantiating CBCFTL specifications with MHPL assertions -combining the inferred assumptions from a backwards-from-error static analysis and specified callbackcontrol-flow constraints.In the rest of this section, we demonstrate our technique by walking through the verification of the bug-fix from Figure 2a.

Message-History Program Logic (MHPL).
Program analyses often compute invariants for each location in the program.Our approach relies on computing such an invariant that abstracts the message histories and application states that may reach the assertion failure using a novel message-history program logic (MHPL).If the invariant excludes the initial state of the program (e.g., the empty message history), then there is no way the assertion failure can be reached from the initial state, letting H prove that the assertion failure is impossible.Here, we demonstrate such a proof on our running example.
While analyzing the application, H maintains an invariant map, mapping each program location to its current invariant (represented as Σ in Section 3.2).Individual locations in the application are labeled with line numbers, and the framework location representing the event loop is labeled with fwk.We show a representation of this invariant map for our example in Figure 3.In a goal-directed, backwards-from-error formulation, the invariant map is initialized with the state { 5 : okhist •f.act↦ null * this↦ f } positioned right before the assertion failure.There are two parts of the abstract state (which we separate with a centered dot •).On the right, there is an abstraction of the heap or store of the app for the assertion failure; in particular, it says an object f has a field act that points to null and this points to f.We use intuitionistic separation logic [Ishtiaq and O'Hearn 2001;Reynolds 2002] to describe the relevant heap or store, but our approach is parameterized by essentially whatever logic one wishes to use to reason about the app state.The left component is more interesting -it is our abstraction of the possible message histories to reach this location ( defined in Section 3.2).In particular, any execution that witnesses this assertion failure must have done so with a realizable message history, written okhist.Intuitively, okhist denotes the set of all realizable message histories as dictated by the framework.Note that okhist is not "top" or "true", which would concretize to all message histories -realizable or unrealizable.
Next, we compute the abstract message history at the location just before the call entry.This location is the fwk location (i.e., the framework event loop).We write the message from this transition as cb f.call(m) where f and m are symbolic variables corresponding to the values bound to the formal parameters this and media of call, respectively.To capture the effect of invoking call, the abstract message history is updated to cb f.call(m) ։ okhist at the abstract message history labeled by fwk 1 in Figure 3 (this abstract state also removes the this variable due to popping the stack).Intuitively, cb f.call(m) ։ okhist denotes any message history to which f.call(m) can be appended to obtain a realizable message history.Operationally, cb f.call(m) ։ okhist can be thought of as the set obtained by starting with the set of realizable message histories okhist, removing those that do not end with f.call(m), and truncating the remaining ones to remove f.call(m) from the end.
Having computed an abstract state at a framework location (fwk 1 in Fig. 3), we can now check if the abstract state excludes the initial state; if it does not, then we have not yet found a proof that the assertion failure is unreachable.The message history abstraction of the initial state is simply a singleton set containing an empty message history.If the framework could invoke f.call(m) as the first callback, then f.call(m) alone is a realizable message history (i.e., f.call(m) ∈ okhist).Consequently, cb f.call(m) ։ okhist is a set that contains an empty message history, hence it includes the initial state, prompting an alarm.
However, in reality the framework cannot invoke f.call(m) as the first callback (i.e., f.call(m) is not a realizable message history).Crucially, since the set of realizable message history okhist of the framework is not available (i.e., it is defined in the framework implementation), H can use separately-provided specifications of realizable message histories.The following informal specification requires f.call(m) to be preceded by the invocation of cb s=t.subscribe(f) for the message history to be realizable: Informal Spec 1: "The framework may only invoke call if subscribe has been invoked in the past, and unsubscribe has not been invoked since subscribe." With this targeted specification, cb f.call(m) ։ okhist no longer contains the empty message history and excludes the initial state.H therefore (correctly) does not raise an alarm at fwk 1 .
This transformation of the abstract state from location 5 to fwk 1 is done by an abstract pretransformer, which is applied repeatedly to all predecessor transitions of an updated state until reaching a fixed point.Applying the pre-transformer on the onDestroy callback results in the abstract state {fwk 2 : ci s.unsubscribe() ։ cb f.call(m) ։ okhist • ...} denoting the message histories that end in ci s.unsubscribe() followed by cb f.call(m).Each new abstract state is added to the invariant at a location as a disjunctive clause (i.e., the fwk location has an invariant of the form {fwk 1 ∨ fwk 2 ∨ ...}).Continuing so on the buggy version from Figure 2a (and using the necessary specifications) would yield a state whose abstract message history is cb f.onCreate() ։ ci s=t.subscribe(l) ։ cb l.call(m) ։ okhist.Note that this state includes the initial state raising an alarm, as it satisfies the constraint imposed by Informal Spec 1.
As messages are parametrized by symbolic variables, we consider an unbounded number of possible message instances at each predecessor step.Even if one restricts to one possible message instance for each callback method, considering all possible predecessor callbacks makes the proof search exponential.Fortunately, we are able to join abstract states by merging disjunctions, and merging is required to find a fixed point in most cases.Given a disjunction of abstract states, one disjunct may be merged with another if the first disjunction implies, or entails, the one it is being merged with.By merging new abstract states from backward transitions when possible, we can reach a fixed point on some paths being explored.In Figure 3, for example, fwk 4 is merged with fwk 3 .No further backward transitions need to be explored from fwk 4 because they have been explored from fwk 3 .
For presentation, Fig. 3 shows only a few of the transitions and abstract states computed for the running example.In practice, H computes the fixed point at the framework location after considering all backward transitions and shows that the resultant inductive invariant excludes the initial state (for the running example with the fix).In other words, it proves that no realizable message history reaches the assertion violation in the fixed version of AntennaPod.Control-Flow Temporal Logic (CBCFTL).CBCFTL is a novel language for formally writing specifications of realizable message histories.A specification written in CBCFTL consists of a conjunction of history implications.With no history implications, the CBCFTL specification places no restrictions on the realizable message history.Each additional history implication targets one message that the framework can control, such as the invocation of the callback call or onCreate.A history implication says that whenever a message satisfying the target abstract message occurs, then the preceding message history must satisfy the temporal formula . 1 Temporal formulas are drawn from a syntactically restricted fragment of past-time linear temporal logic over finite traces.Such a structure for CBCFTL is natural for three reasons: (1) it allows the developer of the framework model to target callbacks or callins with history implications as needed, (2) history implications are compositional, and (3) specifications may be cleanly combined with abstract message histories to automatically check excludes-initial and entailment (as described in Section 2.2.3).

Callback
For Informal Spec 1, the history implication captures what must be true of the message history when cb l.call(m) is next.First, there must exist a subscription object s that was returned from invoking subscribe.This object s must be returned from the same invocation of subscribe that l was passed to registering the call callback.The subscription object s is the only parameter to the method unsubscribe which must not have come since the invocation of subscribe.Invoking unsubscribe on other objects will have no effect on the target call.All of this is captured by History Implication 1.
History Implication 1.For all objects l and m, if the framework invokes l.call(m), then for some (subscription) object s, the message ci s.unsubscribe() has Not happened Since ci s = _.subscribe(l).cb l.call(m) ∃s.ci s.unsubscribe() NS ci s = _.subscribe(l) 1 The history implication can be seen as the first-order, past-time linear temporal logic formula ( → Y ) that combines always (or globally) , implication →, and yesterday (or previous) Y from past-time linear temporal logic (ptLTL).Note that since, S, is the past-time dual of until, U, in LTL.The NS operator has a built-in "not" and restricts nesting to maintain decidability (discussed in Section 4).Underscore _ in the above specification is simply a shorthand for a locally existentially-quantified variable (i.e., "don't care").
A key feature of CBCFTL is that it handles quantified values such as the listener object l and the subscription object s.Since the history implication applies any time a call is invoked, the listener object l is universally quantified.Reasoning about such quantifier alternation is often undecidable.However, the restrictions we have chosen for CBCFTL allow them to be combined with abstract message histories such that automated reasoning is feasible.

Combining Abstract Message
Histories with Callback Control-Flow.Next, we consider how to interpret and automatically reason about the meaning of abstract message histories.Consider the abstract state just before the call callback with the abstract message history transition, cb f.call(m) ։ okhist, which says that the next message must be cb f.call(m).First, excludes-initial needs to be proven (i.e., this abstract state does not contain the initial state), and then, entailment checks if it should be merged with any equivalent or weaker abstract states.Both of these steps rely on a first-order logic encoding of abstract message histories that we explain here.We prove the resulting first-order logic encoding to be decidable for the excludes-initial judgment and computable in practice for the entailment judgment.
This encoding starts by combining the abstract state cb f.call(m) ։ okhist with History Implication 1.The first step of combining abstract message histories with temporal formula is instantiation (i.e., the judgment in Section 5).Intuitively, instantiation turns the "next" message from the abstract state into requirements on the message history so far.For convenience, the output of instantiation is represented by the same language of temporal formula as is used in the history implications.Temporal formula (1) shown below results from instantiating History Implication 1 on the abstract message history cb f.call(m) ։ okhist.Note how fresh variables are introduced for the existentially quantified values, but the values from the call are retained.∃s , t .ci s .unsubscribe()NS ci s = t .subscribe(f) (1) Such a temporal formula may be converted to first-order logic and checked for excludes-initial and entailment via SMT solver as explained in Section 2.2.2.This temporal formula excludes the initial state because there must exist a subscribe call in the message history (i.e., the judgment ⊢ excludesinit from section Section 5).Here, we also see why targeted specification is desirable for performance reasons.Each history implication can add constraints that prevent states from being merged via entailment.The result of combining two sound history implications is always sound.However, such combinations may impact performance by increasing the abstract state disjunctions at a location.
The next abstract message history shown by the invariant map at fwk 2 adds the unsubscribe call, ci s.unsubscribe() ։ cb f.call(m) ։ okhist.The previously instantiated formula needs to be updated for the ci s.unsubscribe() message.That is, we must consider two cases: (1) this unsubscribe matches the unsubscribe from the previous step (deriving a contradiction), and (2) this unsubscribe is irrelevant to the previous step.Combining these cases into the temporal formula is referred to as quotienting (i.e., the judgment from Section 5).Additionally, if there was a history implication targeting unsubscribe, it would also need to be instantiated (since there is not, the rule applies instead).Combining these steps results in temporal formula (2).
We attempt to eagerly merge abstract states using entailment.While as noted above fwk 4 merges with fwk 3 , the abstract states at fwk 1 and fwk 2 cannot be merged since they restrict the app heap differently.However for presentation, we illustrate entailment on the abstract message histories from fwk 1 and fwk 2 , ignoring the app heap.To determine entailment, we algorithmically search for a message history represented by temporal formula (2) and not by temporal formula (1).If no such message history exists, then this disjunction has not progressed toward the initial state and can be dropped.This entailment holds: the added constraint (ci s .unsubscribe()≠ ci s.unsubscribe()) simplifies to s ≠ s and does not add any message histories to the abstraction over those represented by temporal formula (1).Note with the abstract app heap, a concrete app heap where f.act points to a non-null value is represented by fwk 2 but not fwk 1 , so these states may not be merged overall.

MESSAGE-HISTORY PROGRAM LOGIC (MHPL)
In this section, we explain the process of proving an application safe by showing no realizable message history can reach the assertion failure.First, we define the notion of an application-only transition system that records a message history during execution (Section 3.1).Executions in this transition system, such as reaching the assertion failure, may be restricted based on whether they are realizable (e.g., with a user provided specification).The application-only transition system provides a concrete semantics for a message-history program logic (MHPL) to reason about realizable message histories by adding the message history to the concrete state as ghost state.Using MHPL, we can abstract message histories backwards from an assertion proving that no failing message history is realizable (Section 3.2).

An Application-Only Transition System with Message Histories
Figure 4 defines the syntax and semantics of a program that uses boundary transitions to provide semantics to an app absent of the hidden framework implementation.Conceptually, all framework code is merged within a single framework location fwk (as illustrated in Figure 2a from Section 2).Our semantics are non-deterministic when the framework chooses the arguments for a callback invocation or the return value for a callin.Execution simply "gets stuck" on an unrealizable boundary transition from the framework.
Boundary transitions b append a message to the message history capturing all interaction between the app and framework.A boundary transition is a crossing of the app-framework boundary via a callback invocation fwk −[cb md ( )] ℓ from the framework back to the app, a callback return ℓ −[cbret ′ md ( )] fwk from the app into the framework, or a callin invocation ℓ −[ci ′ md ( )] ℓ ′ from the app into the framework and back.App transitions t represent app code, for example, consisting of standard operations like reading and writing to the application heap.A message history ::= | ; is a sequence of messages with being the empty sequence.The application-only transition system is parametrized by a set of realizable message histories Ω representing actions possible under the real framework.
Method names md are a fully qualified and disambiguated name for method procedures.We assume we can identify a method as being an app (i.e., a callback) method or a framework (i.e., a callin) method based on the method identifier md (e.g., app methods that override a framework type are callbacks in the case of Android).The key part of the program state is recording a message history where messages are instances of boundary transitions; that is, a callback invocation with bound values cb md ( ), a callback return cbret ′ md ( ), or a callin invocation ci ′ md ( ).Values, , may be compared with equality, created by the app, or created by the framework.
Callbacks and callins use a sequence of parameters as program variables and return a value; we write a sequence with an overline (e.g., for a sequence of variables).For simplicity, we assume that variable scoping and shadowing is handled by translation to this language (e.g., via alpharenaming).A callback return cbret ′ md ( ) says that it returns the value in variable ′ -for the corresponding callback cb md ( ); for simplicity, we assume an A-normal form where program expressions are evaluated in internal app transitions and bound to variable ′ here.For a callin invocation ci ′ md ( ), variable ′ is the variable to bind the return value of the invocation.
We see transitions as control-flow edges between two program locations loc.A program location can be the framework location fwk or an app location ℓ.The framework location fwk represents all control locations inside the framework.A program p is then a set of boundary b or app transitions t for some unspecified syntax of app transitions.Conceptually, a program p is the control-flow graph for each app callback augmented with boundary control-flow edges into and back from the framework location fwk.
A program state is a memory , at program location loc, which consists of a message history with a boundary stack and an app store .A boundary stack is a stack ensuring that the message history consists of matching calls and returns.If we assume that callbacks may not be nested inside of callbacks, this stack may have at most one activation k.Like app transitions, the specific form of app stores is unspecified, except we assume it supports looking up the value for an app variable ( ) and initializing variables [ ↦ → ].An app state is then a pair of an app location ℓ and an app store .
Boundary transitions b are particularly interesting as they capture the non-deterministic or unobserved behavior of the framework and record the action in the "ghost state" for the message history.The boundary transition judgment form , b ⇓ Ω ′ says, "In program state , executing the boundary transition b results in an updated program state ′ and is realizable under realizable message histories Ω. " This judgment form captures the realizable executions of boundary transitions.To execute a callback invocation transition fwk −[cb md ( )] ℓ via rule , the program state is at the framework location fwk, values are chosen for the callback parameters non-deterministically (conceptually by the framework) and initialized in the app store [ ↦ → ], and the callback activation cb md ( ) is pushed on the boundary stack .Then to record this boundary transition execution, this callback message cb md ( ) is appended onto the current message history .We want to capture that depending on the current message history , this callback invocation transition may not be realizable.This realizability of message histories is captured by checking if the new message history is realizable with ; cb md ( ) ∈ Ω.
Executing a callback return transition ℓ −[cbret ′ md ( )] fwk via rule is then the expected symmetric operation.The return value is read out of the app store ( ′ ), the callback activation cb md ( ) is popped off the boundary stack, and control goes into the framework location fwk.The premise = ( ) enforces that argument variables are not modified by the callback which simplifies the formalism.The callback return message cbret ( ′ ) md ( ) is similarly appended onto the current message history to record the execution of the callback return transition -and checked for realizability.Note that to connect the callback invocation with its return, the callback return message includes the method name md and actual arguments from the callback activation.
The callin invocation transition ℓ −[ci ′ md ( )] ℓ ′ via rule is symmetric to callback invocation and return together, that is, the arguments for the callin are read from the app store, and the callin return value from the framework ′ is chosen non-deterministically (conceptually by the framework) and then bound to variable ′ .Then, the callin message ci ′ md ( ) is appended onto the current message history and checked for realizability.It should be noted that , , and together capture that the framework cannot modify the app store except through invoking callback methods.This formalizes one aspect of the so-called separate compilation assumption of Ali and Lhoták [2012], which considers the consequences of the assumption that framework is developed separately and compiled in the   absence of the app.As a consequence of these semantics checking realizability at each boundary transition, every prefix of a realizable message history must also be realizable.
The application-only transition system is then given by the transition relation judgment form And finally, concrete executions are given by the reflexive-transitive closure of this single-step transition relation from an initial program state init .We write → * Ω ′ for the reflexive-transitive closure of → Ω p ′ .

Refuting Callback Reachability with Message-History Program Logic
The ultimate aim of MHPL is to prove statically that a program assertion cannot fail.We start with the error condition, , an abstract state just before the assertion such that the assertion may fail (e.g., f.act ↦ null from the running example in Section 2 representing app memories where there exists a framework object with a null act field).We refute the reachability of the error condition with the judgment form ⊢ p unreach.This judgment form is read as, "No concrete program state satisfying the abstract state is reachable in program p with realizable message history specification ." We use CBCFTL, defined in Section 4, to abstract the set of reachable message histories Ω in the concrete semantics.We use the judgment | = to say that a message history is captured by a specification and note the set of message histories in a specification as Ω (i.e., the concretization of a specification is the set of message histories Ω ).Since the specification is an input to our algorithm, we assume that is a sound abstraction of realizable message histories (i.e., Ω ⊆ Ω for the set of realizable message histories Ω in the concrete semantics).
Our proof technique works in a goal-directed manner: we over-approximate the set of the states that may reach the given error condition with a program-state invariant, Σ.If the initial program state init : fwk : • • • init is excluded from the program state-invariant, Σ, then the location of cannot be reached with any concrete state satisfying abstract state error condition .For some abstract state , the excludes-initial judgment, ⊢ excludesinit, holds only if the concretization of must not contain the initial state init .While an abstract state may be excludes-initial in any of its components (e.g., the abstract app store), the particularly interesting component here is its abstract message history .Thus, in subsequent sections, we focus in on the excludes-initial judgment on abstract message histories.Excludes-initial for message histories, ⊢ excludesinit, holds if is not in the concretization of .

An Abstract
Semantics with Message Histories.In Figure 5, we define MHPL, which abstracts the application-only transition system from Section 3.1, to derive refutations with respect to message histories.To abstract messages , we replace concrete values with symbolic variables ˆ .Symbolic variables are existentially quantified across each part of the abstract program state with an assignment, .That is, a concrete state satisfies the concretization relation of an abstract state, (i.e., | = ) if a exists such that each part of satisfies the concretization relation with each part of (e.g., • | = ).
An abstract message history captures the set of message histories reaching a given program location loc under the realizable message history specification .An abstract message history can be okhist, which corresponds to all realizable message histories under .Note that since we only care about realizable message histories, we do not include a ⊤ abstract message history corresponding to all message histories.Since our logic explores backwards, it adds constraints on future boundary transitions to the abstract message history as they are encountered.The key is to see this constraint as an ordered linear implication on the right ։ , which informally says, "For all messages satisfying , appending that message to the current message history implies that the new message history satisfies ." In the middle part of Figure 5, we give a precise concretization relation between a message history with an assignment • and an abstract message history: • | = .
The rest of an abstract program state is straightforward.We do not care specifically about the form of abstract app stores, except that like concrete app stores, we need a way to look up a • |= ⊢ p unreach Fig. 5. Refuting callback reachability with a MHPL.We abstract the application-only transition system with Hoare triples over app and boundary transitions and an abstract program state invariant Σ.The location of an abstract state is noted with loc( ) and looking up the state at a location in the invariant is noted with Σ(loc).Executing backwards, an abstract message history becomes conditional in messages observed in the future execution.An abstract realizable message history is parameter and is the abstract analogue of the concrete set of realizable message histories Ω.
(symbolic) value for a variable and to initialize variables.To do that, we use intuitionistic separation logic [Ishtiaq and O'Hearn 2001;Reynolds 2002] to indicate arbitrary store ⊤, separatingconjunction of two stores 1 * 2 , a singleton points-to or cell for program variables ↦ ˆ , an infeasible store ⊥, or a disjunction of stores 1 ∨ 2 , which we usually consider in disjunctive normal form.An abstract app state ::= ℓ : | • • • is then an abstract app store and location ℓ.
An abstract memory ::= • • | • • • is then a product of an abstract message history , an abstract boundary stack , and an abstract store -or a disjunction of such products.An abstract program state ::= loc : is simply an abstract memory at a program location loc.For abstract boundary stacks , we consider arbitrary boundary stacks ⊤, or appending a boundary activation • cb md ( ˆ ).Finally, we consider abstract program-state invariants Σ to be a set of abstract program states , which we also treat as a map from locations loc to the abstract state at that location (i.e., Σ(loc) = iff = loc : and ∈ Σ).
We describe the abstract semantics of boundary transitions b as Hoare triples ⊢ { ′ } b { }, except that we are interested in backwards over-approximating triples instead of forwards.That is, we read the judgment as, "If there is an execution of the boundary transition b to a post-state satisfying , the pre-state of that execution satisfies ′ " (Lemma 3.1).
The abstract semantics rules for boundary transitions b follow closely their concrete counterparts (assuming a structural rule for disjunction of memories ).The rule captures computing the pre-condition of the callback invocation transition fwk −[cb md ( )] ℓ and shows moving from an assertion on the abstract boundary stack • cb md ( ˆ ) to a hypothetical next message in the abstract message history cb md ( ˆ ) ։ .In detail, it first asserts that the post-app memory has bindings for the callback parameters * * ↦ ˆ and has the correspond- ing callback activation on top of the boundary stack • cb md ( ˆ ).Then, we drop the parameter bindings and pop the callback activation.Finally, we update the abstract message history with the abstract message corresponding to the callback invocation, cb md ( ˆ ) ։ .As an example from Section 2, the abstract state { 5 : okhist • f.act↦ null * this↦ f } just after the entry of call produces the pre-state { fwk 1 : cb f.call(m) ։ okhist • f.act↦ null } at the framework location which proceeds call.
Continuing to mirror the abstract semantics, the rule pushes a hypothetical callback message on the boundary stack corresponding to the callback that would have just returned in the concrete execution.Similar to abstract callback invoke ( ), abstract callback return ( ) and abstract callin invoke ( ) add hypotheticals to the abstract message history.The main difference is how each updates the abstract app store.For , the return value and its relationship to other symbolic variables is unknown, therefore we ensure that the separation logic domain has materialized return values and arguments for the callback via ′ * ′ ↦ ˆ ′ * * ↦ ˆ .For example, the post state from the running example is transferred over the return of the onDestroy callback to produce the pre-state { 9 : cbret f 2 .onDestroy()։ cb f.call(m) ։ okhist • . . .}.Note that we elide the return value here, as onDestroy is void.The value f 2 may or may not alias f, as there is a case split from the separation logic materialization (Figure 3 from Section 2 shows just the aliased case).
Finally, removes a program variable from the post app store corresponding to the return value and introduces fresh symbolic variables to the pre-store bound to the arguments of the callin invoke.For example, the post state { 3 : ... ։ okhist • task ↦ t * this ↦ f } transferred over the create call creates the pre-state { 2 : ci t = create(...) ։ ... ։ okhist • this ↦ f }.
We write app( ) = for the projection of an abstract program state to an abstract app state that drops the abstract message history and abstract boundary stack components.The app transitions are checked for being inductive in the analogous way with the rule defining the judgment form Σ ⊢ t, which similarly depends on an abstract semantics for app transitions ⊢ { ′ } t { } and an entailment judgment for abstract app states ⊢ ′ .
We then check that the abstract program-state invariants Σ are inductive for executing backwards a given boundary transition b with the judgment form Σ ⊢ b.This judgment says, "In abstract program-state invariants Σ, executing boundary transition b backwards is inductive when constrained by realizable message histories defined by specification ." The defines this judgment and captures the backwards over-approximation.Specifically, it depends on an entailment judgment ⊢ ′ that is parametrized by the realizable message histories specification (i.e., it should satisfy the following soundness condition: if ⊢ ′ and | = , then | = ′ ).To get the post-and pre-locations of a boundary transition b, we write post(b) and pre(b), respectively.Then, the rule chooses some that over-approximates Σ(post(b)) -i.e., Σ(post(b)) ⊢ , applies the abstract semantics for b -i.e., ⊢ { ′ } b { }, and checks that Σ(pre(b)) over-approximates ′ -i.e., ′ ⊢ Σ(pre(b)).This rule is the backwards over-approximating version of the usual Hoare rule of consequence.A similar judgment may be written for an app transition Σ ⊢ t (e.g., as in Blackshear et al. [2013]).For clarity, the semantics is written with explicitly materialized points-to for message arguments and return values (e.g., * ↦ ˆ ).If such values do not otherwise constrain the abstract state, they may be summarized into the top store ⊤ without precision loss.
To describe an inductive program invariant, we define the may-witness judgment form Σ ⊢ p that says, "Abstract program-state invariants Σ is inductive executing backwards from abstract program state -at location loc( ) -in program p. " The rule that defines this judgment form simply checks that each boundary transition b and each app transition t in program p are inductive.
Finally, to derive a refutation of reachability ⊢ p unreach, the rule says that we derive an inductive program invariant Σ from -i.e., Σ ⊢ p , and we derive that the program invariant at the entry location fwk excludes the initial (concrete) program state -i.e., ⊢ Σ(fwk) excludesinit.

Abstract Interpretation with Message
Histories.While we have described a checking system with the may-witness judgment form Σ ⊢ p , we can consider a direct approach to computing an inductive program invariant Σ from an error condition via a backwards abstract interpretation.The invariant map Σ is initialized with the error condition just before the assertion (okhist • ⊤ • where negates the assertion condition) and ⊥ at other locations.We then proceed with a standard worklist algorithm.When the invariant map is updated at a location, all transitions to that location are added to the worklist.Each transition in the worklist and abstract state at the post-location are processed by a transfer function (based on the Hoare triples defined above) producing a precondition that is joined into the invariant map.Pre-conditions at a location are eagerly merged with existing disjuncts both to avoid an updated state at a location and for efficiency.Merging is done automatically via the entailment check, • • ⊢ ′ • ′ • ′ .If a new pre-condition cannot be merged with an existing disjunct, it is added to the existing disjunctions.At the framework location fwk, all callback return boundary transitions, ℓ −[cbret ′ md ( )] fwk, are added to the worklist.We alarm if we cannot prove that the invariant at fwk excludes initial.If a fixed point is reached that excludes the initial state, ⊢ • • excludesinit, then we have refuted the reachability of the assertion failure.Intuitively, we have now captured the abstract state at all locations that may step to the assertion failure, and excludes initial is proving that no message history can go from the initial state of the program to the assertion and fail.
Excludes-initial, ⊢ excludesinit, and entailment of abstract message histories, ⊢ are automated via SMT (and described in Section 5).Existing techniques can combine the message history SMT encoding with other parts of the abstract state (e.g., using Piskac et al. [2013] for separation logic).

CALLBACK CONTROL-FLOW TEMPORAL LOGIC (CBCFTL)
In this section, we describe the Callback Control-Flow Temporal Logic (CBCFTL) that we use to express the specification of the realizable message histories.We design CBCFTL as a compromise between the expressiveness required to specify callback control flow and the need of the abstract interpretation to automate judging excludes-initial (⊢ excludesinit) and entailment ( ⊢ ′ ), which are parametric in the specification language used to express .
As we observe in Section 2, a specification of realizable message histories must be able to express: (1) quantification over message values (e.g., the subscription object s from History Implication 1); and (2) constraints on what messages must have or have not happened in the past (e.g., subscribe or unsubscribe).These requirements suggest CBCFTL should be a linear temporal logic (LTL) [Manna and Pnueli 1992] interpreted over finite sequences (i.e., message histories) [Giacomo and Vardi 2013], with first-order quantification of message arguments, and pasttime temporal operators [Lichtenstein et al. 1985].In principle, the excludes-inital and entailment judgment could be reduced to checking the satisfiability of first-order LTL (FO-LTL) formulas, but it is undecidable [Song and Wu 2016] and is limited in ready-to-use implementations.
Instead, we restrict the CBCFTL syntax such that reasoning about message histories leading to a target message is decidable (i.e., with history implications consisting of a target message and temporal formula).In Section 5, we show that such a problem can be in turn reduced to the satisfiability of the fragment of temporal formulas of CBCFTL.We show how to use such a subproblem to decide the excludes-initial (⊢ excludesinit) judgment and to obtain a semi-algorithm for judging entailment ( ⊢ ′ ).The syntactic restriction of CBCFTL carefully controls the use of features of the logic, such as negation and quantifier alternation, that complicate automated reasoning.In particular, the restrictions are such that we can encode a temporal formula of CBCFTL in an equisatisfiable formula in the Extended Effectively Propositional (Extended EPR) logic [Korovin 2013;Padon et al. 2017].This section first gives the syntax and semantics of CBCFTL and then explains the encoding of the temporal formula fragment in Extended EPR.

A Temporal Logic for Expressing Realizable Message Histories
Figure 6 describes the CBCFTL syntax and semantics.A CBCFTL specification is a conjunction of history implications ::= .Each history implication targets an abstract message, , controlled by the framework (e.g., invocation of the call callback) and a temporal formula, , that must hold before the framework outputs that message.While not captured in the syntax, history implications are closed formulas where the variables of the abstract message in the antecedent are implicitly universally quantified, and we assume that the temporal formula in the consequent is quantified so that its free variables are a subset of the variables of (i.e., fv( ) ⊆ fv( ) where fv(•) yields the set of free variables of a formula).Section 5 will explain how abstract message histories combine with history implications leaving only temporal formula, which motivates this design.
Temporal formulas include restricted versions of standard past-time temporal operators (O for Once, HN for Historically Not, and 2 NS 1 for Not Since), equality and disequality between variables, and positive Boolean combinations.In particular, the temporal operators apply only to individual symbolic messages and do not allow for explicit negations (although some negations are implicit in the history implication and in the temporal operators HN and NS).Symbolic messages are essentially abstract messages , except we allow for a local existential quantification of variables ∃ ˆ ., which is convenient for "don't care" arguments (e.g., the _ in History Implication 1).The structure of CBCFTL specifications also limits the nesting of the temporal operators: past temporal operators are always nested in a future temporal operator ( ), and the only future temporal operator is history implication .While not explicitly shown in the syntax, we also restrict quantifiers such that ∀ ˆ .may only contain conjunctions and disjunctions of HN and equality/disequality of symbolic variables.
The most interesting part of temporal formulas are the temporal operators: HN states that has historically not (i.e., has never) occurred in the past, O that occurred at least once in the past, and 2 NS 1 that 2 has not occurred since 1 occurred.The operators restrict standard past-time temporal operators so that for a given message , we either positively look back for the time occurs or negatively rule out at each time in the past.Thus, the O operator is directly the Once operator from past-time LTL, while HN and 2 NS 1 are syntactic restrictions for the appropriate negations within Historically and Since (i.e., HN def = H ¬ and 2 NS 1 def = ¬ 2 S 1 in past-time LTL).Additionally, we allow standard boolean combinations of operators as well as equality of variables.
Fig. 6.Syntax and semantics of callback control-flow temporal logic (CBCFTL).CBCFTL is a subset of firstorder linear temporal logic (FO-LTL) that describes a set of realizable message histories.A CBCFTL specification is a conjunction of history implications .Temporal formulas include restricted versions of standard past-time temporal operators that apply only to individual symbolic messages with limited negation O (Once), HN (Historically Not), and NS (Not Since).While not shown in the syntax here, the subformula of universal quantification ∀ ˆ . is further restricted to HN or the propositional forms to limit quantifier alternation.
A model of a specification is a (concrete) message history , which is a finite sequence of messages .Message histories are zero-indexed by positions ∈ [0, len( )), and we write [ ] for the message at position in and len( ) for the length of .A message history satisfies a history implication | = iff for all positions in the message history (i.e., ∈ [0, len( )]), if a concrete message with an assignment for its variables models and is [ ], then the prefix of up to − 1 must satisfy the temporal formula (under the assignment ).A model for a temporal formula is a tuple • • of a message history, an assignment, and a position in .The past-time temporal operators apply to the prefix of the message history up to (and including) position , and the relation is undefined for any position outside the valid range of indices of the message history (e.g., −1).

Encoding Temporal Formula Into Extended EPR
Here, we describe how we encode temporal formula into Extended EPR [Korovin 2013;Padon et al. 2017], a decidable fragment of first-order logic.In brief, effectively propositional (EPR) is a firstorder logic fragment where closed formulas converted into prenex normal form have the quantifier prefix ∃∀ without any function symbols.Extended EPR adds function symbols as long as the quantifier alternation graph does not contain cycles.The quantifier alternation graph is a directed graph where the nodes are sorts and the edges are defined by functions (or ∀ .∃ . . ..) from the sort of the argument to the sort of the value.
To encode a temporal formula , we model message histories with uninterpreted functions over uninterpreted sorts.We use an uninterpreted function hist : HistIdx → Msg from history indices HistIdx to message instances Msg.To capture message instances, we use a function msgname : Msg → MsgName from message instances to message names MsgName (i.e., representing the message kind, like cb, and the method name) and a function msgargs : Msg → ArgIdx → Val from messages instances to arguments indices ArgIdx to values Val (i.e., representing the arguments of the message instance).
Then to describe ordering constraints on messages in a message history, we use a set of ordering axioms (referred to as ax ).We use an uninterpreted function ≤: HistIdx → HistIdx → Bool and axiomatize a total ordering on HistIdx (like Padon et al. [2017]), as well as an axiom for zero (i.e., ∀idx ∈ HistIdx.0 ≤ idx where 0 is a variable).Argument indices ArgIdx are finite and bounded to the largest arity found in the framework methods.Message names MsgName are also finite and bounded by the framework interface definition.As such, we precisely represent the needed ordering constraints on messages in message histories.
Given the above, the encoding of temporal formula is now direct.We can encode an abstract message (i.e., an unquantified symbolic message ) at an index idx ∈ HistIdx using the hist, msgname, and msgargs functions.To be able to encode the length of a message history, we introduce a distinguished variable len.Then, we can encode the past-time temporal operators (O , HN , and 2 NS 1 ) using the encoding of an abstract message at an index, 0, ≤, and len.Here, we leverage the restriction that the temporal operators apply only to individual symbolic messages .With respect to Extended EPR, it is clear that the quantifier alternation graph from the function symbols is acyclic.And then, the encoding of temporal formula described above stays in Extended EPR because of the careful control of negation to prevent introducing any ∀∃ edges.

COMBINING ABSTRACT MESSAGE HISTORIES WITH CALLBACK
CONTROL-FLOW MHPL from Section 3 depends on two judgments, excludes initial ⊢ excludesinit and entailment ⊢ ′ .Excludes initial says that abstract message history, , excludes the initial, empty message history.Message history entailing a second message history ′ says that all concrete traces abstracted by are also abstracted by ′ .Both of these judgments depend on the CBCFTL specification from Section 4 for a definition of realizable message histories.For each abstract message history, we combine with the CBCFTL specification in order to avoid reasoning about the specification separately.In this section, we first show how to combine an abstract message history, , with a specification, , resulting in a single temporal formula, (as we describe in Section 2.2.3).Second, we show how to compute excludes initial and entailment for temporal formula.Finally, we prove that defining these judgments in this way is sound.
The high-level intuition is that given an abstract message history , we instantiate the specification of realizable message histories with into a single temporal formula .Then, with this temporal formula , we can implement these judgments on abstract message histories via queries to an off-the-shelf SMT solver (using the encoding described at the end of Section 4).In Figure 7, we describe the judgment form ⊢ ≡ that captures this combining of and into a single temporal formula .
As we see in Figure 7, the combining (or equivalent-to-a-temporal-formula) judgment form ⊢ ≡ is syntax-directed on the abstract message history .Under the assumption of , the abstract message history okhist is equivalent to the temporal formula true (rule ).For the ordered-implication abstract message history ։ , intuitively, we want to hypothesize 1 to derive any constraints from instantiating from and to derive any constraints from "quotienting" the constraints from 2 to "remove 1 from the end".Instantiating and quotienting are captured by two helper judgments.The instantiate judgment form , ⊢ says, "In specification , hypothesizing abstract message , temporal formula describe realizable message histories." And the quotient judgment form ⊢ ≡ ′ says, "Temporal formula is equivalent to temporal formula ′ with abstract message appended." We can now read the key rule: if hypothesizing 1 in yields temporal formula ′ 1 , abstract message history 2 is equivalent to temporal formula 2 , and 2 is equivalent to temporal formula ′ 2 with 1 appended, then the ordered-implication abstract message history ։ is equivalent to ′ 1 ∧ ′ 2 .Instantiation is the process of combining the hypothetical next message of an abstract message history with a history implication (e.g., Equation 1 in the running example from Section 2).For the instantiate judgment , ⊢ , we show only the cases for single history implications 2 where the hypothesized message 1 either matches ( ) or doesn't match ( ).The other cases for true and 1 ∧ 2 just yield true and the conjunction of the instantiations in 1 and 2 , respectively.As 2 implicitly binds the variables of 2 , we write 1 ≃ 2 for a matching up to a substitution from the variables of 2 to the variables of 1 and write [ ] for the capture-avoiding substitution with in .And we write 1 2 for the case where 1 cannot match 2 .
The quotient judgment form, shown in Figure 7, ⊢ ≡ ′ is syntax-directed on to yield ′ .We show the quotienting judgments for the three temporal operators O, HN, and NS.Quotienting the other temporal formula productions is straightforward.Quotienting once, O , with the abstract message has two possibilities: ( 1 Fig. 7. Instantiating CBCFTL specifications with abstract message histories from MHPL.The judgment form ⊢ ≡ says, "Under CBFTL specification , an abstract message history is equivalent to a temporal formula ." We can view this judgment as giving us an encoding into a temporal formula , the instantiation of a specification of realizable message histories with a particular abstract message history to derive a description of the realizable message histories up to a program location.message in the "once" making the temporal formula equivalent to "true" and ( 2) NotMatch( , ) -the abstract message is not equivalent to the message in the "once" leaving the temporal formula unchanged.Mirroring the quotienting of once, quotienting historically not, HN , with the abstract message has two possibilities: (1) Match( , ) -the abstract message is equivalent to the message in the "has never" making the temporal formula equivalent to "false" and (2) NotMatch( , ) -the abstract message is not equivalent to the message in the "has never" leaving the temporal formula unchanged.The operator 2 NS 1 is a combination of the previous two: either the quotiented message matches the right-hand side becoming true, or it must not match the left-hand side.
We see that for quotienting with the temporal operators, we need an analogous encoding of match or doesn't match: the meta-level functions Match( , ) and NotMatch( , ) encode into propositional formula of a symbolic message matching or not matching an abstract message , respectively.Since the message names, md, and kinds are known, Match( , ) and NotMatch( , ) always result in equalities and disequalities of logic variables (e.g., Equation 2 in the running example from Section 2) or "false".
With the ability to combine an abstract message history from MHPL with a CBCFTL specification of realizable message histories via the ⊢ ≡ judgment, algorithms for judging excludes-initial and entailment via SMT queries become clear.Let us write ⊢ ≡ for the encoding of a temporal formula into a closed first-order formula and use ax for the axioms encoding message histories from Section 4.
We define procedures for judging excludes-initial ⊢ excludesinit as checking for the unsatisfiability of ax ∧ ∧ len = 0 (where ⊢ ≡ and ⊢ ≡ ), and entailment ⊢ ′ as checking for the unsatisfiability of ax ∧ ∧ ¬ ′ (where ⊢ ≡ , ⊢ ≡ , ⊢ ′ ≡ ′ , and ⊢ ′ ≡ ′ ).The soundness of checking these judgments relies on the correctness of the combining judgment: Note that we assume a well-formedness condition that no abstract message is vacuous (i.e., for any abstract message and any assignment , there exists a (concrete) message such that • | = ).The correctness of combining relies on correct instantiation and quotienting: Proofs for these statements may be found in Appendix C.

EMPIRICAL EVALUATION
As we discuss in Section 2, the challenge for a program verifier is to prove the safety of assertions that depend on the callback order, while avoiding unsound models of the framework.We hypothesize that: (1) thanks to the targeted callback control flow specification, H can prove safe assertions while avoiding unsound results when an assertion does not hold.And (2) H can be applied to real event-driven programs.We validate our hypotheses with the following research questions: RQ1: Proving Assertions: Is it possible to write a targeted CBCFTL specification for H and prove safe assertions, while avoiding unsound framework models?RQ2: Generalizability to Real-World Applications: Can H prove assertions on realsized, complex, and widely used Android applications?Bug Patterns.Checking for arbitrary assertions, such as safe null dereference, is not interesting as most safe assertions can be proven with an intra-callback analysis.So, to find assertion locations in Android apps that require callback control flow reasoning, we identified a set of problematic API usage patterns.We first searched bug reports of runtime crashes for popular open source Android apps satisfying all the following criteria: (a) The issue had a stack trace similar the one shown in Figure 1.(b) The issue accepted a fix that relies on the callback order.(c) The crash involved callbacks or callins from a set of commonly used Android objects (Activity, Fragment, Dialog, View objects such as buttons and menus, AsyncTask, and Single/Maybe from RxJava).Then, we classified the crashes according to 5 patterns of interaction between callbacks and callins.(1) getAct [3] [Fietz 2018a]the Android method getActivity returns null if called on a Fragment that is not in the "created" state, and the app dereference such null pointer (Activity and Fragment objects are in the "created" state if the onCreate callback has been invoked, but the onDestroy has not).( 2) execute [5]  [Fietz 2015]the app calls execute twice on the same AsyncTask object, ending in an exception.(3) dismiss [7] [Fietz 2016] -the app calls dismiss on a Dialog constructed with an Activity that is currently in the "created" state, ending in an exception.(4) finish null [Meier 2021] -the app dereference a field in an onClick callback, the same field can be set to null in the onPause callback, and the app call finish on the enclosing Activty (we call "nullable" the fields that can be set to null in a callback).( 5) subs null [Hamster 2020] -the app dereferences a nullable field in a callback executed concurrently, such as Runnable run.We name the patterns with the main message involved in the crash, followed in subscript by an "exception property", specifying when the bug would manifest with throwing an exception.Such exception properties may be a CBCFTL history implication (referenced by number and listed in Appendix D) specifying when the framework returns an exception, a null value, or a nullable field dereference (indicated by null).

Pattern
H no-order eager Bug getAct [3]  9 15 3 [1,4] 3 33 1 11 2 13 9 3 !! falseexecute [5   7 6 3 [2,6] 2 29 0 0 2 33 12 3 9 16 3 [1,4]  Table 1.The rows of this table are split into Bug and Fix benchmarks for each pa ern.We first list the number of callbacks, callback returns, or callins that could be captured by a history implication in the framework model.Next, we list the number of history implications (specs) wri en for the benchmark (listed in Appendix D), then, we show how many of the messages are in the specification.The depth captures how many times H needed to step backwards through a callback (e.g. Figure 3 shows 4 steps back).H alarms on all the bug versions ( ! ), and refutes reachability of the bug assertion for 4 out of the 5 bug-fixes ( ).In the last case, H explored up-to 5 callbacks before timing out at 30 min ( 5 ).For the comparison with ideal tools using the "no-order" and "eager" modeling approaches, some results are labeled falseand false-! .
Implementation.H implements the backward abstract interpretation with message histories of Section 3 for refuting callback reachability assertions in Android apps.H uses Soot [Vallée-Rai et al. 1999] for loading the compiled app and to implement the application only control flow graph construction (similar to [Ali and Lhoták 2013] but augmented with boundary transitions as discussed in Section 3.1).H implements the encoding of Section 5, and uses the Z3 SMT solver [de Moura and Bjørner 2008] to check the satisfiability of temporal formulas (Section 4).H further processes callbacks in parallel and pre-empts calls to Z3 when possible for performance.We ran our experiments using Chameleon Cloud [Keahey et al. 2020] using an AMD EPYC 7763 and 256 GB of RAM.

RQ1: Proving Event-Driven Pa erns
In Table 1, we evaluate the ability of H and the representative state of-the art framework modeling (no-order, eager) to prove safe fixes of the bug patterns, while correctly alarming on instances containing the bug.For each one of the 5 bug patterns, we distilled a Bug and a Fix benchmark application from the real app code mentioned in the representative bug reports (slicing the app code to remove all the components and code non-necessary to reproduce the bug).The Bug version demonstrates the usage of the framework callbacks and callins causing the crash in the original application, while the Fix version applies the fix from the bug report.A sound analysis should always alarm on the Bug version.
We manually wrote a CBCFTL specification sufficient to prove the assertion safe for each fix, and then we run H with this specification (specs column) on both the Bug and Fix version.We compare H with the main framework modeling approaches, which either do not assume any callback ordering (no-order), or provide an eager modeling of the framework.Infer [Calcagno and Distefano 2011] and Flowdroid [Arzt et al. 2014] are used as representatives for the first and second approach, respectively.Of the 5 bug patterns, only the 4th and 5th patterns are supported by Infer and none are supported by Flowdroid.We note that Flowdroid is the only open source tool we could run in the eager category but does not natively support these properties.Therefore, in order to compare with the no-order model, the first three exception properties were reduced to a nullable field and checked with Infer (i.e., we manually wrote code that would throw a null pointer exception just before the actual exception was thrown).For the remaining two, we added nullability annotations on the affected fields (because Infer will not alarm on a null value from a field without this annotation).For the eager model, we manually examine the artificial main method generated by Flowdroid.This is a main method that should behave as the original app composed with the framework.We evaluate whether any sound and precise whole-program static analysis could prove the fix while alarming on the bug with this main method.
Discussion of the Results.H always (and correctly) alarms ( ! ) on all the Bug versions, while either refutes ( ) or does not terminate before exhausting a run-time budget of 30 minutes ( 5 result in the finish null benchmark).For the Fix version of the finish null benchmark, H still does not alarm, but provides the partial result proving that no program execution containing less than 5 callback invocations can reach the assertion.Interestingly, such a partial proof rules out the (abstract) execution H found when failing to refute the assertion in the Bug version for finish null , which visits 4 callbacks (see the depth column).Targeted refinement of the framework specific control-flow specifications was required in each case for H to avoid false alarms.Unsurprisingly, the no-order model results in false alarms on each fixed benchmark.Additionally, for the eager model, we found that in all but one case the artificial main method generated by Flowdroid rules out the sequence of callbacks reaching the real bug.In three of these cases, a callback that has to be executed to reach the bug was missing from the call graph.In one case, the main method over-constrained the callback order.For the remaining case, the eager model did not generate code that changed state when setEnabled(false) was invoked, disabling a button.Therefore, no program analysis could distinguish the state where onClick could not occur on that button.
6.2 RQ2: Generalizability to Real-World Applications Next, we evaluate the generalizability of H by analyzing a set of 47 widely used applications containing over 2 million lines of code.These apps were found and retrieved from the F-Droid repository [F-Droid 2023] by filtering for apps updated in the last 2 years and that are more than 8 years old (rejecting obfuscated or otherwise difficult to inspect apps).We answer this question by searching for the five bug patterns and attempting to verify the 1090 locations found.First, we run H on each location with only the exception property, and then, we sample 8 locations that could not be proven for targeted specification refinement including timeouts and alarms.
We searched for the five patterns described in RQ1 using the application-only control flow graph.For the execute and dismiss patterns, we searched for the callins in the call graph.The remaining three patterns use an intraprocedural data flow analysis to find nullable values that were dereferenced.This value comes from either getActivity for the first pattern or a nullable field for the remaining patterns.The finish pattern looks for such dereference commands in the onClick callback when the finish method is used, and the subs pattern looks for dereferences in common concurrency callbacks.
Results are reported in and Flowdroid.There were 9 apps that timed out with Flowdroid.Among the 38 apps that Flowdroid could finish on, it found 28k application methods as compared to 70k application methods found by H .
by the apps the patterns were found in KLOC.We then report the number and percentage of the locations that H alarms on, timeouts on, and is able to prove safe.As the most common unsoundness in RQ1 was missing methods from the call graph, we compare the number of application methods found in the call graph of H and Flowdroid.A higher number of methods indicates more code is being analyzed.
Table 3 lists 8 randomly sampled locations from distinct apps that H could not prove without refinement.For each sample, we recorded the time required to write the CBCFTL specification in the "spec time" column.The time to understand the callbacks being specified is not included in the recorded time, as this would be required for any modeling approach.If it took more than an hour to run H or if we took more than an hour to write the specification time, we record a timeout ⊗ in the result (res) column.For perspective on the modeling difficulty, we list the number of messages that could be captured by the specification (the cb,ret and ci columns under Sample), as well as the total number of specifications as history implications we wrote (under the specs column) and the number and percentage of app messages that could be matched (the cb, cbret, and ci columns under H ).
Discussion.Before manual refinement of the specification of the framework model, our tool was able to prove 43% of the locations safe, raise alarms on 28% and times out on 29%.We note that for execute [5] , finish null , and subs null , we get few alarms as developers appear to use these patterns defensively.Of the 8 samples, we found that we were able to correctly classify 4 locations within the hour budget of specification writing time (and always in within 5 minutes) and the hour budget of H run time (and from a few seconds to a few minutes).Of the 4 timeouts, one was from the specification writing time, and the rest were H taking more than an hour.In the 3 cases where we were able to prove locations, the specifications ignored 95% or more of the boundary transitions in the app (i.e., no abstract message can match the majority of transitions in the applications).This highlights a performance benefit to targeted-refinement -although our analysis is unbounded in the worst case, in practice the majority of the messages that boundary transitions in the app can produce do not affect the encoded meaning of the message history, and most abstract states are immediately merged via entailment.Calling back to Section 5, the specification ignoring most messages means that most of the time the quotient judgments result in an equivalent message history.Ignoring most messages allows most abstract states to be merged.3. We sampled 8 locations that could not be proven from Table 2 and a empted to add CBCFTL specifications to prove them safe.These are listed by app name as all samples were chosen so the apps are unique; specific locations app versions and links may be found in Appendix D. * The Connectbot benchmark here was a timeout in Table 2; we manually removed callbacks to help find the alarm and understand the bug, while the benchmarks were unmodified.
With benchmarks containing hundreds to thousands of callbacks among thousands to tens of thousands of app methods and SMT calls for each new abstract pre-state, this means that we are avoiding the exponential explosion in the typical case.
It is also noteworthy that the application-only control flow graph used by H captures significantly more applications methods than Flowdroid.Among the applications that we could use Flowdroid to build call graphs for, it found 28K app methods.In these same apps, H found 70K app methods.This seems to reflect our observation from RQ1 that it is very challenging to capture all possible callbacks while eagerly modeling the framework and thus an argument for the targeted modeling approach of H .

Threats to Validity
The main threat to validity of our experiments is the shifting behavior and authors understanding of the Android framework for which we used as a case study.As noted in the Introduction, manual modeling of the Android framework is extremely difficult.Even though the application-only control flow graph appears more sound from these experiments, we found that it is possible to miss callbacks without a complete list of objects the framework can instantiate with reflection.To reduce the risk of an unsound call graph, we ensured that each location in RQ1 and a sampling of locations from RQ2 cannot be proven unreachable for any state (i.e., is reachable under some app state) unless the location appears to actually be unreachable through manual inspection.

RELATED WORK
Depending on the analysis domain, precise models are often included for some components but elided for others.As described in Section 1, a common approach, particularly in industrial Android analysis tools, is to use no model at all (i.e., the most over-approximate model).Verifiers with no callback order modeling have the advantage of performance and are the easiest to maintain but have a high false alarm rate requiring heuristic filtering [Calcagno et al. 2015].Precision is added to the callback control flow models for a range of different domains of static analysis.Awareness of the Activity lifecycle and other user interface callbacks can improve taint analysis for security [Arzt et al. 2014;Calzavara et al. 2016;Gordon et al. 2015].Other program analysis tools will use the Activity lifecycle in addition to precise models of other user interface components for verifying user interface properties [Perez and Le 2021; Yang et al. 2018Yang et al. , 2015]].Framework precision with respect to objects used for concurrency such as AsyncTask and thread pools is often captured for race detection [Hu and Neamtiu 2018;Wu et al. 2019;Yang et al. 2018Yang et al. , 2015] ] and other tools that detect concurrency issues [Pan et al. 2020].For our experiments, we added precision for some callbacks from the Activity lifecycle, other user interface components, and objects for concurrency such as AsyncTask.The benefit of our compositional modeling approach is that components may be added on an as-needed basis as opposed to eagerly modeling a large portion of the framework.
Building the model of the framework directly into the program semantics used for the analysis has the advantage that the subsequent abstraction may be precisely chosen based on the modeled behavior.Many of the tools that build the model into the analysis capture UI elements, inter-component communication, and the stack like behavior of windows [Calzavara et al. 2016;Payet and Spoto 2014;Rountev and Yan 2014;Yang et al. 2018].The drawback to building the model directly into the analysis is that adding or updating behaviors [Huang et al. 2018] requires modifying the analysis itself.The most common approach to model callback orders is by generating an artificial main method [Arzt and Bodden 2016;Arzt et al. 2014;Gordon et al. 2015;Hu and Neamtiu 2018;Pan et al. 2019].An artificial main method has the advantage that modeling can be decoupled from the program analysis by generating code that enforces a callback order to link with the application.When analyzing with such a main method, the normal abstraction used by the program analysis captures the callback control flow (e.g., through context sensitivity).The generation of main methods that can be abstracted precisely is a challenge.Capturing behavior such as arbitrary interleaving between callbacks (e.g., multiple simultaneous activities) can be difficult while avoiding language features that cause imprecision in analysis such as dynamic dispatch.We note that race detectors often combine some aspects of hard coding the callback control flow into the program semantics with utilizing an artificial main method (often for call graph construction).Automata and graph based approaches to modeling and abstraction [Blackshear et al. 2015a;Perez and Le 2021] are compositional and only rely on knowledge of relative order between callbacks.A difficulty with any modeling approach that eagerly models components is that the more components are modeled, the more likely the model is unsound.Unsoundness is common among any approach we listed that captures some callback order [Cao et al. 2015;Meier et al. 2019;Wang et al. 2016].Our approach can lessen the risk here by enabling a targeted approach to callback control-flow modeling to avoid modeling more than necessary.

CONCLUSION
We have described a novel middle way for refuting callback reachability that enables a decoupling of the specification of callback control flow from the abstract interpretation to compute program invariants over an application-only transition system.This decoupling offers the appealing capability to gradually refine the possible callback control flow as needed and in a targeted manner to prove an assertion of interest, and it thus moves us past the false dichotomy of either using no modeling or eagerly modeling all callback control-flow constraints.The key innovation of our approach is an internalization of message histories into the analysis abstraction as a hypothetical (i.e., an ordered linear implication) to capture message histories up to a program location constrained by future messages and parametrized by a separate specification of realizable message histories.We then define a specification logic for callback control flow (CBCFTL) that carefully specializes past-time linear temporal logic so that we can utilize message-history program logic (MHPL) assertions together with CBCFTL specifications.Our evaluation provides evidence with a proof-of-concept implementation that our approach can refute callback reachability in challenging examples drawn from real-world issues among open-source apps.

A ASSUMPTIONS OF THE APPLICATION-ONLY TRANSITION SYSTEM
In practice, the translation from a compiled application to the application-only transition system makes some assumptions about the execution environment that we list in this section.These assumptions simplify reasoning about concurrency, reflection, and library behavior.First, we assume that all callbacks occur on the same thread (i.e., individual statements inside two separate callbacks cannot be interleaved during execution).In practice, some callbacks do occur concurrently.For example, the callback passed to Single.create(...) in the place of the ... is typically executed on a background thread.Specific kinds of program analysis exist to address the interleaving caused by threaded concurrency, but that is out of scope for our paper.Additionally, we do not consider exceptional control flow involving try/catch blocks.
For the application only transition system, we assume that the framework cannot directly modify the app store except through the invocation of a callback.In theory, the framework may directly modify fields such as this.actgenerating an execution not represented by the application only transition system.However, such direct access is rare.Since the framework must be compiled without the application, any method or field that the framework accesses must typically extend a type declared by the framework (e.g., Fragment which declares an abstract method onCreate).The only exception is reflection, which may be used to access the app store such as the this.actfield directly, but we assume such uses of reflection are rare, as it breaks encapsulation.More commonly, the framework uses reflection to create instances of objects such as PlayerFragment.Ali and Lhoták [2012] coined the phrase separate-compilation assumption to describe how a reasonable set of assumptions on framework behavior may be used to generate an application-only call graph.Our application-only transition system follows the same principle and is based on the same assumptions.
As mentioned in the Overview (Section 2), we assume callin invocations do not synchronously call back to the app (i.e., a callin stays in framework code until it returns).This assumption is frequently mirrored by other static analysis, such as Arzt and Bodden [2016], by framework stub implementations that do not invoke callbacks.In general, we find that not many Android methods have synchronous callbacks.In fact, none of our examples use a method exhibiting this behavior.
This assumption is reflected by the single ℓ −[ci ′ md ( )] ℓ ′ boundary transition.Our boundary stacks are degenerate here in that they will only ever have at most one activation k (for the active callback).The boundary stack is conceptually the subsequence of the run-time call stack corresponding to boundary transitions (i.e., the pending calls alternating between callbacks and callins), which ensures that the message history consists of matching calls and returns.
One can extend the language to support synchronous callbacks by splitting the callin invocation boundary transition in two: one for the callin invocation into the framework (i.e., from ℓ to fwk) and one for a call return back from the framework (i.e., from fwk to ℓ ′ ), analogous to callback invocations and callback returns.Note that if synchronous callbacks can be soundly modeled as asynchronous ones, then it is also unnecessary to add the complexity of synchronous callbacks (which is generally the case for Android).

B PROOFS FOR MESSAGE HISTORY PROGRAM LOGIC
In this section, we prove the theorems for the message history program logic (MHPL) that we describe in Section 3. We make the following assumptions: Providing a sound specification is the responsibility of the user of H .
The sound excludes init Property 2 is a consequence of the sound encoding of message history program logics Section 5, a sound first order logic encoding, and a sound SMT solver.Subcase 2 or fv( ) ≠ fv( 2 ): In this case, we have NotMatch( 1 , ) so the in Not Since must be in .
by the definition of message not equals, we know that 1 or fv( ) ≠ fv( 1 ) so must not be equal to under assignment and therefore ; From the lemma and the quotient-and judgment, we have that The cases for ∨, ∀ ˆ ., and ∃ ˆ .are similar to ∧.For all the following productions, the message history is not affected: ˆ 1 = ˆ 2 , ˆ 1 ≠ ˆ 2 , and "true".Therefore, they model any history ; • .
For the case "false", it cannot model ; • therefore holds vacuously.
D EXTENDED EXPLANATIONS OF THE BENCHMARK APPLICATIONS AND SPECIFICATIONS Here, we give a listing of all the specifications we wrote for Section 6 RQ1, details on the crashes, and reasons the Flowdroid model would not be able to classify the benchmarks correctly.
The first benchmark, getAct [3] , is a slightly more complex version of the motivating example in Section 2. The main difference is that instead of an act field, the app invokes the callin getActivity.Calling getActivity before onCreate or after onDestroy results in a null value as captured by History Implication 3. As a minor difference, getAct [3] uses the onActivityCreated callback instead of onDestroy which has slightly more complicated behavior (History Implication 4).The behavior of the call callback was explained in Section 2 with History Implication 1.
History Implication 2. For all f, if the framework invokes f.onCreate(), the same message cb f.onCreate() is Historially Not possible (or, Has Never been invoked in the past).History Implication 3. The method getActivity returns null when invoked on an Activity in the paused state.
cb f.onActivityCreated() HN cbret f.onDestroy()∧ HN cb f.onActivityCreated() ∧ HN cb f.onActivityCreated() Flowdroid would not alarm on the buggy version of getAct [3] because call was not in the call graph making the assertion unreachable.If call was added to the framework model generation (e.g. by Cao et al. [2015]), then the Flowdroid model would not be able to capture the effect of unsubscribe (i.e.History Implication 1) resulting in a false positive for the fix.
The second benchmark, execute [5] , has a button that starts an AsyncTask to perform an action in the background using the execute callin.AsyncTask is an abstract class that can be overridden by the app to encapsulate long running tasks (similar to Single).In order to prevent concurrency issues in the state of the overridden class, the framework enforces that execute crashes if called twice on the same instance of the overridden class.We capture this exceptional return with History Implication 5.If the button was clicked twice quickly, the task could be executed twice crashing the application.The fix was to disable the button using b.setEnabled(false) (History Implication 6).Additionally, if the listener could be registered to two different buttons via two calls to onCreate which calls setOnClickListener, then the second button could be pressed crashing the app.To rule out this case, we needed History Implication 2 (also needed by getAct [3] ).
History Implication 5.For any given instance of AsyncTask, the execute method throws an exception if execute has been invoked in the past.exn ci t.execute() O ciret t.execute() History Implication 6.Every time the onClick callback occur, the associated button has not been disabled.The fix for execute [5] would be misclassified by Flowdroid's model because the effect of the callin b.setEnable(false) on the button is not captured.
The third benchmark, dismiss [7] , uses a background task (via AsyncTask) and when the task finishes, the onPostExecute callback dismisses a ProgressDialog.If the ProgressDialog is dismissed after the parent has been paused, it throws an exception (History implication 7).The fix was to check if the parent UI was visible before dismissing the dialog using the application state.Additionally, we needed to restrict the method used to create the dialog, show, to return a fresh dialog instance each time (History Implication 8).History Implication 7. The dismiss callin throws an exception if invoked while the attached Activity is paused.The bug for dismiss [7] would not be detected by Flowdroid because the onPostExecute callback was not in the call graph.Adding onPostExecute to the Flowdroid would result in a false alarm on the fix because the interaction between show, dismiss, onPause, and onResume captured by History Implication 7 could not be captured due to the callins.
The fourth benchmark, finish null , has a button that dereferences a field in a onClick callback.This field is set to null when the Activity is paused (via onPause) to save memory.However, via what may be considered a bug in some versions of the Android framework itself, calling finish on an Activity can result in the onClick callback occurring after onPause (History Implication 9).Similar to other benchmarks, there is a onCreate that registers the button and can only happen once (History Implication 2).Since an Activity may have multiple simultaneous instances in an Android app, we also need a spec that says a button may only come from one Activity instance (History Implication 10).History Implication 9. Every time the onClick callback happens on the listener object l, the listener was registered on a Button object v, and the Activity object a was either in the "resumed state" (i.e., after a onResume but before than a onPause) or the message finish happened.The bug for finish null would be missed by Flowdroid's model because this model assumes a onClick may only occur after an onCreate and before an onPause.If onClick were allowed to occur after onPause, then the Flowdroid model would still need to reason about the callin v.setOnClickListener(null) which prevents onClick.
The fifth benchmark subs null is caused by a dereference of a nullable field callback that is used by RXJava for synchronizing the results of a background task with the UI thread.Similar to the first benchmark, this bug would not be detected by a whole-program analysis using the Flowdroid main method because the synchronization callback is missing from the call graph.

Fig. 4 .
Fig. 4.An application-only transition system with boundary transitions and message histories.The message history component of the program state records the execution of boundary transitions b between the app and framework.We use the judgment → Ω p ′ to represent a single step over either an app or boundary transition in the application.The transition system is parametrized by a set of realizable message histories ∈ Ω.
says, "Program state steps to ′ in program p by either a boundary transition b or app transition t under realizable message histories Ω. " Straightforwardly, the and rules simply state that we can either take a step with an app transition t or a boundary transition b in the program p (depending on the program location).The transition semantics of app transitions are left unspecified , t ⇓ ′ .We assume that app transitions themselves do not read or write framework state directly, as the app store is separate from the framework state.

Table 2 .
The first column lists the individual patterns while the second column lists the locations found for each.For scale, we list the thousands of lines of code contained

Table 2 .
Verifying usages of the multi-callback pa erns among 47 open-source Android apps with only the exception property.We list the results by alarms where H finished but could not prove the property, timeouts where H took over half an hour, and safe where no further specification was needed.We list the number of app methods that are contained in the call graphs of both H