Interaction Proxy Manager: Semantic Model Generation and Run-time Support for Reconstructing Ubiquitous User Interfaces of Mobile Services

,


INTRODUCTION
Smartphones, with the most abundant application resources, are currently mostly accessible by touching the phone screen alone.With the rise of AIoT, people tend to choose different devices and interactions depending on the use scenario [13], expecting applications to offer services in various forms [4,30,31,46].Just imagine a user has a new unread message.He/she may prefer reading it on a wearable device like a smartwatch or glasses while walking, on their PC at work for efficiency, or through a car screen or voice interaction while driving for convenience and safety.While these alternative devices offer the potential to further enrich user experiences, the challenge of developing differentiated user interfaces comes to our attention.
To present existing mobile applications to various terminals, there are two approaches in mainstream practice.The first involves screen-sharing technology based on video streaming, such as Google Cast1 , Miracast2 .However, this solution is limited to visual content and does not adapt to other modalities like voice or gestures.The second approach leverages the system-level framework.Examples include CarPlay 3 and Android Auto 4 , enabling the deployment of applications to car screens.Whereas, it is unable to assist third-party developers in cross-device interface construction, necessitating code adaptation from application developers.
Inspired by the previous work for enhancing accessibility [61], we consider Interaction Proxy as a solution to the aforementioned issues.This technology offers an interaction remapping mechanism that intercepts and forwards page content and events on the phone, enabling a flexible construction of new user interfaces without altering the code of the original application.However, despite demonstrating the potential for accessibility improvements, Interaction Proxy is currently at the proof-of-concept stage, with limited practical reliability [61].Figure 1 highlights several challenges associated with Interaction Proxy, including unstable or missing mobile page data, difficulties locating GUI widgets, and issues with synchronizing the state of each interface [32,43].Consequently, this technique may result in failed function execution or incomplete content of the new user interface [6], leading to high development and debugging costs.
The challenges mentioned above primarily stem from the dynamic nature of user interfaces [29].To eliminate the instability brought by the interface, this paper proposes Interaction Proxy Manager (IPManager), a software module that functions between interfaces, responsible for decoupling the application's interface implementation from the service semantics it contains.In the IPManager, we define UI-Independent Application Description (UIAD), a reversed-engineered semantic model, to synchronize and manage the original application's interface.The model organizes the application's information and methods in a hierarchical structure, making it widely applicable due to its alignment with human cognition and the principles of the underlying object-oriented implementation of the application.
As shown in Figure 2, the IPManager operates in two phases.In the offline phase, the registration system aids developers in model design and establishes the relationship with the original GUI using low-effort interactive annotations.This process formulates recognition strategies, maintains precise mapping relationships, and  provides model APIs for new interfaces.In the run-time phase, the model responds to API calls from new interfaces, delivering the requested data or controlling the phone to execute the necessary actions.Thereby, the new interfaces can indirectly access the original GUI data, whose dynamicity is directly addressed by the model.
Moreover, we implement techniques such as the maintenance of the run-time relationship between the model and the original GUI, cache and route optimizations for stable and efficient communication.
Interaction Proxy Manager overcomes the limitations of existing solutions and aims to achieve the following goals: 1) Robustness.Decoupling the interface from the service allows for a more accurate extraction of application semantics and shields new interface systems from the original interface's instability, leading to more reliable operation.2) Generality.The model can be both generated from a wide range of applications and invoked by various devices to construct novel user interfaces.3) Economy.Model generation can be a one-shot process, enabling swift deployment to different devices and scenarios without burdensome modifications.4) Flexibility.IPManager transcends the physical limitations of a single GUI, allowing for alternative modalities beyond visual feedback.5) Incrementability.The model can be continuously updated and adapted to new semantic concepts based on user requirements.
The specific contributions of our work therefore include: • We propose Interaction Proxy Manager to eliminate the impacts of the original interface, which fulfills semantics-based reverse engineering and provides reliable interaction mapping.• We introduce the Interactive Annotation Mechanism for registering with the original interface, which supports dynamic learning strategies for page classification and widget recognition, streamlines the annotation process, and guarantees robust model generation.• We provide a series of run-time support such as cache and routing optimization to properly handle the dynamicity of the interface and efficiently respond to new interfaces' requests.

RELATED WORK
This section reviews the literature relevant to our work, including prior Interaction Proxy systems, mobile GUI semantic analysis, model-based UI development, and annotation systems with interactive machine learning.

Prior Interaction Proxy Systems
The concept of Interaction Proxy has been previously utilized in designing interactions based on existing user interfaces for various purposes.Therein, it has been used for run-time repair and enhancement of the accessibility of mobile applications [47,[61][62][63] or adapting user interfaces to wearable devices [58,64].SUGILITE [33], KITE [37], and PUMICE [36] propose to program by demonstration or natural language inputs on smartphones.SOVITE [34] helps users discover conversational breakdowns using the existing mobile GUIs.Rataplan [53] is pixel-based approach for linking multi-modal proxies to automated sequences of actions in GUIs.These systems rely on the direct remapping of interface elements or events, resulting in case-by-case connections.More importantly, they lack robust mobile page recognition, usually using pre-defined rules [61], which leads to unreliable generation results, especially for content-rich UIs like large-screen interfaces.

Semantic Analysis for Mobile GUI
Mobile GUI layout obtained through Android's Accessibility Service API 5 only presents the interface rendering without in-depth semantics.Semantic analysis addresses GUI irregularities [32,43] and assists in reverse engineering of application logic.We define the design space of mobile GUI semantic analysis in 4 progressively deeper levels: Item, Label, Concept Hierarchy, and Concept Graph.
Level 1: Item.Identify all meaningful items on the GUI page.REMAUI [44] and GUI skeleton [14] recognize GUI elements using computer vision or OCR, and extract visual features in screenshots.
Level 3: Concept Hierarchy.The logical relationships between components are normally described in a tree, in which the components are not GUI widgets but semantic concepts.Screen Recognition [57,60] comes close to this level, describing the logical relationships via hierarchical segmentation.However, the semantic information is shallow, the components remain at the widget level, and the edges of the tree are not differentiated according to a deeper level of semantics.
Level 4: Concept Graph.This level represents the deepest analysis, illustrating relationships between all semantic concepts and necessitating integration of semantics across pages to restore the application's workflow and data management.Studies address mapping natural language instructions [38] or voice commands [45] to GUI action sequences, defining concepts for GUI widgets and operation paths [36], characterizing inter-screen semantics [35], identifying interaction trace events [21], and various UI tasks [55].However, current research is limited in capturing shallow cross-page semantics, necessitating further efforts to develop practical concept graphs.
Our work is at the intersection between Level 3 and Level 4. We build the semantic hierarchy, integrate the involved content of the multiple pages, and describe the relationships between them.

Model-Based UI Development
Model-Based UI Development (MBUID) simplifies UI development by generating code from models that define data structure, behavior, and relationships [42,49,51].The Camelon Reference Framework [23] outlines the MBUID process, detailing layers of abstraction and their relationships: the Task and Concepts level, Abstract UI (AUI) [52], Concrete UI (CUI), and Final UI (FUI).Various MBUID software tools have emerged [40], supporting cross-toolkit development and element reuse.MBUID also enables reusable model creation [51] and offers design assistance features [25].
While MBUID expedites code generation and eases front-end development, it neglects the complex development requirements associated with data and functionality sources in the UI.Our approach seeks to decouple information services from legacy smartphone applications through reverse engineering, subsequently offering APIs for new UI development and facilitating the integration of data and functionality in new UIs.

Annotation Systems with Interactive Machine Learning
Interactive Machine Learning (IML) enables users to iteratively build and refine mathematical models through input and review cycles, without extensive background knowledge [8,18].This makes IML suitable for annotation systems, reducing modeling expenses and expertise requirements for annotators.
In our work, we use IML to help the system learn classification strategies for pages and widgets.This approach is suitable for our task with dynamical granularity and significantly reduces annotators' expertise and workload.

UIAD MODEL SPECIFICATION
The stability and reusability of an application's semantics far surpass that of its interface, as countless GUI element combinations can represent the same functionality.Therefore, we propose a UIAD Model that organizes the remaining semantic knowledge after removing the UI implementation.The model aims to provide a consistent structure for representing application semantics while adapting to multi-modal interfaces.
In this section, we introduce the UIAD Model Specification using WeChat, a popular instant messaging application, as an example.As the model's foundation, it defines the structure and APIs, while the specific content becomes available only after registration with the original interface.

Model Structure
Drawing inspiration from epistemology, which divides knowledge into descriptive "knowing-that" and procedural "knowing-how" [9], the model organizes application content into information and methods within a hierarchical structure.This representation aligns with human perception of application semantics, reducing learning costs for developers.Moreover, it is consistent with object-oriented programming principles in application source code, enhancing the feasibility of semantic description.
The UIAD Model can describe entire applications, multiple sub-functions, or even a single GUI page by representing the semantic structure as a tree.Tree nodes, defined as semantic elements, exemplify the hierarchical order of the semantics via parent-child relationships.In this arrangement, child nodes embody components of their parent nodes' semantics.
For example, Figure 3 illustrates part of the model structure for WeChat.Once the model establishes a correspondence with the application content, the semantic elements can derive values from multiple pages of the original GUI, as shown in Figure 5, enabling the model to integrate semantic information from diverse sources.
To guide the design of the model structure for organizing information and methods, we will subsequently provide detailed definitions for each type of semantic element.First, the information is described through the following semantic elements: • Root Object: The root represents the entire application.
• Object: A subtree with such an element at the root describes all the information of an Object.Different types of Objects are distinguished by their names, such as Contact and Message.• Object List: Elements of this class consist of several Object children sharing the same name, indicating that these Objects exist simultaneously in the same list.• Property: A property of an Object or Object List, consisting of the property's name, type, and value.The value, represented as a string, must be parsed based on the type.The value may be initially set, or left empty during the design phase and later populated with data from the original GUI.
Relationships between semantic elements occasionally extend beyond the conventional parent-child hierarchy found in tree structures, thereby requiring representations for lateral or cross-branch linkages.To accommodate these unconventional relationships within the overall tree structure, we introduce a distinct semantic element, the Relationship.As a child of the Root Object, a Relationship embodies a "link" between two previously defined semantic elements (Object or Object List). Figure 3 illustrates a Relationship where a Contact is the "source" of a Message.
Second, the methods are flattened in the model, differing from multipage jumps in the mobile GUI.They are organized by the following classes of semantic elements: • Method: Different Methods are distinguished by their names, such as "VoiceCall" and "SendMessage".As a child of the Root Object, a Method comprises three properties: Semantic Parameters, Operation Parameters, and Execution Sequence.• Semantic Parameters: The list of semantic parameters of the method.Each parameter refers to a previously defined element (Object List or Object), such as the Contact involved in the "SendMessage" Method.• Operation Parameters: The list of the method's parameters, excluding the semantic ones mentioned above.
They are typically required to ensure the completeness of the operation, such as the content for the "SendMessage" Method.• Execution Sequence: The list of operations in the original interface, i.e., operable GUI widgets that can accept Operation Parameters or potentially initiate the Method.Initially, the sequence is empty, as shown in Figure 3.Only after registering with the original interface, the edit box and the "Send" button will be added to the Execution Sequence of the "SendMessage" Method, as shown in Figure 5.
The initial model structure needs to be built manually according to the functional requirements, ensuring it includes all relevant semantic elements.For example, when deploying the function of sending messages, the list of contacts and messages, and the parameters involved in "SendMessage" are necessary.The definitions of these semantic elements, including their names, types, and parameters, must be specified manually before the registration process.However, the specific values (e.g.username "Lee") and Execution Sequence will be derived from the original GUI widgets after registration.

Model API
To access the data in the model, two types of model APIs are supported: to get information and to use methods.As shown in Figure 4, new interface systems can simply call these APIs to access the content of the original GUI, and thus fill it into alternative interfaces.Particularly, Figure 10 demonstrates how a car display calls these APIs during runtime.The three parameters required to call the API are defined as follows.
• API Name: Denotes the API to call.
• Condition: Defines constraints for the model's subtree (or forest) selection.
• Target: Identifies desired information or method.Table 5 in the Appendix lists in detail the main supported APIs and their usages.The following subsections introduce the two categories of APIs.

To Get Information.
We can access the content of any node or subtree within the model.Given the Condition and Target, the model locates the relevant information in the tree and returns the result.The model supports get_property and get_related_info for object information queries, and associated APIs for list queries.
• get_property: The properties can belong to any subtree or forest, as determined by the Condition.The Target specifies which properties to return.In particular, if the target is an Object name, all properties of the object are returned.
• get_related_info: Get the information of Object B that is related to Object A. The description of Object A and their relationship are set in the Condition, and the Target specifies which properties of Object B are required.
• list related: Supports specific operations such as retrieving list items in a set interval (get_list) and finding an object's position (index_of ).Considering the list's versatility, additional APIs like filter and sort can be implemented.

3.2.2
To Use Methods.We can use any method defined in the model.By assigning values to the required parameters in the Condition and specifying the method name in the Target, the model can trigger the method by performing operations according to the Execution Sequence.

INTERACTION PROXY MANAGER
In this section, we present the IPManager, responsible for generating a UIAD Model from the original mobile GUI and managing requests from the new interface system.
To achieve this, we first establish the mapping relationship between the original pages and the model, enabling run-time model generation based on the current page.Therefore, the IPManager consists of an offline registration system and a run-time system, depicted in Figure 4.During the offline phase, the system registers the model with the original pages, storing the results in the Model Registration File.In the run-time phase, the system employs this file to generate the corresponding model instance from the current GUI.
To enhance the reliability and efficiency of the run-time system, we introduce the model manager, overseeing data exchange with new interface systems.The model manager dynamically maintains mapping relationships between the model and the original GUI, executes simulated actions on the original GUI, manages previously generated model instances, and preserves the latest version of the UIAD model.

Registration System
The registration system, depicted in the dashed box of Figure 4, establishes the mapping relationships between the model and the original GUI.These relationships are integrated into the Model Registration File, which lays the foundation for subsequent run-time semantic analysis.
Based on the functional requirements of a given application, developers organize all semantic elements and establish the corresponding Model Specification, as described in Section 3. The system then identifies the original GUI widgets linked to each semantic element.To achieve this, the system records three components: (1) page classification, (2) widget recognition, and (3) page jumping graph.The first two components support widget recognition within a single page, while the third enables cross-page searches.The process requires human annotation, for which we employ an interactive annotation mechanism to reduce costs, further detailed in Section 5.

Widget Recognition.
As shown in Figure 5, a mapping rule specifies the property of specific GUI widgets (e.g., image, text, action) corresponding to a semantic element in the model tree, indicating that the GUI widgets assign value to the semantic element (Figure 5a) or clarify its triggering mechanism (Figure 5b).Different mapping rules apply to different GUI widgets; hence, when using a rule, it is necessary to recognize all applicable widgets on the run-time GUI.The process involves classifying widgets according to the semantic elements they map to.Similarly to page classification, the criteria of widget recognition can dynamically change, which depends on the model structure.

Page Jumping Graph.
While the previous sections cover the semantics within a single mobile page, the page jumping graph represents the semantic relationships between pages.It helps the run-time system find the correct routes when the required widget is not present on the currently displayed page.The graph encapsulates both the route and the context cursor.Route: A basic jumping graph is a directed graph generated automatically, with nodes as pages of the same class.Edges specify the widget and its action (typically "click") triggering the jump.With this graph, a route between any two pages can be found, ensuring at least one route from the start page to the homepage and then to the end page.
Context cursor: Page jumping is required not only when page classes differ, but also when semantic differences exist among pages within the same class.For instance, if the current mobile page is for chatting with Contact A and the target is Contact B, page jumping is needed even though both pages are chat pages.Therefore, we define the contextual semantics of a single page as a cursor in the model, with the subtree below representing the content covered by the page.In the example above, there are two subtrees, Contact A and Contact B, in the run-time generated model.Since the cursor is on Contact A, page jumping is required to reach Contact B.

Run-time Semantic Analysis
After registering with the original GUI, the run-time system analyzes the semantics of the displayed page on the phone using the Model Registration File (derived in Section 4.1).As depicted in the solid box of Figure 4, each parsed UIAD Model at runtime contains information and methods for a single page, known as a model instance (e.g., Figure 6).The system re-analyzes semantics whenever the mobile GUI updates.
Run-time semantic analysis follows the same two-step process as the offline phase: page classification and widget recognition.Necessary widgets are identified and assigned to corresponding semantic elements based on pre-designed mapping rules, adhering to a top-down layout tree order.Algorithm 2, detailed in the Appendix, is utilized for this purpose.

Model Manager
The model manager is responsible for ensuring the UIAD Model's availability on demand.As depicted in Figure 7, the manager continuously listens to requests from new user interfaces and generates responses through the Processor.Since the model instances generated in Section 4.2 can only describe the current single page, the manager relies on the Route and Cache to record the required UIAD Model instances for data integration.If an error occurs at any stage, the Error Recovery is triggered to resolve the issue.
The following paragraphs detail the functionality of each module.

Processor.
The Processor identifies the target subtrees in the model based on the received API request.For getting information requests, the corresponding part of the model tree is directly returned, while using methods requests prompt automatic phone operation according to the Execution Sequence.If data is missing, the manager invokes the Route module to perform page jumping and retries the process.The Route module resolves this issue in two cases: • If the current context cursor conflicts with the target (e.g. the target is Contact B but the cursor is on Contact A), the manager first performs a global back until the cursor leaves its original location and resolves the conflict.It then initiates the second case's processing.
• Without context conflicts, the manager identifies all destination page classes containing the requested information or methods, based on the Model Registration File.Given the current displayed page class and the destination page classes, the manager determines the shortest route and operates the phone accordingly.If the route is not found or the operation fails, the Error Recovery (described below) is triggered.Additionally, a cache for lists is implemented.Synchronizing the mobile GUI and the new interface may be challenging when list lengths differ.The manager merges list items from multiple model instances into a larger list during phone scrolling, storing the expanded list in the cache.Cache data is prioritized when the API is called; if insufficient, the list is controlled to scroll and obtain more data.

Error Recovery. Run-time interruptions due to errors include the following two cases:
• The Route module failure: On the first failure, the manager performs global backs to return to the homepage and re-calls the Route module.If the second attempt also fails, the destination page might be missing from the preset dataset, prompting users to add page data using the registration system.• Inability to locate requested data even after reaching the destination page: It can stem from either the data's absence, requiring model adjustment manually for alternate solutions, or widget recognition errors, which can be addressed by annotating unrecognized widgets and retraining a new recognition strategy.

INTERACTIVE ANNOTATION MECHANISM
Manual annotation is required during model registration with the original UI (described in Section 4.1).Among the three components mentioned, the page jumping graph can be primarily automated [17], and the context cursor is well-defined and easy to annotate.However, page classification and widget recognition pose greater challenges.Since classification criteria dynamically update based on model modifications, pre-defined heuristic rules or pre-trained data-driven models are not applicable.In this section, we propose the Interactive Annotation Mechanism (IAM) to efficiently carry out the annotation task.

Workflow of the IAM
Since GUIs originate from human design logic and follow established design principles, the variations in GUIs are enumerable.By extracting sufficient features from page layout data and images, it is possible to generate a comprehensive set of rules that cover all cases.The challenge lies in refining a classification strategy for each class.Directly proposing strategies involving complex feature combinations is impractical for annotators.Furthermore, without a comprehensive understanding of mobile GUI design, it is impossible to confidently establish a robust strategy based on sample interfaces.Additionally, pre-defined strategies cannot be dynamically adjusted when false cases occur.
In contrast to abstract classification strategies, human perception of the class to which each instance belongs is clear.To this end, we propose a human-computer collaborative mechanism that constructs the classification strategy iteratively, allowing for continuous improvements based on simple human input.Our interactive annotation mechanism, shown in Figure 8, enables continuous refinement of the classification strategy through iterative annotation and inspection.4), ( 5) and ( 6) iterate until annotators no longer make new annotations.

Heuristic Rules.
In the first round, we construct a fixed similarity function using a predefined set of features to determine whether two instances belong to the same class, with only one annotation to start the classification.The feature set and similarity function are available in Appendix B.

Iterative
Update.Heuristic rules, used as the original classification strategy, cannot guarantee correctness (Table 1).These rules are tested on the dataset, and the results are visualized for annotators to correct errors.We use decision trees-an interpretable algorithm conducive to manual modification-trained based on new labeling or confirmation data, and refine them until satisfactory.While both pages and widgets follow the workflow above, their implementations differ significantly, as detailed in Table 6.

Offline Dataset Validation
We established an offline dataset to evaluate the performance of two existing methodologies and the IAM.We developed UIAD Models for 12 applications across 9 categories, as shown in Table 7.Our selection criteria aimed to include applications with rich data, diverse functionality, high user engagement, and strong crossdevice interaction potential.Corresponding to these models' needs, we collected 6,413 pages and 116,516 widgets, which needed to be classified into 339 and 7,981 classes, respectively.

Performance of Existing Methodologies.
We evaluated the following two existing methodologies.
• Pre-defined heuristic rules: A detailed implementation is available in Appendix B. This method has been utilized by prior interaction proxies [61].We employed a five-fold cross-validation technique on our dataset.• Pre-trained data-driven models: We employed Screen2Vec [35] with Euclidean distance for page classification, and the semantic-icon-classifier [41] for widget recognition.These two models were trained on the Rico dataset [17] and tested on our dataset.
As the results in Table 1 demonstrate, both pre-defined heuristic rules and pre-trained data-driven models exhibited limitations: the former's uniform rules couldn't cover every case, while the latter's classification criteria differed from the models' specifications.

Performance of the IAM.
We evaluated the performance of the annotators who used the IAM.Unlike the above two methodologies, the IAM is a human-machine hybrid system for data annotation, rather than a stand-alone model.Its effectiveness is assessed through the following metrics: It is important to note that the performance of these metrics is only influenced by the order of the cases in the dataset, and is independent of the individual annotators (assuming no errors are made during the annotation process).Consequently, we invited 10 annotators to individually handle all cases in the dataset.For each annotator, the data was shuffled randomly to ensure an unbiased evaluation.
The experimental results validate that by employing the IAM, multiple rounds of verification and annotation ensured that the recognition of all classes could be achieved with 100% precision and recall.Furthermore, IAM offers the following advantages.
• Simplicity in generating strategies: The decision trees exhibit a complex structure, averaging 11.27 nodes (sd=5.06)and a depth of 5.18 (sd=2.26).Manually deriving such a rule set is challenging, but by the IAM, it can be automatically extracted by adding a few instances.

• High single annotation gain:
The average number of extra annotations for pages was 1.58 (sd=1.31) times and for widgets, it was 1.95 (sd=1.60)times.Interactive annotation leverages human decision-making to provide more representative examples, effectively avoiding the repeated annotation of similar cases that may occur in traditional machine learning.For instance, as shown in Figure 9(c), adding one single image is sufficient to learn a more refined classification strategy.• Dynamic adjustability: Subdivision and merging frequently occur during model editing.Figure 9 illustrates an example of merging images and text into one class, which can be further subdivided based on the message sender.This level of flexibility is not supported by pre-defined rules and pre-trained models.

ENHANCED DESIGN FLEXIBILITY AND DIVERSE APPLICATIONS WITH THE IPMANAGER
In this section, we highlight the enhanced design flexibility and diverse applications provided by the IPManager.We demonstrate the capabilities of IPManager by creating three distinct user interfaces for WeChat.

Innovative Design Pattern
The innovation encompasses the following three main aspects.

Flexible Mapping Options.
The reorganization of UI elements and various mapping options facilitate the development of flexible and versatile design strategies for new UIs.Many-to-one pages enable data delivery from small to large screens, one-to-many pages suitably project large to small screens, and many-to-many pages are commonly employed in constructing multi-modal distributed interfaces in AIoT scenarios.

Widget Customization.
Widgets can be customized with interesting trigger patterns.Many-to-one widgets bind a new UI widget to multiple original widgets with similar semantics across different mobile applications, enabling a single action to trigger multiple application functions.One-to-many widgets allow the same original widgets to be triggered in multiple ways or modalities, such as initiating phone calls via touch, voice, or physical shortcut keys.Many-to-many widgets combine the features of the previous two, offering the potential for service combinations.

Tailoring Applications for Specific User Groups.
The flat management of methods in the UIAD Model enables developers to create simplified application versions tailored for specific user groups, such as accessibility services and services for the elderly.This approach ensures that the applications address the unique needs and preferences of these user groups, improving usability and user experience.

Distributed Interfaces and Data Integration.
The UIAD Model's support for distributed interfaces across devices enables users to leverage mobile phones as data sources and integrate personal data into different UIs for various scenarios.This feature allows for seamless data flow and user experience across multiple devices.

Application Examples
To demonstrate the three aspects mentioned above, we collaborated with the WeChat mobile application and invited three development teams to create three distinct user interfaces based on the UIAD Model: the tabletsized car display GUI, the smartwatch GUI, and the voice user interface.GUI reconstruction on the car display helps drivers interact with minimal attention cost.The car GUI system's layout is designed specifically for car displays, with the model APIs directly rendering the GUI.Casting from the phone to the car display exemplifies a "small to big" transformation, allowing content from multiple phone pages to be displayed on a single car display page, as shown in Figure 10.
GUI reconstruction on the watch improves user experience in high-mobility scenarios, such as checking messages while running.Casting to watches demonstrates a "big to small" transformation.The parameters required for API requests may originate from multiple pages on the watch, as shown in Figure 11.The voice user interface is built on a modality where the machine completes tasks based on users' natural language commands.The developers employed Large Language Models to parse the commands into UIAD Model APIs and their corresponding parameters, enabling the creation of a simple voice user interface.
These examples show that our approach enables developers to create user interfaces that cater to different screen sizes, modalities, and user needs while supporting seamless data integration across devices.

WORKFLOW PERFORMANCE EVALUATION
Based on the aforementioned system design, we implemented the system and designed a user study to demonstrate the effectiveness of our approach.This study has been reviewed and approved by an appropriate Institutional Review Board (IRB).

Implementation
We implemented the IPManager, comprising the offline registration system and the run-time system, and developed a UI deployment tool to validate model usability.

Registration System.
We developed an interactive annotation platform using Vue for model design and registration with the original GUI (Figure 13).A phone application obtains GUI layouts and captures screenshots using Android's Accessibility Service API and MediaProjection6 , while a Flask server manages data transfer and decision tree training for classification.

Run-time System.
A Flask server provides real-time semantic analysis and the model manager.It generates the UIAD Model and sends page-jumping instructions when necessary.The phone application sends page data to the server and receives instructions, which guide it to simulate on-screen operations and complete the jump via Android's Accessibility Service API.

Target-device UI Deployment.
Given the designed target-device UI's HTML file, the deployment tool specifies the content source of UI elements by binding model API (Figure 14).This generates executable front-end code, allowing the GUI to run as web pages and support voice command input.

Procedure
The user study involved 4 distinct study groups corresponding to the following 4 steps.

Identifying Use
Cases.We interviewed 3 experienced AIoT product managers (P1-P3) to identify functional requirements for various use cases.Based on their opinions, we designed 4 use cases and their corresponding applications.

Developing UIAD Models.
We engaged 8 participants, including 4 junior programmers (with less than 2 years of programming experience) (P4-P7) and 4 non-programming students (P8-P11), to create UIAD models for the 4 use cases.Each participant was presented with the use cases using a Latin square design sequence for counterbalancing.This resulted in a total of 32 models.Participants utilized our IPManager to design Model Specifications, register with the mobile interface, and test Model APIs.

Developing Target-Device UIs.
We hired 8 professional designers (P12-P19) to create the required UI for each use case, tailored for the target devices.Using a Latin square design sequence, each designer used 4 UIAD models developed in the previous step.Then they bound each UI element to the corresponding model API and automatically deployed the new interfaces.

Experiencing
Target-Device UIs.We invited 8 end-users (P20-P27), aged 18 to 26, to use 4 developed UIs in different use cases with their various devices.They provided feedback to assess the usability and effectiveness of the implemented applications on the target devices.The experiment lasted for two weeks, with each participant spending over 5 hours using each UI.

Results
We will analyze the results of each step of the user study.

Identifying Use Cases.
During our discussions with product managers, our approach was validated, with P2 and P3 noting the model's adaptability based on personalized needs.P1 stated that the deployment on various devices can be achieved through our approach.They also expressed further expectations.P1 said "I hope to support multi-device input and output when playing games and watching movies.However, the current solution seems to have limited ability for video streams." P3 suggested incorporating data from other devices, such as health data from a smartwatch.
Additionally, they noted that our new approach would bring significant changes to product design ideas.complexity, especially when user interaction is limited, such as while driving." P3 mentioned that in a home setting, a smartphone could be used for control while a larger screen displays a 3D representation of the home's status.

P2 stated "[This approach] clearly excels in multi-device adaptation, prompting us to focus on designing products with this aspect in mind, instead of targeting specific devices as in the case of existing smart homes and smart cockpits. " P1 added, "We can easily consider separating information presentation and input, reducing user operation
In the end, we selected 4 representative use cases from our interview to carry out the next steps of the user study, as shown in Table 2.The details of these use cases are provided in Appendix F.1.

Statements Median
1.You can understand how the UIAD Model works.6 2.You think the annotation platform is easy to use. 5 3.You can design a good UIAD Model.6.5 4.You can register the model with the original GUI.6 5.You find the operation efficient.
6.5 6.You think the annotation platform is smart.
6.5 7.You think the UIAD model is not complicated.5 8.You are willing to use our system.6 Fig. 15.Distribution of participant ratings for each statement in Table 3. Ratings range from 1 (strongly disagree) to 7 (strongly agree).

Developing UIAD Models.
We present the evaluation results of the participants' performance, as well as their subjective feedback using a 7-point Likert scale in Table 3 and Figure 15.
Participants reported that the learning cost was within tolerable limits (Statements 1, 2, 3, 4 and 7), with an average training time per participant of 35.97 minutes (sd = 2.79), using WeChat as the example.The success rate of independently completing model development was 90.63%, with the remaining 3 models being completed with minimal guidance.The programmers had a 100% independent completion rate, while the non-programmers had an 81.25% rate.
Most participants acknowledged that the model effectively reduced development costs (Statement 5) and were willing to use the system for multi-device interface deployment (Statement 8).As shown in Figure 16a, participants' proficiency improved with an increase in model development instances, resulting in reduced completion time (p-value = 0.006).Figure 16b shows a difference in completion time between programmers and nonprogrammers (p-value = 0.017), attributed to the model aligning more with programmers' thinking.P5 stated, "The model is easy to understand from an Object-Oriented Programming (OOP) perspective." Additionally, as seen in Figure 16c, the time spent developing a model positively correlated with its complexity, related to the requirements of the use case.Notably, the slopes of the fitted lines for programmers and non-programmers exhibited significant differences (p-value = 0.010), indicating that programmers could adapt in less time as the model complexity increased.
Notably, compared to the existing workflows where senior programmers' estimates of development time would require at least dozens of hours, our approach significantly reduces development time and even supports non-programmers in developing.
Participants appreciated our approach.P6 believed that the model is released from the specific settings of different scenarios and is configured as a more universal framework.P9 took it as a more concise, abstract, and well-designed outer layer of applications.Participants also thought the annotation platform was very smart (Statement 6).P10 found the overall experience smooth and the configuration steps interconnected.However, P8 still hoped for further improvements: "Currently, the program strictly follows a step-by-step configuration, and each attribute match must be exact… More user-friendly interaction methods can be considered." Similarly, P10 expressed a desire for more intelligent fuzzy matching.Participants also identified the potential of our approach.P4 said "Constructing applications for various devices with the model as the kernel will become a new paradigm." P5 suggested that it could be extended to gestureoriented user interfaces, which can save a lot of code.P6 expected that "This would enable parallel operation across different devices." P7 expressed a strong demand for porting more applications (such as their frequently used running app) to the watch and interconnecting multiple AIoT devices at home through one terminal.
However, participants sometimes encountered ambiguities.P7 said "I'm not sure whether a concept should be further split out." P6 expressed concerns about "whether the pairing and annotation are comprehensive and complete." Actually, at the current stage, participants can make these decisions freely.When new requirements or issues arise later, they can come back and modify accordingly.

Developing
Target-Device UIs.We evaluated the designers' development costs in the study.The average training time per participant was 12.13 minutes (sd = 1.83), using WeChat as the example.The success rate of completing tasks independently was 90.63%, with the remaining 3 failures resolved after clarification of requirements.
The total time (Figure 17a) and the time for binding the model APIs (Figure 17c) did not change significantly with the number of tasks the participants have completed (p-value = 0.204 and 0.749, respectively).Instead, it is strongly correlated with use cases (Figure 17b (p-value = 0.006) and Figure17d) (p-value = 0.039).The average time spent on API binding was 4.91 (sd = 2.37) minutes, only accounting for 2.06% of the total time, indicating that most time was spent on UI design.
Participants reported reduced time compared to existing workflows.P12 said "It reduced the delivery cost with the front-end development department and there was no need for the original work on organizing the information hierarchy of the page." P14 said "The tree is better understood than the product manager documents." P17 and P19 noted significant time savings when the information structure and function were well-defined.P13 said "The design work is almost the same as before.The main learning cost is in learning the principle of the model but completing the binding is very efficient." P15 appreciated the run-time access to data via APIs for UI validation, saving them from manually inputting data.
Some participants sometimes encountered confusion, but resolved issues easily.P15 said "Some nodes [in the model] appear somewhat ambiguous and need further explanation." P18 said "Node names [in the model] can be unclear, but understanding improves when related to the original interface." P14 stated that designers still needed to build a design hierarchy upon understanding the model.
Participants found our approach met their design requirements, but it did impact designers' thinking and behavior to some extent.Unlike the traditional workflow, our approach encourages independent work, reducing interaction between designers and product managers, which can lead to occasional uncertainty in design decisions.For instance, P17 expressed uncertainty regarding the display of ratings, "whether they should be stars or specific scores".P16 expressed confusion on whether the term "find" should be interpreted as a search box or a search icon.P18 said "Sometimes I am unsure if my confusion is because I have not fully understood the model." While our approach permits design freedom, participants still sought confirmation from product managers due to unfamiliarity with the new workflow.

Experiencing
Target-Device UIs.The target-device UI examples corresponding to each use case are illustrated in Figures 21,22,23,24 in the Appendix.In this study, we evaluated the performance of the developed 32 UIs and the participants' subjective feedback using a 7-point Likert scale, as shown in Table 4 and Figure 18.

Statements Median
1.You are satisfied with the target-device UIs.6 2. The target-device UIs are easy to navigate and use.7 3.The design of the target-device UIs are clear and understandable.6 4. The target-device UIs are responsive and fast.7 5.There are no errors or issues while using the UIs.6 6.You are willing to use these target-device UIs.6 From the usage logs of participants (P20-P27), we collected 16,704 API call logs, boasting a success rate of 99.78%, with failures primarily due to network issues.The average response time for each API call was 413.73 milliseconds (sd = 353.05),varying across different use cases (Figure 19a, p-value < 0.001).
As shown in Figure 19b, the total time spent comprises five segments, each showing significant differences from the others (p-value < 0.001).Page loading time accounted for the largest proportion (56.48%), followed by network transfer time (24.63%).Semantic analysis time averaged a mere 36.36 milliseconds, with relatively small fluctuations (sd = 26.59).Waiting time, the duration required for a new page to stabilize with a ready layout and image, depends on factors like phone performance, network conditions, page content, and application optimization.In our experiment, we assigned a fixed empirical value as the waiting time after each operation.With a 50-millisecond waiting time, task success rates reached 100%; without it, they dropped to 67.64%.3. Ratings range from 1 (strongly disagree) to 7 (strongly agree).In reality, due to the multi-thread, cache, and background updates, a considerable portion of the time is not user-perceived.This is particularly evident in Use Case 2 and Case 4, featuring many background automatic updates not requiring user intervention.
Table 4 shows that most participants found the developed target-device UIs satisfactory in terms of ease of use (Statements 2 and 3), robustness (Statement 5), and perceived delay (Statement 4).The enthusiasm to use these UIs was evident among participants (Statements 1 and 6).For example, P22 stated "The ordering app wasn't previously compatible with watches or voice, but now I can order while I am riding bikes.It is amazing." P25 said "[Use Case 4] The combination of the functions of two apps is fantastic." P21 and P24 liked the ability to use their familiar navigation software on the HUD.Interestingly, P27 said "Removing ads is great." Despite these positive remarks, there were suggestions and concerns.P27 expressed a desire for faster startup times.Privacy and security issues were also noted.P26 stated "I'm worried about having my phone content collected.".P23 said "If I accidentally display private information on the big screen, it wouldn't be good." Additionally, new requirements also surfaced that were not considered during the development phase.These emerged in two areas: the first pertained to overlooked original phone data, as exemplified by P20's observation that certain food combination options weren't included in the ordering interface.The second area was usergenerated requirements.A case in point is P24's expectation for a one-click feature to copy a song from chat, paste it into a music app, and play it directly.To address these issues, we informed participants that our approach allows for incremental updates and low-cost modifications to accommodate new requirements, thereby enhancing functionality.

DISCUSSION
In this section, we discuss our approach based on the findings from the user study conducted above.

Design Guidelines
To enhance speed and robustness, we propose the following design guidelines for utilizing the model: • Minimize the number of page jumps on the phone, as they significantly increase time usage and the risk of unstable page loading.Try to avoid any cache miss when calling the API.For instance, the progression of the new UI should align as closely as possible with the natural flow of context switching, mitigating cross-level or frequent jumps of the context cursor on the model tree.• Be aware of spontaneous updates to the original GUI -changes that are not user-driven, such as "receiving a new message".Such updates may not always be tracked by the system, especially if the phone isn't displaying the "new message found" page at that time.To counter this, the new interface system should proactively call the API to monitor updates.• Keep in mind that mapping new UI elements and APIs primarily supports basic cases.If the new UI requires more complex display logic, such as a multistep API with nested parameters, developers will need to manage the new interface state and relevant parameter variables within their own code.

Capabilities of the UIAD Model
Compared to the preliminary Interaction Proxy [61], which is still in the proof-of-concept stage and not fully implemented, our proposed UIAD model offers several capabilities by efficiently decoupling a mobile application's UI and service: Understandable Description and Efficient APIs: Unlike existing proxies [58,62] that directly map the original UI to the target one, leading to repetitive handling of the original UI's complexity and dynamicity, the UIAD model offers a more holistic solution.It generates a structured and understandable description of an application's service semantics that is comprehensible to both programmers and non-programmers.This model, applicable across different target devices, offers efficient APIs that allow developers to focus on application functionality, unencumbered by the original UI's intricacies.This approach simplifies the development of new interface systems.Further, with the corresponding run-time support, these APIs guarantee the high performance of new interface systems.
Robust UI Reconstruction: Existing interface mapping algorithms, which use heuristic rules to compute layout signatures based on developers' specifications, are rudimentary.As Zhang et al. pointed out [61], "It can therefore be challenging to reason about the state of an app." To overcome this challenge, we employ the IAM to build the model that ensures a low annotation burden while maintaining high recognition accuracy for pages and widgets.This approach guarantees the robustness of new interface systems.
Complex Functions Management: In comparison to single-task-oriented programming by demonstration [33], the UIAD Model encompasses the full range of an application's functionality.It empowers developers to freely organize and invoke diverse APIs, such as chaining and nesting, to support complex tasks.Given its adeptness at handling intricate semantic relationships, the model is particularly suited for the heavily engineered task of UI reconstruction.
Crowd Engagement: As evidenced by our user study, the UIAD Model's ease of comprehension extends to both programmers and non-programmers.This accessibility broadens user engagement, reduces the development threshold and cost, and thereby enhances its adoption and effectiveness.

Feasibility of the UIAD Model in the Future
We discuss the future feasibility of the proposed model from three aspects: Technical Feasibility: The maturity of Robotic Process Automation (RPA) in PC platforms validates the potential of automating user interface interactions.Coupled with AI research into web page understanding, extending these technologies to smartphones supports the prospects for automated model construction.
Implementation Feasibility: Our user study results confirm that our model meets various departmental needs and effectively curtails development costs.Primarily aimed at functional migration, it can support currently incompatible applications such as video streaming or gaming with the simple addition of video transmission technology.
Market Feasibility: The limited number of applications for AIoT devices like smartwatches presents a challenge to ecosystem development.Our model can expedite the migration of existing applications to these platforms, thereby fostering long-term growth and diversity.

LIMITATIONS & FUTURE WORK
This section highlights our work's limitations and proposes potential future directions.
Enhancing Data Loading: Our current reliance on Android's Accessibility Service API results in slower data retrieval and limited availability.Future work could explore acquiring data directly from the phone system, which could potentially reduce page loading time and enhance the semantic richness of the model.
Automating Model Generation: Our goal is to automate the process of generating UIAD Models from existing mobile GUIs through reverse engineering.The main challenge is the subjective nature of the semantic analysis, which complicates controlling the model structure without human intervention, especially when handling complex semantics in various applications.Accurately representing intricate nested structures and relationships is crucial for optimal UI generation.We also need to take into account privacy rights, data importance, and additional functions when dealing with advanced semantics.The integration of these factors may help automated model generation to better cope with real-world complexities, leading to more efficient and user-friendly interfaces.
Automating User Interface Construction: Our current application requires human input to determine the new UI layout and which APIs to call.We aim to generate UIAD Models from existing device interfaces, learn the transformation from models to different interfaces, and utilize context modeling [28] along with existing model-based UI deployment tools to automatically generate optimal user interfaces, potentially utilizing large models for lower development costs.

CONCLUSION
This paper presents Interaction Proxy Manager, which decouples interface and semantics for reliable UI reconstruction.Therein, the UIAD Model is defined based on the semantic analysis of existing mobile applications and offers APIs for alternative device systems.IPManager functions through two distinct phases: the offline phase and the run-time phase.In the offline phase, the Interactive Annotation Mechanism assists annotators in registering with mobile pages and learns the classification strategies for pages or widgets.It effectively reduces annotation difficulty and cost, while ensuring recognition accuracy, as validated by an offline dataset.Subsequently, the run-time system generates the UIAD Model using learned strategies, providing information or invoking methods based on requests received from new interface systems.We showcased IPManager's design flexibility and application diversity by developing three applications that extended phone services to car displays, smartwatches, and voice-oriented interfaces.We conducted a user study to evaluate the entire workflow, involving product managers, developers, designers, and end-users.The result validates the usability, efficiency, comprehensibility, and robustness of our approach.We anticipate that our work will contribute to the broader field of UI reconstruction, facilitating the deployment of more services across diverse devices and ultimately enriching the user experience.

B.2 Similarity Function
Using our own dataset, we determined whether two items belonged to the same class through a similarity function (Algorithm 1).We established thresholds by leveraging the Receiver Operating Characteristic (ROC) curve.

D THE IMPLEMENTATION OF THE IAM
While both pages and widgets follow the workflow of IAM, their implementations differ significantly, as detailed in Table 6.Table 6.Differences between the page classification and the widget recognition.

Page Classification Widget Recognition
Purpose Each page corresponds to a unique class.
A widget can be mapped to multiple semantic elements in the model.

Model Implementation
A multi-class decision tree.A bi-class decision tree for each semantic element. Annotation

E OFFLINE DATASET VALIDATION
Aiming to validate the IAM, we constructed an offline dataset by developing UIAD Models for 12 applications across 9 categories, as shown in Table 7.We listed the following use cases based on the requirements from the AIoT product management to evaluate the performance of our workflow:    Table 13.Full results of the subject feedback from UIAD Model developers on a Likert Scale from 1 (strongly disagree) to 7 (strongly agree).Statement numbers refer to Table 3.

Fig. 1 .
Fig. 1.Challenges posed by the dynamicity of the interface.In this example of WeChat, green boxes indicate problematic widgets.(a) The avatar widget is not available on the layout.(b) There are a wide variety of widgets involved in the chat.(c) The layout hierarchy is confusing.(d) A situation of unrecognizable page.

Fig. 2 .
Fig. 2. How the Interaction Proxy Manager works.It extracts a UI-Independent Application Description Model from an existing mobile application and provides information and method API for alternative interface systems to construct GUIs on the smartwatch, car display, and VUI.

Fig. 3 .
Fig. 3. Part of the UIAD Model structure of WeChat.As the starting point, Root Object has 2 fixed-value properties.It has two Object List children: Contact List and Message List.Contact List contains Contact objects, describing all properties of each contact.Similarly, Message List contains Message objects, organizing all data related to each message.Additionally, two methods, "VoiceCall" and "SendMessage", are defined with detailed parameters.At this stage, most of the Properties' values and all the Execution Sequences are still empty, awaiting data from the original interface.

Fig. 4 .Fig. 5 .
Fig. 4. IPManager is in charge of the generation and run-time support of the UIAD Model.The dashed box area is the registration system in the offline phase, and the solid box area is the run-time system.

4. 1 . 1
Page Classification.Mobile pages are categorized based on their topics.Page classification serves as the first feature for GUI widget recognition and forms the foundation of the page jumping graph, helping the runtime system identify the current state and locate the correct routes.The classification criteria can dynamically change.When pages within the same class create ambiguity in widget recognition or necessitate a page jumping between them, we can divide the class into subclasses.

Fig. 6 .
Fig. 6.(a) The current page displayed on the phone and (b) its corresponding UIAD Model instance.

Fig. 7 .
Fig. 7. Overview of how the model manager works.

4. 3 . 2
Route.Data missing occurs when the state of the new UI and the original mobile GUI are not synchronized.

Fig. 8 .
Fig. 8. Workflow of the interactive annotation mechanism.(1) UIAD Model Specification is designed.(2) Annotators label the class to which each page or widget belongs.(3) The system applies pre-defined heuristic rules as the original classification strategy.(4) Test with the dataset.(5) Annotators modify or confirm some of the results.(6) The system uses new annotations for decision tree training.Steps (4), (5) and (6) iterate until annotators no longer make new annotations.
Training Set.To further reduce the annotation workload, a series of heuristic rules are employed to automatically generate more negative examples based on the existing annotated data, resulting in an expanded training set.5.1.4Summary.Figure 9 demonstrates an example of training a widget recognition strategy through interactive annotation.Ideally, only one annotation per class is needed.To prevent redundancy, only unconfirmed results are sent to annotators.The strategy is tested on new pages and adjusted with additional examples when required.For similar cases, a single example suffices to adapt the strategy, avoiding exhaustive enumeration, as demonstrated in Figures 9(b) and 9(c), where the incorrectly recognized message time widgets are removed.

Fig. 9 .
Fig. 9.The annotation process for recognizing all widgets of message content, using the feature set available in Appendix B.1.2.(a) The "Hello" widget was initially annotated.(b) The system recognized several widgets based on the annotated example.The annotator removed the "10/24/22 5:12 PM" widget, as it represents the message's time, not content.(c) The system corrected the misrecognition of message time and added recognition of the "Duration: 00:04" widget.The puppy image widget was not recognized and was manually added.(d) The system added recognition for the "1.pptx" widget, successfully recognizing all message content.

Fig. 10 .
Fig. 10.Interaction Proxy system on the car display.The car display sends get_list and use_method API requests to the IPManager during runtime.

Fig. 11 .
Fig. 11.Interaction Proxy system on the smartwatch.The smartwatch sends a use_method API request to the IPManager during runtime.

Fig. 12 .
Fig. 12. Interaction Proxy system on the voice user interface.

Fig. 16 .
Fig. 16.Average time spent by participants on developing UIAD models.(a) Completion time of each task attempt.(b) Completion time of each use case.(c) The relationship between completion time and the number of nodes in the model.Error bars in (a) and (b) indicate standard deviation.The details of the statistical evaluation are provided in Appendix F.3.1.

Fig. 17
Fig. 17.Average time spent by designers on developing target-device UIs.(a) Total time spent on each task attempt.(b) Total time spent on different use cases.(c) Time spent on binding the API for each task attempt.(d) Time spent on binding the API for different use cases.Error bars indicate standard deviation.The details of the statistical evaluation are provided in Appendix F.3.2.
Fig. 17.Average time spent by designers on developing target-device UIs.(a) Total time spent on each task attempt.(b) Total time spent on different use cases.(c) Time spent on binding the API for each task attempt.(d) Time spent on binding the API for different use cases.Error bars indicate standard deviation.The details of the statistical evaluation are provided in Appendix F.3.2.
(a) Average response time for each API call in different use cases.Error bars indicate standard deviation (b) The distribution of the response time.

Fig. 19 .
Fig. 19.The average response time for each API call.The details of the statistical evaluation are provided in Appendix F.3.3.

Fig. 21 .
Fig. 21.A Target-device UI example for Use Case 1.(a) The original mobile phone interface of NetEase Music.(b) New deployed UI.

Fig. 22 .
Fig. 22.A Target-device UI example for Use Case 2. (a) The original mobile phone interface of Amap.(b) New deployed UI.

Fig. 23 .
Fig. 23.A Target-device UI example for Use Case 3. (a) The original mobile phone interface of Meituan.(b) New deployed UI.The images have been post-processed to change the text into English.

Fig. 24 .
Fig. 24.A Target-device UI example for Use Case 4. (a) The original mobile phone interface of Schedule Application.(b) The original mobile phone interface of Express Delivery Application.(c) New deployed UI.The images have been post-processed to change the text into English.
4.3.3Cache.To optimize efficiency, a cache merges all historical model instances to form a complete UIAD Model, reducing overhead from page jumping caused by Route module triggers.If a new instance conflicts with an old one, the cache is directly overwritten.Conflicts arise only when the same semantic element has different values or matches different GUI widgets.Model instances generated from pages of different classes are all preserved.The similar structures of all model instances facilitate merging or overwriting.

Table 1 .
Classification Results.Annotators are required to traverse all data and annotate necessary cases until 100% precision and recall are achieved, signifying the achievement of complete annotation accuracy through iterative human-machine interactions.•Annotation Efficiency: It is reflected by the number of annotations needed to achieve complete accuracy.

Table 2 .
An overview of use cases.

Table 3 .
Subjective feedback from UIAD model developers.Full results are available in Table13in Appendix.

Table 4 .
Subjective feedback from end-users.Full results are available in Table14in Appendix.
for Each Iteration Modify or confirm the classes of pages.Remove False-Positive widgets and add False-Negative ones.

Table 7 .
An overview of applications with corresponding page and widget count evaluated.

Table 14 .
Full results of the subject feedback from end-users on a Likert Scale from 1 (strongly disagree) to 7 (strongly agree).Statement numbers refer to Table4.