D2S2: Drag ’n’ Drop Mobile App Screen Search

The lack of diverse UI element representations in publicly available datasets hinders the scalability of sketch-based interactive mobile search. This paper introduces D2S2, a novel approach that addresses this limitation via drag-and-drop mobile screen search, accommodating visual and text-based queries. D2S2 searches 58k Rico screens for relevant UI examples based on UI element attributes, including type, position, shape, and text. In an evaluation with 10 novice software developers D2S2 successfully retrieves target screens within the top-20 search results in 15/19 attempts within a minute. The tool offers interactive and iterative search, updating its search results each time the user modifies the search query. Interested users can freely access D2S2 (http://pixeltoapp.com/D2S2), build on D2S2 or replicate results via D2S2’s open-source implementation (https://github.com/toni-tang/D2S2), or watch D2S2’s video demonstration (https://youtu.be/fdoYiw8lAn0).


INTRODUCTION
Iterative app screen search, while an exciting area of recent work [16,17,18], still faces many challenges. First, Google image search is fast, Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro t or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). ESEC/FSE '23, December 3-9, 2023, San Francisco, CA, USA © 2023 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-0327-0/23/12. https://doi.org/10.1145/3611643.3613100 searches many images from the open web, and supports text-based search queries. But searching for an app screen via text queries remains clumsy, especially when looking for screens that contain certain UI elements in speci c locations. Such searches result in long text queries and produce few relevant results. Recent work has searched the widely-used Rico dataset via a combination of text search [18] and sketched element doodles [16,17]. These approaches support a few UI element types via deep learning. Expanding their scope would require additional specialized training data, which must be collected and curated.
When designing mobile applications, studying real-world examples aids in gathering requirements, analyzing current trends, and cultivating motivation to develop a compelling mobile app [9,10]. Given the broad and rapidly expanding market for mobile apps, an e cient mobile app screen search tool becomes valuable.
Designers commonly use drag-and-drop tools (e.g., Figma [2]) to create wireframes. Similarly, software developers utilize drag-anddrop-based visual kits (e.g., the Android Layout Editor [1] or Prototypr [3]) for UI development. The popularity of these techniques is growing due to their user-friendly nature, intuitive interfaces, and since they do not require specialized technical expertise [13]. D2S2 o ers an interactive solution via drag and drop for mobile screen search.
D2S2 is for novice users who want help creating a complete UI design during the early software development stages. Users can search for mobile screens by dragging and dropping UI elements on the canvas. The tool's search interface includes basic features such as undo and redo. Users can also add plain text and put text in a text-button. As a user adds, removes, resizes, and moves UI elements, D2S2 searches through 58k Rico [7] screens to fetch UI examples based on UI element type, position, element shape, and texts as shown in Figure 1. D2S2 fetches the top-20 screens and displays them in its website's top-pick screen search results section.
We recruited 10 software developers without prior UI/UX design training to assess D2S2's e ectiveness. The participants searched for a given target Rico screen with D2S2 until the screen appeared in D2S2's top-20 search results. In our experiment, D2S2 successfully obtained 15/19 target screens within a minute and 19/19 within four minutes. D2S2 further retrieved more relevant mobile screens than the other closely related competitor, Google image search. In summary, this paper makes the following major contributions.
• D2S2 is the rst interactive drag-and-drop app screen search tool. After each query change it updates its search results. • D2S2 searches 58k Android screens and is freely available (http://pixeltoapp.com/D2S2). • In a preliminary user study, D2S2 performed similarly as the deep-learning based TpD (but without requiring training) and better than Google image search. • D2S2's implementation (https://github.com/toni-tang/D2S2) is available under a permissive open-source license.

BACKGROUND
D2S2 searches 58k mobile Android app screenshots from the Rico dataset by Deka et al. [7]. Each screenshot has a corresponding DOM-tree container hierarchy, where each UI element is described by its Android class name, x/y coordinates, textual information, and on-screen visibility. Liu et al. expanded on this dataset by collecting 73k screen elements, categorizing them into 25 types of UI components, and further dividing text buttons into 197 and icon into 135 sub-classes [15]. D2S2 incorporates several common Android UI elements identi ed by Liu et al. Previous studies have explored using sketches and wireframe images to search for relevant mobile screens. However, wireframebased approaches such as Swire rely on complete wireframe images to identify screens with similar visual characteristics, often not considering UI element type and text within the screen [4,5]. Dependence on an entire wireframe image does not support the iterative nature of the design process.
Besides Google Image Search, our closest competitors are PSDoodle [16,17] and TpD [18], which o er an interactive and iterative approach to searching mobile screens. PSDoodle employs a deep neural network to identify sketched UI elements and then computes a ranking score for Rico's screens based on various factors, including UI element type, position, and shape. TpD extends PSDoodle by adding a text-based search that matches a text query with visible text on the mobile screen and UI element descriptions. Notably, TpD allows queries to contain text, UI element sketches, or both.

OVERVIEW AND DESIGN
To create a search system that is easy to use, we followed a usercentric approach. Via the Figma [2] graphical design tool, we thus rst created a UI prototype (Figure 2), showed the prototype to 11 computer science undergraduate students, and collected their feedback. By incorporating their feedback, we then iteratively enhanced D2S2's user experience, mostly by re ning D2S2's UI. All user feedback is in D2S2's repository.

User Interface & Query Language
In D2S2, a search query consists of a set of UI elements arranged on a canvas that models the screenshot of a mobile app. Starting with an empty canvas, the user interactively re nes this canvas, adding and adjusting UI elements as they should appear on the desired app screens. Each time the user modi es this search query, D2S2 retrieves matching app screens that have the query's UI elements at about the location the user placed them on the canvas. A part of the search is matching any texts the user added to the search query with screens' text contents and descriptions of their UI elements. Figure 3 shows D2S2's current UI. Besides moving the app bar to the bottom, the biggest change is allowing users to search D2S2's library of 52 built-in UI elements by the UI element's name and various synonyms. Figure 4 lists these 52 UI elements in the order D2S2' UI presents them. The order is TpD's UI elements rst, then UI elements identi ed by Liu et al., ordered by how common they appear in Rico screens [15].  The user searches (or scrolls) the UI element list, selects a UI element and drags and drops it on the canvas. The user can interact with the UI element, i.e., to move or resize it there. The app bar at the bottom of the canvas allows undoing and redoing the last element modi cations and clearing the screen. The user can add text either via a text-button from the UI element collection or via the app bar's "TEXT" feature. The latter adds a text element to the canvas the user can manipulate like any other UI element. Clicking on such a text element enables modifying its text content.
As in the earlier PSDoodle and TpD, UI elements may be nested, i.e., to support UI elements grouped in a container element. D2S2 encodes the canvas's current state as a set of 6-tuples of the form ( , , , ℎ, , ), one tuple per UI element on the canvas. The tuple lists an element's left-top corner's location in pixel-space, the UI element's width, height, category, and text content (for text and text buttons). The D2S2 webpage is written in React, as it provides client-side rendering [14] and e ciently manages various events such as drag-start, drag-end, and the undo/redo functionality.  To rank its 58k screens, D2S2 uses TpD's infrastructure, which in turn builds on PSDoodle's. For non-text UI elements, D2S2 uses PSDoodle's screen scoring scheme (which TpD similarly reused). Speci cally, D2S2 divides a mobile app screen into 24 equally sized tiles (6 along the width and 4 along the height) and maintains TpD's tile con guration. The main change is in more than doubling TpD's 23 UI element classes to D2S2's 52. In the back-end, this is straightforward by adding one screen ID index for each of the additional UI element classes to allow fast screen lookup.

D2S2's Back-end
For text elements, D2S2 reuses TpD's pipeline [18], which preprocesses the Rico screens' text contents and UI element descriptions (remove stop words, identify names, lemmatization, adding synonyms via contextual analysis, and tagging text content with on-screen location). As for text contents TpD only supports four di erent screen areas (top-left, top-right, bottom-left, and bottomright), D2S2 rst maps the location of a text element to one of these four TpD screen areas. As TpD, D2S2 then uses ElasticSearch with Levenshtein edit distance one, to heuristically also match slightly mis-typed user-provided text to screen contents.

D2S2 USAGE
To compare with its closest competitors TpD and Google Image Search, we enlisted 10 computer science students who did not have formal UI/UX design training. While the participants di er, we recruited them using the same criteria the TpD study used. To ensure diversity, we selected ve individuals with and ve without previous mobile app development experience. All participants were early-stage undergraduates aged 20-25. As a token of appreciation, each participant received USD 10 compensation. Speci cally, we are interested in the following research questions.
RQ1 How does D2S2 compare with TpD, in terms of total time of the interactive search, nal queries' UI element counts, and nal queries' top-k screen retrieval accuracy? RQ2 How does D2S2 compare with Google Image Search on a free user query, for producing relevant top-20 search results? For each participant, we had one video conference of about 30 minutes that started with us explaining D2S2's objectives. We then demonstrated the search process for an icon, dragging the icon to the canvas, resizing and adjusting the icon's position on the canvas, the functionality of the undo/redo/clear-screen buttons, and how to add text using the text and text-button features. Each participant accessed D2S2 over the internet via a web browser on their personal machine. We used D2S2's standard setup as a website hosted on an Amazon AWS EC2 general-purpose instance (t2.large), featuring two virtual CPUs and 8GB RAM. D2S2's repository contains all experimental results.

Similar Screen Search Performance as TpD
For this second part of a participant meeting, we used the 26 randomly selected Rico target screens used by TpD's evaluation. For each participant, we randomly selected from this pool one target screen per search session. We instructed the participant to create a query that would retrieve the target screen and re ne the query until the target screens appeared in D2S2's top-20 results. the total time, the number of UI elements and texts in the participant's nal query, and the target screen's rank in D2S2's results for that nal query. D2S2's top-k retrieval accuracy is the number of search sessions in which D2S2 ranks the target screen in its answer to the participant's nal query in the top-k. We use top-k retrieval accuracy, as the metric is widely used to evaluate related work [6,8,11] and correlates with user satisfaction [12]. Table 1 summarizes the results. Comparing with TpD's results is a little tricky as TpD's participants were instructed to search until the target screen appears in TpD's top-10 search results or the search exceeds 3 minutes. So TpD participants were encouraged to spend a bit of additional time to re ne a query. With this caveat, the overall results for D2S2 and TpD are similar. D2S2's top-20 screen retrieval accuracy is 100% (19/19) vs. TpD's 97% (29/30).
D2S2's total search session time was at least 23 seconds, 240s maximum, 68s average, and 50s median. This compares to a 5s minimum, 156s maximum, average 45s, and 35s median for TpD. Contributing to TpD shorter search sessions are TpD's experimental setup (which allowed participants to practice using TpD for some 10 minutes before collecting results) and D2S2 having more than twice the number of UI elements to choose from for a search query. We observed participants using signi cant time browsing the UI elements available in D2S2 and selecting the correct UI element.

More Targeted Than Google Image Search
In this nal meeting part, we instructed each participant to formulate a Google-style search query and perform a corresponding search using both D2S2 and Google image search (an example of a participant's query is "mobile screen menu icon top left and search icon top right"). We then asked the participant to rate each result in both tools' top-20 results as relevant or non-relevant to the query.
Participants judged 20% (77/380) of Google image search's results as relevant and 58% (222/380) of D2S2's result screens. D2S2's 58% relevance here is largely in line with TpD's 52% reported for searches for a given target screen [18]. Given D2S2's and TpD's slightly di erent experimental setups, it is hard to draw conclusions about their relative performance. For the search scenario over 58k Rico screens, both tools clearly perform better than Google Image Search.

CONCLUSIONS
Current sketch-based iterative mobile screen search has limitations in supporting many UI elements. Drag-and-drop provides a exible alternative. D2S2' provides an interactive, drag-and-drop search that displays results interactively. The tool is freely available and has undergone user testing, demonstrating its e ectiveness. D2S2 is a promising solution for novice users who require assistance creating a comprehensive UI design in the initial development phases.