Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces

Present-day graphical user interfaces (GUIs) exhibit diverse arrangements of text, graphics, and interactive elements such as buttons and menus, but representations of GUIs have not kept up. They do not encapsulate both semantic and visuo-spatial relationships among elements. To seize machine learning's potential for GUIs more efficiently, Graph4GUI exploits graph neural networks to capture individual elements' properties and their semantic-visuo-spatial constraints in a layout. The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task, which involved predicting the positions of remaining unplaced elements in a partially completed GUI. The new model's suggestions showed alignment and visual appeal superior to the baseline method and received higher subjective ratings for preference. Furthermore, we demonstrate the practical benefits and efficiency advantages designers perceive when utilizing our model as an autocompletion plug-in.

: Graph4GUI is a graph-based GUI representation that captures the connections between GUI element properties and constraints.Such representation can capture the visual-spatial-semantic structure of a GUI such that it could be effectively employed in computational design.a) To represent the GUIs, bipartite graphs comprising element nodes (colored purple) convey the GUI elements' properties and constraint nodes (colored green) that can be integrated into graph neural networks.Such representation can serve various downstream tasks, such as predicting constraints (dotted orange edge) for an unplaced element (colored orange).b) By iteratively predicting the sizes and locations of yet-unplaced elements, we can support designers by autocompleting partially completed GUI designs.

ABSTRACT
Present-day graphical user interfaces (GUIs) exhibit diverse arrangements of text, graphics, and interactive elements such as buttons and menus, but representations of GUIs have not kept up.They do not encapsulate both semantic and visuo-spatial relationships among elements.To seize machine learning's potential for GUIs more efficiently, Graph4GUI exploits graph neural networks to capture individual elements' properties and their semantic-visuo-spatial constraints in a layout.The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task, which involved predicting

INTRODUCTION
Modern graphical user interfaces (GUIs) are replete with diverse elements like text, graphics, buttons, checkboxes, sliders, and icons, arranged in various ways.GUIs deploy visual, spatial, and textual cues to guide users in their design.For example, colors convey grouping and attention, while visual cues like proximity or shared visual areas signal element associations [11].Elements are ordered and grouped based on grid lines; for instance, lists are often leftaligned [53].In addition, textual elements, such as headers, labels, and annotations, are needed to communicate the "semantics" of the various shapes and images.Despite architectural commonalities, each GUI genre and application has its unique conventions.The question of how to represent a GUI's visual-spatial-semantic structure such that it could be effectively conveyed in computational design remains open [32,[34][35][36].
Prior methods for representing GUIs and their constituent elements fall short of capturing these integrative aspects.Some work has focused exclusively on textual content in GUIs, but neglected the visual aspects of design and the variety of GUI elements [45,46].In contrast, other approaches emphasize visual appearance and GUI element types but overlook the content of the elements [1,15,50].This results in similar treatment for GUIs sharing structural and visual similarities but differing in content.Layout constraints represent the layout relationships between GUI elements, such as alignment, same-size, and grouping.Most existing methods, employing Convolutional Neural Networks (CNNs) to learn GUI images, face challenges because they have to learn layout constraints from pixels.This makes training a model to predict constraints a challenge.
To address this gap, we propose a novel graph-based GUI representation Graph4GUI that integrates GUI elements with layout constraints (Figure 1a).We formulate a bipartite graph to express GUI elements and their relationships via two kinds of nodes: element and constraint nodes.As shown in Figure 2, each GUI element node represents element properties, including visual appearance, textual content, element type, position, and size.Constraint nodes include four types of constraints: alignment, same-size, element grouping, and multimodal grouping constraints.We then employ Graph neural networks (GNNs) to learn a domain-specific representation from the graph-structured data.
Compared to other GUI representations [1,13,15,[45][46][47]50], the design of our GNN aims to balance two representation learning goals.On the one hand, we want to maximally exploit domain knowledge, particularly using stable, universal GUI design characteristics without learning from scratch.On the other, we want to capture contingent design tendencies -such as color palettes and fonts -without manual specification.If successful, this would make it possible to learn a useful representation with fewer samples.To this end, our approach is to represent relatively universal principles of layouts as constraints in a graph, thus reducing the need to learn them from scratch.At the same time, genre-specific tendencies are learned by applying GNNs to capture the design features unique to GUIs.In contrast, traditional structured representations in computational design, such as DOM trees, are not designed for learning but for specifying GUIs.While they represent the view hierarchy of a layout, they do not lend themselves to many machine learning methods.For example, there is no natural way to represent the concept of grid alignment with DOMs, as this information is split into the leaf nodes of the tree.To compute whether two elements are aligned, the whole tree needs to be parsed.In contrast, our method directly embeds connections between GUI elements and their constraints in the graph.
To examine the effectiveness of our graph-based representation, we applied it to different applications: GUI autocompletion, GUI topic classification, and GUI retrieval.Our primary emphasis lies in autocompletion due to its complexity.This autocompletion problem is challenging, not only because exploring potential GUI element combinations is computationally costly but also because good solutions must consider visual, spatial, and semantic constraints among the to-be-placed elements and those already present.We present a method for iteratively recommending the position and size of unplaced GUI elements, as illustrated in Figure 1b.To augment the model's usability for designers, we introduce alternative element suggestion options, including the recommendation of element

Approach
Textual Content Visual Appearance Element Type View Hierarchy Layout Constraints Table 1: A comparison of existing approaches, marked based on if they explicitly represent: textual content, visual appearance, GUI element type, view hierarchy, and layout constraints.The view hierarchy, akin to a DOM tree, begins with a root view and organizes all its descendants in a tree structure.Layout constraints denote relationships like alignment and grouping among GUI elements."✓" indicates that the model can capture the factor, while "✗" indicates that it does not.
groups and simultaneous suggestions for all elements.For evaluation, we conducted Study 1, comparing it with GRIDS [13], an approach for autocompletion using integer programming.Our model produced suggestions with superior alignment and visual appeal compared to the baseline, consistent with participants' preferences.
In Study 2, we integrated our model into a plug-in for a design tool.It allows GUI designers to utilize autocompletion capabilities in real time while maintaining full control over the design process within the interactive design tool.We interviewed six GUI designers to study our tool's practical benefits and efficiency advantages.Our work makes the following contributions: (1) A novel graph representation for GUIs, Graph4GUI, which incorporates GUI element properties such as textual content, visual appearance, and element types, along with their relationships and constraints.(2) A graph neural network method for learning the graph representation of the GUI to optimize the GUI element dimensions and positions.(3) An autocompletion framework that serves to demonstrate the graph representation facilitating interactive GUI design.The framework's effectiveness was evaluated through a comparison study and a designer study.(4) Applying the graph representation to other applications, including GUI topic classification and GUI retrieval.

RELATED WORK
This section focuses on the limitations of preexisting representations of GUIs, the GUI-related applications of graph neural networks, and constraint-based approaches to layout generation.

Representations of GUIs
Existing GUI representations often prioritize specific properties while neglecting others.Table 1 provides a comparison of existing approaches, marked based on whether they explicitly represent textual content, visual appearance, GUI element type, view hierarchy, and layout constraints.Some representations focus on textual content, ignoring visual appearance and GUI element types [45,46].In contrast, alternative methods prioritize visual appearance and the types of GUI elements [1,15,50], but often overlook the importance of textual content.This can lead to similar treatment of structurally and visually similar GUIs that differ significantly in textual content.Screen2Vec [47] addresses this by generating GUI representations incorporating textual content, element types, and screen hierarchy.Our method, Graph4GUI, extends this consideration by incorporating constraints and interrelationships between GUI elements.ILuvUI [37] proposed a vision language model to create a language representation of GUI.A relevant method, GRIDS [13], is an integer programming method optimizing grid layouts using layout constraints.We also consider layout constraints since they are important in developing a well-structured GUI design.With our approach, Graph4GUI, we propose a solution that considers not only textual content, visual appearance, and GUI element types but also the constraints and interrelationships between GUI elements.

Graph Neural Networks on GUIs
Graph neural networks [24,25,62,71] are state-of-the-art models for encoding graph-structured data.Whereas CNNs rely on convolution over spatial neighborhoods and enjoy widespread application to encode GUI images, GNNs aggregate information from neighborhoods defined by an input graph that are not restricted to the spatial domain.This gives them the potential to exploit information about the GUIs beyond pixel level.Li et al. applied GNNs to denoise an existing user-interface dataset [44], and performed GUI autocompletion from the GUI layout hierarchy but failed to generate visually realistic GUI results [48].Brückner et al. [10] looked into constructing a graph using GUI elements' relative positioning to predict elements; however, it proved challenging to learn the layout structure from only relative positions.HAMP [2], introduced a graph representation with nodes for app descriptions, GUI screens, GUI classes, and element images to perform GUI tasks.Still, such detailed metadata cannot be extracted from GUI screenshots without extensive manual annotations.In contrast, our application of GNNs is geared toward modeling the layout graph of GUI elements, thereby enabling us to capture both the topological intricacies of the GUI layout and the properties of individual GUI elements.

Constraint-based Layout Generation
Constraint-based layout models have been widely used in GUI layouts [3,5,6,8,28,40,51,52,60,61,63,66,68,72,75,76] and document layouts [7,29,30,41].Early methods like Peridot [55,57] and Lapidary [74] proposed programming by demonstration, automatically generate constraints for user interfaces based on designer interactions.These models offer greater flexibility for layout generation than simple layout models such as group, grid, table, and grid-bag layouts [54,56,58].Prior work proposed constraintbased layout generation [17,69].For instance, SUPPLE [20][21][22] presented constraints for alternative widgets and groupings, and ORCLayout [33,38,39] introduced OR-constraint as a mixture of hard and soft constraints to unify flow-based and constraintbased layouts.Constraints have functioned also to enable layout personalization [18], maintaining consistency [19], giving layoutalternative suggestions based on user-defined constraints [4,65], generating layout alternatives from templates or modifiable suggestions [31,64,73], and allowing both author and viewer to specify the layout [7].Finally, recent work has explored applying deeplearning approaches to automatic layout generation, eliminating the need for manually defined constraints [42,77].However, none of these methods predict constraints for GUI elements.Incorporating GUI element relationships as constraints enables our model to predict them within the network.This enhances the network's ability to establish connections and deepen its comprehension of both element properties and constraints.

GUI LAYOUT PROBLEM
Designing a GUI involves carefully selecting elements and organizing them into a structure that is usable and aesthetically appealing.This brings with it a large number of both element-specific and layout-related decisions and is typically iterative in nature [23].
Our objective is to provide a more comprehensive characterization of GUIs compared to existing GUI representations by factoring in visual, spatial, and semantic features.We formulate the problem by partitioning it into elements and layouts while also defining the GUI design problem as an optimization process.

Element Properties
The first major consideration is visual appearance, a broad notion encompassing such properties as color combinations, geometric shapes, and GUI styles.For example, employing tranquil hues such as blues and greens may encourage a calming interface, requiring a more spacious and streamlined layout.In contrast, energetic shades (reds, yellows, etc.) might necessitate a more condensed and highenergy design.Likewise, the arrangement of the various shapes plays a crucial role in the overall visual appeal of the layout.Equally essential is the textual content -encompassing all forms of text content visible in the user interface.Labels can play a crucial role from the design perspective.The alignment and arrangement of labels that contain long paragraphs require larger spaces, to avoid clutter and overlapping.Conversely, elements containing brief text strings or bullet-point-style items may allow for compaction, thereby leaving room for other pertinent features.The format of the content is also critical; condensed or expanded layouts especially require careful consideration of element spacing, alignment, and the overall design arrangement.Finally, each element type, such as button, checkbox, or text field, has inherent functionality and objectives.For example, buttons need to be easily reachable if user interaction is to be effective, whereas checkboxes might not require such prominent positioning on account of their less frequent use.Text fields' usual dominance as the focal point is due to their role in data entry, which necessitates adequate design space.

Layout Constraints
Layout-level properties can be approached as constraints.Imposing constraints can help maintain consistency across GUIs and aid users in understanding them.Among commonplace GUI constraints are keeping similar elements the same size, aligning elements along a shared grid, and grouping related elements together.The alignment constraint, for instance, enforces uniform positioning and visual consistency by arranging elements along a shared axis, thus maintaining a structure within the layout [53].In addition to establishing relationships among elements, alignment strengthens the synergy between the elements and the broader layout.The samesize constraint is equally essential in that it guarantees maintaining appropriate sizes across GUI elements.This enhances visual harmony by making sure of consistent sizing among related elements.
Element grouping is important for enhancing the layout's organization and logical structure.This is achieved by bringing together related elements with similar functions.The strategy promotes userfriendly navigation.Finally, multimodal grouping constraints lend a structured feel to varied elements, with coherent organization across distinct types.This allows for a harmonious combination of text, images, and other GUI components while remaining respectful of the layout's coherence and uniformity principles.

Formulation of GUI Layout Problem
We can now define the GUI layout problem as an optimization problem.With this formulation, we decide on the positions and sizes of elements in a GUI, denoted as   = (  ,   ,   , ℎ  ), where the coordinates (  ,   ) represent the top-left corner of the -th element and (  , ℎ  ) represents its width and height.Here, we focus on a setting where all the elements are in rectangular bounding boxes.We define two objective terms, the element loss term (L ele ) and the constraint loss term (L cons ).The first of these encapsulates the properties of the GUI elements, such as visual appearance, texture content, and element type.The constraint loss term covers the layout constraints that guide the GUI design toward an optimal arrangement.The objective function we defined above becomes L ( ê1 , ê2 , ..., ê , F; , ) = L ele ( 1 ,  2 , ...,   , ê1 , ê2 , ..., ê ; ) where F = { 1 ,  2 , ...,   } pertains to the set of properties for the  GUI elements, including the visual appearance, textual content, and element type.The predicted size and position of each GUI element are denoted as ê = ( x , ŷ , ŵ , ĥ ), while the ground-truth sizes and positions are represented by   = (  ,   ,   , ℎ  ). > 0 is the weight of the constraint loss. > 0 is the weight of the boundary loss as a part of the element loss term described below.
The element loss term (L ele ) refers to the discrepancy between the predicted and actual values for both positions and sizes of the GUI elements, with a penalty imposed if the predicted elements are outside the interface area: We implement the Mean Squared Error (MSE) loss function to quantify the level of discrepancy between the predicted and the actual positions and sizes of GUI elements, represented as The boundary constraint is used to penalize a predicted element lying outside the interface's screen space: where  UI and ℎ UI are the width and height of the GUI.
We introduce the constraint loss term (L cons ) for estimating the discrepancy between the predicted constraints and the constraints present in the GUI design.This is measured by means of Binary Cross Entropy (BCE), whereby evaluates a binary decision for each constraint, namely, whether it is satisfied or not, where  represents the constraints based on the elements and element properties calculated as The term − log( ĉ) serves to penalize the model when a constraint  that should be satisfied has a predicted probability ĉ approaching 0 (where the ground-truth value is 1).This encourages the model to increase the likelihood of satisfying the required constraints.Conversely, the term −(1−) log(1− ĉ) punishes the model when the predicted probability ĉ is near 1 for a constraint  that should not be satisfied (since the ground-truth value is 0).This term encourages the model to reduce the likelihood of constraints that ought not be satisfied.
As a result, the formulation of the GUI optimization problem is

GUI REPRESENTATION
Our proposed method is designed to enrich GUI representation by developing a heterogeneous bipartite graph that covers both GUI element properties and layout constraints, thereby dealing with the intricate arrangement and interrelationships among GUI elements.This graph comprises nodes of two types: GUI element nodes and constraint nodes.The former expresses element properties specific to individual GUI elements, and the latter defines layout constraints for GUI elements in the interface display.Integrating element properties and layout constraints into a single unified graph facilitates a thorough representation of a GUI's elements and layout.While earlier work has integrated element properties into GUI representation [47,49,59,67], our approach brings further benefits by not only accounting for the properties of individual GUI elements but also capturing their interrelationships and spatial arrangements within the overall layout.
To ascertain the linkages between GUI elements and constraints, we connect the respective GUI element nodes to the constraint nodes with edges.Specifically, GUI element nodes can only connect to GUI constraint nodes, and vice versa.Consequently, the graph constructed represents the GUI structure by establishing the relations between elements and constraints through its edges.

Graph Nodes for GUI Elements
In our graph representation Graph4GUI, each GUI element is signified by a separate node with properties identifying its position, size, visual appearance, textual content, and type.We encode these properties into an embedding vector and concatenate them to form a single attribute vector for the node (see Figure 2 a).

Position
Embedding.We define a GUI element's position by the coordinates of its top-left and bottom-right points, represented as ( 1 ,  1 ) and ( 2 ,  2 ), respectively.These coordinates specify the element's position and size in the GUI.To represent the position within the graph node, a trainable parametric matrix of size ( (, ℎ) + 1) × 256 is used to encode the position, where  and ℎ are the GUI's width and height, respectively.This matrix maps any coordinate to a 16-dimensional vector.On feeding the four coordinates into this matrix and flattening the resulting embedding, a 64-dimensional vector is output as the position embedding.
4.1.2Size Embedding.While an element's size can be derived from its position, it is useful to include a size embedding in our representation, especially for tasks such as GUI autocompletion that require a new element to be placed in the GUI at an unknown position.We embed the element size using a process similar to position embedding.This yields a 256-dimensional-vector size embedding.

Visual Appearance
Embedding.We encode the visual appearance of GUI elements by extracting high-level features and converting them into a feature vector, which serves as the element's visual representation.

Textual Content
Embedding.The textual content of elements is represented by encoding textual information into a vector that captures the semantic meaning and context properties of the text.4.1.5Element Type Embedding.GUI element types are represented as one-hot vectors.For example, in a dataset that contains the three element types Text, List Item, and Button, a button element will be assigned a vector of [0, 0, 1] as its type.In the case of our dataset, which contains 18 distinct element types, the element type embedding is a one-hot vector of length 18, which is processed by a trainable matrix to produce the embedding for the element type.

Graph Nodes for Constraints
Our constructed graph is designed to generalize the definition of constraints.We represent constraints of different types as separate nodes in the graph, which enables easy extension of the graph to include additional kinds of constraints in the future.Currently, we represent four types of constraints as nodes in the graph: alignment constraints, same-size constraints, element grouping constraints, and multimodal constraints (see Figure 2 b).

Alignment Constraint Nodes.
We incorporate alignment constraint nodes into our graph to stipulate the positional affiliation among GUI elements.Each node comprises attributes symbolizing the kind of alignment and the line employed for element alignment.Edges are established between GUI elements and their respective alignment constraint nodes to signify their correlation with alignment.Alignment constraints can express any of six distinct alignments -namely "left-aligned," "top-aligned," "right-aligned," "bottom-aligned, " "vertical midline-aligned, " and "horizontal midlinealigned." We employ a one-hot vector to express the alignment type.For instance, a left-alignment type is expressed as [1, 0, 0, 0, 0, 0], indicating left-alignment.We further characterize the alignment line using a two-dimensional vector -e.g., [, 0] represents the elements being aligned with  = .We concatenate the alignment type and line representations to produce an eight-dimensional vector.

Same-Size Constraint Nodes.
To portray GUI elements of the same width or height within the GUI in graphical terms, we devised the notion of uniform size constraint nodes.There are two types of size constraints in our design: identical width and similar height.We consolidate the size type and size value into a singlenode attribute vector instead of defining them as a one-hot vector.For example, the identical width constraints are defined by [, 0], where  is the width value, while constraints dictating identical height are defined by [0, ℎ], where ℎ is the height value.

Element Grouping Constraint Nodes.
Incorporating consideration of related components enhances the structure of GUIs, particularly with regard to elements with comparable functions or belonging to the same category.To depict the element grouping constraints, we define element grouping constraint nodes.We then connect related GUI element nodes to the corresponding grouping constraint node, signifying their inclusion in a particular group.

Multimodal Grouping Constraint
Nodes.Multimodal grouping constraints enable structured organization of elements of differing types: text, pictures, and other GUI components.We create a set of nodes for each multimodal grouping constraint and correlate the relevant elements with their respective nodes.By establishing ties between GUI element nodes and multimodal grouping constraint nodes, we signify placing elements that differ in mode within the same group.In cases of additional multimodal grouping constraints, we create new types of multimodal grouping constraint nodes and repeat the process.

Learning GUI Layout Design with Graph Neural Networks
As Figure 1a illustrates, we create a graph representation of a GUI layout by organizing GUI element nodes and constraint nodes.Again, these nodes are connected by edges, representing the relationships between elements and constraints.To facilitate GUI design, we can train a graph neural network to take this graph as input and optimize the layout.
where  * denotes the optimized parameters of the GNN model.After optimizing the GNN model's parameters,  * , we can use this model to design new GUI layouts.For a new design, the trained GNN takes the graph-form representation of GUI elements and constraints as input, and then outputs its predicted dimensions and positions for GUI elements.Our approach makes it easy to incorporate new design constraints by modifying the construction of the graph or the objective function.

GUI AUTOCOMPLETION
To demonstrate the utility of our graph representation, we propose an autocompletion method that uses our representation approach to enable interactive iterative design.GUI autocompletion is challenging due to the computational complexity involved in accurately predicting suitable GUI elements.Given fixed screen dimensions, our method automatically generates suggestions for finishing a partially completed GUI layout by iteratively predicting the positions of remaining unplaced GUI elements.Our method suggests We only illustrate some parts of the graph for simplicity.Element 8 is the target to-be-placed element.In each GNN layer, nodes perform aggregation from their respective neighbors.To illustrate, consider element node 3.As it goes through the GNN layers, it accumulates information from related constraint nodes and other element nodes.This process results in feature embedding vectors for all nodes, including both element nodes and constraint nodes within the graph.We compute the graph embedding as a weighted average of the node embeddings with the weight matrix  .We then concatenate the target element's embedding vector, the graph embedding, and a constraint embedding and send it to fully connected layers to predict whether the target to-be-placed element should satisfy the constraint.Simultaneously, we concatenate the target element's embedding and the graph embedding to predict the initial position and size of the target element.Integrating these predictions with the constraints, we subsequently refine the position and size to obtain the final results.position, size, and confidence level for each unplaced element based on the partial GUI.It enables designers to receive suggestions when they complete the design of each element, without the need to predefine all GUI elements beforehand.Moreover, if designers have additional unplaced GUI elements ready, our method iterates over each, providing suggestions for their positions, sizes, and confidence levels.It can significantly reduce the manual effort required for design.Autocompletion that produces high-quality GUIs is rendered difficult by the high computational complexity of evaluating all possible combinations of GUI elements.
Prior studies have explored the autocompletion task; however, they were only capable of handling wireframes.Li et al. [48] used the GUI layout hierarchy to perform GUI autocompletion.Although the hierarchy does capture the structure of the layout, it accounts for only the grouping and containment relationships between GUI elements.It neglects the alignments and relative sizes, which are important layout constraints.On the other hand, Brückner et al. [10] proposed a method of constructing a graph with reference to differences in position between GUI elements.However, it does not consider the properties of GUI elements.GRIDS [13] is a grid-layoutbased optimization approach for autocompletion considering constraints such as alignment and grouping.With integer programming, GRIDS produces results by searching for optimal available placements for unplaced elements.Our method, considering both element properties and constraints, fills the gap, crossing the void to generate more desirable predictions.

Target GUI Element Prediction
Given a partial GUI, we set out to predict both the size and the position of a target element (with a fixed aspect ratio) and the associated constraints it should follow.As shown in Figure 3, the process begins with constructing a graph representation of the partial GUI, denoted as G  .Exploiting GNNs, we encode this graph to yield feature vectors for all the nodes, including element nodes and constraint nodes within the graph.Within each GNN layer, nodes are aggregated from their respective neighbors.Going through the GNN layers, each node iteratively accumulates information from its associated constraint nodes and other element nodes.
After this, the feature vector ℎ G  for the entire graph G  is computed.This vector is obtained by way of the weighted summation of the average feature vectors of each node type.The weight matrices W, W, W, W, and W  are trained alongside GNN parameters, making sure an end-to-end training process ensues that eliminates manual selection: The resulting graph feature vector and the embedding of the target element are concatenated and fed into fully connected layers.This facilitates position and size predictions for the target GUI element ê = ( x , ŷ , ŵ , ĥ ).Furthermore, our approach extends to predicting the constraints that should be satisfied by the target GUI element.We utilize the objective function outlined in Section 3 to perform fine optimization of the target element's positioning, dimensions, and adherence to constraints.For each constraint within the partial GUI, we concatenate the graph feature vector, the embedding of the target element, and the specific constraint's feature vector.This concatenated vector is propagated through fully connected layers to predict the probability of each constraint required for the target GUI element's satisfaction.Simultaneously, we concatenate the target element's embedding and the graph embedding to predict the initial position and size of the target element.Integrating these predictions with the constraints, we subsequently refine the position and size to obtain the final results.

Confidence Levels
To guide the process of ascertaining the level of confidence in the outcomes, below we describe how we compute confidence levels such that we can avoid offering potentially questionable predictions.Conveying the confidence level -whether it is low, medium, or high -of a certain prediction enables software tools and designers to take suitably informed actions.An application of this feature will be discussed later in the designer study.

High Confidence.
High confidence is validated in terms of alignment and uniform size constraints.When the disparity between the predicted alignment line and the position of the target element falls below the threshold value , refinement is performed by aligning the position with the projected alignment line.For instance, if the predicted constraint entails left-alignment with the line  = , and if | x − | ≤ , the x value is adjusted to match .This threshold was set to  = 20 pixels in our experiments.Likewise, when the difference between the size of the target element and the sizes of other elements, as predicted by constraints promoting uniformity, is below the  limit, the size is adjusted to match the uniform size value.For example, if the projected constraint dictates that the target element should possess the same width as elements with a width of , and if | ŵ −  | ≤ , the ŵ value is adjusted to align with .When both ( x , ŷ ) and at least one of ŵ and ĥ can be verified, we assign a high confidence level to the outcome, since the fixed aspect ratio of the target element allows deducing the remaining attribute.

Medium
Confidence.Medium confidence is established via element and multimodal grouping constraints.In scenarios wherein confirmation is unattainable for ( x , ŷ ) and at least one of ŵ and ĥ , grouping constraints come into play.These constraints, encompassing both element and multimodal grouping, reveal patterns among elements and can be exploited for refinement of positions and sizes.For example, vertical groupings often entail consistent widths and equidistant spacing between elements vertically.Consequently, if | ŵ − avg(  )| ≤ , where   represents widths of other elements in the target's vertical group, ŵ is adjusted to match avg(  ).If ( ŵ −   ) − avg(|  −  − 1|) ≤ , the ŵ value is set to   + avg(|  −  − 1|).In instances where ( x , ŷ ), along with at least one of ŵ and ĥ , can be verified, the result gets accorded a medium confidence rating.

Low Confidence.
In all other cases, the outcomes are assigned a low confidence rating.

EXPERIMENTS FOR AUTOCOMPLETION
We conducted experiments for the autocompletion task to show the effectiveness of our representation.We created a dataset with partial GUIs and then evaluated our method's prediction quality through qualitative and quantitative experiments.Additionally, we conducted an ablation study to demonstrate the necessity of each constraint type taken into account.

Dataset and Training Process
For the evaluation, we took the ENRICO dataset [43], a subset of the RICO dataset [14] including cleaner mobile GUI information and the VINS [12] GUI dataset, as our basis for creating a mobile dataset for GUI autocompletion.We improved the dataset's quality through several steps.Initially, we excluded layouts in the dataset that contain three or fewer.GUI elements.After this, we employed the UIED model [70] to enhance the precision of element types and refine the bounding boxes of the GUI elements.We then made further adjustments manually to correct the bounding boxes of the elements.Our refined dataset contains 5,653 GUIs in total.
To evaluate our model's performance, we followed a fivefold cross-validation approach; this technique involved partitioning the GUI dataset into five equal-sized folds.Four of the folds, with approximately 4,522 GUIs, served for training our model, while we reserved one fold, encompassing around 1,131 GUIs, for testing.In our experiments, each fold was utilized once for testing, with the remaining four folds serving as the training data.Using fivefold cross-validation helps to validate the generalization of the model to unseen data and offers a more comprehensive view of the model's behavior by averaging its performance across multiple test sets, thus reducing the impact of random variations in the data.
To create our dataset for GUI autocompletion, for each GUI, we randomly kept a chunk of GUI elements on the display.We removed other GUI elements to create a partial GUI and store the potential "next GUI element" to be added for completing the given partial GUI.By this mechanism, we obtained a partially completed GUI with missing elements and a target element that we need to predict, given the partially completed GUI.Note that each partial GUI often had more than one potential target GUI element.With this method, we can generate various partial GUIs and corresponding target elements.We generated partial GUIs for training from the complete GUIs in the training data, doing similarly for the test data.In total, each fold of complete GUIs yielded approximately 171,212 pairs of partial GUIs and corresponding target elements for training.Consequently, for each experiment, we used a training dataset containing about 684,849 incomplete-GUI-target pairs and a test dataset comprising approximately 171,212 pairs.6.2 Implementation Details 6.2.1 Embeddings.We encoded the visual appearance of the element by using a pre-trained ResNet152 model [27].Through this model, which is able to extract high-level features from images, we generated a feature vector that represents the element's visual appearance.For encoding the textual content of the GUI elements, we used a pre-trained BERT model [16].A Transformer-based neuralnetwork architecture pre-trained on a large corpus of text data, BERT can generate a 768-dimensional vector representing the text, which we applied to extract features from the interface elements.In a technique that improves the efficiency of models utilizing this representation, we introduced an "unknown" token for infrequent words.Infrequent words are often inadequately represented in training data.That can lead to overfitting.Incorporating an unknown token enables the model to generalize its predictions for previously unseen words and simplify the representation to facilitate the model's processing.To implement this approach, we began by computing the frequency of each text element.If the text occurred fewer than three times, we replaced it with the special token [UNK] in BERT, representing an unknown word.We then used BERT to generate the text content embedding.

Graph Neural Networks.
We applied the SAGEConv model [26], a variety of GNN models that is suited to training heterogeneous graphs.The SAGEConv model employs a message-passing technique to propagate information through the graph to convey it from a node's neighborhood to the node itself, thereby improving its feature representation.SAGEConv can capture the relationships between nodes of different types in the graph.The output feature vectors are 256-dimensional ℎ ∈ R 256 .The trainable weights for computing W  , W  , W  , W  , W  ∈ R are in the dimension of 256 × 256.

Qualitative Evaluation
To enhance the usability of the model for designers, we introduce three types of element suggestion options and show the results.1b indicates, iteratively predicting the sizes and positions of yet-unplaced elements by means of the updated graph representation helps support designers by autocompleting partially completed designs.By default, we loop over all elements still to be placed and select the one with the highest associated confidence level to add (Figure 4a).Furthermore, our setting allowed designer-in-the-loop interaction wherein designers can make adjustments to the GUI design as their preferences dictate after every iteration.They could move the element, resize it, or select an alternative GUI element for placement.The changes are visible immediately, and the underlying graph representation gets updated accordingly, so that subsequent predictions can work from it, as demonstrated in Figure 4b.6.3.2Suggesting a Group of Elements.Our model predicts grouping constraints for each element based on the partial GUI.As shown in Figure 4c, if multiple elements share the same grouping constraint, our model suggests these grouped elements together, thereby expediting the prediction process.6.3.3Suggesting All Elements.Alternatively, in Figure 4d, the model can predict all elements simultaneously.The final results iterate over each element, placing those with the highest confidence levels first, based on the updated partial GUI.While providing a complete view, modifying results can be more challenging compared to adjustments in the iterative prediction process.

Quantitative Evaluation
To evaluate the accuracy of our autocompletion approach, we assessed its single-step prediction by three metrics.For this purpose, we denoted the top-left point of the predicted GUI element as ( x, ŷ) and its corresponding ground truth as (, ).Similarly, we denoted the predicted size of the target GUI element as ( ŵ, ĥ) and its corresponding ground truth as (, ℎ).Finally,  UI and ℎ UI represent the width and height of the user interface.
6.4.1 Metrics.We established separate metrics for assessing the predictions' accuracy with regard to position, size, and alignment.All three metrics have a range between 0 and 1, where a lower value indicates a better prediction.
• Position Error (PosError): The PosError metric measures the relative distance between the predicted position of the GUI element and the corresponding ground-truth position.
Calculating the error entails ascertaining the distance between the predicted and ground-truth positions, then normalizing it by the maximum possible distance that the element can move, • Area Error (AreaError): The AreaError metric evaluates the difference between the predicted size of the GUI element and the corresponding ground-truth size.The difference between the predicted and ground-truth sizes is normalized in terms of the maximum size between the predicted size and the ground-truth size, • Alignment Error (AlignError): The AlignError metric judges the proportion of the alignments predicted correctly.This figure is calculated by dividing the number of correctly predicted alignments for the target element by the total number of alignments that the predicted element should satisfy.

Comparison
. We compared our model and GRIDS [13], an optimization approach based on grid layout designed for autocompletion while considering constraints, including alignment, element location, rectangular outline, and preferred element positions.We   chose GRIDS for comparison due to the following reasons: 1) it also takes constraints into account, 2) the majority of existing GUI presentations do not perform autocompletion, and enabling autocompletion is non-trivial, 3) other relevant prior works [10,48] addressing autocompletion tasks are not open-sourced.With integer programming, GRIDS produces results by seeking the optimal placements available for unplaced elements.It generates multiple optimized solutions, each accompanied by a confidence value.We chose the solution with the highest confidence value among each model's first 10 solutions.In addition, we established an upper bound for the model's performance by employing ground-truth constraints to predict element positions and sizes.Because of the ambiguity of GUI element placement, the ground truth thus defined was not assigned a zero-loss value.The ambiguity arises from the fact that some elements do not satisfy enough constraints; that issue, in turn, makes accurate prediction of their placement challenging.
Our comparison by all three metrics was performed relative to the number of preexisting elements in the partial GUIs, as Figure 5 illustrates.We used fivefold cross-validation for the comparisons (our plots present both mean and standard deviation values), with a sample of 10,000 test data from each fold used for model evaluation.
Since our technique uses ground-truth alignments in predicting the ground truth, we do not have ground-truth results for alignment error.The results show that our model predicts more accurate positions and sizes, with more accurate alignments, than GRIDS.Finally, we computed the inference time needed by the models.Our model performs one-step predictions in 0.148 seconds, on average, with our test data on a single RTX4090 GPU, while GRIDS takes much longer, at 67.4 seconds.

6.4.3
The Ablation Study.Our ablation study compared the proposed model to ablated models, each lacking one specific type of constraint.This ablation study, the results of which are depicted in Figure 6, showed the necessity of each constraint for our model's performance.

COMPARISON STUDY
We performed a comparison study to evaluate our model against GRIDS [13] for one-step prediction.We got test images randomly sampled from the test dataset described in subsection 6.1.All the images are mobile GUIs.As Graph4GUI optimizes positions and sizes with content and graphics, while GRIDS only handles wireframe layouts, the comparison focuses on wireframes.To ensure fairness, we trained our model on wireframe GUIs without visual appearance, textual content, or element type, showcasing higher perceived quality despite not fully utilizing Graph4GUI's capabilities.All had normal or corrected-to-normal vision, and none were colorblind.Local regulations did not mandate formal ethics review.
7.1.2Experimental Design.From 1,000 randomly sampled partial GUIs with a to-be-placed element, we randomly selected 100 presented to each participant for comparison between our method and the GRIDS method.
7.1.3Apparatus.Pairs of GUI images were presented side by side on a custom webpage in randomized order.
7.1.4Procedure.After completing a demographics questionnaire, participants viewed GUI pairs and selected the preferred one based on personal criteria like design, layout, or aesthetics.Preferences were indicated by choosing the left or right image, or "They are equally good".Participants could assess up to 100 pairs, stopping at their discretion.

Findings
We received responses for 3,367 image pairs from 35 participants.Preferences were as follows: 456 for GRIDS (13.54%), 2,368 for our model (70.33%), and 543 expressing no preference (16.13%).The difference between our method and GRIDS was statistically significant ( 2 = 3115.8, < .001).This finding attests that our model indeed produces more visually appealing suggestions that show better alignment than the baseline method's output, backing up our conclusions with evidence from participants' preferences.

DESIGNER STUDY
We conducted a user study to evaluate the effectiveness and usability of our method for assisting with GUI autocompletion.The aim was to assess both the impact on design efficiency and the subjective experience.To guide the design of the study, we established the following objectives: (1) Determine whether our technique enables designers to enhance the efficiency of the design process.(2) Evaluate the quality of suggestions provided by our model.
(3) Explore how designers utilize each function of the tool, specifically the element prediction preview, the element prediction, and the confidence rating for the predictions.(4) Ascertain whether designers perceived our technique as helpful for their GUI design practice.

Method
8.1.1Plug-in.Our method is implemented as a Figma plug-in (Figure 7), offering GUI element predictions with confidence levels.Given a partially completed GUI and a list of elements to be placed, the plug-in computes confidence levels for each prediction, aiding designers in prioritizing placements.The plug-in includes a preview window that displays a wireframe version of the GUI when hovering over an element, allowing designers to assess results intuitively.
Upon selecting a GUI element, the plug-in automatically positions and sizes it on the canvas.8.1.3Materials.Participants used the lab's laptops for various activities, including interacting with the Figma interface using our plug-in and filling out a questionnaire.The study involved a practice task to acquaint participants with Figma and our plug-in, and six GUI design tasks (three with our plug-in and three without).Each task provided a brief outlining the GUI design purpose, a GUI to be completed, and a list of elements to place.

Experiment Design.
The study used a within-subject design, exposing participants to two conditions.In one condition, they freely used our plug-in with Figma's standard features to complete three GUIs (login, shopping, and menu pages).In the baseline condition, participants completed the same GUIs without the plug-in.Task and condition order were fully counterbalanced to eliminate any potential bias.
8.1.5Procedure.After signing the consent form, participants completed a design background questionnaire and underwent a Figma tutorial.They were then tasked with creating six GUIs in two conditions: with and without our plug-in.After each task, participants rated their designs and assessed the perceived task load.In the with-plug-in condition, they also evaluated the plug-in's specific features.We used the System Usability Scale (SUS) for overall usability assessment.We conducted final interviews for participants to compare their experiences, provide feedback on plug-in features, and identify weaknesses in our tool.

Quantitative Findings
Our tool's usability and helpfulness were quantitatively evaluated using the System Usability Score, participant ratings of features and resulting GUIs, and task completion times.
8.2.1 System Usability Score.Following established practices [9], SUS scores were computed, yielding an average of 87.08 (SD = 5.48).This significantly surpasses the average SUS score of 68, indicating excellent usability.Participants found our plug-in easy to use, with design features enhancing their GUI design process.

Ratings of Plug-in Features.
Participants rated each plug-in feature on a scale of 1 to 7, with the model scoring 6.77 for preview (SD = 0.47), 5.83 for element prediction (SD = 0.69), and 6.17 for confidence level (SD = 0.69).Participants prefer features of prediction confidence and previews before actual element placement.

Ratings of Result GUIs.
Participants provided an integer score out of 7 for each GUI design, with no significant differences between conditions in ratings for completed GUIs ( = 0.396,  = 0.695).The average rating with the plug-in was 6.28 (SD = 0.65), and without the plug-in was 6.17 (SD = 0.96).Participants reported having terminated their design process upon achieving satisfaction with the GUIs.
8.2.4 Timing.Without our plug-in, the average finishing time was approximately 5.6 minutes (SD = 1.02), while with the plug-in, times mostly ranged from 1 to 3 minutes (Mean = 2.22, SD = 0.78).The significant difference ( = −11.71, < .001)indicates a 40% improvement in GUI design task completion time with our plug-in.
8.2.5 Summary.The high SUS score suggests ease of use and benefits for participants in their GUI design process.Our plug-in significantly improved efficiency, reducing task completion time by approximately 40%.Participants favored features such as element prediction preview, confidence level indicator, and element prediction.However, no significant difference in GUI quality was observed between plug-in usage and non-usage, as indicated by participant ratings of result GUIs.

Qualitative Findings
Alongside quantitative analysis, we performed qualitative analysis.Overall, participants gave positive feedback, especially on providing proper suggestions for GUI element prediction and omitting many manual design operations, with P2 mentioning that "suggestions for element prediction are reasonable and have saved me some time on manual editing and aligning elements.".

Workflow.
Participants appreciated the integration of our method as a Figma plug-in.Since Figma is a popular design software, P3 concluded, "It is very useful to have this kind of integration; I do not need to spend time learning a completely new tool to use these functions, and now I can simply use the software I normally use at work."P5 held the same opinion, stating, "This plug-in doesn't interrupt my design process; it's more like an add-on that helps with my design and provides inspiration.The operations are intuitive, so there is no need for us to learn how to use it specifically."Some participants also praised the ease of element placement and thought the plug-in makes the design process more efficient; e.g., "I often had to place the elements one by one, but now I can just click, and the plug-in directly suggests the proper placement.It is easy and less time-consuming." (P6).

Functionality.
Participants highlighted the usefulness of the preview window, with P5 noting its role in exploration and inspiration: "It is interesting to see element prediction previews for each element in the preview window.I could hover over each element to get intuition as to how the GUI looks after placing it without the need to actually place it and undo it if I do not like the result.This can be used as an exploration process to help me compare different elements intuitively without additional effort." and P3 emphasized that the preview window gives designers "a good way to visualize element suggestions."In addition, P2 mentioned that the combination of preview and confidence level makes for better exploration: "' I used the preview to compare the predictions among elements with high confidence -or medium if no element with high confidence exists -to decide which one I preferred to place first.I do not need to think much about which element I want to place since the confidence level helped

OTHER APPLICATIONS
In addition to GUI autocompletion, we further explored other applications using our graph-based GUI representation.

GUI Topic Classification
GUI topic classification involves categorizing GUIs based on their topics and usage.For instance, "Gallery" GUIs exhibit a grid-like layout with images, while "Profile" GUIs display information related to user profiles or products.Our approach utilized GUI representation for classification, employing eight GUI topics derived from the Enrico dataset [43].We sampled 10,000 GUIs, comprising both complete and partial instances from the Enrico GUIs associated with these eight topics, with a maximum of 2,000 instances for each GUI topic.The dataset was split into 85% for training and 15% for testing.The graph representation of each GUI was fed into a Graph Neural Network (GNN) to obtain the graph embedding, following the same process used in the autocompletion task (refer to Section 5).The classification process involved three fully connected layers and a softmax function, resulting in an accuracy rate of 91.53%, higher than other baselines.A comparison with the ResNet50, Nearest Neighbors, and Random Forest models is presented in Table 2.

GUI Retrieval
GUI retrieval is the process of finding the most similar GUI to a given one.Utilizing the graph embedding from our trained GUI topic classification model, we applied the nearest neighbor approach to identifying the closest GUIs.Samples demonstrating the performance of our model and the Screen2Vec model [47] in retrieving both complete and partial GUIs are shown in Figure 8 (More results can be seen in the supplementary materials).

User Study.
A comparison study was conducted to assess our model against Screen2Vec.
Participants.Fifteen participants (9 female, 6 male) were recruited through social media promotion.All participants had normal vision or vision corrected to normal with glasses.None were colorblind.Local regulations do not require formal ethics review.
Experimental Design.From a pool of 1,500 randomly sampled partial and complete GUIs, we randomly selected 100 GUIs for each participant and presented the results retrieved by our method and the Screen2Vec method.
Apparatus.Pairs of GUI images, one predicted by our method and one by Screen2Vec, were displayed side by side on a custom webpage in randomized order.
Procedure.Participants began with a demographics questionnaire, followed by evaluating GUI images and selecting their preferences based on personal assessment criteria.Each participant could assess up to 100 pairs and could stop comparisons at any point.

DISCUSSION AND CONCLUSION
This paper addressed the challenges of representing GUIs through a graph-based deep learning model.Prior deep learning-based GUI representations failed to consider the constraints for GUI elements and the visual-spatial-semantic structure of a GUI, which are important in computational design.Although many modern GUI tools use constraints to optimize GUIs, training a model to predict constraints remains a challenge.Our proposed novel graph-based GUI representation captures both the properties of GUI elements, such as textual content, visual appearance, and element types, and their relationships in the visual, spatial, and semantic dimensions of a GUI.It can be computed efficiently in computational design.We further trained graph neural networks (GNNs) to take the graph as input to optimize the GUI.We will release our code and data.
Our work has achieved the following results in the GUI autocompletion task.
(1) Our method predicts the position, size, and alignment of GUI elements more accurately.As shown in Figure 5, it achieves less than half of the error values in these three metrics (position, size, and alignment) compared to GRIDS [13], an approach for autocompletion using integer programming.When the number of existing elements on the GUI increases, it remains to have low error rates while GRIDS's errors dramatically increase.
(2) Our model offers superior alignment and visual appeal compared to the baseline method, and is better aligned with participants' preferences.In our comparison study, 70.33% of the responses preferred results from our model compared to 13.54% for GRIDS.(3) Our method enhances flexibility by integrating as a plug-in within a popular existing design tool, Figma.This integration allows designers to apply workflows they are already familiar with, eliminating the need to learn new tools or switch between different design software tools.Participants in the designer study praised the plug-in for accelerating their design process without disrupting the existing functionalities of their design applications.
In addition to the demonstrated capability of our graph-based GUI representation in the GUI autocompletion task, we show that our GUI representation can be applied to other applications, such as GUI topic classification and GUI retrieval.Our model demonstrated superior accuracy in GUI topic classification compared to baseline methods like ResNet50, Nearest Neighbors, and Random Forest.Furthermore, user feedback highlighted our model's effectiveness in retrieving visually similar GUIs compared to the Screen2Vec model.Compared to other data-driven approaches, our graph-based representation facilitates the understanding of GUI structure, improving the explainability of the model.This capacity enables our representation to potentially extend to diverse downstream tasks.For example, accessibility needs can also be represented as constraints [22], and our method can train and predict layout constraints, thus it could potentially enhance accessibility.

Limitations and Future Work
As pointed out by participants in our designer study, our method has limited ability to generate accurate predictions if the unplaced element does not need to align or group with any existing element on the GUI.We currently assign a low confidence level to it to avoid uncertain predictions.Future work can improve the prediction of underconstrained GUI elements by considering more design priors or including more complicated constraints.As shown in Table 1, our representation does not explicitly represent the view hierarchy.The view hierarchy provides structural data, aiding models in understanding the layout and relationships of elements.We do not currently represent view hierarchies since they are not always available and often contain errors with incorrect structure information.However, future work can connect related element nodes in the graph representation to represent the view hierarchy.Moreover, while our method offers suggestions for each element to be placed, it provides only a single suggestion per element, thus constraining the possibility of exploration.In addition, we focus on a setting where all the elements are rectangular in shape or in rectangular bounding boxes.There are no datasets available with non-rectangular bounding boxes.To accommodate various shapes of bounding boxes, we can augment the element node with additional parameters.These parameters would facilitate the description of common shapes, such as rectangles with rounded corners and circles.Subsequently, the model can be retrained to incorporate this information when present in the training dataset.Furthermore, we observe that even for element prediction with high confidence levels, sometimes it does not predict ideal results.For example, as illustrated in Figure 9, our method cannot capture the semantic correspondence between different types of GUI elements, e.g., it cannot detect that the "Favorite" text should correspond to the "star" icon.Future research could explore more about GUI element correspondence and constraints across UI types.

Figure 1
Figure 1: Graph4GUI is a graph-based GUI representation that captures the connections between GUI element properties and constraints.Such representation can capture the visual-spatial-semantic structure of a GUI such that it could be effectively employed in computational design.a) To represent the GUIs, bipartite graphs comprising element nodes (colored purple) convey the GUI elements' properties and constraint nodes (colored green) that can be integrated into graph neural networks.Such representation can serve various downstream tasks, such as predicting constraints (dotted orange edge) for an unplaced element (colored orange).b) By iteratively predicting the sizes and locations of yet-unplaced elements, we can support designers by autocompleting partially completed GUI designs.

Figure 2 :
Figure 2: a) Graph4GUI represents each GUI element through a separate node with properties.GUI element nodes convey the element properties, including visual appearance, textual content, element type, position, and size.b) Constraint nodes express four types of constraints: alignment, same-size, element grouping, and multimodal grouping constraints.

8 Figure 3 :
Figure3: Graph4GUI was adapted for the autocompletion task: We first encode the graph representation of the GUI via the GNN.We only illustrate some parts of the graph for simplicity.Element 8 is the target to-be-placed element.In each GNN layer, nodes perform aggregation from their respective neighbors.To illustrate, consider element node 3.As it goes through the GNN layers, it accumulates information from related constraint nodes and other element nodes.This process results in feature embedding vectors for all nodes, including both element nodes and constraint nodes within the graph.We compute the graph embedding as a weighted average of the node embeddings with the weight matrix  .We then concatenate the target element's embedding vector, the graph embedding, and a constraint embedding and send it to fully connected layers to predict whether the target to-be-placed element should satisfy the constraint.Simultaneously, we concatenate the target element's embedding and the graph embedding to predict the initial position and size of the target element.Integrating these predictions with the constraints, we subsequently refine the position and size to obtain the final results.

Figure 4 :
Figure 4: a) Our model can iteratively predict unplaced GUI elements (shown in blue bounding boxes).b) Designers can make adjustments (orange), including moving, resizing, or re-selecting GUI elements.c) The model's capability to predict groupings allows for the placement of elements together as a group.d) The model can also predict all the elements simultaneously.

Figure 5 :
Figure5: Comparison of our model with GRIDS[13], an autocompletion approach using integer programming, and the established upper bound for the model's performance exploiting ground-truth constraints to predict positions and sizes.The evaluation used three metrics: position error, size error, and alignment error.This comparison incorporates fivefold cross-validation to assure reliability, with the mean and standard deviation illustrated in the corresponding plots.

Figure 6 :
Figure 6: Results from an ablation study comparing our model's performance to ablated models in which each type of constraint has been removed.

Figure 7 :
Figure7: We implemented our method as a Figma plug-in.The plug-in offers GUI element prediction suggestions with confidence levels, helping designers prioritize element selection.It also features a preview window that shows a prediction preview of the element placed on the GUI when hovering over an element, providing an intuitive view to help decision-making.After selecting a GUI element, the plug-in can automatically place it in the suggested position and size on the canvas.

Figure 8 :
Figure 8: GUI retrieval results of our model and Screen2Vec, including both complete GUIs and partial GUIs.

Figure 9 :
Figure9: Limitation of our method: It cannot capture the semantic correspondence between different types of GUI elements, like associating the "Favorite" text with a "star" icon, which could be explored further in future research.
4.3.1 Graph Construction.The heterogeneous bipartite graph, G = ( ∪ , ), is constructed from  GUI element nodes and their  corresponding constraint nodes.The former set of nodes is represented by  = { 1 ,  2 , ...,   }, and the set of their constraint nodes is denoted by  = { 1 ,  2 , ...,   }.In constructing links between element nodes and constraint nodes, we assign the adjacency matrix .The value of an element  , in that adjacency matrix is set to 1 when the th GUI element is satisfied with the th constraint; otherwise, it is 0.
4.3.2Predicting GUI Element Dimensions and Positions.The primary task of our GNN model is to predict the dimensions and positions of the GUI elements.The predicted GUI element attributes are denoted by ê = ( x , ŷ , ŵ , ĥ ).These predictions are produced through parameterized functions specific to the GNN model.The model parameters are denoted by  , and the model that generates the GUI element predictions is denoted as GNN  .The predicted GUI elements ê are then computed as ê = GNN  (G).(8) 4.3.3Optimization of GNN Parameters.The GNN model's  parameters are optimized by minimizing the previously defined objective function L; see Subsection 3.3.The optimization process can be represented as  * = arg min  L ( ê1 , ê2 , ..., ê , F;  ), 8.1.2Participants.Six GUI designers with diverse experience levels were recruited through email lists, local networks, and social media platforms.Participants, aged 21 to 33 (mean age 26.5), were either UI/UX designers or HCI/design students, all experienced in Figma for GUI design.The group comprised four females and two males.Prior to the study, participants received full information about the conditions and gave informed consent.Local regulations do not require formal ethics review.

Table 2 :
GUI topic classification results: Comparing the accuracy rates of our graph-based GUI representation method and the ResNet50, Nearest Neighbors, and Random Forest models for GUI topics.