Low-code Development Productivity

This article aims to provide new insights on the subject by presenting the results of laboratory experiments carried out with code-based, low-code, and extreme low-code technologies to study differences in productivity. Low-code technologies have clearly shown higher levels of productivity, providing strong arguments for low-code to dominate the software development mainstream in the short/medium term. The article reports the procedure and protocols, results, limitations, and opportunities for future research.

Overall, this seems too good to be true, and it is important to separate what is "advertising" from what is "achievable" for companies that are weighing whether to adopt this technology for their software development.
This article aims to provide new insights on the subject by presenting the results of laboratory experiments carried out with code-based, low-code, and extreme lowcode technologies to study differences in productivity.Low-code technologies have clearly shown higher levels of productivity, providing strong arguments for low-code to dominate the software development mainstream in the short/medium term.The article reports the procedure and protocols, results, limitations, and opportunities for future research (expanding the results of Trigo, et al. 21).BACKGROUND Source code is the set of logical instructions that a programmer writes when developing an application.Once written, these instructions are compiled/interpreted and converted into machine code.High-level programming languages such as Python, Java, JavaScript, PHP, C/C++, C#, etc., are examples of technologies used in code-based application development.
Low-code software development, on the other hand, consists of minimizing the amount of manual coding by using support tools.The objective is to develop software faster and with less effort on the part of development teams, thus accelerating software delivery.
Examples of low-code/no-code software-development technologies are IBM Automation Platform, Zoho Creator, Appian, Mendix, OutSystems, AgilePoint, Google AppSheet, Nintex, TrackVia, Quickbase, ServiceNow, Salesforce App Cloud, Microsoft Power Apps, Oracle Visual Builder, Oracle APEX, and Quidgest Genio, to name just a few.The distinctive feature of these technologies is that they allow the creation of software applications with minimal hand-coding. 22,23 pically, low-code platforms provide a graphical environment that facilitates application development, unlike code-based technology, which requires manual coding (i.e., almost everything is developed graphically in low-code technologies, with little or no programming, allowing people with no programming competencies to create software applications).One disadvantage of these technologies are the licensing costs, which are known to be higher than for code-based technologies. 17

METHOD
In this research, laboratory experiments were performed in a controlled environment, following a previously defined procedure and protocols to enable accurate measurements. 2The experiments were designed to be objective so there is no bias in the results (e.g., resulting from the researchers' influence/perspective). 9 The underlying research question was: Do low-code technologies result in higher software-development productivity than code-based technologies (as reported in the gray literature)?The variable under study was productivity in the creation and maintenance of software applications.
For each experiment, a software-development technology was selected (code-based, low-code, or extreme low-code (quasi no code)), and one developer with proven proficiency in that technology was invited to participate.In the case of code-based technology, the developer's preferred technology was selected.The productivity calculation was based on the UCPA (use case points analysis) method. 12he artificial and controlled environments of the experiments made it possible to accurately measure execution times; this is impossible in other types of studies, such as field experiments, in which it is not viable to control all external stimuli that condition the performance of tasks. 24he experiments were structured into five stages: 0 Experiment design I Briefing II Software application development (creation) III Software application development (maintenance) IV Results analysis Stages I, II, and III were repeated for each technology involved in the experiments.

Stage 0 -Experiment design
Stage 0, the preparatory phase for the various experiments to be performed, was carried out only once.During this stage, the procedure to be followed was defined; the protocols that specify the application to be developed and maintained (structured in two stages) were created; and the methods to be used to estimate and measure productivity were specified.The protocols are available for download at https://doi.org/10.5281/zenodo.6407074.
The UCPA method was chosen from the several possible alternatives (e.g., lines of code, 14 COCOMO II (Constructive Cost Model), 19 function point analysis, 8 etc.), because of its focus on the functionalities of the applications to be developed and independence of the technology to be used (which, in the case of the defined experiments, is fundamental).
The method consists of the following phases: 2,3,11 1. Calculation of the UUCP (unadjusted use case points) variable, using the variables UAW (unadjusted actor weight) and UUCW (unadjusted use case weight), respectively related to the perceived complexity of actors and use cases: UUCP=UAW+UUCW 2. UUCP adjustment, considering a set of factors of a technical and environmental nature reflected in the variables TCF (technical complexity factor) and EF (environmental factor).The combination of variable UUCP with variables TCF and EF results in the assessable UCP (use case points) of the project: UCP=UUCPxTCFxEF 3. Finally, the UCP variable is multiplied by the PF (productivity factor), which represents the number of hours necessary for the development of each UCP: Total Effort=UCPxPF Thus, with the UCPA model as a reference, the PF variable was calculated: The lower the resulting PF, the higher the productivity of the technology under study.
The experiment was structured in two main parts: the first part (stage II) created a software application; and the second part (stage III) consisted of the maintenance (corrective and evolutionary) of that application.
Appendices A.1, A.2, and A.3 identify the actors and use cases described in the experiment protocols, as well as their respective scores (weight).
For the first part of the experiment (creation of a software application), TCF was given a value of 1, considering the low application complexity.Given that the purpose of the experiment was to determine the EF value for each technology, the starting point for calculating the UCP variable was also set at 1. Thus, for the first part (stage II) of the experiment:

UUCP=UAW+UUCW=125+9=134 UCP=UUCPx1x1=134x1x1=134
For the second part of the experiment (maintenance), participants were asked to make two changes (corresponding to a weight of 20 points) and to implement new use cases (also corresponding to 20 points), as shown in appendix A.3.Thus, in total, for the second part (stage III) of the experiment:

UUCP=UAW+UUCW=40+9=49 UCP=UUCPx1x1=49x1x1=49
Throughout each experiment, a researcher was always present.Whenever requested by the developer, additional clarifications were provided on the application to be developed.It should also be noted that the experiments were fully recorded on video for subsequent analysis.Break times (e.g., for meals) were registered but not considered for productivity calculation.During the experiments, the developers could access all the information they needed; the only restriction was that they could not contact other developers for help.

Stage I -Briefing
Stage I was preparatory and consisted of presenting the protocol and the conditions for conducting the experiment to the developer.The use cases were presented in detail, as well as the mockups and datamodel requirements.The degrees of freedom were also defined-for example, regarding the color scheme of the graphical interface.
The importance of the final application being as close as possible to the mockups was duly stressed, as well as the need for strict compliance with the specificationsdevelopers were told to resist the temptation that "it would be better in any other way"-since a quality assessment in the final stage of the experiment was planned to consider these very aspects.Time measurement started after the completion of this phase.

Stage II -Software application development (creation)
The objective of stage II was to create a new application, following the protocol defined in the first part of the experiment.Each developer's activities were recorded on video, and one of the team's researchers was always present during this stage.Besides the programming corresponding to the defined use cases, the activities performed by the developer included the configuration of the development environments used, the creation of databases, and testing.It should be noted that the complementary activities varied significantly depending on the development technology used.

Stage III -Software application development (maintenance)
Stage III followed the same procedure as stage II, except that the objective was not the creation of a new application but the maintenance (corrective and evolutionary) of an existing application (the one created in stage II).Moreover, the activities were based on a new protocol and requirements (see appendix A.3), which was made available only after completing stage II (i.e., in stage II, the developers were not aware of the protocol for stage III).

Stage IV -Results analysis
After completing the experiments, the time records (registered manually) and the videos of the activities performed were checked to ensure the accuracy of the time counting.Furthermore, to promote greater accuracy in the calculation of the productivity made possible by each technology, a quality assessment of each resulting application was performed with the participation of at least two researchers, considering four fundamental criteria: compliance with the mockups; fulfillment of the functionalities as described in the use cases; occurrence of errors; and application performance.Note that although quality assessments of the various applications resulted in minor differences in the final productivity calculated, this had no significant expression in productivity differences among the various technologies that were part of the experiments or in the overall conclusions of the study.

RESULTS
Three experiments were conducted using the most recent versions of the selected technologies: code-based (Django/ Python 4 ); low-code (OutSystems 13 ); and extreme low-code (Quidgest Genio 15 ).All the participating developers (one per experiment) were experienced in using the target technology in a professional context.The researchers' contacts for accessing the participants determined the selection of the low-code technologies.In the case of the code-based technology, the participant chose Django/ Python (he al so had experience with several others, including PHP, C#, etc.).All the participants were familiar with the experiment domain (aware of the involved concepts) and the type of application to be developed.
For each experiment, the results (presented in table 1) are based on the variables: 3 QF (quality factor) 3 Implemented UC (considering QF) 3 Time (hours) 3 PF (without considering QF) 3 PF (considering QF) Additionally, for each variable, the results of stage II (software application creation) and stage III (software application maintenance) are presented, as well as the experiment as a whole (total).
The QF variable is related to the quality of the final product and was determined considering four fundamental criteria: (1) compliance with the mockups; (2) fulfillment of the requirements as described in the use cases; (3) occurrence of errors; and (4) application performance.
For example, if an application had a minor deviation in the implementation of a particular use case compared with the respective mockup, without any impact on functionality, the QF variable corresponding to that use case would be penalized by 5 percent.In the case of an error inhibiting the use of the functionality, however, the penalty could go up to 100 percent.
The QF variable's final value (per technology) results from the weighted average of the application's overall quality (considering the weights of the use cases).For example, a QF of 0.9 can be interpreted as the application meeting 90 percent of the specification described in the corresponding protocol.Two researchers reviewed and applied a test script created to reduce bias in quality assessment.In the end, the application performance criterion was not considered because no differences were identified among the resulting applications.If there had been such differences, it could be because of web server capacity and not the involved technology.
Thus, the implemented UC variable (considering QF) corresponds to the UC effectively implemented and is calculated by multiplying the UUCP variable of the experiment by the QF variable.
The time variable corresponds to the creation/ maintenance time of the application, measured in hours.
The PF variable (without considering QF) consists of the calculated productivity factor, having as reference only the UUCP of the experiment (that is, UUCP/Time); this variable ignores the degree of compliance (QF) with the specification in the protocol.
Finally, the PF variable (considering QF) consists of the calculated productivity factor, having as reference the implemented UC variable (considering QF).Thus, this variable better reflects productivity, since it takes into account the UC effectively implemented (considering the QF) and not simply those specified in the experiment's protocol.
DISCUSSION AND CONCLUSION By first analyzing the QF variable, it is possible to verify that, in the case of code-based and low-code technologies, a degradation of the application's quality was noted from the first part (stage II) of the experiment to the second part (stage III).The same did not happen in the case of extreme low-code technology.Given the nature of the changes in the protocols of the experiment, this should not be attributed to the technologies under study, but mainly to the limited testing carried out by the developers.
For example, in stage III, the PF of the application maintained with low-code technology was penalized in a use case implementation because of a coding error that caused the application to abort its normal operation.Nevertheless, globally, low-code and extreme lowcode technologies allowed the development of more robust applications in this experiment.It is important to stress, however, that regardless of the technology, rigorous testing cannot be disregarded in the softwaredevelopment process.
Considering the PF variable, only in the case of the codebased technology was there an improvement from stage II to stage III.This aspect must be put into perspective when comparing it with low-code technologies, since in the case of stage II of the experiment, the code-based technology required a lot of time for setup activities (e.g., database creation), which did not have to be repeated in stage III.Low-code technologies have been shown to be more effective in setup activities.Therefore, the total values (the Totals column in table I) better reflect the reality of the experiment.
Tables 2 and 3 present a comparison of the productivity verified in the various experiments.Table 2 does not consider the QF variable, whereas table 3 presents the differences by considering it.Although considering the QF variable gives more precision to the measurements, the comparison was included without considering the quality, to verify if it influenced the global conclusions.Results show that the findings of the experiments remain the same, regardless of considering QF.
Overall, in these experiments, low-code technologies have shown considerably higher productivity than codebased technology, ranging from about a threefold to a tenfold increase in productivity.
This expands prior work 21 and is in accordance with some gray literature reports, which state that developing applications using low-code technologies accelerates the process, 5 resulting in faster delivery and higher productivity. 6,16For example, research by Forrester shows that low-code platforms speed up development about five to ten times. 18ccording to Gartner, 7 low-code will account for more than 70 percent of software-development activity by 2025.This article presents one of the first research-based studies focused on productivity differences among types of development technology.
It is not without limitations, however.First, the selected technologies do not represent "all" extant low-code and code-based technologies.They include some of the most popular technologies, but many more could be part of the experiments.Second, the experiments' protocols specify a "management software" application-and there are many other types, such as multimedia.It would be interesting to study the "fit" of the different technologies, considering the application type to be developed.
Third, the protocols for developing/maintaining the application software were designed to be implemented in a short period of time by a single developer.Since the software development activity is often a collaborative process, this opens space for further research.
Finally, the participants in the experiments were all experienced developers familiar with the specific technologies they used.Their different profiles could be a source of bias.
Overall, these limitations may have a small influence in the recorded times, but do not put the conclusions into question, since low-code technologies have clearly shown

TABLE 1 :
Experiments results