An Inconvenient Truth in Software Engineering? The Environmental Impact of Testing Open Source Java Projects

As we have come to rely on software systems in our daily lives, we have a clear expectation about the reliability of these systems. To ensure this reliability, automated software quality assurance processes have become an important part of software development. However, given the climate crisis that we are witnessing, it is important to ask ourselves what the impact of all these automated quality assurance processes is in terms of electricity consumption. This study explores the electricity consumption and potential environmental impact of continuous integration and software testing in 10 open source software projects.


INTRODUCTION
As we have grown accustomed to living in a software-filled world, we are also more and more relying on software for everyday tasks.Because of our reliance on software, its reliability is indispensable [20].For example, it has been estimated that software failures in 2017 cost the economy $1.7 trillion [25].Additionally, Ko et al. report on software failures that can be directly linked to the loss of 1500 human lives [24].In this light, the role of software quality assurance becomes ever more important.
To ensure the quality of software systems, software engineers have a variety of quality assurance approaches at their disposal.Some popular approaches are: software testing [1,4,7,21], modern code review [3,5,10,21], automated static analysis [6,16], and build automation [8,11,18,29].Of the four aforementioned approaches, software testing, automated static analysis, and build automation are automated and run on the workstations of software engineers, or are run through continuous integration services [8].
While we acknowledge that reliable and robust software is of the utmost importance, we cannot neglect that Information and Communication Technology (ICT) is a growing concern in the climate change debate.It has been estimated that in 2020 the energy consumption of the ICT sector reached 15% of the world's total energy consumption [15], and it has been predicted that the ICT sector could consume up to 20% of the world's electricity by 2025 [13].How that use of electricity translates to the environmental impact is tightly related to the carbon intensity of the electricity.The carbon intensity expresses the "cleanness" of the produced electricity, i.e., it specifies how many grams of CO 2 are released to produce a kilowatt hour (kWh) of electricity [30].The carbon intensity depends on how the electricity was produced, e.g., through renewable resources, or using fossil fuels.
Depending on the study, the overall ICT carbon footprint is broadly estimated to be between 1.8% to 3.9% [14] of the total greenhouse gas emissions as of 2020.While the impact of ICT seems modest when compared to sectors like transportation (27%) and the manufacturing industry (24%) [23], there seems to be consensus that "urgent policy action and investment are needed to limit increases in energy use driven by increasing demand of ICT services" [14].
Our exploratory study fits in this call to arms, as we explore -and hope to create awareness on -how popular software engineering practices are contributing to the consumption of electricity.While Pang et al. indicate that software engineers typically have little knowledge of energy consumption [28], Chowdhury et al. [9] and Verdecchia et al. [35] rightfully point out that software engineers need to have awareness about and feedback on energy consumption before they can adjust their programming practices and behaviour.
Our particular focus for this study are automations of popular software engineering practices, particularly those that we execute frequently, often without thinking about them.Two prime examples of such automations are software testing, and continuous integration.In particular, we aim to investigate how frequently they are executed and what the impact of testing and performing a complete build is in terms of electricity consumption.
Our guiding research question is the following: RQ: What is the energy impact of automated software testing and continuous integration in open source software development?
Our exploratory results indicate that there is great variety in the energy consumption among projects for these quality assurance practices.A striking example of a project that consumes quite a bit of energy is the Elasticsearch project: it was built 5025 times in 2022, leading to an estimated yearly energy consumption of ∼161.5 kWh for building this project on an AMD Ryzen 7 CPU.This level of energy consumption corresponds to ∼9.7% of the yearly average household energy consumption of a citizen in the European Union.

STUDY SETUP
We have a very clear understanding that we can in absolutely no way be complete in our investigation to estimate the energy consumption of quality assurance practices in open source software (OSS).Essential reasons for this incompleteness are: (1) We would need to have access to the precise hardware on which the build and test actions take place and be able to measure the precise power consumption.(2) We would need to build all projects on GitHub to get a complete picture.This is infeasible from (1) a time perspective, and (2) from the perspective that it is non-trivial to get OSS projects to build out of the box.In particular, Khatami and Zaidman have shown that around 47% of the Java projects they considered for their study run out of the box, i.e., without making major changes to the configuration of their system [22].Similar numbers have been reported by Hassan et al. (46%) [17] and Sulir et al. (41%) [31].(3) We would need to take multiple environments into consideration, i.e., both the workstation environment, i.e., the hardware on which the software engineer would locally run tests, e.g., in the IDE [7] or command line, and the continuous integration environment [8], i.e., a server or cloud environment on which a complete build-test cycle is performed after a commit to version control.
As such, we fully acknowledge that our investigation is (1) exploratory in nature, (2) composed of a convenience sample in terms of projects, i.e., those projects that we could build locally "out of the box", i.e., we did download specific versions of the development kit, or specific compilers, but did not make changes to the source code, and (3) the energy measurements come with a number of important assumptions (see below).

Two evaluation platforms
We opted to run our energy evaluations on two separate platforms: • A Raspberry Pi 4 B1 .This mini computer is equipped with a 1.Our choice for these two platforms was instigated by the fact that both these devices are USB-C powered and contain no battery.As such, we could monitor their power usage with the CT-3 power meter from AVHzY 3 .We used the Shizuku Toolbox to read out electricity measurements from the CT-3 power meter 4 .This begs the question of how realistic these platforms are in terms of actual computational performance versus electricity consumption.On the one hand, the Raspberry Pi platform features an ARM Cortex-A72 processor design that is in use in many midrange smart phones (e.g., Samsung's Galaxy A9).The Raspberry Pi's overall electricity footprint is less demanding with a 7.5W TDP 5 .On the other hand, the Minisforum Mini PC contains more realistic hardware with the AMD Ryzen 7 6800U chip that has an adjustable TDP of 15-28W.The chip is in use in popular notebooks such as the Asus Zenbook S 13.However, we assume build farms to contain more powerful processor designs.For example, Amazon Web Services uses Intel Xeon processors 6 that have a minimal 85W TDP 7 .We thus start from the assumption that our electricity measurements are likely at the lower end of the spectrum.
Another important factor is the precision of the electricity consumption measurement.We explicitly opted to measure the electricity consumption at the hardware level, and not at the software level.While measuring at the software level would be more convenient, for example, see the PeTra tool for Android electricity measurements [26], it is also less precise as it typically only considers CPU usage [19].An alternative to a USB-C power meter would be the Monsoon power meter [2].We avoided using a USB-C laptop, as there could be an energy draw from the battery.

Energy simulations
Because of the aforementioned assumptions that we make, and because we randomly select a commit from the year 2022 that builds successfully and that we assume to be representative for the energy consumption of all builds of a project, we refer to our study as an energy simulation study.We simulate two particular scenarios without Docker, as explained in our replication package [36]: Scenario 1: Run a full build + tests.This scenario roughly corresponds to a Continuous Integration build.In this scenario we ensure that the project is clean and that the cache of the build system (Maven or Gradle) is empty.As such, during the build dependencies are downloaded, the entire project is built, optional analyses are executed (e.g., static analysis, or code coverage measurements), and the tests are run.We use ./gradlewbuild or mvn install, unless otherwise specified by the documentation.
Scenario 2: Run all tests.This scenario roughly corresponds to a developer executing all the tests locally (commandline, outside of the IDE).The project is built, and we simulate the electricity consumption of running the tests.We use ./gradlewcleanTest test or mvn test, unless otherwise specified by the documentation.

RESULTS
The results of our electricity consumption simulations can be observed in Table 1; the table also indicates the exact commit that we have considered, an estimation of the number of tests (a simple search for the occurrence of @Test for JUnit projects), and the number of commits on GitHub in 2022 for the particular project.Single builds.When we examine the results of individual builds, we observe a range between 442 mWh (Apache Maven) and 8050 mWh (Apache Druid) for the Raspberry Pi platform.Similarly, for the Minisforum EM680 we observe a range between 1573 mWh (Apache Maven) and 32131 mWh (Elasticsearch).The more powerful Minisforum EM680 platform uses roughly ×4 more energy for a build compared to the low-power Raspberry Pi 4 B.
Single test suite runs.Switching our attention to the energy consumption of running the test suite, we observe that for both platforms the energy consumption of a test run is typically quite a bit less than for a full build.The two extremes are still Apache Maven and Elasticsearch with an energy consumption of respectively 286 mWh and 23308 mWh.This corresponds to ∼65% and ∼73% of the energy consumption of a full build.On the faster Minisforum EM680 platform, we also observe that test runs require less energy compared to full builds, but do make the observations of the high variance between build and test energy consumption, ranging from 20% for JUnit5 to 88% for Google Guava.This is something that we aim to investigate more deeply in future work.Turning our attention to the yearly energy consumption, an initial observation is that higher yearly energy consumption mainly stems from a higher number of commits.For example, while a single build of Apache Flink on the Minisforum EM680 platform consumes 7.379 Wh, the yearly energy consumption is 23.746 kWh due to the 3218 builds that we observed on GitHub.In contrast, we see that a project like LinkedIn Cruise-control had fewer contributions in 2022 and only required 68 builds on GitHub, leading to a yearly energy consumption for building it of 0.612 kWh (Minisforum EM680).
Initial insights with regard to Research Question.Individual runs seem to have a rather small impact in terms of electricity consumption.For example, the most energy intensive build can be observed in the Elasticsearch project (32131 mWh on the Minisforum EM680 platform).The yearly energy consumption for building Elasticsearch is ∼161 kWh, which is due to the high number of 5025 commits (and builds) that Elasticsearch underwent in 2022.

DISCUSSION
Electricity consumption in context.To put the energy simulation data of Table 1 into context: recharging your smartphone battery from 0 to 100% daily leads to a yearly energy consumption of ∼2 kWh 8 .At a macro level, Table 2 shows the yearly household energy consumption per citizen of a number of countries.Taking the average energy consumption of an European citizen (EU-27, the 27 countries part of the European Union), we see that this equates to 1.67 MWh.Relating this to our energy consumption simulations, we can thus see that the yearly builds of the most electricity-intensive project in our initial dataset, namely the Elasticsearch project corresponds to ∼9.7% of the average household energy consumption in the European Union (based on the number of commits in 2022 and simulated on the Minisforum EM680).
Greenhouse contribution.The carbon intensity is a measure of how "clean" the electricity is. 9 It is determined by the fuel mix used in the generation of the electricity. 10As Table 2 shows, the carbon intensity, expressed in grams of CO 2 per produced kWh, varies greatly per region: 110g of CO 2 in Canada versus 531 in China.As such, it is difficult to establish how polluting building and testing your software is, as we need to know the carbon intensity of the electricity used to calculate the precise CO 2 emissions.If we were to assume that Elasticsearch was built in Europe, the yearly emissions would amount to 53.774 kg of CO 2 .
Two platforms.We initially started our investigation with the Raspberry Pi platform, but considering the number of test runs that timed out, we switched to the Minisforum EM680 that launched in June 2023.We present data of both platforms to indicate the difference in power consumption depending on the platform.More specifically, we measured the power consumption while the computer was idle for exactly 1 hour: 1.773 Wh on the Raspberry Pi and 12.7 Wh on the Minisforum EM680.

Threats to validity
Construct validity.The documentation of the CT-3 powermeter that we use in our study reports a voltage resolution accuracy of 0.0001V.While we have not further tested this, we consider the deviation small enough to not influence our initial observations.The energy measurements might also be influenced by factors such as the network: if the network is congested, longer waiting times before dependencies are downloaded might occur and these waiting times might influence the energy consumption.
We did not run all build and test cycles multiple times due to time constraints; some single runs took a full day to complete.For Apache Maven, Apache Seatunnel, and LinkedIn Cruisecontrol we did run the full build 3 times and observed a coefficient of variance of respectively 0.009, 0.1, and 0.005, which can be characterised as low variance.In future work, we will solidify our findings by running the energy simulations multiple times for all projects.
External validity.The exploratory results in this paper might not be representative, as we (1) only consider a small set of software projects, and (2) simulate the energy consumption on two platforms that are known to be energy efficient.Future work needs to measure energy consumption on the actual hardware in use in data centers.

RELATED WORK
Verdecchia et al. take a broad look at how to make digital infrastructures more sustainable: making the software itself more energyaware, e.g., by automatically killing zombie processes, is one of the proposed solutions [33].Other investigations have focused on making deployed software more energy conscious, e.g., Hindle has presented the Green Mining approach to study energy consumption differences between commits [19].Similarly, Di Nucci et al. have focused on reducing the energy consumption of Android applications [27].
In other work by Verdecchia et al., it is claimed that "testing not only consumes most of the time and effort in a software project, but it also heavily contributes to energy waste", without empirical foundation [34].In contrast to the aforementioned studies, our paper contributes initial empirical insights into the electricity required to execute test suites and run continuous integration builds, in other words we focus on the energy consumption of (parts of) the software development process, not on the deployed software.

CONCLUSION
We have carried out an exploratory study into the energy consumption of two frequently executed software quality assurance mechanisms, namely continuous integration and testing.While we have merely simulated these two mechanisms on low-power hardware, we see indications that individual builds do not consume that much electricity.For example, 32 Wh for building Elasticsearch on the Minisforum EM680 platform with an AMD Ryzen 7 6800U CPU.Depending on the project and a myriad of factors like the size of the test suite, the elements composing the build, the number of dependencies, etc., the testing phase of the build consumes between 20% and 88% of the total energy consumption of a full build.
When considering simulated yearly totals, we do observe that somewhat larger projects like Elasticsearch that are built 5025 times yearly do consume considerable amounts of electricity: ∼161 kWh for all CI builds executed in 2022.This level of energy consumption corresponds to ∼9.7% of the average household energy consumption of a citizen of the European Union.Simulating the environmental cost in terms of CO 2 emissions for Elasticsearch when building it in the European Union would amount to 53.774 kg of CO 2 , the equivalent of driving an average petrol car for 222 kilometers 11 .
If we want to steer away from the climate crisis that we are currently experiencing, we will need to (1) further investigate the power consumption of our routine software engineering practices, (2) create awareness among software engineers of their energy impact, and (3) come up with solutions to reduce our energy footprint.

FUTURE PLANS
• Constructing an automated pipeline for energy measurements that would enable to simulate energy measurements at scale, for a variety of programming languages.• Performing fine-grained measurements to isolate several steps in the build process, e.g., assembling dependencies, static analysis, compiling, testing.• Investigating how we can reduce the number of builds by balancing (1) quick feedback to developers, and (2) energy consumption.

Table 1 :
Results of electricity consumption simulation using Raspberry Pi and Minisforum EM680.Single measurements are in mWh (milliwatt hour), yearly estimations are in kWh (kilowatt hour).

Table 2 :
[12,32]city consumption in the household sector per capita in a selection of countries and the associated carbon intensity of electricity production.Unless otherwise indicated, data comes from[12,32].