Studying and Improving Software License Compliance in Practice

As the process of software development has matured, the reuse of open-source software has become commonplace. Open-source software licenses can both provide permissions and impose restrictions regarding software's distribution, modification, and reuse. Modern systems can have many such licensed components, complicating the task of license compliance and compounding the risk associated with reusing open source components. To address these issues, this dissertation seeks to identify weaknesses in current processes and automated tools, such as in handling license conflicts, exceptions, and interpretations, in order to develop new compliance tools and resources grounded in the realities of software compliance as revealed by software developers and legal practitioners.


PROBLEM AND RESEARCH STATEMENT
Open-source software (OSS) has risen to drive the software industry forward, creating an environment in which reusing existing OSS components such as libraries and frameworks has become the norm [13,16,17,25].Components are often made available through OSS licenses, which define the terms governing their distribution, modification, and reuse.Complying with these documents is critical for the continued use of open-source components, as noncompliance can lead to financial, reputational, and legal penalties [18,19,26,27,35,39].Thus, the inherent risk in using OSS necessitates the verification of license compliance.
Unfortunately, achieving compliance can be a difficult task.As the scale of software has increased, so too has the scale of compliance tasks, requiring an analysis of potentially hundreds or even thousands of OSS components [16] distributed under some of the hundreds of recognized software licenses [5, 7], which themselves may be incompatible with other licenses and are written in terms which developers may struggle to understand and apply [14,15,21,37,38].Compounded with the fact that dependencies' licenses may be updated with time [36,38], the task of verifying that a piece of software is compliant with the licenses of all of its dependencies has become daunting and sometimes infeasible without the assistance of automated tooling at large scales.
Considering the challenges and importance of license compliance, prior work has produced several processes and tools to assist with license compliance tasks, such as detecting and fixing license incompatibilities [20-24, 28, 29, 34].However, these approaches do not capture the entirety of the software licensing landscape.Firstly, while previous research focuses solely on licensing from the perspective of software developers, legal practitioners (e.g., inhouse and outside lawyers) are often involved in the compliance process.Furthermore, license compliance methods are changing.New developments and guidance, such as a recent US executive order [8], have encouraged the use of technologies like Software Bills of Materials (SBOMs), documents which inventory software's dependencies.Recently, new tools have been developed which can consume SBOMs to facilitate license compliance [10,32].Modern license compliance tools have not been sufficiently analyzed in previous research.More research into their exact capabilities is required, but we hypothesize that current tooling does not sufficiently meet practitioners' needs, given their limited ability to capture usage context and consider license interpretations, factors which can influence complex compatibility and requirements decisions.
Given these challenges, this dissertation seeks to streamline modern software license compliance practices by (1) identifying current deficiencies in existing practices and tooling from the perspectives of both software developers and legal practitioners, (2) developing novel automated tooling for license compliance which supports modern software ecosystems and provides analysis of a software project's licenses, and (3) developing educational resources grounded in an understanding of common developer perceptions which serve to close developers' knowledge gap surrounding software licensing.By carefully identifying current deficiencies, we will develop targeted solutions which will streamline this critical software engineering process.

PROPOSED RESEARCH
The goals of this research are to acquire a complete understanding of current software license compliance practices, pinpoint compliance tasks which can benefit from improved tooling and automation, assess current developers' software licensing literacy, and develop new automated solutions for licensing tasks which ameliorate pain points and deficiencies in current practices and tooling.

Understanding the Licensing Landscape
2.1.1Views and Practices.Despite the significant presence of licensed OSS in modern software development, previous work shows that it can be difficult for developers to understand software licenses and related tasks [14,15].To that end, we will investigate current views and practices in the software industry regarding software license compliance.Through surveys and interviews with practitioners at software firms and in open-source, we will assess the current landscape to identify current compliance processes.This allows us to locate any deficiencies which new solutions can address.
Similarly, by examining practitioners' views, we can assess currently held conceptions and misconceptions about licensing.Targeting any confounding areas will facilitate the development of resources to help developers better understand software licenses and licensing tasks.Moreover, developers themselves are not the only stakeholders who engage with licensing tasks.Legal practitioners also frequently play a direct role in software license compliance, but previous work has not studied this area from their perspective.By incorporating their views and practices, this dissertation will present a more complete picture of the software licensing landscape.
2.1.2Current Tooling.Several automated tools have been developed to assist with licensing tasks such as identifying dependencies and scanning for their licenses, such as Fossology [4], ScanCode Toolkit [6], and FOSSA [3].Solutions such as these can identify components and their licenses, but their limitations prevent them from directly addressing license incompatibilities and considering usage information.While some work has been done to investigate licensing tools in the past [30], the environment has changed in recent years.As SBOMs continue to grow in popularity, several new license compliance tools have emerged which take advantage of their capabilities [29], and current state-of-the-art tools like Black Duck [1] utilize them.As such, an up-to-date analysis of modern licensing tools is required to assess their effectiveness.

Improved Licensing Tools
We hypothesize that current tools do not completely meet developers' needs.While current tools are often capable of finding the dependencies or even the licenses used by a piece of software, they are limited in their ability to consider dependencies, usage information, and license interpretation questions together to provide an analysis of these licenses which aids developers in identifying specific compliance issues and licensing tasks which need to be performed [34].Some level of formalization may assist with interpretation concerns [24,31], but currently, a concrete solution to assist human analysis is needed.
To address this, we will develop new solutions to facilitate this level of automated analysis.Preliminary analysis of current needs leads us to address tasks such as identifying key license terms, identifying requirements based on detected licenses, locating any conflicts due to incompatible licenses or license terms, and accounting for any extenuating or unusual circumstances, such as license exceptions or multi-licensed components.New tools will be directed towards filling perceived gaps in currently-available functionality as determined by our analysis of current tooling.Specifically, we plan to make license compliance tools interactive, such as through a chat-bot interface, in order to facilitate (1) automatic analysis of how APIs/libraries are used after they have been identified, (2) resolution of more complex licensing scenarios, such as license incompatibilities with transitive dependencies, and (3) continual communication with humans, improving compliance via guided analysis and assisting with license interpretation questions.New approaches will be evaluated against current tools with user studies to compare their capabilities and usefulness.

Educational Resources for Developers
As software developers do not typically have legal training, understanding and applying software license terms can be a difficult task for them [14,15].One way to address this is to create resources for developers which can fill this knowledge gap.Such resources would discuss and visualize OSS licenses, their terms, and common licensing tasks in ways which developers could more easily understand.Current resources are rare and focus on key terms and selecting licenses for one's software [2, 5], not resolving conflicts or managing compliance.New resources would detail license (in)compatibilities and include interactive, rule-based guides which can be tailored to an organization's specific compliance needs.These will be evaluated with user studies to assess their effectiveness.
Another approach to this problem is to work towards incorporating licensing tasks into current educational programs for software developers, including undergraduate and graduate education.Currently, training on software licensing is often not a specific goal for students [9,11,12].As such, there is an opportunity to train future software developers to be at the forefront of software licensing by improving licensing and software supply chain education.

ANTICIPATED CONTRIBUTIONS
This dissertation aims to study current software license compliance practices in order to identify and target weaknesses which can be addressed by new tooling and resources for software developers and legal practitioners.By both improving our understanding of the human element in compliance and creating new designs for compliance tools, both software developers and consumers will benefit: risks to developers using OSS will be mitigated, and the rights of users stipulated in OSS licenses will be preserved.
Contributions will include an in-depth analysis of industry views and practices regarding OSS and software licenses, creating a knowledge base of current processes and problems rooted in the perspectives of both software developers and legal practitioners; novel approaches and tools which facilitate more advanced analysis of software's compliance requirements and resulting in a smoother compliance process and mitigated risk; and educational resources devoted to developers' licensing needs.
To fully capture the state of licensing, some of this research will be performed jointly with legal experts: we have already secured such a collaboration.Results from surveys and interviews will be quantitatively and qualitatively analyzed through an open-coding approach [33] and released in aggregate.Both software engineering and legal researchers will contribute to qualitative analysis where applicable, with any differences in classification reconciled by all parties.The effectiveness of new solutions will be assessed via user studies and comparative analysis with existing tools, and source code for tools created will be released to promote verification.
225 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.