What Do You Mean by Memory? When Engineers Are Lost in the Maze of Complexity

An accepted practice to decrease applications' memory usage is to reduce the amount and frequency of memory allocations. Factors such as (a) the prevalence of out-of-memory (OOM) killers, (b) memory allocations in modern programming languages done implicitly, (c) overcommitting being a default strategy in the Linux kernel, and (d) the rise in complexity and terminology related to memory management makes the existing guidance inefficient. The industry needs detailed guidelines for optimizing memory usage targeting specific operating systems (OS) and programming language types.


INTRODUCTION
Primary motivation.The accepted practice in software engineering is that every application should minimize its memory usage to optimize performance [6,11].Optimal memory usage helps an application achieve higher throughput, reduce the amount of paging or swapping for the OS, and decrease the resource requirements for the host system.A critical reason in the modern OSs that causes applications to measure and control their memory usage is the presence of an OOM killer.The OOM killer is a system component that terminates applications when they "use too much memory."When engineers improve memory usage, they face challenges because the optimization techniques and terminology highly differ between OSs and types of programming languages.What the term memory means is highly context-dependent and requires a non-trivial amount of deciphering to interpret correctly.
Influx in complexity and terminology.The evolution of hardware and memory management techniques has caused an increase in the ways that OSs account for and classify memory usage.A classical textbook that describes the initial architecture of the UNIX spends 38 pages on memory management [5].A modern overview of Microsoft Windows internals covers the same topic in 182 pages [15].The author of a book on a contemporary overview of Linux kernel memory management subsystem states as of writing this paper that "I have written 835 pages of a target of roughly 1,200-1,500 pages" [14].Though this is an anecdotal example, it corresponds with our industry experience while working on problems related to memory management.Thinking only at the scope of malloc() and free() has become obsolete and oversimplified.
We illustrate the increase in complexity by looking at the most famous ecosystems, such as commercial OSs that Apple and Microsoft produce and the open-source Linux kernel.In Table 1, we enumerate 20 ways macOS and its derivatives, such as iOS, iPadOS, and watchOS, categorize memory usage [8,13].Microsoft Windows introduces even more definitions, such as commit size, paged pool, non-paged pool, and reserved memory [15].If that is insufficient to confuse an average software engineer, then Linux adds terms such as PSS (Proportional Set Size), RSS (Resident Set Size), USS (Unique Set Size), and VSZ (Virtual Memory Size) to the mix [9,10].

INDUSTRY CHALLENGES
"He who controls the amount of dirty pages in kernel controls the application's lifetime and its commercial success."-A paraphrase of Frank Herbert's "Dune" [7].
Scarcity of in-depth knowledge.Most software engineers specialize in something other than memory management internals for a particular OS.Even fewer engineers have the in-depth knowledge that spans multiple OSs.Nevertheless, most popular software systems like browsers, editors, and messengers support multiple mobile and desktop OSs.A talk by Mark Russinovich, a CTO of Microsoft Azure, is appropriately titled "Mysteries of Memory Management Revealed" [12].An operating systems engineer armed with a kernel debugger has become a modern-day mystic and sorcerer.
Invalidation of existing assumptions.Starting from the initial versions of the UNIX in the early 1970s, the succinct guidance to optimize applications' memory usage has been "call malloc() less."This recommendation has become obsolete with OS development advancements, increased complexity of memory-related terminology, and heterogeneity in popular programming languages.We find two primary paradigm shifts that invalidate the existing assumptions.Firstly, overcommitting in the popular Linux kernel means that OS only allocates memory when the allocated pages are written to by a consumer [9,10].As a result, the frequency and size of allocations have lost their original meaning.Secondly, high-level programming languages have abstracted memory management away from the programmer.Techniques such as garbage collection have made allocations and deallocations implicit, seamless, and non-deterministic.
Need for mapping between intents and actions.Different platforms share the standard performance engineering goals related to memory.Engineers want to (a) avoid premature termination of their applications, (b) stay under a certain quota or limit of memory usage, and (c) optimize the metrics that matter in a particular context.Unfortunately, the current published guidance is either overtly general or limited.We observe repeated rediscovery of the same facts and that critical knowledge has become limited to a tiny group of engineers.Section 3 provides one concrete example showing the ambiguity and complexity of daily engineering tasks for iOS.

A CASE OF APPLE AND IOS
We will use iOS as a concrete example due to the popularity of Apple's ecosystem.As practicing engineers, we have witnessed similar challenges with Android, various Linux distributions, and Microsoft Windows.In February 2023, Apple stated, ". . .we now have more than 2 billion active devices as part of our growing installed base" [4].A significant portion of these devices are cell phones that use iOS.Most modern OSs support paging (writing memory pages that OS does not actively use into a secondary storage such as disk) [1].iOS does not support paging out [2,8].
As a result, memory is one of the most precious system resources on iOS.The OOM killer will terminate the iOS applications that exceed a specific limit.Therefore, each iOS application developer must understand how to prevent a situation where an OOM killer terminates the application.
An engineer who wants to understand what specific criteria OOM killer uses to terminate applications must read the XNU kernel source code.After the discovery process, the engineer will hopefully reach the definition in Listing 1 that describes the application's physical footprint [3].Like a Russian Matryoshka doll, the engineer now faces a new set of conundrums to unwrap.
Listing 1: Definition of physical footprint in XNU kernel.
Physical footprint : This is the sum of : How do Objective-C or Swift developers apply these findings?Both languages, by default, use Automatic Reference Counting (ARC).ARC, by design, abstracts memory management away from engineers.What does reducing the amount of purgeable non-volatile compressed memory they consume mean?What are the "good" and "bad" amounts?Should the engineers create fewer objects, write less code, do less of everything, or more of something else?How to even apply any typical advice to optimize memory usage in programming languages like this [6]?

INDUSTRY NEEDS
The complexity of memory management means long-term job security for operating systems and performance engineers.However, the current situation is detrimental to most software engineers who do not specialize in those fields.Determining what subset of metrics matters and how to control them to achieve a desired outcome has become a costly and involved effort.Non-specialists need concrete guidance that helps them to map their intents to specific actions depending on the context.With the rise in popularity for high-level languages such as Java, Kotlin, Python, or Swift where the memory management is not explicit, the lack of guidance has become problematic.
In our industry experience, the set of typical questions that the engineers who face memory-related performance challenges ask are the following: (1) What do any of these metrics mean?
(2) Which metrics are essential in the context of a particular application, programming language, and OS? (3) What is the desired range for a specific metric for a given application?(4) How do I modify the application to change the critical metrics in the desired direction?The research community can help practitioners by gathering experimental data to (a) determine what metrics matter (or not) for a particular intent on a specific OS, (b) how to evaluate certain metrics for programming languages that manage memory implicitly, and (c) provide guidance for languages that use garbage collection Another vital contribution is the case studies describing how practitioners have solved specific problems.A small number of software companies are developing the majority of popular applications.Sharing the existing in-house knowledge will help advance the software performance engineering field.

Table 1 :
Different memory usage quantifiers in Apple OSs.