Carbon-Aware Memory Placement

The carbon footprint of software activities is determined by embodied and operational emissions of hardware resources. This paper presents cMemento, a concept that enables operating systems to make carbon-aware memory placement decisions. Main memory has become heterogeneous in today's computer systems. In addition to traditional (and volatile) main memory (e.g. DRAM), novel memory technologies with persistent properties are often also available (e.g. PRAM, FRAM, MRAM). Complementary, there are a large number of new memory interfaces (e.g. high-bandwidth, graphics, and low-power memory) that have to be additionally taken into account by the operating system when allocating memory. The availability of new memory technologies and interfaces enables systems with improved energy efficiency. At the same time, the new memory interfaces have revealed serious flaws in the current state-of-the-art memory abstractions in operating systems. Hence, moving away from the homogeneous perspective of memory resources is a crucial step towards significantly reducing the energy consumption and, ultimately, the carbon footprint of today's computer systems. With cMemento, we propose an approach that combines information on characteristics of (i) active workloads and (ii) available memory resources with a carbon model. cMemento transforms the combined information into memory placement decisions at operating system level. The placement decisions that are made result in improved operating conditions (i.e. better energy efficiency and lower carbon footprint) for the available memory resources.


INTRODUCTION
Recent years have brought a rapid expansion of the range and variety of available main-memory technologies and architectures.When previously, evolutionary generations of DDR DRAM were the staple in the main-memory tier, today, system integrators can draw from a much-widened design space, including significant interface variations (GDDR, LPDDR, HBM) to fundamentally the same DRAM technology, up to completely new memory cell types (PRAM, FRAM, MRAM).Under such conditions, the long-held tenet of abstracting physical memory resources into a homogeneous virtual memory space visible to the programmer is beginning to show severe shortcomings [6,19,20].
Until now, memory subsystems received much less attention from efforts to improve energy efficiency-and thus linked operational carbon-impact-than active compute resources, even though memory chips and interconnects can easily draw power in the same order of magnitude as a CPU [17].Furthermore, the embodied carbon emissions, both for acquisition of novel memory technologies, as well as their limited life-time like prominently in the case of NVRAM, is not sufficiently captured by existing programming and operating system abstractions.In light of the societal importance 1 name changed for blind review Figure 1: Architecture of our carbon-aware memory placement approach carbon-FooBar. of economic and ecologic concerns in computing technology, it is crucial to leverage each memory technology to its fullest potential to meet both performance and carbon-efficiency objectives.
Thus, this paper presents carbon-FooBar, a concept that enables operating systems to make carbon-aware memory placement decisions.The operating system and system software offer the necessary interfaces to improve and extend existing abstractions and mediate between programmer and hardware perspectives.carbon-FooBar refrains from burdening programmers with an unfiltered view of raw memory-resource characteristics.Instead, the flow of information is inverted by giving systems software explicit or implicit knowledge of workload in addition to memory resource characteristics.This establishes a common point where carbonefficient memory placement decisions can be made.
This paper makes the following four main contributions: (i) the carbon-FooBar approach to collect and apply necessary information for carbon-aware memory placement decisions.This includes analysis methods at development time, profiling at runtime, as well as, characteristics of memory resources at system-setup time.
(ii) the design of a carbon model to determine the embodied and operational carbon emissions of memory allocations.(iii) the design of a operating-system component, Memory Governor, for carbon-aware memory placements.(iv) a case study for different types of memory allocations (i.e.different lifetime and access patterns) and types of memory resources.
The rest of this paper is structured as follows.Section 2 discusses energy-FooBar, the underlying concept for carbon-FooBar.Section 3 outlines the design of the Memory Governor and Section 4 discusses placement requirements for carbon-aware memory placement.We discuss the carbon model of carbon-FooBar and a case study of different memory types in Section 5 and Section 6, respectively.Section 7 gives an outlook on future work, Section 8 discusses related work, and Section 9 concludes this paper.

THE ENERGY-FOOBAR APPROACH
The trend to compose machines from heterogeneous memory technologies poses new challenges to systems software.On the one hand, overly generic placement policies fail to capture the optimisation potential.On the other hand, system operators can not be expected to customise placement strategies for each machine under their responsibility.However, existing placement strategies only have limited means (e.g.NUMA domains) to cope with this new heterogeneity.In particular, they miss empirical analysis mechanisms to adapt to evolving workload and resource conditions.
The scope of energy-and carbon-FooBar are machines with one common physical address space and a heterogeneous set of memory technologies.However, scenarios beyond this scope, for example with disaggregated memory or RDMA, are discussed in Section 7. One key design decision is the granularity at which placement decisions are optimised.Optimising at a very fine-grained level (e.g.single memory accesses) may theoretically yield the best results but comes with high overheads.Vice versa, optimising only at coarse granularity (e.g.virtual address spaces) has little overhead but also limits the optimisation potential.We believe that optimising at buffer granularity best fits existing programming paradigms.As developers are experts in the dynamics of their applications, they tend to organise their data into objects or buffers (either by manual allocation or as part of the programming environment).
Our envisioned architecture is illustrated in Figure 1.The left side shows the empirical analyses, which generate characterisations of workloads (top) and the memory resources (bottom).These characterisations are used by the tools on the right side (i.e.carbon model and placement requirements), which aid the Memory Governor to make placement decisions.The characterisation of memory resources includes their performance and energy behaviour during representative access patterns.Once collected, this information is utilised by the carbon model to predict the energy demand and carbon emissions.Conceptually, the workload characterisation consists of three parts: a) explicit requirements at development time, b) static analysis ahead of runtime, and c) profiling/tracing at runtime.Explicit Requirements.We believe it is beneficial to allow developers to share their specialised knowledge of workload behaviour.Therefore, we provide the possibility to explicitly state functional (e.g.persistent vs. volatile) and non-functional (e.g.latency bounds) requirements.In order to retain development efficiency and backwards compatibility, this source of information is optional and thought for especially critical application data structures.
Static Analysis.Although only capable of capturing memory behaviour known ahead of runtime, static code analysis can yield essential information about memory allocations.The analysis can be conducted once and later reused for every workload start.
Profiling/Tracing.Workload behaviour at runtime is observed by means of performance monitoring counters (PMCs).This allows our system to respond to changing workload conditions.
The three different information sources are consolidated into a unified workload characterisation using the energy-FooBar notation, which is designed to describe workload behaviour.This characterisation is used to to determine possible placement locations and the carbon footprint of a placement.Eventually, the Memory Governor uses these information for memory placement decisions.

CARBON-AWARE MEMORY GOVERNOR
The Memory Governor is the core component of the carbon-FooBar approach.It is part of the operating system and responsible to automatically make efficient memory placement decisions.The name Memory Governor follows the example of similar components managing global strategies for shared resources.For example Linux implements a CPU frequency governor, which globally balances the CPU-time demands of all running applications and optimises the CPU frequency according to an optimisation goal (e.g.energy efficiency).In the Memory Governor case that means, managing the globally available memory resources and responding to memory allocations in a carbon-efficient way.
Figure 2 visualises the underlying concept.The black rectangle comprises all available memory resources.Thereby, each cross represents a single resource (or part of a resource).Due to placement constraints (blue), only parts of the globally available memory resources can be considered for a memory request.Placement constraints can be, for example, depending on functional properties like non-volatile byte-addressable memory (NVRAM).The decision space can be further restricted by administrative constraints (grey) imposed by system operators.
In order to fulfil its task, the Memory Governor needs to keep track of the available and free memory resources.For each memory request it calculates the possible placements, that is, the intersection of placement and administrative constraints.The remaining possible placements constitute the decision space for the Memory Governor, in which it tries to satisfy its target (i.e.carbon efficiency) in a best-effort approach.It can therefore utilise the sensitivity of a workload to a specific hardware resource (see Section 4).
Due to this design and by implementing the Memory Governor within the operating system, it can also limit the wear and tear of memory resources with limited lifetimes (such as NVRAM).As the carbon emissions to manufacture memory (embodied carbon) can constitute a significant amount of the total carbon emissions, this is necessary to be truly carbon aware.As the Memory Governor has a system-wide view on all resources and the carbon intensity of the current power supply, it can balance all of these constraints.A second reason for implementing the Memory Governor within the operating system is the observation that memory allocations occur frequently.Hence, the Memory Governor requires an efficient implementation.Existing implementations of model-based placement strategies [21] show that efficient and practical implementations are possible, and provide a starting point.

PLACEMENT REQUIREMENTS
Communicating placement requirements from development time to runtime requires a standardised exchange format.To interface between workloads and the Memory Governor, we need to transform workload behaviour and sensitivity to resource characteristics into placement requirements.In this context, sensitivity stands for how a workload will react to a memory resource with specific functional and non-functional properties.Our proposed interface is twofold: an Application Programming Interface (API) for explicit specification on an allocation granularity for developers, as well as, a mechanism to deduce implicit sensitivity from observed workload behaviour.
Instead of mere library calls to an explicit API, we propose a more accessible integration with existing programming language features such as function decorators or scoped allocators to offer a low-overhead specification on how to serve specific buffers.This way, requirements and changing sensitivity can be denoted for a limited section of the application source code.They can also be inherited over source-code sections and can be lifted automatically as the defined scope is left.
Listing 1 provides a brief example of the envisioned API usage with scoped allocators in C++, where different memory requirements, such as latency and randomAccess, are expressed, reflecting their performance impact on the respective buffer.By overwriting the standard memory-allocation functions, the proposed API tracks each distinct memory allocation.The allocation is then identified as a logical buffer and associated with the currently active memory requirement set.The identified buffers and the per-buffer resource requirements are passed to the Memory Governor using sensitivity weights to specify the priority for independent characteristics.In the early stages of the Memory Governor implementation, the explicit API affords an early validation point, as the applicability of the chosen requirements and sensitivity format can be evaluated without relying on sophisticated automatic mechanisms for an implicit deduction.The second step, however, comprises the automatic transformation of the behavioural data about workloads gathered during runtime.Said behaviour model is build based on benchmarks, which are tailored to fill in the parameters like latencies and access granularities as well as the read and write dynamics of memory technology and controller.Using profiling, a previously unknown workload can be classified along those parameters to match sensitivity classes.To that end, static code analysis can be another ways to recognise pre-defined classes, although the expected complexity is much higher.

MEMORY CARBON MODEL
The carbon footprint of memory accesses is the basis for carbonaware and carbon-efficient memory placement.The memory carbon model estimates the carbon emissions of memory placements depending on the memory-access behaviour and utilised hardware.We identify two use cases for these estimations relevant to this work: a) as a tool for developers to analyse their software's memoryrelated carbon footprint ahead of runtime.b) as an integrated component within the operating system at runtime whose estimations are passed to the Memory Governor.
With the energy-FooBar notation and memory characteristics as input, the model derives the carbon footprint of placement decisions.Based on the model's operational and embodied carbon estimations for different options, the Memory Governor determines the most carbon-efficient placement.Central to the carbon model are two parts: a) the carbon metric used for calculating the carbon footprint associated of a memory placement and b) the energy model used for calculating operational emissions of memory accesses.Section 5.1 introduces the carbon metric utilised by the carbon model: the Software Carbon Intensity (SCI) specification [12].The underlying energy model is presented in Section 5.2.

Carbon Metric
As a metric for assessing the carbon footprint of memory placement decisions, the carbon model utilises the Software Carbon Intensity (SCI) specification [12].The SCI was suggested by the Greensoftware Foundation as a standard metric for the carbon footprint of software and is currently under review at ISO.The SCI takes into account both operational carbon emissions () and embodied emissions of the hardware ().The carbon footprint is then derived per application-specific units of work (): Operational Carbon.The operational carbon emissions of memory placements are determined by the operational energy demand  of memory accesses and the carbon intensity  of the energy supply.The SCI, thus, expresses the operational carbon emissions as Deriving the operational energy of memory technologies under different access patterns requires the use of a separate energy model, which is further discussed in Section 5.2.The carbon intensity of the available energy mix has to be constantly monitored and provided as input for the carbon model.These values can either be obtained from the local energy providers or from providers such as ElectricityMap [10] that analyse the carbon intensity per country.
Embodied Carbon.For a holistic view of the carbon footprint of memory placement strategies, the carbon model also considers emissions related to production and disposal of hardware.The SCI proposes the following equation for embodied carbon emissions: The embodied carbon emissions attributed to memory placements are, thus, determined by the following factors: The memory's total embodied emissions ( ).The share of the memory's lifespan taken up by the memory placement ( ).The share of the available resources ().The total embodied emissions of a given memory component can be obtained from hardware vendors.Considering both   and , the carbon model differentiates between two classes of memories: a) wear-sensitive memories and b) wear-agnostic memories.Wear-sensitive memories show limited endurance.Due to their physical characteristics, these memory technologies are expected to fail after a number of accesses.Thus, each access can be attributed a proportional share of the total embodied emissions.For example, flash-bashed and phase-change-based memories such as Intel Optane can be classified as such memories [2,5].  is, therefore, determined by the ratio of the memory accesses related to a placement decision and the overall available accesses until expected failure.The number of memory accesses related to the placement of a buffer is obtained from static or dynamic analyses (see Section 4).As the wear affects the entire memory resource (e.g. a DIMM)  equals one.
The second class, wear-agnostic memories, is not limited in their expected lifespan by individual memory accesses.Memories like DRAM are typically expected to outlive their system.Therefore, their expected lifespan is set to the system's anticipated lifespan.With this class of memories,  equals the share of the memory capacity in use for the memory placement.

Energy Model
An integral part of determining the operational carbon emissions is evaluating the energy behaviour of the underlying hardware.This behaviour is incorporated into our memory carbon model in form of an energy model.The model's input consists of the following: Memory Access Behaviour.The energy demand of using memory heavily depends on how the memory is accessed.Therefore, the energy model uses the energy-FooBar notation to represent a workload's memory access behaviour (e.g.determined by the runtimebehaviour monitoring as described in Section 2).
Allocation Granularity.The Memory Governor is required to make carbon-efficient memory placement decisions at different granularities.For example, what are the costs to place a huge vs. small buffer in NVM memory?How do the costs change when HBM memory is used?To account for the difference in scope and granularity, the energy model retrieves the granularity as input.
Hardware.The energy demand, and therefore the energy model, is highly hardware specific and receives hardware characteristics as input.We use a top-down approach to combine general models and precise hardware-specific models.Therefore, we create general models for similar hardware, for example, one for NVRAM and one for HBM.These general models can be refined into a specific models (e.g. the NVRAM model into a PC-RAM model).
The implementation of resource models in general and energy models in particular often utilises machine-learning techniques.Both, simple linear [15] and ensemble models [26], as well as sophisticated techniques based on neural networks [16], have shown great results for resource and energy models.These techniques differ in expressiveness and accuracy on the one side and training and execution costs on the other side.

CASE STUDY
To illustrate the decision space of the proposed Memory Governor, we analyse an exemplary system that contains 3 different memory technologies: DRAM, Viking NVDIMMs, and Intel Optane.The device is equipped with 256 GB memory of each type.We determine how the carbon intensity of various workloads changes in environments with different energy supplies ( ).Table 1 and Table 2 lists the operational and embodied emissions, respectively.

Carbon Emissions
Embodied Carbon.The embodied carbon in DRAM is well understood.Both, the authors in [30] and [13] report around 5.0 kg CO 2 e for 16 GB DRAM.In contrast, only little information on the embodied carbon of NVM is available.In the absence of any lifecycle assessments of Intel Optane and Viking NVDIMMs, we estimate their properties using related components.For Intel Optane, we use an SSD as substitute.The embodied carbon for a 256 GB SSD varies between 50 kg CO 2 e [29], 18.7 kg CO 2 e [30], and 7.7 kg CO 2 [13] 3 .For this case study, we use the median of those sources, which is 18.7 kg CO 2 e. Viking NVDIMMs consist of DRAM that is written to an SSD in case of an outage.As such, we estimate the embodied carbon as the sum of the costs for the DRAM and SSD of equal size.
Operational Carbon.For DRAM, we assume an energy consumption of 0.4 W/GB.Since RAM needs to be constantly refreshed, we attribute that amount of power consumption for the entire lifetime of the allocation.SSDs have a negligible power consumption while they are not being accessed and thus we attribute operational carbon to a workload on a per-access basis.The carbon emitted by a single access is denoted in Table 2. Viking NVDIMMs only access the DRAM during normal operation and only write to the SSD in the event of a power failure.As a result, we treat this kind of memory like DRAM when calculating operational carbon.

Workload Analysis
Using the base emissions determined above, we study the emissions of three different workloads.We show that the ideal placement for allocations both depends on the type of workload and the current carbon intensity of the energy supply.Each workload allocates a resource share () of 4 kB ( = 1.6 • 10 −8 for all cases).The workloads differ, however, in how long the memory is allocated and their access pattern.DRAM is treated as wear-agnostic, while Intel Optane is wear-sensitive.Viking NVDIMMs only write to the SSD on a power failure, so they are treated as wear-agnostic.For Carbon-Aware Memory Placement wear-agnostic memory, we assume a lifetime of five years.For Intel Optane, we assume an endurance of 150 TB written.
Scenario A: Short-Lived Allocation, Single Access: This workload allocates the memory for 10 µs and accesses it only once.There are no placement requirements for this allocation.This results in a proportionate life span of   = 6.3 • 10 −14 for wear-agnostic memory and   = 2.7 • 10 −11 for wear-sensitive devices.For this type of allocation, putting the allocation into DRAM is the optimal placement strategy for all energy types.In the energy mix the allocation only emits 1.6 • 10 −15 g CO 2 e when placed in DRAM, while it would emit 5.4 • 10 −10 g CO 2 e when placed in Intel Optane.
Scenario B: Medium-Lived Allocation, Many Accesses: This workload allocates the memory for 50 ms and accesses it 100 000 times.The placement is restricted to persistent memory.This results in   = 3.2 • 10 −10 for wear-agnostic memory and   = 2.7 • 10 −6 for wear-sensitive devices.Since this allocation requires placement in a non-volatile memory, placing it in DRAM is not an option.Due to the wear-sensitive nature of Optane it is highly inefficient for this kind of memory placement (5.3 g CO 2 e when using the energy mix), making the Viking NVDIMM (8.2 • 10 −12 g CO 2 e using the energy mix) the best option across all energy sources.
Scenario C: Long-Lived Allocation, Few Accesses: This workload allocates the memory for 5 s and accesses it five times.There are no placement requirements for this allocation.This results in   = 3.2 • 10 −8 for wear-agnostic memory and   = 1.3 • 10 −10 for wearsensitive devices.For this type of allocation, the ideal placement depends on the current energy source.DRAM is the ideal placement decision for this allocation when using wind energy with 6.4 • 10 −11 g CO 2 e (8.0 • 10 −11 g CO 2 e for Optane).However, when using more carbon-intense energy sources like the mix, natural gas, or coal, the operational carbon outweighs the embodied carbon so that Optane becomes more efficient (8.1 • 10 −10 g CO 2 e for DRAM and 2.7 • 10 −9 g CO 2 e for Optane when using the mix).

OUTLOOK: PROVISIONING SYSTEMS
The components presented with the carbon-FooBar approach, namely the carbon model, the collected memory characteristics, and the analysis tools, serve the Memory Governor to make decisions for a system at runtime.However, they can also be used to reason about the carbon footprint of systems ahead of runtime, especially for provisioning new installations.For a specific example: Recent developments in coherent interconnects such as CXL [27] open the design space beyond the choice of memory technologies to entirely new system topologies.With CXL, physical memory transactions that would have been handled by a local memory controller can be forwarded to and handled by remote machines, providing an efficient path for memory disaggregation schemes.Unused memory on one machine can be donated to another machine that would otherwise have exhausted its physical memory capacity, albeit with a latency and bandwidth penalty shown to be slightly larger compared to remote memory accesses in large NUMA systems [25,28].
Multiple lean machines with low embodied carbon can offset their interconnect overhead compared to a single complex machine.This strongly depends on the workload, as high interconnect traffic between machines may cause a disproportionately large operational carbon footprint.However, for a different workload, allocations may rarely transition between the cache subsystem and physical memory, so both performance losses and communication energy costs will be low.In all three situations, the carbon-FooBar workload behaviour model combined with the carbon model can help to anticipate the actual trade-off between embodied carbon savings and operational costs.Additionally, it supports provisioning decisions, complementing previous solutions [1], with performance predictions to ensure operational objectives can be met.

RELATED WORK
Research has shown the importance of incorporating system software for effective system-wide energy management [11,35].At the same time, advances in operating systems established support for new hardware properties such as non-volatile, persistent memory [4] in embedded systems [9], data-centre systems [7], and large-scale main-memory database systems [23,24] as well as disaggregated memory systems [36].The idea of carbon-aware workload placement, based on application SLAs has been discussed before [3], but with a stronger focus on compute jobs in a cluster compared to the proposed carbon-FooBar approach with memory placements within one system.In addition, our position paper also includes the impact of endurance and embodied carbon, which allows for other resources to be used, depending on the excess low-carbon energy supply [31] and the production conditions [18].However, the combination and joint use of different memory technologies in a single, composable system requires additional support.Special-purpose solutions have been explored in individual cases [33,34] but generic approaches at the operating system level are still missing.To make efficient use of different types of memory, the operating system needs to adapt applications at runtime (i.e.depending on memory access patterns) to the available hardware resources (i.e.type and size).The idea of scoped allocators to denote requirements for specific buffers has been demonstrated previously in the context of NUMA-aware data placement [14].Although that work targeted quantitative requirements like latency and throughput, it did so by specifying the desired NUMA node (explicitly or implicitly) rather than by weighted resource characteristics.

CONCLUSION
The carbon impact, both embodied and operational, is crucial in the design and operation of computer systems.Heterogeneous systems, composed of novel memory interfaces and cell types, require system software and programming models to catch up to the opening gap between existing abstractions and the underlying technologies.
In this paper, we proposed our carbon-FooBar approach towards carbon-aware memory placements with novel memory technologies.carbon-FooBar builds on the notion of a Memory Governor, an operating system component that combines knowledge of workload behaviour and sensitivity with available hardware characteristics for allocating memory resources energy-efficiently.Our envisioned energy-FooBar notation forms a central vehicle for exchanging and persisting this information.We outlined the interactions and information flow between different components in our architecture that help to tune and balance system energy consumption, carbon-intensity, and modelled embodied carbon through efficient memory placement.

Table 1 :
Embedded carbon cost for different memory classes.

Table 2 :
Operational carbon cost to access 4 kB of memory.