Decision Making for Self-Adaptation Based on Partially Observable Satisfaction of Non-Functional Requirements

Approaches that support the decision-making of self-adaptive and autonomous systems (SAS) often consider an idealized situation where (i) the system’s state is treated as fully observable by the monitoring infrastructure, and (ii) adaptation actions are assumed to have known, deterministic effects over the system. However, in practice, the system’s state may not be fully observable, and the adaptation actions may produce unexpected effects due to uncertain factors. This article presents a novel probabilistic approach to quantify the uncertainty associated with the effects of adaptation actions on the state of a SAS. Supported by Bayesian inference and POMDPs (Partially-Observable Markov Decision Processes), these effects are translated into the satisfaction levels of the non-functional requirements (NFRs) to, therefore, drive the decision-making. The approach has been applied to two substantial case studies from the networking and Internet of Things (IoT) domains, using two different POMDP solvers. The results show that the approach delivers statistically significant improvements in supporting decision-making for SAS.


INTRODUCTION
Self-adaptive and autonomous systems (SAS) are expected to apply decision-making techniques under uncertainty.As such, the decision-making technique of an SAS dictates the execution of adaptation actions according to unanticipated events [5].Such adaptation actions impact the state of the SAS to maintain the required satisfaction levels of the non-functional requirements (NFRs).Further, trade-of analysis of the NFRs is crucial in establishing the expected balance among them [15,29].
While many decision-making techniques assume that adaptation actions have invariant efects on the state of the SAS, we argue that such efects may be stochastic and can change over time according to environmental and contextual luctuations.As an illustration, let us focus on the efects of adaptation actions on the system properties of a SAS, such as performance or reliability, which may vary over time due to changes in the environmental context.For example, the action of sending a data package through a network has an expected delivery rate with a speciic efect on the reliability of the system.However, the delivery rate may luctuate due to uncertain factors such as software or hardware failures.Further, existing decision-making techniques frequently assume full observability of the efects of adaptation actions based on the monitoring infrastructure [4, 7, 8, 20, 34ś36, 53].However, the system's state may not be fully observable in practice [23,61,66].
To account for the possible luctuation of the efects of adaptation actions and partial observability of the state of the SAS, we have developed RE-STORM: Requirements Trade-ofs for self-adaptation using Partially Observable Markov Decision Processes (POMDPs) [19,42].We use POMDPs to (a) allow the efects of adaptation actions to be modelled using probability distributions, as opposed to having ixed efects on the system over time, and (b) treat the system's state as partially observable by a monitoring infrastructure.The system's state is represented by the levels of satisfaction associated with the NFRs.The POMDP balances the trade-of of the conlicting NFRs over time [42,72].
The following are the contributions of the paper: (i) A decision-making technique for self-adaptation based on partially observable NFRs.The system's state of the POMDP, which is not directly observable, is modelled based on the NFRs of a SAS.(ii) Quantiication of the uncertainty associated with the efects of executing adaptation actions on the levels of satisfaction of the NFRs in the running system.After the execution of an adaptation action, observations related to the satisfaction of NFRs are collected from the system monitoring infrastructure.Based on those observations, Bayesian inference is used to deduce the current efects on the state of the system.These efects are quantiied as probabilities that represent the current satisfaction level of the NFRs of the SAS.We have used two substantial case studies from the networking application domain [59,64] and the Internet of Things (IoT) domain [30].For each case study, diferent dynamic contexts have been used to show how RE-STORM supports decision-making under uncertainty while improving the level of satisfaction of the NFRs.The results also show how RE-STORM improves the trade-of among NFRs.We have applied RE-STORM using two diferent POMDPs solvers; Despot [54] and Perseus [50], showing its applicability.Our results highlight how, under dynamic contexts, RE-STORM provides a quantiication of the efects of executing adaptation actions on the running system, which leads to the necessary update of the utility values that will match the new context.The evaluation performed supports the conclusion that the approach delivers a statistically signiicant improvement in the decision-making for SAS.
The paper is organized as follows: Section 2 explains the baseline concept of POMDPs.Section 3 presents our proposed approach of RE-STORM.Section 4 describes the approach using an illustrative case of Remote Mirroring.Section 5 describes the decision-making process ofered by RE-STORM and how it is driven by partially observable NFRs.Sections 6 present the details of the evaluations.Section 7 discusses the main indings and validation of results.In Section 8, we provide a comparison with related work.Finally, in Section 9, we conclude the paper and outline future research opportunities.

POMDPS AND DECISION-MAKING UNDER UNCERTAINTY
A POMDP provides a principled approach to model sequential decision-making problems to make rational decisions in the face of uncertainty within a changing environment [73].In RE-STORM, a POMDP keeps the up-to-date quantiication of uncertainty about the efects of the adaptation actions on the running system.Fig. 1 shows POMDP and RE-STORM for decision-making is SAS.A POMDP is represented as a tuple <S,A,R,T,O,Z, >.The S represents the state space, i.e. a set of distinct states s ∈ S the system could reach.A represents the space of actions.The system seeks to inluence its state by executing actions from the set A. The system's goal is to choose actions in such a way that desirable states s ∈ S should be frequently visited.Desirable states are determined by the reward function R: SxA → R(s,a), in other words, the system gets a reward R(s,a) for taking action a and arriving at the new state s ∈ S. A POMDP allows uncertain action efects to be modelled.This behaviour is represented by the transition function T:SxAxS → [0,1], which implies that the system has a certain probability of making a transition to any state s ∈ S as a result of executing an adaptation action.The stochastic nature of the action efects is described as the conditional probability function T (s, a, s ′ ) = P(s ′ |s, a) where, at each time slice, the system takes action a ∈ A to move from a state s ∈ S to a new state s ′ ∈ S. Furthermore, in a POMDP, states s ∈ S are not directly observable.Instead, observations z ∈ Z are received.This behaviour is represented by the observation function O:SxAxZ → [0,1].The conditional probability function O(s ′ , a, z) = P(z | s ′ , a) describes the system's probability of observing z ∈ Z given action a was performed, and the resulting state was s ′ , which is not directly observable.Observations z ∈ Z corresponds to features of the environment directly perceptible by system sensors.We use observations z ∈ Z to infer the state s ∈ S as is depicted below.The ∈ [0, 1) is the discount factor, expressing preferences for immediate rewards over future ones.
Bayesian inference and quantification of uncertainty: Given that in POMDPs, the system's state ∈ is not directly observable, a belief over possible states of the system is maintained.Let −1 be the belief at time − 1.If the system takes action at time − 1 and receives observation at time , then Bayesian inference is applied to quantify the uncertainty as the new belief about the state ′ at time : where is a normalizing constant [54].A belief is a probability distribution about the current state ∈ of the system.Through Markov property, a belief represents the entire system state history or trajectory, in terms of its past observations and actions [9].Furthermore, during the POMDP-based decision-making, the decision-making agent tries to ind a policy .A policy deines the system strategy for all possible situations it can ind [46].In terms of a POMDP, a policy = () represents a mapping that speciies the action at the current belief about the system's state ∈ .The goal is to maximise the expected value EV, i.e. the possible amount of reward earned under the current belief as is shown below: Hence, POMDPs provide reasoning and decision-making over time, using partial knowledge (i.e.beliefs) of the states ∈ of a running system.

RE-STORM: REQUIREMENTS TRADE-OFF FOR SELF-ADAPTATION USING POMDPS
This section presents the proposed approach of RE-STORM, presented in Fig. 1b, which uses POMDP to support trade-ofs of the requirements during decision-making in SAS.To achieve the satisfaction of NFRs, a system performs adaptation actions A. These actions can have diferent efects (good or bad) on the NFRs' satisfaction levels.The NFRs cannot be labelled as fully satisied or fully violated: the satisfaction levels of NFRs cannot be represented as an absolute value of True or False because of the lack of crispness in the nature of satisfaction of NFRs [12,13,21,68].However, the satisfaction levels can be modelled as probability distributions such as P(NFR = True).As such, an NFR is considered satisied if it meets an acceptability threshold deined by the experts [58].For example, in an Internet of Things (IoT) network, the satisfaction level of the NFR such as Maximization of Reliability (MR) can be speciied as P(MR = True) = 0.8 or P(MR = True) = 0.3 for a given environmental context.The MR can be considered highly satisied if the P(MR = True) >= 0.7, where 0.7 can be regarded as an acceptability threshold requirement.
In the case of RE-STORM, each state in a POMDP represents the set of combinations of satisfaction values of NFRs.As states in POMDPs are not directly observable, a belief (i.e. a probability) over each state is maintained by the POMDP.Therefore, the satisfaction levels of NFRs are speciied as marginalized probability distributions P(NFRi = True) where NFRi is a member of the set of NFRs.These probabilities are used to specify whether the satisfaction level of an NFR meets the acceptability threshold.
Considering the above, we provide Deinition 1, 2 and 3 as follows: Deinition 1.In RE-STORM, the state ∈ of the system represents the combinations of satisfaction values (True or False) of the system's quality goals, i.e. its NFRs, which are not directly observable.Deinition 2. In RE-STORM, the stochastic efects of the execution of an adaptation action ∈ on a system, represented by the conditional probability P( ′ |s, a), are quantiied as the belief about the state ′ ∈ of the system.Deinition 3. In RE-STORM, an NFR is considered satisied if the current belief about the satisfaction of the NFR is higher than the acceptability threshold.

NFRs and the POMDP transition function
In a POMDP, the transition function T (s, a, ′ ) = P( ′ |s, a) is a conditional probability, which represents the transition of a system from state s to state ′ when action has been executed under the current state .Based on the above, we present the next deinition: Deinition 5.The transition function T (s, a, ′ ) = P( ′ |s, a), represents the probability of a system arriving at a satisfaction state of NFRs s' in the next time slice, if the system takes an action a under the current satisfaction state s of its NFRs.
According to Deinition 1, the states in the RE-STORM are represented as combinations of satisfaction levels of NFRs.When an adaptation action is performed, the transitions of the NFRs' states occur.Therefore, for each NFR, the transition model represents the transition from the current satisfaction state of an NFR to its new state [66].Hence, to represent the transitions between the states of the individual NFRs, the conditional independence property [75] and Bayes theorem [33] are used.The NFRs are not independent; however, the transitions of states of NFRs are independent.This allows us to factor the transition model into a product of marginal conditional distributions to represent the transitions of the satisfaction state of the individual NFRs.Given the current satisfaction state of an NFR and action, the transition to the next state with respect to each NFR is independent of the transition in the satisfaction state of other NFRs.

Monitoring variables (MON) and the POMDP observation function
In POMDPs, the state of the system s ∈ S is not directly observable.Instead, monitorable variables (i.e.MON variables) are used to collect observations of the system's state.The values of MON variables are represented as observations ∈ from the environment.These observations are used in Equation ( 1) to compute a belief about the real system's state.Based on the above, we present the following deinition: Deinition 6.The observation function O( ′ , a, z) = P(z| ′ , a), represents the probability of observation z collected from the environment, given an action a was executed and as a result the state s' was achieved.
Next, we present the case of a Remote Data Mirroring Network to support the explanation of the details of the approach.

ILLUSTRATIVE CASE: REMOTE DATA MIRRORING
We have applied the RE-STORM approach to the two case studies associated with Remote Mirroring [37] and Internet of Things [30] domains.In this section, we use the case of Remote Data Mirroring (RDM) to illustrate the approach.
The Remote Data Mirroring (RDM) SAS [19] is composed of data servers and network links.It must replicate and distribute data eiciently along with providing assurance that spread data is not lost or corrupted [37].Each link of the network has an associated operational cost and a measurable throughput, latency and loss rate that are used to determine the reliability, cost and performance of the RDM system.The goal here is to satisfy the NFRs Minimization of Cost (MC), Maximization of Performance (MP) and Maximization of Reliability (MR) under uncertain environmental conditions of link failures and varying ranges of bandwidth consumption [37].To satisfy these NFRs, the network is required to continuously take adaptive actions by switching between the topologies of Minimum Spanning Tree (MST) and Redundant Topology (RT).An MST Topology uses the minimum possible number of network links to transfer data among diferent remote servers (i.e.mirrors).In contrast, an RT topology simultaneously uses numerous redundant network link paths for the transmission of information across the servers.The satisfaction levels related to the performance, reliability and operational costs of the RDM are determined according to trade-ofs which are based on the application of the topologies as follows: • An RT Topology ofers higher levels of reliability in comparison to MST topology.However, maintaining an RT topology may be expensive in some contexts, given the additional cost of required bandwidth consumption, and as a result, the performance of the system will also be afected.• On the other hand, MST Topology ofers lower operational costs and higher levels of performance than the RT topology.However, it negatively afects reliability.

Stochastic efects of adaptation actions in the RDM SAS
The stochastic efects of the topologies RT and MST over the state of the RDM SAS are determined according to the following trade-ofs that drive the decision-making: (i) RT Topology ofer higher levels of reliability than MST topology: producing higher costs while reducing performance.Conversely, (ii) MST Topology ofer higher levels of performance and lower levels of cost than RT topology: the reliability of the system can therefore be jeopardised.These stochastic efects have been modelled by the transition function T (s, a, s') presented in Section 3.1.

Partial observability in the system
In Fig. 1b, the NFRs 1 , 2 , and 3  are not directly observable.Instead, observations are obtained by using monitoring variables (called MON variables).Three MON variables are speciied in the RDM SAS: Ranges of Bandwidth Consumption (RBC) (i.e., RBC <x, RBC in [x,y) and RBC>=y), Active Network Links (ANL) (i.e., ANL<r, ANL in [r,s) and ANL >=s) and Total Time for Writing (TTW) (i.e., TTW<f, TTW in [f,g) and TTW >=g).TTW is a performance measure which considers the time to write each copy of data on each remote site [37].In the RDM SAS, the pair values (x, y); (r, s); and (f, g) represent range boundaries for the MON variables RBC, ANL and TTW, respectively.The relationships between the MON variables and the NFRs MC, MR and MP have been modelled by the observation function O(s', a, z) presented in Section 3.2.They can be summarized as follows: • In case of Ranges of Bandwidth Consumption (RBC), the lower the monitored values are, the greater the satisfaction of Minimization of Cost (MC) is (same relationship exists between TTW and the belief about the satisfaction of MP).• In case of Active Network Links (ANL), the higher the monitored values are, the greater the satisfaction of Maximization of Reliability (MR) is.Both the transition functions and observations obtained from monitoring variables are taken into account to estimate the belief about the satisfaction of the NFRs of the RDM SAS by using Bayesian inference (as presented in Section 2).

Service Level Agreements (SLAs)
To understand the Service Level Agreements (SLAs) in the RDM SAS, let us present the deinition of NFRs satisfaction.NFRs are quality goals to be satisied in a system.Measuring the satisfaction of NFRs such as 1  , 2  , and 3  in Fig. 1b is challenging as it may not be possible to conclude that an NFR is fully satisied.Instead, they can be labelled as suiciently satisied [21].Probabilistic approaches have been used to model the lack of crispness about the satisiability nature of NFRs [8,15,24].We leverage the mathematical framework provided by POMDPs to model the satisfaction of NFRs using probability distributions about the current state of the system (as presented in Section 2).
Based on Deinition 3, the SLAs represent the minimum satisfaction level proposed for each NFR to be met during the system's execution.Any value below the acceptability threshold of an NFR is considered to be in a zone of poor satisfaction.In contrast, any value equal to or greater than the threshold is seen in a zone of suitable satisfaction.The identiied SLAs for the NFRs of the RDM SAS are presented in Table 1.

Table 1. RDM SAS -SLAs
As an example, for the case of Maximization of Reliability (MR) having an SLA P(MR = True) >= 0.9 would mean that the probability of satisfying MR should be at least 0.9.In other words, for the MR to be satisied at a particular point in time, the RDM should have at least 90% of the concurrent active links.Diferent SLAs have been studied to conirm the suitability of the approach.Further details about the behaviour of the RDM SAS under diferent SLAs can be accessed in Appendix B.
As observed in Equation ( 5), MC +1 , MR +1 and MP +1 are inluenced by both, the previous action a ∈ A and the previous states of MC , MR and MP (i.e. they are interdependent).For example, Table 2c shows the probability of 0.88 to transit to the new state MP +1 = True when the current state of the system is MC = False, MR = True, and MP = True.In POMDPs, these states are not directly observable.Instead, Bayesian inference (as presented in Section 2) is used to compute a belief about the states.The conditional probability tables (CPTs) for the transition function of the RDM SAS are shown in Tables 2a, 2b and 2c.These conditional probabilities are deined by the domain experts [10].For the case studies used in this paper, we have taken [30,37,64,78] as the sources of the domain knowledge needed to come up with the initial probabilities for the case studies RDM [37,64] and IoT [30,78].

Monitoring variables (MON) and the POMDP observation function
Based on Deinition 6, the factored observation function for the RDM case study is presented as follows: In Equation ( 6     4.6 Utility value for system stakeholders when executing adaptation actions The utility value for system stakeholders when executing the adaptation of action ∈ depends on the efects that the adaptation action has on the system's state ∈ .Thus, diferent efects may represent diferent utility values for the system stakeholders.In this paper, the system's state represents whether its NFRs are satisied (as presented in Deinition 1 and 3).In a POMDP, the utility values correspond with the reward function R(s,a) (as presented in Deinition 4).
Reward function for the RDM SAS.A reward function assigns a numeric value to each 2-tuple: (state, action) of the system [69], indicating its desirability level in the decision-making process.The initial stakeholders' utility values of the RDM SAS are shown in the column łReward values R(s,a)ž of Table 3b.As an example of the initial utility values for system stakeholders, in Table 3b, we observe that the efect of execution of an adaptation action (a=MST or a=RT) on the system's state to the new state: MC=False, MR=True and MP=False (as described in rows r 6 and r 14 ), the stakeholders favour the topology MST (see row r 6 =0.0660) over the topology RT (see row r 14 =0.0377).This suggests that, under this speciic state (i.e. when MP and MC are not being satisied), the Minimum Spanning Tree topology (MST) would be preferred over a Redundant topology (RT) which would ofer more reliability.All other possible reward values R(s, a) in Table 3b favour the topologies MST or RT, based on the initial utility value speciications determined at design-time.It is important to note that the efects on the system's state of executing adaptation actions may change over time.Hence, the perceived utility values for system stakeholders (i.e.their preferences) can also change.
To illustrate the advantages provided by RE-STORM, Section 6.1.3shows how diferent dynamic contexts have been simulated to produce changes in the efects of the execution of adaptation actions to, therefore, trigger the need for reassessment of stakeholders' utility values.Next, details on the decision-making process performed by RE-STORM are presented.

RE-STORM: DECISION MAKING DRIVEN BY PARTIALLY OBSERVABLE NFRS
This section presents details of the proposed approach for self-adaptation based on partially observable NFRs.The runtime behaviour of RE-STORM is based on a POMDP model within a feedback control loop (see Fig. 2).The diferent activities of the MAPE-K loop [70], and the details of the decision-making in the RDM SAS are presented as follows.

RE-STORM: MAPE-K loop activities
(1) Monitoring.In this activity, the observable data, observations z ∈ Z from the managed system, are collected by sensors (See Fig. 2).Speciically, in the RDM SAS, the MON variables Ranges of Bandwidth Consumption (RBC), Active Network Links (ANL) and Total Time for Writing (TTW) are monitored.The observed values for each MON variable, i.e. values from the ranges (RBC <x, RBC in [x,y) and RBC>=y), (ANL<r, ANL in [r,s) and ANL >=s) and (TTW<f, TTW in [f,g) and TTW >=g), constitute the evidence to compute in the next activity the current belief about the state of the system ∈ .(2) Analysis.Any required data transformation to enable data to be used at the Planning stage should be performed at this step.Therefore, the component labeled Belief Estimator in Fig. 2 is responsible for updating the belief about the system's state.This update is performed by using Equation ( 1).The result is a belief +1 about the new state of the system after the execution of the previous adaptation action .This belief will be the input for the planning activity, and it is also recorded in the Knowledge Base as part of the POMDP requirements-aware model (See details in Fig. 2).
(3) Planning.The component labeled Action Planner in Fig. 2 is the policy = () responsible for generating actions, as a function of the current belief about the satisfaction of the NFRs in the system.We use two diferent solvers for POMDP planning [50,54] to choose the best adaptation action.The irst solver uses online POMDP planning, a technique that interleaves planning with plan execution: at each time slice, the system searches for an optimal action ∈ at the current belief b.It then executes the chosen action immediately [54].On the other hand, the second solver makes use of point-based value iteration approach for solving POMDPs [16,56].(4) Execution.Once an action has been selected, it is executed over the managed system (See Fig. 2).As a result, the system reaches a new state s' with probability T (s, a, s ′ ) = P(s ′ |s, a).This state is not directly observable, yet an observation ∈ with probability O(s', a, z) = P(z|s ′ , a) is received.Then, the MAPE-K loop starts again.(5) Knowledge.The Knowledge Base of an autonomic system holds models/astractions that support the Monitoring, Analysis, Planning and Execution activities [2].In the case of RE-STORM, a POMDP model is kept in the Knowledge Base (see Fig. 2).The POMDP models contains the current believes about the satisfaction of the NFRs and the stakeholders' utility values about the efects on the system of executing adaptation actions, i.e. the reward values R(s, a).It also contains the SLAs of the system and the current representation of the system's uncertainty, i.e. the transition and observation functions, both modelled in terms of the system's NFRs.The next section presents details on the planning activity to select an adaptation action.

Online planning activity using POMDPs
We use the Determinized Sparse Partially Observable Tree (DESPOT) algorithm [54] as the planner implementation.The steps of the planning activity are shown next: (1) Planning step 1: Detect NFRs below their thresholds of satisfaction.If the current belief of an NFR has its satisfaction level labelled below its Service Level Agreement (SLA), then the Weights Updater module [26] (See Fig. 2), is performed to update the current stakeholders' preferences according to changes in the utility values speciied as reward values R(s,a).
The Weights Updater module is used to reassess stakeholders' utility values to consider assigning more importance (i.e.preference) to adaptation actions with a more positive efect on the satisfaction of NFRs below their thresholds of satisfaction.
The latter explains how RE-STORM supports runtime reassessment and update of preferences due to changes of utility values by using the ARRoW implementation presented in [26].Due to the modularity of the RE-STORM architecture, other techniques can be used to reassess and update preferences regarding NFRs such as [40,41,55,60].(2) Planning step 2: Build a DESPOT tree to project future evolutions of the satisfaction of NFRs.
One desirable capability of autonomic self-adaptive systems is anticipation, which is deined as being able to anticipate, to some extent, needs and behaviours to be able to manage itself in a proactive way [39].This capability is known as Proactive self-adaptation, and implies predictions of how the environment will evolve in the near future [38].RE-STORM supports decisions under the uncertainty implied by those predictions.Its implementation is based on a belief tree provided by the DESPOT algorithm [54] in order to select an adaptation action.Next, we present relevant details related to the capabilities for proactive self-adaptation used by the approach.Future evolutions of the system's state.The Action Planner module of RE-STORM (see Fig. 2), considers future evolutions of the belief about the satisfaction of the NFRs to decide the next adaptation action ∈ , i.e. to reason about long-term efects of immediate actions [54].The future evolutions of the belief about the system's state are represented by the DESPOT tree shown in Fig. 2b.The algorithm builds per each time slice a sparse approximation of a standard belief tree: a DESPOT tree by using a simulation model [18].The root node of the tree is the belief 0 which represents the belief about the current satisfaction of the NFRs of the running system.Each edge in the tree represents an action-observation pair, i.e. (i) an adaptation action MST or RT (e.g. a 11 or a 2 1 ) and (ii) an observation z ∈ Z (e.g.z 1 1 , z 2 1 ...).Except for the root node b 0 , each circular node in the tree represents a projected belief about the future state of the system, i.e. the predicted satisfaction of its NFRs.The DESPOT tree also represents the neighbourhood of the current belief 0 about the satisfaction of the NFRs of the system.
(3) Planning step 3: Select the optimal action ∈ .The Bellman's principle of optimality [48], used by the DESPOT POMDP solver, is shown in the equation (7).It is applied over the DESPOT tree to choose the best adaptation action.
The DESPOT algorithm searches the tree with root at the current belief 0 .Speciically, the action planner module of the DESPOT algorithm (Fig. 2) uses look-ahead search [49] to approximate the optimal discounted reward value * ( 0 ) [52,54,73].The search is guided by a lower bound ( 0 ) and an upper bound ( 0 ) on the approximated optimal discounted reward value * ( 0 ).The explorations continue until the gap between the bounds ( 0 ) and ( 0 ) reaches a target level 0 or the allocated planning time 1 runs out.Equation (7) recursively computes over the tree the maximum value of action branches and the average value of observation branches [18,46,54,73].The result is an approximately optimal policy based on the belief 0 about the current system's state, i.e. the current satisfaction of its NFRs.The system then executes the optimal action of the policy ( 0 ).The selection of the optimal action ∈ made in equation ( 7) is based on the current and projected beliefs about the system's state, i.e. about the satisfaction of its NFRs.

Details on how the DESPOT algorithm implements the Bellman equation can be consulted in Appendix D.
Next, the evaluation of the proposal is described.

EVALUATION
This section presents the set of experiments used to evaluate RE-STORM applied to two case studies: RDM [59] and IoT [30].Diferent dynamic contexts have been carefully designed to be used during the experiments.As a proof of generalization of the RE-STORM approach, we have executed experiments using two diferent POMDP solvers: DESPOT [54] and Perseus [50].Both solvers use simulations for experiments.The diference is that the DESPOT has a simulation capability that is used to simulate RDM whereas, in case of Perseus, we connect it to an external simulator (RDMSim [59] in this paper), or it can even be connected to a real system.The solvers are described in Appendix D and E. For the purpose of reproducibility, we have provided the results of the experiments in [65].
We argue that RE-STORM allows quantifying the uncertainty of adaptation actions on the levels of satisfaction of the NFRs of the running system using probabilities (See Deinition 2).For instance, after the execution of an adaptation action, RE-STORM can quantify the uncertainty about the satisfaction level of an NFR, through probabilities; e.g. the probability of Minimization of Energy Consumption may have increased due to the action performed, and the probability conveys a quantiication of the uncertainty needed to support decision-making.A primary goal of the evaluation is to assess the RE-STORM's capability of quantifying this kind of uncertainty.Therefore, we state the following research question: RQ1: Can the uncertainty associated with the stochastic efects of executing an adaptation action be quantiied?
Once the answer to RQ1 has been proven airmative, a second goal of the evaluation is to assess that the capabilities of quantiication of uncertainty by RE-STORM, described above, can be used to improve the decisionmaking and behaviour of the running system.To do so, we have explored opportunities to reassess and enhance the trade-of among NFRs based on new knowledge and evidence acquired during runtime supported by RE-STORM's capabilities of quantiication.
RE-STORM can highlight situations such as where the utility/reward associated with a given NFR has changed, which can inluence changes in the preferences shown by stakeholders (e.g. in the newly identiied environmental context, performance can turn out to be less crucial than reliability).We argue that based on knowledge and evidence found during execution, the trade-of among NFRs can be improved by updating the utility value for system stakeholders (i.e., the preference of the stakeholders) with respect to executing adaptation actions.Therefore, we propose our second research question: RQ2: Under new and unexpected contexts observed, can the trade-of among the NFRs be improved by updating the stakeholders' preferences based on the new knowledge found at runtime ?Next, we present the experiments performed using DESPOT solver as follows:

Evaluation using DESPOT
This subsection describes the experiments performed using DESPOT solver.Next, we provide the details of the infrastructure used and the initial setup for experiments.
6.1.1Infrastructure used during the evaluation.The behaviour of the RDM SAS has been implemented with the simulation model [18] provided by the DESPOT toolkit [18,54].The RDM network and its behaviour are based on the case study presented in [64] and the more general speciications and expert-based knowledge shown in [31,37].The RDM case study of this paper comprises 25 RDM servers with 300 physical network links.For the RDM coniguration, the minimum number of Active Network Links (ANL) expected is equal to 24 [64].The RDM application has been simulated over 1000+ time slices (i.e.simulations of at least 1000, 2000, and 3000 thousand time slices have been performed).During each simulation, periods of dynamic perturbances have been randomly inserted.Uncertainty Management in RE-STORM:.Two main POMDP elements deal with the uncertainty in the RDM SAS due to unpredictable environments: namely, the transition function T (s, a, s ′ ) and the observation function O (s ′ , a, z).Both have been speciied in Sections 4.4 and 4.5, respectively.In this work, we speciically quantify and evaluate the uncertainty related to the transition function T(s, a, s ′ ) = P (s ′ | s, a) (See Deinition 5).For the case of the RDM SAS, network parameters have been initialized according to the known probability that certain network links will fail at any given point during the system's execution as described in [64] and [37].
Configuration of dynamic changes in the environment: For the RDM SAS, the topologies MST and RT represent the adaptation actions.The efects on the system's state of executing these topologies may change at runtime.Simulations, considering changes on the efects of these topologies, have been implemented by randomly modifying the transition function T (s, a, s ′ ) = P(s ′ | s, a) during the simulation as presented in Section 4.4 (See Tables 2a, 2b and 2c).
The probability distributions of the transition function T(s, a, s ′ ) = P(s ′ |s, a) have been randomly changed at runtime by introducing disturbances to generate dynamic changes in the environment.
The new efects are conceived to make the SAS show periods of deteriorated satisfaction of the NFRs and evaluate RE-STORM.Accordingly, the probability P( s ′ = True | s, a) is randomly decreased.Diferent dynamic contexts envisioned to represent this behaviour are presented in Section 6.1.3.The duration of each period of deteriorated satisfaction has been randomly selected from a range between 5 and 15 time slices based on the data provided by [37].Further details about the coniguration of RE-STORM and the dynamic contexts for its evaluation can be found in Appendix C.

6.1.3
Experiments.This section presents how RE-STORM performs under diferent dynamic contexts.First, we showcase the behaviour of the RDM under its stable conditions.RDM SAS under stable conditions: Fig. 3 shows the behaviour of the RDM SAS using the speciications presented in Sections 4.3 to 4.6.This behaviour is taken as that shown by the RDM SAS in stable conditions.The stable conditions of the system represent foreseen scenarios envisioned by the experts, based on their knowledge of the context of the system.Under stable conditions, the following behaviour has been identiied: • The preferred coniguration is to use a Minimum Spanning Tree Topology (MST) (See Fig. 4).
• The average belief about the satisfaction of the NFRs Minimization of Cost (MC), Maximization of Reliability (MR) and Maximization of Performance (MP) complies with the SLAs of the system (See Fig. 3).The stochastic efects on the system's state of executing the adaptation actions MST and RT are quantiied as beliefs about the satisfaction of the system's NFRs.Under stable conditions, these beliefs comply with the established SLAs.DC 6 Description: This context represents an unusual scenario explicitly used to evaluate our approach under extremely detrimental conditions.A case like this would be usually related to a signiicant site failure [31,37], where both repeated and multiple concurrent failures are expected [37] as in the previous two dynamic contexts but all at the same time.For example, a full-scale site failure may be caused by a power outage afecting all the buildings on diferent campuses, an earthquake or a lood afecting structures within several metropolitan areas.Under this context, the worst-case data loss [31] may occur in diferent sites (RDM nodes), i.e. a site can be destroyed or inoperative before the full backup of information is shipped ofsite.Site failure disasters are usually modelled with a failure rate of once per year [31].The main goal of this dynamic context is to study the behaviour of the RDM SAS using RE-STORM in situations where it may be challenging to meet the SLAs regardless of the adaptation action selected.

Results of Experiments:
This section presents results on the dynamic context DC 1 and an aggregated view of the results for the six (6) dynamic contexts under evaluation.Nevertheless, speciic results on the dynamic contexts DC 2 to DC 6 can be consulted in Appendix A.
Dynamic context DC 1 .To better explain this context, irst, we showcase the behaviour of the RDM SAS before using RE-STORM to update stakeholders' preferences; then, the behaviour of the system after their update is presented.
Behaviour before updating stakeholders' preferences due to changes of utility values.The following indings are presented: • Without the update of preferences, the preferred coniguration is to use a Minimum Spanning Tree Topology (MST) (See Fig. 4a).• Without the update of preferences, the execution of the MST topology reduces the system's reliability compared to its stable conditions.While the average belief about the satisfaction of the NFRs Minimization of Cost (MC) and Maximization of Performance (MP) agree with their SLAs, the belief about the satisfaction of Maximization of Reliability (MR) is below its SLA (See Fig. 3b: Beliefs about NFRs satisfaction without update of preferences).The current stakeholders' preferences (See Section 4.6), suitable for the stable conditions of the system identiied in Section 6.1.3,are not suitable for the dynamic context DC 1 .Instead, they favour the MST topology, which reduces the system's reliability.Therefore, reassessment and update of stakeholders' preferences are needed under the new and unexpected context detected at runtime.Let us explore the results of this: Behaviour after updating stakeholders' preferences due to changes of utility values.The behaviour presented below has beneited from the update of stakeholders' preferences (Planning Step 1 of RE-STORM in Section 5.2).Next, the main indings are highlighted: • By updating the stakeholders' preferences, the preferred coniguration is Redundant Topology (RT) (See Fig. 4b).2).In this step, more importance is given to an adaptation action (i.e.RT), which has a more positive efect on the system's reliability (MR), which was below its threshold of satisfaction.
In the dynamic context DC 1 , the update of stakeholders' preferences, i.e. reward values R(s, a) in a POMDP, contributes to improving the satisfaction of the reliability in the RDM SAS.As a result, all the NFRs meet their SLAs.
Aggregated view of results.Fig. 5 synthesizes the NFRs satisfaction under the dynamic contexts 1 to 6 .It can be observed when the decision-making process is applied under new detected contexts that were not foreseen in advance and RE-STORM was not used to update the stakeholders' preferences, (See Fig. 5: Beliefs about NFRs satisfaction without update of preferences), the beliefs about the satisfaction of cost (MC) and performance (MP) met their SLAs.Still, the belief about the reliability (MR) satisfaction was below its required SLA.Under this scenario, the preferred adaptation action is the MST topology (see Fig. 4: selected topology without update of preferences), even though the RT topology ofers higher levels of reliability based on the trade-ofs speciied in Section 4.1.This behaviour is due to initial stakeholders' preferences introduced in Section 4.6, which were only suitable for the RDM SAS under stable conditions (see Section 6.1.3).On the other hand, when the stakeholders' preferences are updated by RE-STORM according to the newly detected contexts, the RT topology is more frequently selected.As a result, the belief about the satisfaction of MR is improved and taken to a value that meets its SLA.As a trade-of, a reduction in the beliefs about the satisfaction of cost (MC) and performance (MP) is also observed but still continues to meet their SLAs (see Fig. 5: Beliefs about NFRs satisfaction with update of preferences).
The reassessment and update of preferences allowed the RDM SAS to select more suitable adaptation actions to improve the general performance and trade-ofs among its NFRs.
Hence, RE-STORM has proved to provide the infrastructure to support the reassessment and update of the stakeholders' preferences, i.e. reward values R(s, a) in a POMDP when dynamic contexts not previously foreseen are found during the system's execution.Therefore, RQ2 is answered in the airmative.
Next, we present the experiments performed using Perseus solver.

Evaluation using Perseus
In this section, we present experiments using the Perseus POMDP solver.Details regarding the implementation of RE-STORM with the Perseus algorithm can be consulted in Appendix E. This time, we evaluate the application of RE-STORM when it is connected to the simulating environments RDMSim [59] and DELTA-IoT [30].This allows us to test both the validity of the approach when it interacts with real environments and to present RE-STORM's applicability in diferent domains.Due to the limitation of space, the experiments shown do not cover the update of preferences, as the experiments related to RQ2 have already been approached in the previous section.However, these results have been reported in [60].

RDMSim Experiments:
In this section, we describe the experimental setup and results of RE-STORM (using Perseus) for the RDMSim network.
Experimental Setup: The tool of RDMSim presents a stimulating environment for the remote data mirroring network [31,37].The simulator is designed according to the infrastructure presented in [64] and is equivalent to the case study presented in Section 4. Similar to the experiments presented in previous sections, we have considered the three NFRs of Maximization of Reliability (MR), Maximization of Performance (MP) and Minimization of Cost (MC).In case of RDMSim, Operational Cost is measured in terms of Bandwidth Consumption, Performance is measured in terms of Time to Write Data and reliability is measured using number of active network links.For the purpose of satisfaction of NFRs, the network is required to take adaptation actions by switching between the topologies of MST and RT.The simulation tool of RDMSim provides us with the implementation of the diferent dynamic contexts for the RDM network similar to the ones presented in Section 4.3.According to the speciications of RDMSim, the following threshold requirements for the satisfaction of NFRs are considered: R1: Bandwidth consumption should be less than or equal to 40 per cent of total bandwidth consumption to satisfy MC.
R2: Number of active links should be greater than or equal to 35 per cent of total links to satisfy MR.R3: Time to write data should be less than or equal to 45 per cent of total writing time to satisfy MP.
Experiment Results: We have executed experiments using all the dynamic contexts DC1 to DC6 for the RDMSim network for 1000 simulation time slices.These experiments correspond to those presented in Section 6.1.4and Appendix A by using RE-STORM without updating stakeholders' preferences.Experimental results show that MC and MP meet the threshold requirements having the average satisfaction level below the threshold for satisfaction as presented in Fig. 6.For MC, the satisfaction threshold has a total bandwidth consumption below 3700 Gbps, whereas the satisfaction threshold for MP has a total writing time below 2700 milliseconds.In the case of MR, the average satisfaction level lies above the threshold, i.e. 105 active links, in several dynamic contexts.The exception lies in the results for DC1, DC3 and DC6 where the average satisfaction level is below the threshold.
The reason behind it is the uncertain contextual situations like link failures that occur at runtime under these dynamic contexts.Fig. 7 shows the behaviour in terms of topology selection for RDMSim network as a result of the adaptations ofered by RE-STORM.Let's consider the case of DC4.The uncertain situation of link failures during the execution of MST topology afects the satisfaction of MR and the NFRs MC and MP are also afected due to synchronous mirroring 3 .In such a situation, RE-STORM increases the usage of RT topology to support the satisfaction of MR along with maintaining satisfaction levels of MC and MP.In summary, the results achieved using the Perseus implementation of RE-STORM for the case of the RDMSim network show comparable results to the DESPOT implementation of the RE-STORM approach without updating preferences, which leads to similar conclusions.Moreover, the topology selection behaviour in all the cases using the Perseus implementation for RE-STORM is also similar to that of DESPOT without an update of preferences as presented in Section 4 and Appendix A.

DELTA-IoT Experiments:
In this section, we describe the experimental setup for the case of DELTA-IoT network and experimental evaluations of RE-STORM (using Perseus) for the case.
Experimental Setup: The simulating environment of DELTA-IoT presents a multi-hop IoT network for a smart campus.The network consists of 15 motes based on LoRa (Long-Range) radio communication.The motes comprise RFID sensors, passive infrared sensors and temperature sensors that are installed across diferent buildings of the KU Leuven campus for the purpose of providing access to the laboratories, monitoring the occupancy status and sensing the temperature.The functional goal for the communication of motes is to relay information to the central gateway at the central monitoring facility.The IoT network is required to survive for a longer period of time with minimal battery usage and achieve communication reliability.Hence, for this purpose, the NFRs considered in the case of the DELTA-IoT network are the Minimization of Energy Consumption (MEC) and Reduction of Packet Loss (RPL) under uncertain environmental conditions of communication interference on the links and dynamic traic load.In order to achieve the required satisfaction levels for the NFRs, tuning the network link settings such as the transmission power, communication range and distribution factor for the links using diferent adaptation strategies is needed.The threshold requirements for the satisfaction of NFRs according to the speciications of the DELTA-IoT network are as follows: R1: Total energy consumption for the network should be less than or equal to 20 coulombs in order to satisfy MEC.R2: Total packet loss for the network should be less than or equal to 20 per cent in order to satisfy RPL.
For the purpose of experimental evaluations, we have performed adaptations using RE-STORM for 100 simulation time slices of the DELTA-IoT network.In the case of the DELTA-IoT network, one simulation time slice is equal to 15 minutes of network activity.During each simulation time slice, local adaptation decisions for each mote are made using the Perseus implementation for RE-STORM.Based on the monitored link interference measured in the form of Signal to Noise Ratio for each mote, the decision of increasing transmission power (ITP) or decreasing transmission power (DTP) on the links is taken to support the satisfaction of MEC and RPL.Next, we present the experimental results for the case of the DELTA-IoT network.
Experiment Results: First, we describe the results of the adaptations ofered by RE-STORM at each mote level during each simulation time slice.As the network comprises of 14 motes and 1 gateway, so adaptive decisions for the 14 motes are taken individually at the end of each simulation time slice.Fig. 8a shows the satisfaction level of NFRs MEC and RPL maintained by RE-STORM under the dynamic environmental changes caused by the link interference on the network links associated with the motes.The results show that the average satisfaction level for both the NFRs is below the required threshold.The average satisfaction for MEC is 11.9758 coulombs, which is less than 20 coulombs, whereas, for RPL, the average satisfaction is 0.147, which is below the required threshold of 0.20 as shown in Fig. 8a.Hence, the required satisfaction threshold requirements are met.
Furthermore, we have also compared the satisfaction level of NFRs at the end of each simulation time slice with and without the application of RE-STORM as shown in Fig. 8b.Considering the case of MEC, the average satisfaction level of MEC is above the required threshold when the DELTA-IoT network operates without the application of RE-STORM.The adaptations ofered by RE-STORM maintain the required threshold constraint of 20 coulombs by achieving the average satisfaction below the threshold as shown in Fig. 8b.Furthermore, for RPL, the application of RE-STORM shows an increase in packet loss by ofering a trade-of for the satisfaction of MEC.However, there is an increase in the packet loss, RE-STORM maintains the required satisfaction level by keeping the average satisfaction level of RPL below the satisfaction threshold of 0.20.Hence, RE-STORM proves to show better results in terms of maintaining the requirements of the system when compared to the DELTA-IoT system working without the adaptation ofered by RE-STORM.

DISCUSSION
This section discusses the evaluation of the results presented and the threats to the validity of this study.

Evaluation of results and implications
To validate our indings, we have run several statistical tests to the data of the experiments.These tests supported the evaluation of the null Hypotheses described next.
Let's recall from Deinition 2 that RE-STORM allows for the quantiication of the uncertainty of adaptation actions on the levels of satisfaction of the NFRs by using probabilities (i.e.these probabilities represent the beliefs about the levels of satisfaction of the NFRs).Further, Deinition 2 also states that these probabilities are conditional to the stochastic efects of the adaptation actions executed over the system.Therefore, we deine the following null hypothesis: H 0,1 : łThe probabilities (i.e.beliefs) about the actual state of the system's NFR don't represent the real observations collected.ž Based on this Hypothesis, with the experiments, we assessed if the belief probabilities represent the actual observations.If the belief probabilities don't account for the actual observations the null hypothesis would be rejected.
For the second Hypothesis, let's recall from Section 6, RE-STORM can highlight situations such as those when the utility/reward associated with a given NFR has changed.These new utilities can afect the preferences shown by stakeholders (e.g. if some new environmental context is identiied, performance can be found to be less crucial than reliability, and as such, the end user needs to be informed).Based on the new knowledge found by RE-STORM, the trade-of among NFRs may be improved by updating the preferences of the stakeholders with respect to adaptation actions.Therefore, we deine our second null hypothesis: H 0,2 : łThere is no diference between updating and not updating stakeholders' preferences to improve the trade-of among the system's NFRs.ž With the experiments performed, we assessed if the update of preferences improves the trade-of among the system's NFRs.The eventual rejection of the null hypotheses H 0,1 and H 0,2 will support the answers provided to the research questions RQ1 and RQ2 respectively.
Hypothesis H 0,1 evaluation.In POMDPs, the system's state s ∈ S is not directly observable.Instead, the decision-making is driven by the belief about these states.Let us recall from Deinition 2 that RE-STORM allows quantifying the uncertainty of adaptation actions on the levels of satisfaction of the NFRs of the running system using belief probabilities.Given that we have used the DESPOT toolkit [54] to simulate the dynamic contexts in our experiments, we had access to the satisfaction state of NFRs based on the SLAs i.e. ∈ .Therefore, a logistic regression analysis could determine how well the beliefs represent the satisfaction state of NFRs i.e. the state of the system.Statistical signiicance.The results related to the dynamic context DC 1 are presented in Table 4.We found in all cases that our belief representation about the satisfaction of the NFRs in the RDM SAS was statistically signiicant.Speciically, for Minimization of Cost (MC), the logistic regression analysis shows that our belief representation had an 85.1% of success rate to predict the actual system's state (p-value = 0.000, Wald statistic = 18.498).In the case of Maximization of Reliability (MR), the success rate for predicting the real system's state was of 92.8% (p-value = 0.000, Wald statistic = 36.422);inally, for Maximization of Performance (MP), we obtained a success rate of 84.3% (p-value = 0.000, Wald statistic = 51.368).

Table 4. DC 1 -Logistic regression models
Equivalent results have also been obtained for our probabilistic representations in the dynamic context DC 2 to DC 6 (see Appendix A).The use of logistic regression analysis allowed us to demonstrate how the beliefs about the system's state s' in the transition function T (s, a, s' ) = P(s' |s, a) are statistically signiicant.The new state s' represents the stochastic efects on the system of executing the adaptation action a ∈ A (see Deinition 2).Therefore, we have been able to quantify the uncertainty related to the state s' as its belief by using RE-STORM.
Furthermore, we have also performed logistic regression to evaluate the results for implementation of RE-STORM using Perseus solver when used with the simulating environments of RDMSim and DELTA-IoT.In these cases, we have both the belief probabilities and observations coming from the simulators.Hence, using logistic regression, we further investigate when both the belief probability and observations are considered and to what extent they represent the actual state of the systems' NFRs.We have used the statistical measures of precision and recall [62] to evaluate the results.Let's irst consider the case of RDMSim.Under DC1, the logistic regression model shows an accuracy score of 0.971 for MC with a precision and recall of 0.977 and 0.989, respectively.For MR and MP, the accuracy score for classiication is 0.986 and 0.965, having a precision of 0.968 and 0.966, respectively.The recall for both MR and MP is 1.0.The logistic regression model has shown similar results for other dynamic context scenarios.The data set and the results for all the contexts are reported in [65].Moreover, for the case of DELTA-IoT, the logistic regression model has been applied in a similar way to study the relationship of actual NFRs' state based on the beliefs and observations.The model shows a classiication accuracy score of 0.999 for MEC with a precision and recall of 0.999 and 1.0, respectively.For RPL, the accuracy score is 0.954, having a precision and recall of 0.943 and 1.0, respectively.
The results and the behaviour aforementioned enable us to reject the null hypothesis H 0,1 .Therefore, it answers the RQ1.
Hypothesis H 0,2 evaluation: With regards to the null hypothesis H 0,2 , we evaluated the behaviour of the RDM SAS under two speciic scenarios: without update and with the update of stakeholders' preferences about the efects of executing adaptation actions on the system's state.RE-STORM has quantiied these stochastic efects as probability distributions, i.e. beliefs about the satisfaction of the NFRs in the RDM SAS.
Because these beliefs are continuous variables, we have been able to run a two-sample comparison test to determine if there is a statistically signiicant diference between the means of the beliefs when the update of stakeholders' preferences is performed.Statistical signiicance: The results related to the dynamic context DC 1 are depicted in Table 5.In all cases, we found a statistically signiicant diference between the means of the satisfaction of the NFRs while updating or not updating stakeholders' preferences.For the case of Maximization of Reliability (MR), Table 5a shows that the update of stakeholders' preferences efectively contributes towards a higher belief about the satisfaction of MR (mean = 0.93, SD = 0.058) compared to the belief when stakeholders' preferences are not updated (mean = 0.86, SD = 0.038).These results are statistically signiicant (Table 5b: t = 19.708,p-value = .000).In Table 5a, it is observed that the update of stakeholders' preferences produces a lower belief about the satisfaction of Minimization of Cost (MC) (mean = 0.86, SD = 0.056) compared to the belief when stakeholders' preferences are not updated (mean = 0.88, SD = 0.041).The results are signiicant (Table 5b: t = 12.129, p-value = .000).Finally, Table 5a shows that the update of stakeholders' preferences also produces a lower belief about the satisfaction of Maximization of Performance (MP) (mean = 0.83, SD = 0.066) compared to the belief when stakeholders' preferences are not updated (mean = 0.88, SD = 0.055).The results are also signiicant (Table 5b: t = 17.771, p-value = .000).In general, during the experiments under diferent dynamic contexts, it was observed that when initial stakeholders' preferences are not updated, the beliefs about the satisfaction of NFRs can be drastically compromised, e.g.dynamic contexts DC 1 (Fig. 3 (b)), DC 3 (Fig. 11) and DC 6 (Fig. 13).These beliefs can go even below their required thresholds due to the unsuitability of the initial stakeholders' preferences.In contrast, when the stakeholders' preferences are updated, the results show that the decision-making process improves the NFR with the lowest belief about its satisfaction, taking it eventually to a suitable zone 4 .As a trade-of, a slight reduction in the beliefs about the satisfaction of the other NFRs involved can also be observed.Moreover, the aggregated results presented in Section 6.1.4also show that the reassessment and update of preferences and the decision-making by RE-STORM can improve the general performance of the RDM SAS.
The results and indings reported above enabled us to reject the null hypothesis H 0,2 and answers RQ2.
Next the Research questions will be discussed based on the results and the insights gained from the experiments.

Research questions revisited
In this section, we highlight our contributions by answering the research questions of the paper.Two research questions identiied in this work were presented in Section 6. Answers to these questions are depicted as follows: RQ1: Can the uncertainty associated with the stochastic efects of executing an adaptation action be quantiied?
We have shown how the analysis activity of RE-STORM, which is part of the approach's runtime behaviour embedded in a MAPE-K loop, allows for the quantiication of uncertainty of the efects of adaptation actions.The system's state is not directly observable; however, the beliefs about the efects on the system's state of executing an adaptation action are updated based on collected observations.Speciically, Equation (1), applies the use of Bayesian Inference (see Section 2), to quantify the uncertainty of the current system's state based on probability distributions.In Section 7.1 and based on the case studies, it has been shown that the belief representation of the system's state is statistically signiicant in predicting the system's state.
Therefore, by answering the RQ1, it has been shown the irst contribution: a decision-making technique for self-adaptation based on partially observable NFRs.RQ2: Under new and unexpected contexts observed, can the trade-of among the NFRs be improved by updating the stakeholders' preferences based on the new knowledge found at runtime ?
In Section 6, we have shown that the trade-ofs among NFRs can be improved by reassessing and updating stakeholders' preferences about the efects of executing adaptation actions on the system's state.To this efect, we have used the Weights Updater module presented as part of the Planning step 1 of RE-STORM (see Section 5.2).The module shows how the original stakeholders' preferences inhibited better adaptation choices that would allow achieving higher satisfaction of the NFRs regarding their SLAs and that the reassessment of the preference values is therefore needed.The results evaluated in Section 7.1 have shown that when the original stakeholders' preferences that do not match the current runtime context are updated with better-itting values, the trade-of and the decision-making in a SAS are improved accordingly.Under complex and extremely hostile environments (e.g.dynamic context DC 6 ) or unusually high SLAs (See Appendix B), the trade-of performed by the approach may not necessarily guarantee the fulilment of its SLA.However, better results were observed in syster with the original values.
Our main aim in approaching research question RQ2 was to explore further opportunities to reassess and improve the trade-of among NFRs due to new knowledge acquired during runtime supported by RE-STORM.Speciically, we focused on the reassessment and update of stakeholders' preferences to improve the trade-ofs among the NFRs in a system.

Threats to validity
In this section, the main threats that might have an impact on the validity of the results of this work are presented.
• Internal validity: Internal validity refers to the degree of conidence that relationships being tested are not inluenced by other factors and whether the evidence supports our claims [11,63].Accordingly, there is a potential risk when we try to determine whether the update of stakeholders' preferences afects the beliefs about satisfaction of NFRs, the beliefs may also be afected by other causes (e.g.diferent environmental conditions under the same evaluation).We have mitigated this threat by performing experiments with randomized dynamic contexts (See Section 6.1.2),under which the case studies have been evaluated.Speciically, we use the same coniguration (i.e. the same randomized scenario) for RDM when RE-STORM was/was not applied to update stakeholders' preferences during the system's execution.Another, internal threat to validity is related to the extent to which the presented approach performs in an actual environmental setup.In this paper, we have used a case study approach based on a simulator.The experimental results are based on the environmental factors presented by a simulated environment, not an actual physical network.However, the simulation artefacts that are selected for the case studies [30,59] provide simulations that are closer to the real settings.Both RDMSim [64] and DELTA-IoT [30] are public software artefacts of the research community and have been used by diferent research teams, which adds conidence to the results.• External validity: This aspect of validity is concerned with studying the extent to which it is possible to generalize the indings [76].We support the generalizability of the approach by applying it to two case studies from diferent domains and executing experiments using two diferent POMDP solvers.The case studies [30,59] that we have selected provide simulations that are closer to the real environmental settings.Accordingly, we believe that our work is achievable in real settings of other domains similar to those in [30,59].From the point of view of scalability, we also support the generalizability of the approach.RE-STORM uses DESPOT and Perseus, algorithms that overcome the scalability issues related to the łcurse of history" and the łcurse of dimensionalityž [54,56] in POMDPs.Given the current state-of-the-art, RE-STORM is a novel solution to represent the evolution of the beliefs about the satisfaction of partially observable NFRs and their trade-of.However, the current implementation of the RDM SAS using DESPOT is based on a simulated environment provided by the algorithm [54].Nevertheless, the simulation is based on real data [37].In RE-STORM, the states are deined in terms of combinations of satisfaction levels of NFRs.If the number of NFRs is two, we would have four states for POMDP; for three NFRs, it would be eight, and so on.Therefore, in practice, it should target applications with not many NFRs.Therefore, while using our approach, SAS's design experts must limit their reasoning to critical NFRs to drive self-adaptation, which, according to [47], includes a considerable number of applications.
In the next section, we contrast our contribution against related work.

COMPARISON WITH RELATED WORK
There is a variety of approaches for decision-making under uncertainty and driven by their NFRs [7,17,27,34,36,51,55,57].Table 6 shows a summary of them.In this section, a comparison against RE-STORM is presented.We compare RE-STORM with other approaches based on the following criteria: 1) Scalability issues, 2)łLong-term efectsž in decision-making under uncertainty, 3) Representation of Partially-Observable NFRs, 4) Stakeholders' preferences speciication at design-time, and 5) Runtime reassessment and update of stakeholders' preferences.✓-- Approach does not fulill the criterion ✓-OR ✓--: Approach partially fulills the criterion at diferent levels ✓: Approach fulills the criterion The irst column Scalability issues in Table 6 is mainly related to the curse of history and the curse of dimensionality, which have been explained in Section 7.3.The authors of [28,55] fail to deal with such scalability issues; while the approaches [4,7,20,25,34,36,38,40,51] partially deal with these scalability problems.A limitation of RE-STORM relates to the curse of dimensionality, i.e. the fact that the states are deined in terms of combinations of satisfaction levels of NFRs, which can grow exponentially.Therefore, when using RE-STORM design experts should restrict reasoning to critical NFRs.Nevertless, it constitutes a substantial number of applications [47].RE-STORM has been applied using two diferent POMDP implementations (DESPOT [54] and Persus [77]), which allows it to overcome the curse of łHistoryž (i.e. in the case of RE-STORM beliefs do not grow exponentially along with the planning horizon).For instance, RE-STORM can take a maximum of 1 second to take a decision if this is required by the stakeholders.RE-STORM is considered one step ahead with respect to the other approaches towards fulilling the scalability criterion and represented by ✓in Table 6.
The second column, Long-term efect, allows us to classify the evaluated approaches in two main categories, (i) those that use reactive control decision-making [7,27,34,36,51,55] and thus their decision-making process relies on techniques that focus on short-term efects, and (ii) those that take into consideration the long term efects of their immediate actions [4,15,20,25,38] while using sequential decision-making approaches such as Markov Decision Processes (MDPs) and Partially Observable MDPs.In the case of RE-STORM, it supports decision-making based on Bellman's principle of optimality, which considers the current state of the system while projecting future evolutions of the satisfaction of the NFRs [see Section 5.2 (Planning step 2)].This capacity enables RE-STORM to overcome the well-known problem present in reactive approaches: i.e. choosing attractive short-term actions with potential undesirable long-term consequences [38].
The third column Representation of Partially-Observable NFRs shows how well the approaches deal with full observability of the current state of the NFRs.Whether they use a general goal model [7,27,34,36,51,55], or an implementation of an MDP [4,20,25,38], or Dynamic Decision Networks [14,15,57].Diferent from RE-STORM these approaches do not model the uncertainty related to the satisiability of the NFRs in a system, as they assume that the state of the NFRs is fully observable at every time step.As we argue in this paper, this assumption often does not hold in reality.Instead, using RE-STORM, a belief over the state of NFRs is maintained based on observations and actions.Such beliefs (probability distributions) represent the quantiication of uncertainty related to the real system's state and are used to drive the decision-making process presented in Section 5.2.Some approaches make use of Bayesian Artiicial Intelligence to tackle the uncertainty in the model selection process for SAS [22,45] however, they don't provide modelling of partial observability for NFRs.
In the fourth column Preferences Speciication, it is observed that most of the approaches present an explicit speciication of preferences.The exception is [25], which is oblivious to the speciication of preferences.The other approaches range from the speciication of preferences provided by system stakeholders [17,51,55] to preferences determined by using a simulator, such as the case in [20].In the middle, some authors use weights as the preferences to specify the preferences of stakeholders [3,38].Speciically, authors in [38], as in our case, use the rewards based on utility functions to specify the preferences included in the SLAs; however, they have done it in an ad hoc manner for a speciic pair of preferences.The latter is the reason why both [3,38] were qualiied to be in the process of fulilling the criterion Preferences Speciication.In RE-STORM, the initial preferences provided by system stakeholders are encoded by using the requirements-aware runtime model using POMDPs.During runtime, these preferences can be reassessed and updated if needed, as explained in the next paragraph.
Finally, the ifth column Update of Preferences shows that most approaches do not update preferences at runtime, except for the approaches in [40,51,55] and RE-STORM.Moreover, the update of preferences in [51,55] is not autonomous.Diferent from the above, we have shown how initial assumptions at design-time (i.e.stakeholders' preferences) can be updated during the system's execution to explore opportunities to improve the trade-ofs among NFRs according to new contexts, which may not have been foreseen.Speciically, we support runtime preferences reassessment and update by using the Weights Updater module [26] presented in Section 5.2.Furthermore, we also have a complementary approach to support autonomous tuning of NFRs priorities to keep SLAs compliant in an autonomous system [60,61] using Multi-Reward POMDPs.

CONCLUDING REMARKS AND NEXT STEPS
In this paper, we have presented the RE-STORM approach to account for the quantiication of uncertainty of the possible luctuations of the efects of adaptation actions based on partial observability of the state of an SAS using the model of a POMDP.The state of the SAS is represented by the levels of satisfaction associated with the NFRs.The POMDP that supports RE-STORM balances the required trade-of of the conlicting NFRs over time.Instead of assuming ixed efects of the adaptation actions on the system over time, which is common in traditional approaches, RE-STORM models the efects of the adaptation actions over the state of the SAS by using probability distributions that change depending on the observations captured during runtime.
Possible future steps include: Runtime learning of Stakeholders' preferences: Our work has shown the potential of runtime updates of stakeholders' preferences to improve the trade-ofs among NFRs and, therefore, the decision-making in SAS.We have used the implementation in [26], but diferent approaches can be used given the modular architecture of RE-STORM.We believe that a feasible next step is using Bayesian and Machine learning approaches to compute optimal or approximately optimal preferences that it new environmental contexts detected at runtime.At the present stage, we have obtained initial results on a new approach for autonomous tuning of NFRs' priorities [60,61] to maintain the SLAs in a SAS based on data gathered at runtime.
Further ways to quantify uncertainty: A future version of RE-STORM will extend its current system's state with soft requirements (e.g.human values [43,44]) to quantify uncertainty as beliefs about the efects on them of executing adaptation actions.We envision the modelling and continuous monitoring of soft requirements to collect runtime insights regarding potential violations of stakeholders' values and other soft requirements.Collected insights in this context may suggest improving design features in a software product or represent socially-oriented implications for systems at a higher level.Nevertheless, the reward values R(s,a) were updated in 2 and 4 , when in speciic time slices during the system's execution, the beliefs about the satisfaction of the NFRs were below their thresholds.The inal result was an improvement on the reliability of the system (See Figs. 9 and 10: behaviour under dynamic context 2 and 4 with update of preferences).As a trade-of efect, a reduction on the beliefs about the satisfaction of cost and performance has been observed, but still, they were able to meet their SLAs.Statistical signiicance.The logistic regression analysis shows that for Minimization of Cost (MC) (See Table 7), our belief representation had an 85.2% of success rate to predict the actual system's state (p-value = 0.000, Wald statistic = 40.346).In the case of Maximization of Reliability (MR), the success rate for predicting the real system's state was of 93.3% (p-value = 0.000, Wald statistic = 37.709) and for Maximization of Performance (MP), a success rate of 82.8% was obtained (p-value = 0.000, Wald statistic = 56.831).
For the dynamic context DC 4 , it was observed that for Minimization of Cost (MC) (See Table 8), the predictive capacity of the real system's state by the belief representation had a success rate of 84.2% (p-value = 0.000, Wald  7. DC 2 -Logistic regression models statistic = 24.440).In the case of Maximization of Reliability (MR), the success rate for predicting the real system's state was of 92.5% (p-value = 0.000, Wald statistic = 29.000)and for Maximization of Performance (MP), a success rate of 79.9% was obtained (p-value = 0.000, Wald statistic = 46.826).In this context, the efects on the system's state of the new environment produced an important reduction on the reliability of the system.The reliability was under its satisfaction threshold (See Fig. 11: behaviour under dynamic context DC 3 without update of preferences).Despite the new detected context, the preferred topology continued to be MST: 100% (See Fig. 4: selected topology without update of preferences).After the reassessment of stakeholders' preferences performed by the Weights Updater module (Planning step 1 of RE-STORM), the preferences were updated at runtime, and the reliability of the system was taken by the decision-making process to levels where its average satisfaction addressed the required SLA (See Fig. 11: behaviour under dynamic context DC 3 with update of preferences).A slight reduction on the performance and cost was also observed due to the trade-ofs among NFRs.However, this reduction does not imply any risk to continue meeting the SLAs of cost and performance: both NFRs were over their thresholds.Statistical signiicance.Results of the logistic regression analysis show that for Minimization of Cost (MC) (See Table 9), the predictive capacity of the real system's state by the belief representation had a success rate of 81.0% (p-value = 0.000, Wald statistic = 37.959).In the case of Maximization of Reliability (MR), the success rate for predicting the real system's state was of 91.8% (p-value = 0.000, Wald statistic = 37.959); inally, for Maximization of Performance (MP), we obtained a success rate of 81.5% (p-value = 0.000, Wald statistic = 50.125).Finally, in the dynamic context 6 , the most hostile environment designed for this evaluation was studied.6 relects the negative impacts of the dynamic contexts 4 and 5 simultaneously.Under 6 , when applying RE-STORM with no re-assesment of stakeholders' preferences, it was found that regardless the adaptation action selected by the RDM SAS, the environment showed a trade-of behaviour with a tendency to increment the cost (MC) while reducing the reliability (MR) and performance (MP).The satisfaction of all the NFRs in 6 were lower than those shown in the experiments with the stable conditions.When repeating the experiments of 6 but using RE-STORM with the update of the reward values R(s,a), it was found that the satisfaction of the reliability of the system was incremented.However, the increment was not enough to meet its SLA.Additionally, as a result of the trade-ofs to give priority to reliability, the satisfaction of the performance was reduced to a level below its threshold (See Fig. 13: behaviour under dynamic context DC 6 with update of preferences).In the cases of the beliefs about the satisfaction of MR and MP, they are not in compliance with their SLAs, due to the current extremely hostile dynamic context (See Fig. 13: behaviour under dynamic context DC 6 with update of preferences).11), the predictive capacity of the real system's state by the belief representation had a success rate of 74.9% (p-value =  15 after the update is MC = 0.8608, MR = 0.9373 and MP = 0.8354.Although, it is possible to observe in the igure above that the satisfaction of reliability is during some scarce time slices (upper whisker) over its threshold, on average, it was not possible to meet its SLA after the update of stakeholders' preferences.
RE-STORM improves the NFR with the lowest satisfaction concerning its SLA.However, unusual high SLAs may prevent taking the NFR to a suitable zone of satisfaction.In Fig. 16: Beliefs about the NFRs satisfaction without update of preferences, it is observed that the belief about the satisfaction of the reliability is always below its SLA.Therefore, the stakeholders' preferences are updated by the Weights Updater module accordingly to this context.Then, Fig. 16: Beliefs about the NFRs satisfaction with update of preferences, shows the new beliefs about satisfaction of the NFRs.In this case, regardless of the increment of the average satisfaction of the reliability of the system in comparison to the Example 01 (from 0.9373 to 0.9418), its satisfaction is constantly under its SLA, as is shown in Fig. 16.Even if not always possible to address the SLAs in a system under extreme conditions, RE-STORM is able to perform the trade-of of the NFRs independently of the SLAs in use.

C APPENDIX: SETUP OF RE-STORM
This appendix presents details on the initial coniguration of RE-STORM to implement its runtime behaviour (See Fig. 17).The initial representation of the SAS environment should be speciied at design-time to be used during the system's execution.The coniguration iles of this environment are briely described as follows:

C.1 RE-STORM configuration file
The main parameters of the RE-STORM coniguration ile are presented as follows: •  Other SLAs have also been evaluated and reported in Appendix B. Next, the coniguration ile to enable the behaviour of the dynamic contexts 1 to 6 is described.

C.2 Dynamic contexts configuration file
This coniguration ile supports further simulation of the SAS under diferent dynamic environments.Its main parameters are: • Dynamic context to be activated.It is an integer value (between 0 and 5) which represent the dynamic context to be activated.• Noise factor.It is a real value between [0,1] which represent the probability of activation of the selected dynamic context .The default value used during this evaluation is 0.5.
• Deviation range of the selected dynamic context .It is a range of values [lowerBound, upperBound], from which a real number is randomly selected to decrease speciic probability values (See Tables 2a, 2b and 2c), accordingly to the dynamic context selected.For example, for the range [0.1, 0.15] and the dynamic context 1 : P(MR +1 = True|NFR , MST ), a random value between 10% and 15% will be selected to reduce the current positive impact of the topology MST over the realiability of the system (MR +1 = True).• Length of the selected dynamic context .It is a range of values [lowerBound, upperBound], from which an integer number is randomly selected to specify the number of time slices that the dynamic context is performed.The default range of values used is [5,15] .• Flag to update reward values R(s,a).It is an integer value (0 or 1) to determine if the Weights Updater module of RE-STORM is performed when an NFR is detected below its threshold of satisfaction (i.e.below its SLA).For example, the behaviour for the dynamic context 1 reported as without update of preferences has been obtained by using the lag value 0. Conversely, the behaviour reported as with update of preferences has been obtained with the lag value 1.

D APPENDIX: DETAILS ON THE RE-STORM IMPLEMENTATION SUPPORTED BY THE DESPOT ALGORITHM
In this appendix, a further explanation of the planning activity of RE-STORM, represented by Algorithms 1, 2 and 3, is presented.Additional details on the implementation of the online POMDP solver tool can be accessed at the DESPOT toolkit repository [6].
Algorithm 1, provides a high-level view of the process to build, search and choose an action from a DESPOT tree [54] (See Fig. 2b  Backup(D,b) ⊲ (Backup on bounds at each node b) 10: end while 11: Return: DESPOT tree with an approximated optimal policy * ( 0 ) it contains only a root node with belief 0 about the current satisfaction of the NFRs in a system.The tree also contains the initial upper and lower bounds associated to the belief 0 (lines 4ś5).The algorithm performs explorations using Algorithm 2, to expand the DESPOT tree and to reduce the gap ( 0 ) between the bounds ( 0 ) and ( 0 ) at the root node 0 .Each exploration aims at choosing and expanding a promising leaf node (line 8) and adds its child nodes into the tree.For each new child, initial bounds () and () are computed.The process continues until the current leaf node is not heuristically promising [54].Then, the algorithm traces the path back to the root and performs backup using Algorithm 3, i.e. the upper and lower bound values () and () are updated at each tree node along the way to the root node (line 9).The updated values at the root node 0 , are used to compute a new ( 0 ) factor.The explorations continue until the gap between the bounds, i.e. ( 0 ) = ( 0 ) -( 0 ), reaches the target level 0 (where 0 >= 0) or the planning time inish (line 7).Then, the system executes the irst action of the policy * ( 0 ), i.e. the action branch * with the highest upper bound ( 0 , ). end if 14: end while 15: Return: An expanded DESPOT tree In Algorithm 2, the exploration to expand the DESPOT tree starts at the root node 0 .At each node along the exploration path, the best action branch * (i.e. the topologies MST or RT in the RDM SAS) is selected, according to the upper bound (, ) (line 10).Afterwards, the observation branch that leads to a child node ′ = (, * , ) maximizing the excess uncertainty ( ′ ) [32,54], is selected (line 11).The excess uncertainty ( ′ ) measures the diference between the current gap at ′ and the expected gap at ′ if the target gap ( 0 ) at 0 is satisied.Each node in the tree is created by a simulation model provided by the DESPOT algorithm [54] as part of a strategy to compute an approximated optimal policy [54].Initial upper and lower bounds () and () are created at each tree node.These bounds are inluenced by the current reward values R(s,a) of the system.As stated in Deinition 1, the reward values correspond with stakeholders' utility values associated with the state of the system (NFRs in RE-STORM) and adaptation actions.Therefore, update of preferences due to changes of utility values in the Planning step 1 (See section 5.2), can determine new bound values and thus a diferent adaptation action selected by the policy ( 0 ).The exploration at a node is terminated under the following conditions (line 6).First, Δ(b) <= ℎ , i.e. the maximum tree height is exceeded, and second, () > 0, indicating that when the expected gap at is reached, further exploration from onwards may be unproductive.When the exploration terminates, Algorithm 3 is performed.In Algorithm 3, the path back to the root node is traced and backup is performed using In this appendix, an explanation of the planning activity of RE-STORM, using an alternative POMDP solver is presented.Additional details on the implementation of this solver can be accessed at [77].
We have used Persues algorithm as an alternative for planning.Perseus is based on the Point-Based Value Iteration (PBVI) framework [50].The algorithm is divided into two main parts.
1)Random Exploration of the belief space to collect the belief values for NFRs.
2) Update of the Value function using Bellman backup [56].
In point based methods, the Value function V is represented in the form of -vectors presented as: Hence, the Value over belief is represented = . (9) Algorithm 4 Perseus: Random Exploration , ← (, , ) Add , to B 7: ← , 8: until || ≠ 9: return B During the irst step of random exploration, action is randomly selected and executed in oline mode to generate an observation.Consequently, a new belief sample point is computed, based on the selected action a and observation o, using the belief update procedure as follows [56]: where

Fig. 1 .
Fig. 1.General POMDPs and RE-STORM When using POMDPs in RE-STORM, the reward values R(s,a) at design time relect the initial stakeholders' preference values for executing an adaptation action based on the satisfaction of NFRs.During execution, these initial preference values are re-assessed according to new evidence collected at runtime.The reassessment can prompt new perceived utility values by stakeholders, which better it the newly found context (i.e. the stakeholders' preferences have changed).Based on the above, we present Deinition 4 as follows: Deinition 4. In RE-STORM, the reward values R(s,a) correspond to the utility value of arriving at the new state s ∈ S after executing an adaptation action a ∈ A. A SAS based on RE-STORM uses Bayesian inference to update the belief about the new state ′ ∈ S, as evidence arrives [i.e.new observations ∈ collected after executing adaptation actions ∈ ].Furthermore, based on Deinition 2, the policy , represented by the expression a = (b), deines the adaptation action taken by a SAS based on the current belief about the satisfaction of its NFRs.The goal is to select the adaptation action that maximises EV.The deinitions to represent NFRs and their evolution over time, supported by the POMDP, are presented next.Speciically, it is shown how the transition function T( ′ , a, s) and the observation function O( ′ , a, z) of the POMDP have been extended to support the modelling and treatment of partially observable NFRs.
(a) CPT Minimization of Cost (MC) (b) CPT Max. of Reliability (MR) (c) CPT Max. of Performance (MP)Table 2. CPTs for POMDP transition function4.4NFRs and the POMDP transition function Based on Bayes' theorem [71, 74], the transition function T (s, a, s' ) is factored as a product of conditional distributions [75].Hence, using Deinition 5, we derived a transition function as a function of the NFRs of the system.By applying this concept to the RDM SAS, the NFRs Minimization of Cost (MC), Maximization of Reliability (MR) and Maximization of Performance (MP) are shown in the following factored transition function: (, , ′ ) = ( +1 | , , , ) ( +1 | , , , ) ( +1 | , , , ) ), the MON variables: Ranges of Bandwidth Consumption (RBC), Active Network Links (ANL) and Total Time for Writing (TTW) represent indirect observations of the state of the RDM SAS, i.e. the NFRs Minimization of Cost (MC), Maximization of Reliability (MR) and Maximization of Performance (MP) respectively.The conditional probability tables (CPTs) modelling the observation function of the RDM SAS are shown in Table 3a .
(a) CPTs for RBC, ANL and TTW (b) Reward values R(s,a) (a) RE-STORM runtime architecture for decision making (b) DESPOT Belief tree with 2 sampled trajectories represented by red and green lines.

6. 1 . 2
Initial setup of experiments with the RDM SAS.Initial conigurations used in the evaluation of the RDM SAS are explained next.
RDM SAS under dynamic contexts: Six diferent dynamic contexts have been deined.Each dynamic context represents variations of the stable conditions of the RDM SAS to trigger the need to reassess stakeholders' utility/preference values.Next, the nature of the changes is presented.•Dynamic Context DC 1 .Changes in the environment during the execution of the MST topology are introduced to reduce the reliability of the system.These changes are implemented by altering the following conditional probability in the transition function T (s, a, s' ): P(MR +1 = True|NFR , MST ).

Fig. 3 .
Fig. 3. RDM SAS -behaviour under diferent environmental conditions The average belief about the satisfaction of Maximization of Reliability (MR) improves as a result of the better-informed decision-making provided by RESTORM.It is also observed that the trade-of among NFRs slightly reduces the belief about the satisfaction of Minimization of Cost (MC) and Maximization of Performance (MP) in comparison to the stable conditions of the system (See Fig. 3b: Beliefs about NFRs satisfaction with update of preferences).Nevertheless, MC and MP are still over their satisfaction thresholds.The behaviour above corresponds to Planning step 1 of the online planning activity of RE-STORM (See Section 5.

[ 76 ] 4 Fig. 9 .
Fig. 9. RDM SAS -behaviour under dynamic context DC 2 Both contexts represent situations where despite dynamic changes at the environment, the current RDM coniguration and preferences still keep the beliefs about the satisfaction of the NFRs over their thresholds (See Figs. 9 and 10: behaviour under dynamic context DC 2 and DC 4 without update of preferences).Nevertheless, the reward values R(s,a) were updated in 2 and 4 , when in speciic time slices during the system's execution, the beliefs about the satisfaction of the NFRs were below their thresholds.The inal result was an improvement on the reliability of the system (See Figs. 9 and 10: behaviour under dynamic context 2 and 4 with update of preferences).As a trade-of efect, a reduction on the beliefs about the satisfaction of cost and performance has been observed, but still, they were able to meet their SLAs.Statistical signiicance.The logistic regression analysis shows that for Minimization of Cost (MC) (See Table7), our belief representation had an 85.2% of success rate to predict the actual system's state (p-value = 0.000, Wald statistic = 40.346).In the case of Maximization of Reliability (MR), the success rate for predicting the real system's state was of 93.3% (p-value = 0.000, Wald statistic = 37.709) and for Maximization of Performance (MP), a success rate of 82.8% was obtained (p-value = 0.000, Wald statistic = 56.831).For the dynamic context DC 4 , it was observed that for Minimization of Cost (MC) (See Table8), the predictive capacity of the real system's state by the belief representation had a success rate of 84.2% (p-value = 0.000, Wald

Fig. 13 .
Fig. 13.RDM SAS -behaviour under dynamic context DC 6 Statistical signiicance.The analysis has shown in this context that for Minimization of Cost (MC) (See Table11), the predictive capacity of the real system's state by the belief representation had a success rate of 74.9% (p-value =

( 2 )
SLAs example 02: Diferent from the previous example, an even higher threshold of satisfaction for the reliability of the system is required in this example.The new established SLAs are: P(MC = True ) >= 0.80, P(MR = True) >= 0.99 and P(MP = True) >= 0.85.Fig. 16: Beliefs about NFRs satisfaction without update of preferences, shows the behaviour of the RDM SAS before the update of the stakeholders' preferences.

Fig. 16 .
Fig. 16.Beliefs about NFRs satisfaction -SLAs example 02 Transition function.It is a set of real values between [0,1].They represent the probability distributions of the transition function T (s, a, s' ) = P(s' |s, a) in a POMDP.Details on their speciication in terms of the

Fig. 17 .
Fig. 17.Initial setup of RE-STORM: inputs and output NFRs of a SAS have been presented in Section 3.1.The speciic values assigned to this parameter for the RDM SAS have been presented in the Tables 2a, 2b and 2c.• Observation function.It is a set of real values between [0,1].They represent the probability distributions of the observation function O(s', a, z) = P(z| ′ , a) in a POMDP.Details on their speciication in terms of the MON variables of a SAS have been presented in Section 3.2.The speciic values assigned to this parameter for the RDM SAS have been presented in Table 3a.• Reward values R(s,a).It is a set of real values between [0,1].They represent the initial preferences of the system's stakeholders, i.e the obtained reward (, ) after taking action ∈ at time , to arrive to the new state ∈ at time + 1.Details on their speciication have been presented in Section 4.6.The values assigned to this parameter for the RDM SAS have been presented in Table 3b.• Thresholds for the levels of satisfaction of NFRs.It is a set of real values between [0,1].The thresholds of satisfaction represent the Service Level Agreements (SLAs) to be monitored during the system's execution.RE-STORM uses them to trigger the need to update reward values R(s,a) in a POMDP.The default values used for the NFRs Minimization of Cost (MC), Maximization of Reliability (MR) and Maximization of Performance (MP) are [0.7,0.9, 0.75].Details on their speciication have been presented in Section 4.3.Other SLAs have also been evaluated and reported in Appendix B. Next, the coniguration ile to enable the behaviour of the dynamic contexts 1 to 6 is described.

Algorithm 3
Algorithm to perform backup on the bounds of each node b using Bellman's principle 1: Parameter(s): 2: -A DESPOT tree D and the current belief b 3: Runtime execution: 4: for each node x on the path from b to the root D do 5: Perform backup on () and l(x) 6: end for the Bellman's principle of optimality to recompute the bounds () and () at each node of the tree [54].ACM Trans.Autonom.Adapt.Syst.Decision Making for Self-adaptation based on Partially Observable Satisfaction of NFRs • 41 E APPENDIX: DETAILS ON THE RE-STORM IMPLEMENTATION SUPPORTED BY THE PERSUES ALGORITHM

Table 3 .
CPTs for POMDP observation function and Reward values R(s,a)

Table 5 .
DC 1 -NFRs Statistics and Independent Samples t-tests for equality of means

Table 6 .
Comparison of RE-STORM to other approaches