Abstract
We propose
1 INTRODUCTION
Most Cyber Security Operations Centers (CSOCs) are flooded by alerts. For instance, while speaking about the Sony breach in 2015, Reference [13] says that “while the tools were able to identify the malicious activity, those alerts were lost in a sea of 40,000 other alerts that same month.” Other sources state that “the security operations center (SOC) is drowning in cybersecurity alerts”1; they go on to state that banks see over 100,000 alerts per day.
We propose Probabilistic Cyber-Alert Management (
Fig. 1. PCAM architecture.
The organization of the article and the main contributions are as follows. Section 2 describes explains the differences between past work and ours. Section 3 describes the data used by
2 RELATED WORK
In this section, we introduce related works that study security personnel scheduling for manual analysis of security alarms.
The idea that managed security services need to optimize the allocation of analysts was studied in Reference [11]; the paper developed a combination of game theory and probabilistic temporal logic to reason about scheduling analysts in the presence of an adversary who takes advantage of the fact that lots of security alerts might be false. The problem was addressed by Dunstatter et al. [4, 5] from a Markov games perspective using deep reinforcement learning. Wang et al. [19] presented a survey of game-theoretic methods that have been applied to improve cyber security. They are focused on a game-theoretic setting with an adversary, which is not the focus of this article.
Ganesan et al. [8, 9] introduced the CSOC workforce optimization problem and used a reinforcement learning method to allocate the incoming alerts to security staff. This is extended in Reference [18] into a setting with distributed CSOCs. Their work does not address what happens before a shift.
Shah et al. [17] proposed an optimization framework to allocate cyber alert detection sensors to alert analysts. This was extended in Reference [16], which focused on fairness issues in the sensor-analyst allocation problem. Okimoto et al. [15] proposed a system to optimize the cyber-security systems involved with multiple criteria, e.g., risk (security), surveillance (privacy), and cost. Altner et al. [1] investigated optimizing staffing and shift scheduling decisions given unknown demand on weekly shifts with an on-call mechanism. In two less-related (as they do not work on analyst scheduling) but still relevant works, Franklin et al. [7] proposed a visualization tool to help with the cyber analysts to understand the analytic process and to structure their analytic thinking. Larriva-Novo et al. [12] proposed a dynamic risk management system with the capability of reacting to those rapid changes in the context of the organization with various types of sensors.
In contrast to all of these efforts that do not rely on a mechanism to separate real from false alerts,
3 PCAM Data
Figure 1 shows the proposed architecture of the
(1) | Alert data statistics. We obtained 44 days of real alert data.2 These data contained 8,119,858 alerts, leading to a daily average of 180,441 alerts per day. From this entire set of alerts, according to Dartmouth’s security analysts, there were just 21,916 true alerts, i.e., about 0.27% of the alerts were real, while about 99.73% of the alerts were false alarms. This constitutes an average of 498 true alerts per day, lost in a sea of 180,441 alerts being generated every day. We obtained statistics about the distribution of these alerts (both all alerts and true alerts) within 10-minute windows during the course of any given day. In other words, we broke the day down into 10-minute windows and obtained statistics (mean and standard deviation of the number of alerts per window, mean and standard deviation of the number of true alerts per window) and used these in Section 4 to generate data-driven schedules for analysts for the next shift so that the shift schedules satisfy various workplace requirements. Figure 2(a) and (b) show the temporal distribution of the number of true alerts during the weekdays and weekends. The results show that the distribution of true alerts is very different during weekdays and weekends. During weekdays, the true alerts largely occur during the 12 noon–to–3 pm window, but during weekends, we see three spikes: during the 3 am–to–4 am window, during the 12 noon–to–3pm window (as was the case during weekdays), as well as during the 6 pm–to–9 pm window. All times are U.S. East Coast times. Temporal distribution of true alerts for weekdays and weekends. The \( x \)-axis represents the time and \( y \)-axis represents the number of true alerts per hour. The shadowed areas represent the standard deviation. | ||||
(2) | Raw Alerts. In addition, once workers are on a shift, new alerts keep coming in from the security products used by the enterprise. In this case, we developed a “True Alert Predictor” to predict which of the alerts coming in in real time were true and which were false. We were able to do this with high accuracy. | ||||
As a consequence, our
4 PCAM Analyst Shift Scheduling as a Nonlinear Bi-Level Optimization
In this section, we describe the problem of scheduling an analyst’s shift for Maximal True Alert Coverage (MTAC). Past work [1] suggests that most shifts are either 8-hour shifts or 12-hour shifts. Because our
We assume a shift is divided up into contiguous time slices \(t_1,\ldots ,t_n\) where each time slice \(t_j\) represents a user-selected interval of time that is sufficiently small for their purposes. In our work, we chose the time slices to be 10 minutes long, and hence a shift of 12 hours corresponds to 72 time slices, i.e., \(n=72\). We assume that the company employs \(m\) analysts \(a_1,\ldots ,a_m\). Each analyst is a junior analyst, a senior analyst, or a principal analyst, and their capabilities in terms of processing alerts are given below; the statistics below are reproduced from the study in Reference [1]. We assume that analyst \(a_i\) is capable of analyzing \(c_i\) alerts per time slice in accordance with the above statistics. In addition, we use the following notation:
\(MT\) is a constant integer that denotes the maximum number of time slices that an analyst may work during a shift.
\(CT\) is a constant integer that denotes the maximum number of contiguous time slices that an analyst may work before taking a break for one time slice.
\(L\) is a constant integer that says that each analyst must have a lunch break consisting of at least \(L\) contiguous time units. (We recognize that the “lunch break” in the night shift is a break for a meal rather than for lunch, but will abuse notation and call it a lunch break).
\(LS,LE\) where \(LS\le LE\) are constant integers that bound the period when each analyst gets their lunch break. For instance, during the day shift, we may set \(LS\) to 37 (i.e., 11:00–11:10 am) and \(LE\) to 51 (i.e., 2:20–2:30 pm), and this imposes the constraint that every analyst gets his lunch break during the time interval \([LS,LE]\). In particular, we require that each analyst’s lunch break be fully contained within the \([LS,LE]\) interval.
We use \(A_j\) and \(TA_j\) as the expected number of alerts (respectively, true alerts) to arrive during time interval \(t_j\). These numbers can be estimated from historical statistics and computed as expected values.
The problem of scheduling shifts is now the problem of allocating analysts to shifts so that the expected number of “uncovered true alerts” is minimized. We first formalize this as a bi-level mixed integer nonlinear optimization problem as shown in Figure 3.
Fig. 3. Nonlinear Optimization problem for Shift Scheduling.
4.1 Constraints
We now explain the constraints in the mixed integer nonlinear optimization formulation provided in Figure 3.
Our formulation uses binary-valued variables \(v_{i,j}\). The idea is that \(v_{i,j}\) should be set to 1 if analyst \(a_i\) is assigned to work during time slice \(t_j\) and 0 otherwise, i.e., \[\begin{equation*} v_{i,j} = {\left\lbrace \begin{array}{ll}1 & \text{if analyst}\;\; a_i\;\; {\text{works during time slice}}\;\; {t_j}\\ 0 & \text{otherwise}\end{array}\right.}. \end{equation*}\]
Constraint (1). Constraint (1) of Figure 3 sets one constraint for each analyst saying that that analyst cannot work for more than \(MT\) time slices during any given shift.
Constraint (2). The goal of the second type of constraint is to ensure that each analyst gets a break periodically. More formally, every time interval of length \(CT\) is required to have at least one break in it. For this, suppose we set \(BR_h=[h,CT+h]\) to be an interval of length \(CT\) starting at time slice \(t_h\). Hence, we have time intervals such as the following: \[\begin{eqnarray*} BR_1 & = & [1,CT+1]\\ BR_2 & = & [2,CT+2]\\ & \ldots & \\ BR_{n-CT} & = & [n-CT,n]. \end{eqnarray*}\] For each such interval \(BR_h=[h,CT+h]\) where \(1\le h\le n-CT\), a break is needed for each analyst. Thus, for each \(1\le i\le m\) and each time slice \(1\le h\le n-CT\), we require that \(\Sigma _{j=h}^{j=CT+h} v_{i,j} \le CT-1\). By having this constraint for each and every time window \(BR_h\) of size \(CT\), we ensure that at least one of the \(v_{i,j}\)’s is set to 0 that would then be the time slice in which the analyst \(a_i\) gets a break.
Constraints (3) and (4). The third constraint requires that every analyst gets a lunch break. For each analyst \(a_i\), suppose \(LT^i_h = [h,L+h]\). Intuitively, each \(LT^i_h\) interval corresponds to the possible lunch periods for analyst \(a_i\) that starts at time \(t_h\) and goes until time \(t_{L+h}\). We require this entire time to be a break for the analyst, i.e., the value of the variable \(v_{i,j}\) should be 0 for each \(j\in LT^i_h\) if this interval happens to be the analyst’s lunch break. To express this idea, we define an intermediate variable \(u_{i,h}\) that is set to 1 if analyst \(a_i\)’s lunch break starts at time \(t_h\). Constraint (3) is therefore as follows: \[\begin{eqnarray*} \displaystyle u_{i,h} & = &\displaystyle \mathop {\prod }\limits _{k=h}^{k=L+h} \left(1-v_{i,k}\right)\!. \end{eqnarray*}\] If this is indeed analyst \(a_i\)’s lunch break, then \(v_{i,h}=0\) for all \(k\le h\le L+h\). This means that \((1-v_{i,j})=1\) for all \(k\le h\le L+h\), which in turn means that \(\Pi _{k=h}^{k=L+h} \left(1-v_{i,k}\right)\) would equal 1 if the interval \(LT^i_h=[h,L+h]\) is indeed the analyst’s lunch break.
Because the analyst gets one and only one lunch break, Constraint (4) says that one and only one of the possible \(u_{i,h}\)’s must be set to 1. Thus, Constraints (3) and (4) jointly enforce the fact that each analyst gets a contiguous lunch break of length \(L\) during his shift. But this comes at a cost, because Constraint (3) is nonlinear.
Constraints (5) and (6). These two constraints ensure that the variables \(v_{i,j}\) and \(u_{i,h}\) are binary variables.
4.2 Objective Function
The formulation of the MTAC problem assumes that we have an estimate of the expected number \(TA_j\) of true alerts that occur during any given time period \(t_j, 1\le j\le n\) — we can see that we do have this information from our Dartmouth College data as shown in Figure 2(a) and (b).
Figure 2(a) and (b) show that the distribution of true alerts during weekdays is dominated (at least at Dartmouth College) by the 12 pm–to–3 pm window. In contrast, there are multiple peaks during the weekends with attacks between the 3 am–to–4 am windows, as well as between the 12 pm–to–3 pm window and the 6 pm–to–9 pm window. Though these statistics only apply to the true alerts at Dartmouth College, most medium and large enterprises can generate similar distributions.
We note that we can estimate \(TA_j\) from historical data to simply be the mean of the set \(\lbrace TA_j(d) | d\in TD\rbrace\), i.e., if \(j\) is the 10-minute time slot between 10 am and 10:10 am, then the mean (respectively, standard deviation) is simply the mean (respectively, standard deviation) of the number of true alerts generated during this time window in the historical data.3
The term \(v_{i,j}\cdot c_i\) is the number of alerts that analyst \(a_i\) can handle (on average) in one time slice. Note that when \(v_{i,j}=0\), i.e., when the analyst is on a break, he or she can handle no alerts at all. Thus, the summation \(\Sigma _{i=1}^m v_{i,j}\cdot c_i\) is the total number of alerts that the \(m\) analysts can handle in one time slice \(t_j\), and hence \(TA_j - \Sigma _{i=1}^m v_{i,j}\cdot c_i\) is an estimate of the total number of alerts that would be left “uncovered” in that one time slice \(t_j\). By uncovered, we mean that this is the number of alerts that are not examined/handled by any of the analysts during this time slice. Because this number could be negative, we take the max of this and 0 in the objective function, i.e., the number of uncovered alerts in time slice \(t_j\) would be \(\mbox{max}(TA_j - \Sigma _{i=1}^m v_{i,j}\cdot c_i)\). Finally, we sum up the number of uncovered alerts over time to get the total number of uncovered alerts during a given shift, i.e., the total number of uncovered alerts during the entire shift consisting of the time slices \(t_1,\ldots ,t_n\) is \[ \displaystyle \sum \limits _{j=1}^n \left(\mbox{max}\left\lbrace 0,TA_j - \displaystyle \sum \limits _{i=1}^m v_{i,j}\cdot c_i \right\rbrace \right). \] The goal of our objective function is therefore to minimize the number of uncovered alerts across the shift as a whole.
Discussion: We see immediately that the objective function essentially has two levels, where the outer-level optimization minimizes the total number of uncovered true alerts and the inner-level optimization ensures that the number of uncovered true alerts cannot be “negative.”
5 EFFICIENT SOLUTION AS A MIXED INTEGER LINEAR PROGRAM
Because the problem of computing shifts involves optimizing an integer program that is bi-level and nonlinear, it is not directly solvable using existing commercial optimization solvers. In the rest of this section, we show how this problem can be neatly encoded as a single-level mixed integer linear optimization problem.
5.1 Non-Linear Constraints Transformation
The formulation of our problem of maximizing the expected number of true alerts that are covered is nonlinear, which suggests that it can be very expensive to solve it. In this section, we show that we can represent it in a linear form.
To linearize the nonlinear constraints in Equation (3), we replace each of the nonlinear constraints with two linear constraints: (7) \[\begin{eqnarray} u_{ih} & \le & 1-\displaystyle \sum \limits _{k=h}^{L+h}v_{i,k}/M \quad \forall a_i,\forall h=1,\ldots ,n-CT, \end{eqnarray}\] (8) \[\begin{eqnarray} u_{ih} &\ge &1-\displaystyle \sum \limits _{k=h}^{L+h}v_{i,k}\qquad \ \ \forall a_i,\forall h=1,\ldots ,n-CT. \end{eqnarray}\] Here \(M\gt m\) is any constant that is larger than the total number of analysts.
The following lemma shows that the constraints are expressed by the following equations.
In the formalized MTAC problem, the constraints expressed in Equations (5)–(8) have the exact same set of solutions as the Constraints (3), (5), and (6).
Constraints (5), (6), (7), and (8) are equivalent to Equations (3), (5), and (6).
Since Equations (5) and (6) require both \(v_{i,j}^{\prime }s\) and \(u_j\)’s to be binary variables, we only need to prove the equivalence of Equations (7) and (8) and (3).
We now prove that for each analyst \(a_i\) and time slice \(h=1,\ldots ,n-CT\), the same values of \(v_{i,k}, k=h,\ldots ,L+h\) will yield the same values of \(u_{i,h}\) in both cases. For ease of reference, we denote the \(u_{i,h}\)’s in Equation (3) and in Equations (7) and (8) as \(u_{i,h}^I\) and \(u_{i,h}^{II}\), respectively. We split the discussion into the following two cases.
Case 1: \(v_{i,k}=0, \forall k=h,\ldots ,L+h\). In this case, it is easy to see that \(u_{i,h}^I=\Pi _{k=h}^{k=L+h} (1-v_{i,k})=1\). At the same time, we have \[ u_{i,h}^{II}\le 1-\displaystyle \sum \limits _{k=h}^{L+h}v_{i,k}/M=1, \] \[ u_{i,h}^{II}\ge 1-\displaystyle \sum \limits _{k=h}^{L+h}v_{i,k}=1, \] which implies that \(u_{i,h}^{II}=1\). Thus, \(u_{i,h}^I=u_{i,h}^{II}\).
Case 2: \(v_{i,k}=1, \exists k=h,\ldots ,L+h\). Denote the sum term \(\Sigma _{k}^{L+h}v_{i,k}\) as \(\sigma\). In this case, \(\sigma\) would be an integer value between 1 and \(m\): \(1\le \sigma \le m\). For \(u_{i,h}^I\), we have \(u_{i,h}^I=\Pi _{k=h}^{k=L+h} (1-v_{i,k})=0\). For \(u_{i,h}^{II}\), we have \[ u_{i,h}^{II}\le 1-\displaystyle \sum \limits _{k=h}^{L+h}v_{i,k}/M=1-\displaystyle \sigma /M, \] \[ u_{i,h}^{II}\ge 1-\displaystyle \sum \limits _{k=h}^{L+h}v_{i,k}=1-\sigma . \] Because \(1\le \sigma \le m \le M\), we have \(0\lt 1-\sigma /M \lt 1\). So \(1-\sigma \le u_{i,h}^{II} \lt 1\). Combining the constraint that \(u_{i,h}\)’s are binary variables, we have \(u_{i,h}^{II} = 0\). Thus, \(u_{i,h}^I=u_{i,h}^{II}\) still holds.□
This result therefore shows a way of linearizing the constraints while preserving the same set of solutions as the solutions of the initial nonlinear integer program.
5.2 Bi-Level Objective Function Transformation
As pointed out above, the bi-level objective function is also a source of high computational complexity of the MTAC problem. To handle this problem, we introduce an auxiliary variable \(w_j\), where for each time slice \(t_j\) we require that \(w_j \ge TA_j - \Sigma _{i=1}^m v_{i,j}\cdot c_i\) and \(w_j \ge 0\). We now show that the optima are preserved as well.
Minimizing the objective in the optimization problem of Figure 3 is equivalent to the following optimization problem: (18) \[\begin{eqnarray} \text{minimize} & \displaystyle \sum _{j=1}^n w_j, \end{eqnarray}\] (19) \[\begin{eqnarray} \text{subject to} & \quad w_j \ge TA_j - \displaystyle \sum \limits _{i=1}^m v_{i,j}\cdot c_i \qquad \forall t_j, \end{eqnarray}\] (20) \[\begin{eqnarray} &w_j \ge 0 \qquad \forall t_j. \end{eqnarray}\]
Let \(\mathbf {v}_j=\langle v_{i,j}\rangle , i=1,\ldots ,m\) denote the decision variable (whether they are working or not) for all the analysts \(i\) at time slice \(t_j\). For each time slice \(t_j\), we define \(f_j(\mathbf {v}_j):= \max \lbrace 0,TA_j-\sum _{i=1}^mv_{i,j}\cdot c_{i,j}\rbrace\). The objective can be represented as \(f(\mathbf {v}_1,\ldots ,\mathbf {v}_n)=\sum _{j=1}^n f_j(\mathbf {v}_j)\).
Because for each time slice \(t_j\) there is \(w_j\ge TA_j - \Sigma _{i=1}^m v_{i,j}\cdot c_i\) and \(w_j\ge 0\), we thus have \(w_j\ge f(\mathbf {v}_j)\). Let \(w=\sum _{j=1}^n w_j\); we then have \(w\ge f(\mathbf {v}_1,\ldots ,\mathbf {v}_n)\). Therefore, we can define the epigraph of function \(f(\mathbf {v}_1,\ldots ,\mathbf {v}_n)\) as \[ \text{epi} f = \lbrace (\mathbf {v},w)|\mathbf {v}\in \text{dom} f, f(\mathbf {v})\le w\rbrace , \] where \(\mathbf {v}=\langle \mathbf {v}_j \rangle , j=1,\ldots ,n\) is the decision variable for all the analysts throughout all the time slices. It is easy to see that the optimization described in Equations (18)–(20) is the epigraph form of minimizing the objective in Figure 3. According to Reference [2], the epigraph form of the original optimization problem is equivalent.□
Combining Lemmas 5.1–5.2, we immediately have the following theorem.
The bi-level non-linear formulation of the MTAC problem described in Figure 3 is equivalent to the single-level Mixed Integer Linear Programming (MILP) in Figure 4 Fig. 4. MILP formulation for Shift Scheduling.
Important Note. The above theorem is very important, because it reduces the problem of shift scheduling to maximize expected true alert coverage (or alternatively minimize uncovered true alerts) to a single-level MILP problem instead of a bi-level mixed integer nonlinear programming problem. Though MILPs are in general NP-hard to solve, they have been well studied, and there are existing commercial optimization solvers (e.g., Gurobi and GLPK) that can be used to efficiently solve these types of optimization problems. For instance, in our experiments with a team of 6 junior, 8 senior, and 8 principal analysts, the problem can be solved within 0.4943 seconds (averaged over 50 runs, using Gurobi).
Note on Identifying the Optimal Mix of Personnel. Given a (biweekly) budget \(B\) for the day/night shift, a CISO (or similar senior leader) of an organization needs to identify the optimal mix of people to hire. This can be easily done. Suppose \(Sal_j\), \(Sal_s\), and \(Sal_p\) denote the biweekly budget for salaries of a junior, senior, and principal analyst, respectively. Then the organization can hire up to \(n_j,n_s,n_p\) junior, senior, and principal analysts, respectively, where \(n_j=\lfloor B/Sal_j\rfloor\), \(n_s=\lfloor B/Sal_s\rfloor\), and \(n_p=\lfloor B/Sal_p\rfloor\). We can then identify the optimal mix of junior, senior, and principal analysts to hire within the budget via a simple procedure. A staffing triple is a triple of the form \((K_j,K_s,K_p)\) where \(1\le K_j\le n_j\), \(1\le K_s\le n_s\), and \(1\le K_p\le n_p.\) Given a budget \(B\), let \(ST(B)\) denote the set of all staffing triples. For each staffing triple \(st_{j,s,p}=(K_j,K_s,K_p)\in ST(B)\), let \(MTA_{K_j,K_s,K_p}\) be the number of missed true alerts computed by running the Mixed Integer Linear Program shown in Figure 4. The optimal staffing is then \[\begin{eqnarray*} (J^*,S^*,P^*) = \text{arg\,min}_{(K_j,K_s,K_P)\in ST(B)} MTA_{K_j,K_s,K_p}. \end{eqnarray*}\]
6 IMPROVING ROBUSTNESS OF SCHEDULING
In the above, we use the average number of true alerts to derive a schedule for the analysts. However, the actual number of true alerts might be different from the averages due to (1) fluctuations/randomness or (2) existence of an adversary. To handle these two cases and therefore improve the robustness of
More specifically, the “adversarial sample” is a given true alert distribution \(TA_1^{(l)},\ldots ,TA_{MT}^{(l)}\) denoted by a superscript \(l\). The adversarial true alert distribution can be different from the average numbers of the distribution from which the probabilities used by
7 EXPERIMENTAL EVALUATION
We now describe the results of our experiments. We only show results of the
7.1 PCAM Analyst Shift Scheduler
We split 24-hour periods into a day shift (7 am to 7 pm) and a night shift (7 pm to 7 am). We consider time slices of 10 minutes; hence, our shifts have 72 time slices in them.
The average number of historical true alerts during any given day (respectively, night) time slice is used to estimate the number of true alerts during the same time slice in the next day (respectively, night) shift.
7.1.1 Settings.
As mentioned earlier, our goal is to come up with a shift schedule that satisfies various workplace constraints while minimizing the number of uncovered true alerts. As the number of uncovered true alerts obviously depends upon the number of analysts at different levels (junior, senior, and principal analysts) that in turn depends on the organization’s budget, we vary these numbers in the range \(\lbrace 2,3,\ldots ,9\rbrace .\) Note that these are used to evaluate how our method works under different settings with different numbers of analysts. Our main formulation is still the bi-level optimization introduced in Figure 3 with fixed number of analysts of different types. Therefore, we do not include the optimization of allocation of number of analysts as part of the optimization.
For each combination tuple of junior, senior, and principal analysts, we use existing optimization solvers such as Gurobi and GLPK to solve the MILP in Figure 4 to optimize (minimize) the number of uncovered true alerts, given their pay and capabilities as described in Table 1. It is important to note that
Baseline: We introduce a baseline that maximizes the amount of time worked by each analyst, subject to the workplace constraints. Specifically, we use the result of the optimization problem posed in Figure 5 as our baseline with which to compare.
Fig. 5. Baseline shift scheduling.
7.1.2 PCAM Shift Scheduling Results.
Figure 6 compares the PCAM-optimized schedule and the baseline for day shifts and night shifts, respectively. The \(x\)-axis shows the budget. Given a budget, PCAM finds the optimal schedule (i.e., mix of junior, senior and principal analysts) to hire and compares the number of uncovered true alerts during the day and night, respectively. For the day shift, the optimal number found by PCAM to get the number of uncovered true alerts to 0 consists of 6 junior, 8 senior, and 8 principal analysts at a cost of $98K for a 2-week period. In contrast, for the night shift, PCAM finds that the best combination consists of 3 junior, 7 senior, and 3 principal analysts at a cost of $55K for a biweekly period.4 The reason fewer analysts are needed during the night (at least for the Dartmouth College setting) is intuitively, because there are fewer true alerts during the night on weekdays.
Fig. 6. Number of uncovered true alerts of optimized schedule and baseline schedule for day and night shifts. The \( x \)-axes show the biweekly. The \( y \)-axes show the number of uncovered true alerts. The vertical orange line marks when the uncovered true alerts for the PCAM scheduler is 0.
The figures show that the PCAM-optimized schedule beats the baseline by a hefty margin. It is better in two respects:
(1) | |||||
(2) | |||||
Financial Implications. In short, the use of
Detailed comparison. Figures 7 and 8 show the detailed number of uncovered true alerts when varying exactly one of the three \(n_{junior},n_{senior},n_{principal}\), while keeping the other two fixed using both the
Fig. 7. Results for PCAM Optimized day shift. Each plot shows the number of uncovered alerts when varying one number of the combination tuple (junior, senior, or principal), while the other two are fixed (the fixed values are shown in the titles of the subfigures). The first, second, and third rows, respectively, fix the number of juniors, senior, or principal to hire. The columns indicate the number of the type of analysts that are fixed. The \( x \)- and \( y \)-axes represent the number of the two types of analysts that are being evaluated. The \( z \)-axis represents the uncovered alerts.
Fig. 8. Results for baseline day shift. Each plot shows the number of uncovered alerts when varying one number of the combination tuple (junior, senior, or principal), while the other two are fixed (the fixed values are in titles of the subfigures). The meanings of the rows, columns, and \( x \)-, \( y \)-, and \( z \)-axes are the same as in Figure 7.
Humanistic practices. From a humanistic perspective, work shifts may be affected by various factors such as sickness, family issues, and so on. We use the uncovered rate (UR) of true alerts to evaluate the performance, which is defined as the number of uncovered true alerts divided by the number of total true alerts. We define the humanistic absence rate to to be the absent work capacity divided by the total work capacity. Table 2 shows the uncovered rates for the alerts under various humanistic absence rates. The uncovered rate is obtained by the simulations based on historical data. The table shows that
7.2 PCAM Shift Schedule Performance with Adversary
We also tested the performance of
Fluctuations of true alert distributions: In this experiment, we consider an adversarial setting where the true alert distributions are made to fluctuate around the average values of the historical data. To do this, we randomly sample 100 true alert distributions using the Normal distribution in Equation (22) with a confidence interval (0.95). Table 3 shows the performance of
Fig. 9. Performance of PCAM-Fluct, PCAM-Shift, and PCAM-Mix with respect to distance of the original and adversarial true alert distributions, for day (upper row) and night (lower row) schedules. The \( x \)-axis represents the Euclidean Distance or number of shifts. The \( y \)-axis represents the UR by each schedule. Results of other distance metrics are shown in the appendix.
| Total true | \(UR_{PCAM}\) | \(UR_{PCAM-Fluct}\) | Distance | |
|---|---|---|---|---|
| Day | 502.270 | 0.717 | 0.340 | 132.437 |
| Night | 498.220 | 0.772 | 0.416 | 154.811 |
Results are averaged from 100 samples for both day and night schedules.
Table 3. Performance of PCAM-Fluct and PCAM with Fluctuations in True Alert Distribution
Results are averaged from 100 samples for both day and night schedules.
Shifts of true alert distributions: In this experiment, the adversarial samples are generated by shifting the original true alert distribution by \(0.5, 1, 1.5, \ldots , 6\) (i.e., with 12 types of shift periods). Table 4 shows the performance of
| Total true | \(UR_{PCAM}\) | \(UR_{PCAM-Shift}\) | Distance | |
|---|---|---|---|---|
| Day | 279.000 | 0.434 | 0.021 | 58.348 |
| Night | 213.000 | 0.425 | 0.033 | 42.795 |
Results are averaged for 12 samples for both day and night schedules.
Table 4. Performance of PCAM-Shift and PCAM with Shift in True Alert Distribution
Results are averaged for 12 samples for both day and night schedules.
Mixed adversarial true alert distribution: In this experiment, we consider both fluctuations and time shifts in the true alert distribution. Table 5 shows the performance of
| Total true | \(UR_{PCAM}\) | \(UR_{PCAM-Mix}\) | Distance | |
|---|---|---|---|---|
| Day | 502.270 | 0.745 | 0.340 | 143.456 |
| Night | 498.220 | 0.801 | 0.409 | 166.163 |
Results are averaged for 100 samples for both day and night schedules.
Table 5. Performance of PCAM-Mix and PCAM with Both Fluctuations and Shifts in True Alert Distribution
Results are averaged for 100 samples for both day and night schedules.
7.3 Live test of PCAM
The acid test of the
We also did an end-to-end test using the previously determined numbers of analysts for the day and night shifts respectively on the live test (6 days). In this live test, we created the analyst schedules before the day and night shifts and then played back the actual arrival of alerts during the 6 days of the testing period. Not a single true alert went uncovered by the schedules generated by the
8 CONCLUSION
Most cyber-security operations centers experience a huge flood of security alerts. This deluge is typically managed by a small team that is usually overworked. Because of the sheer volume of alerts, true alerts can be missed from a sea of false alarms.
In this article, we propose a data driven bi-level optimization algorithm to solve the following problems. First, given a distribution of true alerts during the day, how best should a given set of analysts be scheduled to minimize the expected number of true alerts that are not handled by an analyst. Second, given a desire to reduce the expected number of true alerts that are uncovered to be below a desired upper bound, how many analysts of different types do we need? Third, how can a predictor that predicts whether a given alert is real or merely a false alarm be incorporated into the system?
To solve these problems, we propose the
APPENDICES
A PCAM Alert Classifier
We also briefly introduce our alert classifier that distinguishes true alerts from false ones. A number of threat detection systems [6, 10, 14, 20] have sought to reduce the false alarm rate. As we can use any such framework within
Systems-Related Features: Systems related features included information such as the time of receipt time (of a suspect packet), the identity of the logging server, and more.
Traffic-Related Features: Traffic related features contain proxies for IP addresses, communication ports, the transmission protocol used, and more.
Threat-Related Features: Threat-related features include the category of the threat and the severity of the threat according to that security product.
We preprocessed data by categorizing sparse attributes, discarding single-value attributes, duplicate attributes, and attributes with too many missing values. Table 6 describes the different types of variables we finally ended up with.
| Serial Number | Serial number of the firewall that generated the log. |
|---|---|
| Log Forwarding Profile | The profile used for log filtering and tagging. |
| Source Zone | Indicator of the direction of the connection, which is either outgoing of incoming. |
| Ingress Interface | The network interface device that the session was sourced form. |
| Egress Interface | The network interface device interface that the session was destined to. |
| Repeat Count | Number of sessions with same Source IP, Destination IP, Application, and Subtype seen within 5 seconds. |
| Flags | 32-bit field that provides details on session, including PCAP (packet capture), IPv6, SSL, and so on. |
| Protocol | IP protocol associated with the session. |
| Direction | Indicates the direction of the attack, client-to-server or server-to-client. |
| Source Location | Source country or Internal region for private addresses. |
| Destination Location | Destination country or Internal region for private addresses. |
| Source Port | Source port utilized by the session. |
| NAT Source Port | The source port used by the connection after NAT (network address translation). |
| Destination Port | Destination port utilized by the session. |
| NAT Destination Port | The destination port used by the connection after NAT (network address translation). |
| Subtype | Subtype of the threat log. Vulnerability means this threat is vulnerability exploit detected via a Vulnerability Protection profile. |
| Action | Action taken for the session. |
| Category | The category of the URL. |
| Severity | Severity associated with the threat. Reported by threat detectors. |
| Rule Name | Name of the rule that the session matched. |
Table 6. Details of Variables
We ran 10 off the shelf classifiers to learn classifiers that separate true alerts from false ones. Seven of the 10 classifiers are “traditional” classifiers: Decision Trees, Logistic Regression, Bernoulli Naive Bayes, Gaussian Naive Bayes, Multinomial Naive Bayes, Random Forest, and Support Vector Machines (SVM). The other three are neural classifiers including Multi-Layer Perceptrons (MLP), Google’s Deep and Wide system (DeepWide) and Convolutional Neural Nets (CNNs). We do not make any claim of novelty about these algorithms—as they give very high F1-scores and AUCs, there was no point developing a new classifier.
B TRUE ALERT PREDICTION RESULTS
The shift schedules experiments above are used to schedule each analyst’s scheduled breaks and lunch slots—but they are finalized just before the analyst starts his shift.
The True Alert Predictor component of
Historical Data Analysis. We tested each of the seven classifiers on the features described in Section A. Because our alerts reflect temporal data, cross validation is not an appropriate test.8 To avoid the pitfalls of using cross validation for temporal data, we use rolling window prediction. In rolling window prediction, we train our models for the first \(T\) days and then predict for day \(T+1\)—then we learn the model from the first \(T+1\) days of data and predict for day \(T+2\), and so forth. We now report the results for rolling window prediction—for the sake of completeness, Table 7 contains the results of 10-fold cross validation as well. In our rolling window experiments, we train on each of the intervals \([1,30],[1,31],\ldots ,[1,43]\) and then predict for days \(31, 32,\ldots , 44\) respectively, i.e., we train on the first \(T\) days and then make predictions for day \(T+1\) with \(T=30,\ldots ,43.\)
| ROC-AUC | F1 score | Precision | Recall | |
|---|---|---|---|---|
| Decision tree | 0.9936 | 0.8634 | 0.8866 | 0.8595 |
| Logistic | 0.9992 | 0.8449 | 0.8918 | 0.8124 |
| NB Bernoulli | 0.9970 | 0.2621 | 0.1572 | 0.9064 |
| NB Gaussian | 0.9905 | 0.2507 | 0.1471 | 0.9941 |
| NB multinomial | 0.9967 | 0.2713 | 0.1639 | 0.9059 |
| Random forest | 0.9957 | 0.8806 | 0.9073 | 0.8600 |
| SVM | 0.9967 | 0.8538 | 0.9258 | 0.8064 |
| DNN-MLP | 0.9998 | 0.8740 | 0.9350 | 0.8300 |
| DNN-DeepWide | 0.9856 | 0.8634 | 0.8990 | 0.8466 |
| DNN-CNN | 0.9810 | 0.8634 | 0.8833 | 0.8300 |
Table 7. Classifier Performance (True Alert vs. False Alert prediction) on 44 Days of Data Using Rolling Window Prediction
Table 7 shows that
Live Testing Experiments. We also conducted a live test. In this test, we used the best model (Random Forest) obtained via training on the 44-day window—but then tested it on 6 days of live data to assess the predictive efficacy of our models. Table 8 shows the result of rolling window prediction in this case.
| ROC-AUC | F1 score | Precision | Recall | |
|---|---|---|---|---|
| Day 1 | 0.9931 | 0.9677 | 0.9633 | 0.9722 |
| Day 2 | 0.9786 | 0.9649 | 0.9910 | 0.9402 |
| Day 3 | 0.9998 | 0.9672 | 0.9987 | 0.9365 |
| Day 4 | 0.9913 | 0.9652 | 0.9652 | 0.9652 |
| Day 5 | 0.9758 | 0.9374 | 0.9450 | 0.9299 |
| Day 6 | 0.9867 | 0.9364 | 0.9717 | 0.9035 |
Table 8. Live Test: Performance of Random Forest Classifier on True Alert vs. False Alert Prediction on 44 days of live data. Random Forest has the best F1 among all the classifiers
| ROC-AUC | F1 score | Precision | Recall | |
|---|---|---|---|---|
| Decision tree | 0.9895 | 0.9065 | 0.9488 | 0.8678 |
| Logistic | 0.9992 | 0.8898 | 0.9455 | 0.8404 |
| NB Bernoulli | 0.9967 | 0.2537 | 0.1476 | 0.9018 |
| NB Gaussian | 0.9906 | 0.2510 | 0.1437 | 0.9915 |
| NB multinomial | 0.9968 | 0.2643 | 0.1548 | 0.9013 |
| Random forest | 0.9923 | 0.9085 | 0.9519 | 0.8688 |
| SVM | 0.9974 | 0.8887 | 0.9488 | 0.8357 |
| DNN-MLP | 0.9998 | 0.9044 | 0.9614 | 0.8539 |
| DNN-DeepWide | 0.9825 | 0.8616 | 0.9576 | 0.7830 |
| DNN-CNN | 0.9840 | 0.8765 | 0.9256 | 0.8378 |
Table 9. Experimental Results Showing Classifier Performance in Predicting Whether an Alert Is Real or False Using Cross Validation
We see that the predictive accuracy of Random Forest actually increased: the F1-scores for each of the days went up to 93.64—96.77% with precision also increasing to lie between 94.5% and 99.87% and recall to lie between 90.35% and 97.22%. The fact that the trained models improved during live testing suggests that over-fitting did not occur during the training phase.
C PCAM Detailed Scheduling Results for Night Shifts
Figures 10 and 11 show the detailed number of uncovered true alerts when varying exactly one of the three \(n_{junior},n_{senior},n_{principal}\), while keeping the other two fixed using both the
Fig. 10. Results for PCAM Optimized night shift. Each plot shows the number of uncovered alerts when varying one number of the combination tuple (junior, senior, or principal), while the other two are fixed. The meanings of the rows, columns, and \( x \)-, \( y \)-, and \( z \)-axes are the same as in Figure 7.
Fig. 11. Results for baseline night shift. Each plot shows the number of uncovered alerts when varying one number of the combination tuple (junior, senior, or principal), while the other two are fixed. The meanings of the rows, columns, and \( x \)-, \( y \)-, and \( z \)-axes are the same as in Figure 7.
Fig. 12. Performance for day and night schedules of PCAM-Fluct and PCAM-Mix. The \( x \)-axis represents the distance metrics, including Euclidean Distance, Bray–Curtis Distance, Cosine Distance, and Wave–Hedges Distance. The \( y \)-axis represents the Uncovered Rate by each schedule.
Footnotes
1 https://bricata.com/blog/how-many-daily-cybersecurity-alerts/.
Footnote2 An additional 6 days of data was later used for live testing.
Footnote3 If so desired, then we can compute the mean and standard deviation using just that portion of the training data that looks at the last \(w\) time windows as opposed to all of them.
Footnote4 Please note that the salary rates used were from Reference [1] and from glassdoor.com as opposed to Dartmouth College salaries, which were kept confidential from us. Moreover, benefits are not included in these costs but would be around 30% of salary. Finally, please note that the number of junior, senior, and principal analysts computed above were based on the alert data for Dartmouth College’s network. Though these numbers can be different for other organizations, the process followed in this article can be directly applied to the alert statistics from those organizations.
Footnote5 Results for night shifts are similar and we put them in the appendix for space concerns.
Footnote6 We also consider other distance metrics such as Bray–Curtis Distance, Cosine Distance and Wave–Hedges Distance. We refer to Reference [3] for the definitions of the distance metrics. The results using these other distributions are similar and are shown in the appendix.
Footnote7 The schedule assumed 6 junior, 8 senior, and 8 principal analysts for day time schedules and 3 junior, 7 senior, and 3 principal analysts for night time schedules.
Footnote8 This is because \(K\)-fold cross validation would randomly split the data into \(K\) chunks. It would then do \(K\) iterations: In each iteration, one of the \(K\) chunks would be the test set, while the remaining \((K-1)\) chunks would be used for training the model. The performance of a classifier would then be obtained by aggregating the performance over the \(K\) folds. However, this is inappropriate for temporally sensitive data, because the training folds can (with \(\frac{K-1}{K}\) probability) contain data points from the future and those might be used to predict outcomes for data in the test fold that might be from the past.
9 In contrast, the results of K-fold cross validation shown in Table 9 are slightly inflated (due to the methodological flaw with using cross validation on temporal data) with Random Forest yielding the best F1-Score of 90.85% with a precision of 95.19% and a recall of 86.88%.
- [1] . 2018. A two-stage stochastic program for multi-shift, multi-analyst, workforce optimization with multiple on-call options. J. Schedul. 21, 5 (2018), 517–531. Google Scholar
Digital Library
- [2] . 2004. Convex Optimization. Cambridge University Press. Google Scholar
Digital Library
- [3] . 2007. Comprehensive survey on distance/similarity measures between probability density functions. City 1, 2 (2007), 1.Google Scholar
- [4] . 2018. Allocating security analysts to cyber alerts using markov games. In Proceedings of the National Cyber Summit (NCS’18). IEEE, 16–23.Google Scholar
Cross Ref
- [5] . 2019. Solving cyber alert allocation markov games with deep reinforcement learning. In Proceedings of the International Conference on Decision and Game Theory for Security. Springer, 164–183.Google Scholar
Digital Library
- [6] . 2013. An efficient false alarm reduction approach in HTTP-based botnet detection. In Proceedings of the IEEE Symposium on Computers & Informatics (ISCI’13). 201–205.Google Scholar
Cross Ref
- [7] . 2017. Toward a visualization-supported workflow for cyber alert management using threat models and human-centered design. In Proceedings of the IEEE Symposium on Visualization for Cyber Security (VizSec’17). IEEE, 1–8.Google Scholar
Cross Ref
- [8] . 2017. Optimal scheduling of cybersecurity analysts for minimizing risk. ACM Trans. Intell. Syst. Technol. 8, 4 (2017), 52. Google Scholar
Digital Library
- [9] . 2019. Optimizing alert data management processes at a cyber security operations center. In Adversarial and Uncertain Reasoning for Adaptive Cyber Defense. Springer, 206–231.Google Scholar
Digital Library
- [10] . 2014. False alarm minimization techniques in signature-based intrusion detection systems: A survey. Comput. Commun. 49 (2014), 1–17. Google Scholar
Digital Library
- [11] . 2016. Using temporal probabilistic logic for optimal monitoring of security events with limited resources. J. Comput. Secur. 24, 6 (2016), 735–791.Google Scholar
Cross Ref
- [12] . 2020. Dynamic risk management architecture based on heterogeneous data sources for enhancing the cyber situational awareness in organizations. In Proceedings of the International Conference on Availability, Reliability and Security. 1–9. Google Scholar
Digital Library
- [13] . June 9 2017. Crying Wolf: Combatting Cybersecurity Alert Fatigue. SC Magazine. Retreived from https://www.scmagazine.com/home/security-news/in-depth/crying-wolf-combatting-cybersecurity-alert-fatigue/.Google Scholar
- [14] . 2013. Enhancing false alarm reduction using voted ensemble selection in intrusion detection. Int. J. Comput. Intell. Syst. 6, 4 (2013), 626–638.Google Scholar
Cross Ref
- [15] . 2013. Cyber security problem based on multi-objective distributed constraint optimization technique. In Proceedings of the 43rd Annual IEEE/IFIP Conference on Dependable Systems and Networks Workshop (DSN-W’13). IEEE, 1–7.Google Scholar
Cross Ref
- [16] . 2019. A methodology for ensuring fair allocation of CSOC effort for alert investigation. Int. J. Inf. Secur. 18, 2 (2019), 199–218. Google Scholar
Digital Library
- [17] . 2018. Optimal assignment of sensors to analysts in a cybersecurity operations center. IEEE Syst. J. 13, 1 (2018), 1060–1071.Google Scholar
Cross Ref
- [18] . 2019. Adaptive alert management for balancing optimal performance among distributed CSOCs using reinforcement learning. IEEE Trans. Parallel Distrib. Syst. 31, 1 (2019), 16–33.Google Scholar
- [19] . 2016. A survey of game theoretic methods for cyber security. In Proceedings of the IEEE 1st International Conference on Data Science in Cyberspace (DSC’16). IEEE, 631–636.Google Scholar
Cross Ref
- [20] . 2019. Alert correlation for cyber-manufacturing intrusion detection. Proc. Manufact. 34 (2019), 820–831.Google Scholar
Cross Ref
Index Terms
PCAM: A Data-driven Probabilistic Cyber-alert Management Framework
Recommendations
Enhancing IDS performance through comprehensive alert post-processing
Intrusion detection systems (IDS) are among the most common countermeasures against network attacks. In order to improve the alerts obtained from them, various methods of post-processing have been proposed. These methods usually try to alleviate ...
Alert verification through alert correlation: An empirical test of SnIPS
A significant problem with today’s intrusion detection systems is the high number of alerts they produce for events that are regarded as benign or noncritical by system administrators. A large number of solutions has been proposed to deal with this ...
An alert data mining framework for network-based intrusion detection system
WISA'05: Proceedings of the 6th international conference on Information Security ApplicationsIntrusion detection techniques have been developed to protect computer and network systems against malicious attacks. However, there are no perfect intrusion detection systems or mechanisms, because it is impossible for the intrusion detection systems ...


















Comments