Stationary Algorithmic Balancing For Dynamic Email Re-Ranking Problem

Email platforms need to generate personalized rankings of emails that satisfy user preferences, which may vary over time. We approach this as a recommendation problem based on three criteria: closeness (how relevant the sender and topic are to the user), timeliness (how recent the email is), and conciseness (how brief the email is). We propose MOSR (Multi-Objective Stationary Recommender), a novel online algorithm that uses an adaptive control model to dynamically balance these criteria and adapt to preference changes. We evaluate MOSR on the Enron Email Dataset, a large collection of real emails, and compare it with other baselines. The results show that MOSR achieves better performance, especially under non-stationary preferences, where users value different criteria more or less over time. We also test MOSR's robustness on a smaller down-sampled dataset that exhibits high variance in email characteristics, and show that it maintains stable rankings across different samples. Our work offers novel insights into how to design email re-ranking systems that account for multiple objectives impacting user satisfaction.


INTRODUCTION
Email is one of the most popular online activities, with millions of users exchanging messages every day.However, managing a large and diverse email inbox can be overwhelming and frustrating for users, reducing their satisfaction and productivity [6,26].Therefore, designing email platforms that can help users cope with email overload and find the most important messages to send or reply is a key challenge.Email recommender systems aim to provide personalized suggestions for ranking emails based on users' preferences [14].For example, Google's 'Priority Inbox' feature ranks emails according to their inferred priority for reading based on users' past behavior [1].
However, user preferences are not static; they may change over time depending on various factors such as context or mood.To account for this dynamic nature of preferences, email recommender systems need to learn from feedback and update their ranking strategies accordingly.Offline methods that assume fixed or stable preferences may fail to capture the evolving interests of users over time [15,20].Thus, an online algorithm that can adapt to preference changes in real time is crucial.
Moreover, email recommendation is not a single-objective problem; it involves multiple criteria that affect user satisfaction with different aspects of emails.In this paper we focus on three criteria: closeness (how relevant the sender and topic are to the user), timeliness (how urgent the email is), conciseness (how brief the email is).These criteria reflect different dimensions of importance that users may value differently at different times.For instance, a user may prefer timely but concise emails during busy workdays but close but lengthy ones during leisure time.Hence, email recommender systems need to balance these multiple objectives while generating personalized rankings.
Existing approaches for email re-ranking or recommendation have mostly focused on maximizing relevance or priority based on certain features.For example, some methods use sender-receiver relationship features [2,7], others use topic models [1,4,13,30], while others combine text similarity with temporal features [8].However, these methods have some limitations in terms of accuracy and adaptability.They neglect other factors besides relevance such as novelty or diversity which may also influence user satisfaction [20] .Most importantly they do not explicitly account for preference changes over time.Recent research has started considering "beyond relevance" objectives in recommendation systems, such as exploration vs exploitation, serendipity vs familiarity etc., which optimize factors affecting user engagement rather than just item relevance [3,5,20].We argue that similar objectives apply to email ranking settings, where users may value different aspects of emails more or less at different times depending on their context.
In this paper, we address this problem as a multi-objective online recommendation task based on three criteria: closeness, timeliness, and conciseness.Closeness refers to the estimation of the relationship between the sender and receiver, timeliness refers to the urgency of a reply, and conciseness refers to the usage of words in the email.We argue that these aspects reflect different dimensions of user satisfaction with respect to emails, and they may vary across different users and over time.
We propose MOSR (Multi-Objective Stationary Recommender), a novel online algorithm that uses an adaptive control model to balance these criteria and adapt to preference changes.Our algorithm learns each criterion's weight from historical data and updates it using gradient descent based on observed feedback signals.It then combines these weights into a single score for each email using a linear aggregation function.By doing so, our algorithm can adjust its ranking strategy according to changing preferences without requiring prior knowledge or explicit input from users.
The main contributions of our work are as follows: (1) We formulate the email re-ranking as a multi-objective online recommendation problem that aims to optimize three criteria: closeness, timeliness, and conciseness.These are key factors that influence user actions in email.We show how preferences w.r.t these criteria vary across users and over time.(2) We propose MOSR , an adaptive control model that learns a reference vector from historical data and adjusts it based on online feedback.The reference vector represents the relative importance of each criterion for each user at each moment.MOSR adapts the reference vector dynamically by using reinforcement learning techniques without requiring re-training or compromising privacy.(3) We evaluate MOSR on the Enron Email Dataset [16].We show that MOSR outperforms several baselines in terms of ranking quality measured by NDCG.We also demonstrate that MOSR handles non-stationary preferences well by providing consistent recommendations even when users change their values for different criteria over time.Furthermore, we test MOSR 's robustness on a smaller dataset sampled randomly at different time intervals and show that MOSR still performs better than other methods under high variance conditions.

MOSR FRAMEWORK
Our goal is to design a recommendation system that helps users choose when and how to send emails based on their preferences w.r.t.relationships, urgency, and brevity.We model this as a dynamic problem that involves multiple objectives that may conflict or change over time.Our algorithm uses the email stream flow as input and tries to find the optimal trade-offs among these objectives for each email ranking decision.

Problem Definition
Our re-ranking problem is a type of recommendation problem that consists of two stages: candidate generation and ranking.However, it differs from the typical recommendation problem in two ways: • First, we need to balance multiple and sometimes conflicting criteria to achieve the highest level of satisfaction among them.
• Second, users' email ranking preferences are not fixed but may change depending on external factors.Figure 2 illustrates some scenarios where users' preferences vary or remain constant due to different influences.
Definition 2.1 (Email object).We consider an email object consists to be represented as  = {, , ,  }, where  ∈  is the email address of the sender,  ∈  is the email address of the receivers,  is the content of the email,  is the timestamp when  is sent.G is the set of all the email objects.
We want to rank the emails of a specific email address   according to the user's preferences, which may change over time.The ranking candidates are the emails that have been received or sent by   .Definition 2.2 (Candidate Set).Here, we define candidates set  with  = { 1 ,  2 , ..  }.There are two types of candidates in : unanswered emails in the inbox or follow-up emails after no response.Hence, the candidates set  includes the people   sent to/received from.As we defined before, candidates set  = { 1 ,  2 , ..  , ...},   ∈ .At different timestamps   , the candidate set would also be updated with time window   .
Note the candidate set  is not fixed, since new emails may arrive or be sent at any time.To rank the candidates, we assign each email a score based on multiple criteria Φ, Ξ, Υ for the current timestamp   .These criteria reflect how relevant, timely, important, or interesting an email is for the user.We also use a feedback-based aggregation function that can adjust the scores online as we learn from different users' choices.Then we sort the emails by their scores to get a personalized ranking for each user at any time.

Proposed Approach
The overall architecture of our algorithm is depicted in Figure 1b.In this section, we will introduce the details of the MOSR algorithm. ( Step 4: Compute loss, update the weights with MRAC and repeat.We use several ordered weight averaging (OWA) aggregators to combine the criteria of closeness, timeliness, and conciseness and obtain the predicted scores of candidates Q.Then, we apply a weighted sum aggregator to re-rank the scores from OWA.To adapt to users' choices, we adjust the weights of different scores by adaptive control over the multi-score aggregation.For each email address   , we update its sending preference online according to the loss between true ranking and predicted ranking.When   sends an email to a candidate   , it raises the priority of   and the MRAC (Model Reference Adaptive Control-defined below) modifies the weights of relevant scores.
We formulate our problem as a dynamic multiple objective optimization problem to achieve algorithmic balance over closeness, timeliness, and conciseness.The conventional multiple objective optimization problem aims to optimize the weight of different objectives under constrained or conflicting situations [17].However, this is not suitable for our case because email history changes over time.Therefore, we propose a dynamic version that involves multi-stage ranking setups and time windows.
Most existing recommendation systems use a two-stage mechanism: they first extract potential candidates and model their features to get one score per candidate [18,19,30].However, this is inefficient for data streams because learning over large candidate sets becomes impractical.Unlike previous systems, we use multiple scores based on different user habits instead of one general static score.We also use time windows to enable fast switching among different scores as email history evolves.
We propose a MRAC (Model Reference Adaptive Control) model to create an online mechanism for the multi-objective optimization problem.In this model, we use different rankers to order the solutions according to various criteria, with the aim of discovering the personalized preferences of each user over these rankers.We assume that there is a true preference ranking that reflects the user's ideal ordering of solutions, and our goal is to estimate and update the user's preference over different rankers as they interact with them.To do this, we treat each ranker as a fixed model, and we measure the distance between the true ranking and the predicted ranking by each ranker as a controller.

BACKGROUND 3.1 Email overloading problem
Many users face the problem of email overload, where their inboxes are filled with too many emails and they struggle to identify or respond to the important ones [6,26].One possible solution is to re-rank incoming emails and create a priority inbox based on various factors [1].Previous studies have explored different aspects of this problem, such as how people decide whether to reply or not, depending on interpersonal differences, email content, attachments, and other features [4,11,12,30].They also proposed methods to predict the priority of emails in the inbox using content-based features [10,13,24].
However, most of these methods rely on analyzing the content of emails, which may raise privacy concerns.Aberdeen et al. [1] used a linear logistic regression model with multiple content-based features for real-time online ranking.Yang et al. [30] included attachments as an additional feature for analysis.Feng et al. [13] developed a doc2vec based generative model to rank inbox emails.Bedekar et al. [4] re-ranked emails according to their topic analysis.
In this work, we examine how different criteria affect email ranking jointly.

Model Reference Adaptive Control
MRAC is a control method that uses a reference system (model) as a target for the process being controlled.The reference system has a model with state, input and output variables.The controller

• Reference Model
The reference model defines the desired behavior of a process and is usually expressed in a parametric form (e.g., transfer function/state-space models) that can be implemented in the control computer.To achieve an exact match between the reference model and the actual process, the reference model must have some properties: it must be stable and minimum phase (meaning that its poles/zeros are in the left-half plane), and it must represent the process well.

• Controller
An MRAC system requires a controller that meets some criteria.First, it must ensure "perfect model matching", which means that there must exist control parameters that make the closed-loop response identical to that of the reference model.Second, it must use direct adaptation, which means that the control parameters depend on a linear function of the error signal.In our model, we use OWA-related algorithms to estimate these control parameters based on minimizing an objective function.

3.2.2
Adaptive control with multiple fixed models.MRAC aims to optimize the controller parameters for the entire system.However, some controllers may rely on multiple models in the system [21,22].How to switch and tune between models is a common topic.The models can be either fixed or adaptive.A fixed model has constant controller parameters, while an adaptive model requires parameter adjustment.An MRAC algorithm with multiple models should specify how to select the appropriate controller for different environments.

Multi-Objective Optimization
One way to combine multiple criteria into a single decision function is by using ordered weight averaging functions (OWA) [28].These functions aggregate the scores that measure how well different criteria are satisfied [9].However, unlike weighted sum functions that assign fixed weights to each criterion, OWA functions assign weights based on the magnitude of the scores.This means that higher scores indicate more important criteria.OWA functions are often used in recommendations that involve several satisfaction criteria, such as music recommendations and COVID-19 policy [20,23].
As a symmetric aggregation function, OWA assigns weights according to the values of attributes.Thus, each weight is not associated with a particular attribute.Giving an input x and a weighting vector w, the OWA function will be where  ′  is -th largest element in x, or to say x  () .There are many methods to obtain the weighting vector w.One typical method is the use of Regular Increasing Monotone (RIM) quantifiers [29].RIM quantifiers generate the weights by in which  is the -th largest value,  is the number of criteria in OWA, and  is the RIM quantifier.Furthermore, RIM restricts that The changes over parameters  bring the RIM quantifier to different cases.When  → 0, the OWA becomes the MAX operator, when  → 0, the OWA operator becomes the arithmetic mean, and when  → ∞, the OWA operator becomes the MIN operator.

USER PREFERENCES
Our goal is to understand what makes an email more or less important to users based on closeness, timeliness, and conciseness.Here, we define the criteria we use to assess these factors and quantify them in our recommendation.Definition 4.1 (Insider space).Suppose  is a subset of , indicating the insider email addresses.Here, we define insider email addresses as Enron email addresses, and outsider email as non-Enron email addresses.We define the insider space I =  × G and a function  over I.
A flow list is a key concept for online email re-ranking problems because it enables online training updates.Rather than re-train the model with a new large-scale dataset, we can add the new data to the flow list and train them together.
We treat the email re-ranking problem as a recommendation problem that has two stages: candidate generation and ranking.However, our re-ranking problem differs from typical recommendation problems in three main ways: • The set of candidate emails changes dynamically as new emails arrive or old ones are replied to.• We need to balance different and sometimes conflicting criteria to achieve optimal satisfaction for the user.• There is no universal formula to capture all users' email ranking preferences because they may vary depending on the context and mood of the user.Figure 2 illustrates this.
For a given email address   , we will compute a set of candidates  = { 1 ,  2 , ..  }.Note that  will change with time flows.Our goal is to score the candidates above with an aggregation function and update the function online with different users' choices.By sorting the scores we get, we will obtain the ranking of candidates.

Key Concepts
In this paper, we use three key concepts: closeness, timeliness, and conciseness.We will define and measure them in the next section.
4.1.1Closeness.Closeness represents the relationship between users.People may prefer to reply to those who are closer to them when they prioritize their emails.We distinguish between insider closeness and outsider closeness.Insider closeness captures the relationship between Enron employees, based on their relative job level.Outsider closeness reflects the relationship between Enron employees and people from other organizations.
Quantifying closeness: We adopt two criteria to quantify closeness between two email addresses   and   : (1) Their previous email history frequency, and (2) Their business relationship.To capture dynamics in previous email frequency, we adopt a sliding time window  Φ .We first define frequency  (  ,  ) (  ,   ) as the number of emails from   to   between   to   .Suppose   is the sender and   is the recipient, then at timestamp   , the to-frequency   is  (  −,  ) (  ,   ) and the from-frequency   is  (  −,  ) (  ,   ).To quantify their previous email history frequency  at   , we apply a weighting for the to-frequency and the from-frequency, thus • Insider closeness.Insider closeness Φ 1 is defined when the sender and receiver are in the same company.In this case, we include a job level ratio in the measure.Let the job levels of sender   and recipient   be   and   , respectively.Then the insider closeness between them is: • Outsider closeness.Outsider closeness Φ 2 is defined when the receiver is from a different company than the sender: Here,  (  ,   ) indicates the social distance between   and   .We will further describe this in the section below.
4.1.2Timeliness.Timeliness indicates how long it has been since the last email.Timeliness affects the urgency of a reply.Usually, people tend to reply sooner to emails they receive earlier [30].We also account for cases where people send follow-up emails.
Quantifying timeliness: Timeliness helps quantify the urgency to reply to an email and, in turn, composes the preference over email replying priority.Suppose for sender   and recipient   , we have the subset of chatting history ℎ  with only   ; refer to this as  =  (ℎ  ), in which  is a filter.Then the time stamps for   sending emails to   will be   =  =  ,=1 (ℎ  ), and the time stamps for   receiving emails from   will be   =  =  ,=−1 (ℎ  ).Timeliness consists of two aspects: reply timeliness and follow-up timeliness.To account for those two aspects, we estimate a score for each aspect.Suppose the current timestamp is   , for reply timeliness, the score is Ξ  =   −  (  ).For follow-up timeliness, the score will be Ξ  =   − ( Ξ ,  (  )), in which  Ξ is the time window for follow-up timeliness.We apply weights to Ξ  , Ξ  to form timeliness Quantifying conciseness: The ratio of stop-words helps us approximate how much useful information is in an email.Thus we quantify conciseness as the ratio of non-stop-words.Suppose the content is   at time   with   stop-words, then the conciseness is 1 −

Relationships Between Multiple Objectives
The heatmaps in Figure 2 illustrate how different criterion relate to each other for different users.To construct these heatmaps, we consider seven distinct scores, namely insider score, outsider score, length of email content, effective length ratio of email content, receiving time, number of 2-paths, and their corresponding replying priority ranking.These scores are calculated for each email within the flow list denoted as   .Subsequently, we compute the correlation coefficient between the aforementioned objectives for each user denoted as   .
The graphs show how user behaviors changed before and after the Enron Scandal.We analyzed how different types of users prioritized replying emails of different types over time.For example, Kenneth Lay, the CEO of Enron, replied more quickly to outsider emails after the scandal broke out.On the other hand, Kim Ward, a Trader Manager, delayed responding to outsider emails.Marie Heard, an Enron lawyer, maintained a similar pattern of replies before and after the scandal.However, her boss, Stephanie Panus, responded less promptly to insider emails.
Our analysis reveals two noteworthy patterns: First, the attributes we use vary in how much they reveal and how well they match different users' needs, so there is no one-size-fits-all model for user behavior.Second, the relationships among attributes can shift over time depending on external events.For example, after the Enron Scandal, some users kept their email habits unchanged, while others changed their preferences drastically.
These observations motivate our approach.First, since the attributes represent different aspects of satisfaction that may conflict with each other, simply adding them up with weights will not work well.Therefore, we need a flexible model that can adapt to each user's situation.Second, we assume that unpredictable factors can affect users' preferences significantly over time.Hence, an online algorithm that can adjust to changing conditions is essential for re-ranking emails effectively.
Based on these observations, we develop an MRAC-based symmetric aggregation method to address these challenges.

MOSR DETAILS 5.1 Candidate Set Construction
5.1.1Graph construction and social distance calculation.In outsider closeness, we introduce distance  (  ,   ), which is the social distance between   and   .To calculate social distance, we need to first establish a social network graph between email addresses.Definition 5.1.For two email address   and   , suppose the number of emails   sent to   is    →  , while the number of emails   sent to   is    →  .If    →  ,    →  satisfy the pre-defined restriction, we could establish an edge between   and   .Note that, there would be more edges over time.
Here,  1 ,  2 ,  3 are the parameters we will use in experiments.After computing the graph between email addresses, we will measure the social distance  (  ).There are two options for the distance measurements for .We adopt two options to calculate the social  (  ,   ).The first one is the shortest distance between   and   , the second one is the number of 2-paths between   and   [27].We will further discuss these two options in the experiment part.In our experiment, the parameters we adopt are Suppose there is an email that is sent by   to   at time   , then we will have: • Unanswered emails in the inbox.At timestamp   ,   ∈  will be added to the candidates set of   , name it    .If   sends   and email in [  ,   +   ] or the current time stamp reaches   +   , it will be removed from    .• Follow-up emails after no response.Name the candidates set of   as    .On the second day after   ,   will be added to    .If   sends   email in [  ,   +   ] or the current time stamp reaches   +   , it will be removed from    .We compare the predicted candidate set  to the true candidate set Q of emails sent by the users.Since the candidate set may not cover the whole true set, we further analyze the composition of the undiscovered candidates in Q − .We found that around 35.5% of these undiscovered candidates have mutual connections with the sender.However, as the graph grows larger, adding more mutual connections as candidates will lower the precision.We also explore the impact of adding carbon copy recipients to the candidate set but found that it did not have any significant effect.The results of analysis over mutual connections are included in the Appendix for further exploration.

Loss
Our prediction process consists of two steps: generating a candidate set and re-ranking it.Given that our predicted candidate set may not align precisely with the scope of the ground truth, we calculate how much our prediction  deviates from the true value   by applying methods below: Definition 5.2.We define our candidates set  as a set of emails,  = { 1 ,  2 , ...}, and our prediction  as the ranking of our candidates.Then, for a given algorithm Ω, the predicted ranking y could be defined as y = [ Ω ( 1 ),  Ω ( 2 ), ...]  , in which,  Ω (  ) represents the predicted ranking of   .Suppose the predicted score for a candidate  is Ω(), then  Ω () =  |Ω() = Ω(e)  () .Here, Ω(e) is the vector of predicted scores for e under algorithm Ω, and Ω(e)  () follows Definition 3.1.
Definition 5.3.Suppose our predicted candidates set and results are  and , the ground truth candidates set and results are Q and   .Then for algorithm Ω, the loss between y and y m comes from two parts: the difference between ranking of discovered candidates  ∩ Q, and undiscovered candidates Q − .Then we define loss function as (5) Here   is the parameter we use to measure the cover rate of our predicted candidates set.We define y r as the ranking results of MOSR and y 1 , y 2 , ...y n as the ranking results of algorithms Ω 1 , Ω 2 , ...Ω  .
When   = 0, then the loss between   and   will only consider their ranking.When  grows larger, the importance of cover rate on candidates will be larger.

Training Process
To generate a ranking with email flows, we use the following steps: To reduce the loss with MRAC, we adjust the weights for each OWA operator.Unlike traditional machine learning methods, MRAC aims to match the current system and minimize the gap between predicted output and actual output.We think that email reply preferences may vary and are not fixed.Therefore, we need an algorithm that can track all the user's dynamics, so that it can continuously optimize the parameters to adapt to uncertainty.
The proof of Theorem 5.4 is in Appendix.

EXPERIMENTS 6.1 Dataset
In this paper, we use the Enron email dataset, which consists of 500K emails sent by 1K employees of the Enron Corporation.To better understand their priority preferences, we extract the job titles of 200 key employees from the web.We filter the dataset to include only emails from 1999 to 2002 and show its statistics in Figure3.We organize the dataset as a time flow to mimic a realworld email recommendation system.The data for 1999 is incomplete, so we exclude it from our analysis.In (b), we see the continuity of email activity each year.This is the percentage of days in a year when at least one email was sent or received.The activity rate was 100% in 2000 and 2001, indicating daily email communication.

Methods Compared
We evaluate different baselines for ranking email messages: Logistic Regression (MS-LR), AdaBoost (MS-ADA) [30], four rankers based on ordered weighted averaging (OWA), a simple time-based ranker, and our proposed method MOSR.Table 2 shows the average loss of each method, which measures how well they match user preferences.The OWA-based rankers have lower losses than MS-LR and MS-ADA, confirming that symmetric aggregation is better than asymmetric weighting.Furthermore, MOSR achieves the lowest loss among all methods and adapts to changing user preferences during re-ranking.Figure 4-5 compare the daily losses of our method and the baselines.The figures show a noticeable increase in loss for the baselines that use MSFT features in late 2001, when the Enron scandal broke out.This matches the observed shift in sentiment towards Enron at that time, as reported by [25].Figure 4-5 also reveal the variability of user preferences over time.In later sections, we will show how our algorithm MOSR can cope with such variations and maintain a stable performance regardless of external factors affecting user preferences.We provide full results in Appendix due to space limitations.

Non-stationary check
We use Figure 2 to show that many users have changing preferences over time.In this section, we compare how user preferences change as a group before and after the Enron scandal.Figures 6a and 6b show that the correlation factors between user preferences change significantly after the scandal.To quantify this change, we calculate     the non-stationary behavior coefficient  as || − || 2 2 , where  and  are the coefficient matrices of two subsets of data.Figure 6c shows how  changes between two pairs of dates: 01/01/00 vs 03/01/00 and 09/01/01 vs 12/01/01.We see that  increases after the scandal, which means that user preferences are more affected by external events like the scandal.To separate users with stable and unstable preferences, we set a threshold of  = 1.Users with  < 1 have stable preferences, while users with  ≥ 1 have unstable ones.Figure 6d shows that there are more users with unstable preferences after the scandal.
Our algorithm outperforms the baselines because it can adapt to unstable user preferences better than they can.The loss and NDCG metrics reflect how well our algorithm matches user preferences.Higher loss means lower match quality, while higher NDCG means higher match quality.Figures 7a and 7b show that the loss increases and NDCG decreases as  increases for both our algorithm and the baselines, but our algorithm has smaller changes than they do.This means that our algorithm is less affected by preference changes than they are.
To explain why MOSR is better at adapting to preference changes than the baselines, we conduct an additional experiment to study how non-stationary behavior affects loss.Figures 7a and 7b show how loss and NDCG vary with .They also show that when gets larger, MOSR loss rises more slower than its competitors in Figure 7a and NDCG falls slower than its competitors in Figure 7b.This suggests that MOSR can adjust to preference changes faster than its competitors. 1

Robustness check
To test the robustness of our method, we reduced the size of the dataset by random sampling.Figures 8 show how our method performed in these experiments.The results confirm that MOSR , which uses a symmetric aggregator with MRAC control, is better than the weighted sum aggregators (such as MS-LR and MS-ADA) at adapting to changing user preferences and maintaining a stable performance in the email re-ranking problem.Reducing the size of the dataset increases its variance and makes it more non-stationary.The experiments demonstrate that MOSR can effectively balance the trade-off between stability and adaptability over non-stationary behavior, resulting in a more robust performance than the baselines.We discuss two hyper-parameters: learning rate and distance measurement.The best learning rate is 0.99 and the best distance measurement is number of 2-paths.Due to the space limitation, the discussion over   is in Appendix.

Complexity Analysis
In this section, we analyze the complexity of our algorithm versus baselines.Suppose there are  email objects ,  email addresses, and the number of features is .In our model, each OWA could be regarded as an estimator.Suppose the estimators in our model and AdaBoost model are both .Then the complexity is: In Table 3,  refers to the complexity over feature construction.For MSFT-Logistic Regression (MS-LR) and MS-AdaBoost (MS-ADA) [30], to compute the global features HistIndiv and HistPair, the complexity is  () and  ( 2 ).Then, the feature construction for MS-methods are  () =  ( + 2 ) =  ( 2 ).For simple LR and ADA models, the complexities are  ( 2 ) and  ().Hence the complexities of MS-LR and MS-ADA are  ( 2 +) and  ( +).When adding new data-point to the model, MS-LR needs to recompute the whole model while MS-ADA only re-estimates the weights over estimators.Hence, their complexities are  ( 2 + ) and  ( + ), respectively.For OWA algorithm, to compute the closeness feature, we need to calculate social distance.If we adopt shortest distance as our measurement, the complexity will be  () =  ( 2 ), if we adopt number of 2-paths, the complexity becomes  () =  ().Hence, the complexity for OWA could be  ( + ).If we add one datapoint to OWA, we don't need to re-train the previous one, but we need to update the global features.So the complexity is  ( + ).
For timeline algorithm, we only compare the receiving/sending time of email entity, and decide the rankings based on time feature.So the complexity is  ().
For our model, we adopt OWA as our estimator and use MRAC to train the data.So our training complexity is  ( + ). () is decided by the social distance option, either  ( 2 ) or  ().When add a new data-point, the complexity is  ( + ).

CONCLUSION
In this paper, we addressed the email re-ranking problem as a recommendation task based on three criteria: closeness, timeliness, and conciseness.We argued that these criteria reflect user satisfaction, and thus cannot be combined by simple weighted sums.We designed MOSR (Multi-Objective Stationary Recommender), an online algorithm that uses MRAC (Model Reference Adaptive Control) to dynamically balance the criteria and adapt to preference changes.We evaluated MOSR on the Enron Email Dataset, a large-scale real-world collection of emails, and showed that it outperforms other baselines in terms of ranking quality, especially under non-stationary preferences.We also demonstrated that MOSR is robust to high variance in email characteristics and does not require content analysis, which could raise privacy concerns.Our work contributes to the field of email re-ranking by proposing a novel method that accounts for multiple objectives impacting user satisfaction and adapts to changing user needs over time.In this section, we discuss two important hyper-parameters:  and the distance measurement method. serves as the learning rate in MRAC methods, while the distance measurement method is used to estimate the social distance,  (  ,   ), between two users.The distance measurement method is crucial as it directly relates to the key objective of our approach, which is the calculation of closeness.
The parameter tuning results are in Table 6.Based on the tunning results, we choose  = 0.99 and distance measurement method as number of 2-paths.

A.5 Robustness Check
In this section, we present a comprehensive analysis of the robustness of our approach.Figure 12 clearly demonstrates that the results for OWAs are more stable than those for MSFTs.Additionally, our algorithm, MOSR , exhibits even greater stability compared to other OWA-related results.These results align with our conclusions that: (1) The email re-ranking problem involves conflicting satisfaction criteria, which makes weighted sum aggregators an ineffective solution.
(2) The use of an adaptive control model can further enhance the stability of OWA algorithms.

Figure 1 :
Figure 1: Figure (a) shows the flow chart of MRAC, consisting of Reference model, Process model, Controller and Adaption algorithm.Figure (b) shows the architecture of MOSR .The detailed training process is in 5.3.

4. 1 . 3
Conciseness.Conciseness measures the ratio of useful information in an email.

Figure 2 :
Figure 2: The upper graphs are the user behaviors before Enron Scandal happened, while the below graphs are after Enron Scandal happened.

( 1 )
Step 1: Use RIM quantifier  to generate OWA weights and construct multiple OWA operators Ω  .(2) Step 2: Construct a weighted sum aggregator for the generated OWA operators.The weight for Ω  is   ,  =1   = 1.(3) Step 3: Construct graph and update candidates set . (4) Step 4: Calculate the score and rank the candidates, with score y r =  =1   y i .(5) Step 5: Compute the loss of results in step 4, update the weights with MRAC and go back to step 3.

Figure 3 :
Figure 3: Email activity in the Enron Dataset by year.In (a), we see how many emails were sent and received each year.The data for 1999 is incomplete, so we exclude it from our analysis.In (b), we see the continuity of email activity each year.This is the percentage of days in a year when at least one email was sent or received.The activity rate was 100% in 2000 and 2001, indicating daily email communication.

Figure 4 :Figure 5 :
Figure 4: Cumulative loss curve by date on EnronA and En-ronB.This curve represents the sum of losses that occurred from the start of the period until a given date.Both datasets show steeper slopes in their curves after October 2001, which means that losses grew faster in the subsequent months.This coincides with the time of the Enron Scandal.

Figure 6 :
Figure 6: Comparison of users' preferences before/after Enron scandal, showing a preference shift due to external factors.
Scatter plot comparing different algorithms on non-stationary behavior coefficient  vs Loss.When the user behavior becomes more non-stationary, our loss will grow slower than the baselines.Scatter plot comparing different algorithms on non-stationary behavior coefficient  vs NDCG.When the user behavior becomes more non-stationary, our NDCG will drop slower than the baselines.

Figure 7 :
Figure 7: Scatter plot to show how non-stationary user preferences impact the performance.Each subfigure has an enlarged version on the right side, providing a closer look at the data.
(a) Robustness check on EnronA (b) Robustness check on EnronB

Figure 8 :( 1 ) 2 ) 3 )
Figure 8: The results of the robustness check on two downsampling datasets indicate the stability of the algorithm, demonstrating our stationarity.

Figure 10 :
Figure 10: The analysis for candidates set

Figure 11 :
Figure 11: Cumulative loss curve by date with different   over EnronA and Enron B

Table 1 :
Notations used in this paper  in the flow list   corresponds to an email in the set G, where   is either sent or received by the email address   .We will predict priority for each new item in the flow list to re-rank the emails.Definition 4.3 (Job level).We introduce a surjective function  to map   ∈  to , in which,  is the set of job levels, where  = [].For each   ∈ , we have   =  (  ).We denote  acting on  as