Fairness-Driven Private Collaborative Machine Learning

The performance of machine learning algorithms can be considerably improved when trained over larger datasets. In many domains, such as medicine and finance, larger datasets can be obtained if several parties, each having access to limited amounts of data, collaborate and share their data. However, such data sharing introduces significant privacy challenges. While multiple recent studies have investigated methods for private collaborative machine learning, the fairness of such collaborative algorithms has been overlooked. In this work, we suggest a feasible privacy-preserving pre-process mechanism for enhancing fairness of collaborative machine learning algorithms. An extensive evaluation of the proposed method shows that it is able to enhance fairness considerably with only a minor compromise in accuracy.


Introduction
The performance of machine learning (ML) models inherently depends on the availability of a large quantity of useful training data.In neural network learning, for example, recent studies have shown that accuracy can be improved considerably by having access to large datasets (Krizhevsky, Sutskever, and Hinton 2012).In many applications, however, data is scattered and held by multiple different parties that may be reluctant to share their information for multiple reasons such as commercial competition, privacy concerns, and in some domains even legal constraints.For example, the Health Insurance Portability and Accountability Act (HIPAA) in the United States places strict constraints on the ability of health care providers to share patient data (U.S.Department of Health and Human Services 1996).
As a result, a variety of methods have been recently proposed to allow multiple parties to collaboratively train ML models while preventing the disclosure of private information.Hereinafter, we refer to this setting as Private Collaborative Machine Learning (PCML).While these methods address a very similar problem, they are often associated with different (though overlapping) research domains including privacy-preserving data mining (Lindell and Pinkas 2000;Agrawal and Srikant 2000), privacy-preserving machine learning (Jayaraman et al. 2018), collaborative machine learning (Chase et al. 2017;Hu et al. 2019b), and federated machine learning (Yang et al. 2019).
These methods take two main approaches for preserving privacy.The first approach is based on perturbation (Agrawal and Srikant 2000); it incorporates noise to the training data in order to obscure sensitive information.The main limitation of this approach is that incorporating noise to the data may yield an inferior model.The second approach is based on Secure Multi-party Computation (SMC) (Lindell and Pinkas 2000).SMC achieves privacy protection by applying cryptographic techniques that enable several parties to perform a joint computation on private data that is distributed among them, where only the outcome of the computation is revealed to the parties, but no other information is exposed.In contrast to perturbation, SMC does not change the data, and therefore the output issued by such algorithms is identical to the output of their nonprivacy-preserving counterparts.However, since in many cases, a generic and perfectly secure SMC solution is infeasible (Domingo-Ferrer and Blanco-Justicia 2020), lower security requirements are typically accepted for the sake of higher efficiency (Du, Han, and Chen 2004).
Despite the growing number of studies proposing algorithms for PCML, the fairness of such algorithms was overlooked.Since many automated decisions (including which individuals will receive jobs, loans, medication, bail or parole) can significantly impact peoples' lives, there is great importance in assessing and improving the fairness of the decisions made by such algorithms.Indeed, in recent years, the concern for algorithmic fairness has made headlines.The evidence of algorithmic bias (see e.g.Angwin 2016) and the resulting concerns for algorithmic fairness have led to growing interest in the literature on defining, evaluating and improving fairness in ML algorithms (see a recent review in Pessach and Shmueli 2020).All of these studies, however, focused on centralized settings.
In this paper, we consider a learning setting similar to the PCML setting described above, where data is scattered among several parties, who wish to engage in a joint ML procedure, without disclosing their private information.Our setting, however, also adds a fairness requirement, mandating that the learnt model satisfies a certain level of fairness.
To address the new fairness requirement, we suggest a privacy-preserving pre-process mechanism for enhancing fairness of collaborative ML algorithms.Similarly to the pre-process fairness mechanism suggested in (Feldman et al. 2015), our method improves fairness through decreasing distances between the distributions of attributes in the privileged and unprivileged groups.As a pre-process mechanism, our approach is not tailored to a specific algorithm, and therefore can be used with any PCML algorithm.In contrast to (Feldman et al. 2015), our method was designed to allow privacy-preserving enhancements, which are obtained through SMC techniques.
Experimentation that we conducted over a real-world dataset shows that the proposed method is able to improve fairness considerably, with almost no compromise in accuracy.Furthermore, we show that the runtime of the proposed method is feasible, especially considering that it is executed once as a pre-process procedure.

Related Work
We organize the survey of relevant related work as follows.In Section 2.1 we survey related studies that deal with algorithmic fairness in centralized settings.In Section 2.2 we review literature on private collaborative machine learning.We conclude, in Section 2.3, with a discussion of recent research efforts on private and fair machine learning.We note that all of the latter studies focused on either a centralized setting or a distributed setting which is non-collaborative, while we consider here a distributed collaborative setting.

Algorithmic Fairness in Centralized Settings
In the algorithmic fairness literature, multiple measures were suggested.(The reader is referred to Pessach and Shmueli 2020; Verma and Rubin 2018, for a comprehensive review of fairness definitions and measures).The most prominent measures include demographic parity (Calders and Verwer 2010; Dwork et al. 2012) and equalized odds (Hardt, Price, and Srebro 2016).Demographic parity ensures that the proportion of the positive predictions is similar across groups.One disadvantage of this measure is that a fully accurate classifier may be considered unfair, when the base rates (i.e., the proportion of actual positive outcomes) of the various groups are considerably different.Moreover, in order to satisfy demographic parity, two similar individuals may be treated differently since they belong to two different groups, and in some cases, such treatment is prohibited by law.In this paper we focus on a variation of the equalized odds measure.This measure was devised by Hardt, Price, and Srebro (2016) to overcome the disadvantages of measures such as demographic parity.It was designed to assess the difference between the two groups by measuring the difference between the false positive rates (FPR) in the two groups, as well as the difference between the false negative rates (FNR) in the two groups.In contrast to the demographic parity measure, a fully accurate classifier will necessarily satisfy the two equalized odds constraints.Nevertheless, since equalized odds relies on the actual ground truth, it assumes that the base rates of the two groups are representative and were not obtained in a biased manner.It is important to note that there is an inherent trade-off between accuracy and fairness: as we pursue a higher degree of fairness, we may compromise accuracy (see, for example, Kleinberg, Mullainathan, and Raghavan 2017).
Fairness-enhancing mechanisms are broadly categorized into three types: pre-process, in-process and post-process.Pre-process mechanisms involve changing the training data before feeding it into the ML algorithm (e.g.Louizos et al. 2017;Feldman et al. 2015;Calmon et al. 2017;Zemel et al. 2013).In-process mechanisms involve modifying ML algorithms to account for fairness during training time (such as Kamishima et al. 2012;Zafar et al. 2017;Louizos et al. 2017;Agarwal et al. 2018).Post-process mechanisms perform post-processing to the output scores of the model to make decisions fairer (Corbett-Davies et al. 2017;Dwork et al. 2018;Hardt, Price, and Srebro 2016;Menon and Williamson 2018).The reader is referred to (Pessach and Shmueli 2020) for further information regarding different fairness-enhancing mechanisms.Note that the above mentioned papers focused on a centralized setting, in which all information is held by one party, and therefore did not have to deal with privacy considerations.

Private Collaborative Machine Learning
We consider a setting in which several parties wish to collaboratively train ML models on data that is distributed among them, while preventing the disclosure of private information.We refer to this setting as Private Collaborative Machine Learning (PCML).While many methods were proposed in the literature to address this setting (or a very similar one), they are often associated with different (though overlapping) research domains including privacypreserving data mining (Lindell and Pinkas 2000;Agrawal and Srikant 2000), privacy-preserving machine learning (Jayaraman et al. 2018), collaborative machine learning (Chase et al. 2017;Hu et al. 2019b), and federated machine learning (Yang et al. 2019).
The manner in which data is distributed among the collaborating parties is typically categorized as vertical, horizontal or mixed.In a vertical distribution, each party holds all records with a different subset of attributes, whereas in a horizontal distribution each party holds a subset of the records with all attributes.The mixed scenario refers to an arbitrary partition of the data among the collaborating parties.
When analyzing the privacy preservation of the protocol, it is common to distinguish between two types of adversaries: semi-honest and malicious.A semi-honest adversary is a party that follows the protocol properly, but tries to infer sensitive information of other parties from intermediate messages that are received during the protocol.A malicious adversary, on the other hand, may deviate from the prescribed protocol in attempt to learn sensitive information on others.In addition, such a malicious adversary can corrupt more than just one party, in order to increase the data at its disposal.The semi-honest model is often more realistic and very common in the literature.

Privacy and Fairness
While many recent studies have investigated algorithms for PCML, the fairness of such algorithms was overlooked.There were, however, several recent studies that incorporated both privacy and fairness considerations to ML algorithms in non-collaborative settings.Most of these studies have investigated the case of a centralized setting, in which all of the dataset is held by a single party (Jagielski et al. 2019;Xu, Yuan, and Wu 2019;Mozannar, Ohannessian, and Srebro 2020;Bagdasaryan, Poursaeed, and Shmatikov 2019;Huang et al. 2018;Cummings et al. 2019).The goal of those studies was to train an ML model over the centralized dataset, making sure that the released model and its future outputs are fair, as well as private, in the sense that one cannot infer from them (meaningful) information about individual data records of the dataset.
Few other studies have investigated a distributed setting which is non-collaborative in the following sense.Their distributed setting includes a "main" party that wishes to train a fair ML model over the data it holds, and a third party to which the sensitive attributes are outsourced.The motivation behind this setting is that while the sensitive attributes should not be exposed to the main party, they should still be used in the training process of the ML model (in a privacypreserving manner) to ensure that the resulting model is fair.To obtain this goal, these studies used either random projections (Hu et al. 2019a;Fantin 2020) or SMC techniques (Kilbertus et al. 2018).

The Proposed Method
We propose a private pre-process fairness-enhancing mechanism based on SMC for the PCML setting.Our solution assumes that data is horizontally distributed between parties and that adversaries are semi-honest and non-colluding.

Terminology and Notations
Let D be a dataset that is distributed horizontally among L parties, P , ∈ [L].(Hereinafter, if N is any integer then [N ] ∶= {1, . . ., N }.)They wish to engage in a fair collaborative ML classification process, without disclosing the private information that each of them holds.To that end, we develop herein a pre-process mechanism for enhancing fairness in distributed settings, and then we devise a privacy-preserving implementation of that mechanism, using SMC.
We proceed to introduce the basic notations that will be used throughout the paper.Other notations will be introduced later on.Let W represent a given population consisting of n individuals, and let A be a set of attributes or features that relate to each of those individuals.Then we consider a dataset D in which there is a row for every individual in W and a column for every attribute in A. We distinguish between three types of attributes (columns): • S represents a sensitive attribute (e.g., race or gender).
In this work we focus on a binary attribute S that attains one of two possible values, S ∈ {U, V }, where U means unprivileged and V means privileged.(By abuse of notation, we are using S to denote the attribute, as well as its values in different rows of the dataset.)• X stands for the collection of all non-sensitive attributes.
To simplify our discussion, we assume that X consists of one attribute.When X consists of several non-sensitive attributes, we will apply the same pre-process mechanism on each such attribute, independently.We shall also assume hereinafter that X is a numerical attribute.• Y is the binary class attribute that needs to be predicted (e.g."hire/no hire").
The set of rows, or W , is split in two different manners.The first split is as induced by the sensitive attribute S. Namely, where W U is the subset of all individuals in W that are unprivileged (i.e., S = U for them) and The other manner in which W is split is according to the distribution of the records of D among the L parties.Namely, W = ⊍ ∈[L] W , where W is the subset of individuals whose information is held by the party Finally, we let n S denote the size of W S ∶= W S ⋂ W , namely, the number of individuals in W S whose records are held by (1) We shall adopt these superscript and subscript conventions hereinafter.Namely, a superscript S will denote a restriction to the subset W S , S ∈ {U, V }; a subscript will denote a restriction to the subset W , ∈ [L]; a combination of the two will denote a restriction to W S ; and no superscript S nor subscript relates to the entire population W .
In addition, we let D(X) denote the collection of all values appearing in the X-column of D. Similarly, D S (X), D (X) and D S (X) denote the collection of all values in the X-column of D, restricted to the rows in W S , W or W S , respectively.We assume that D S (X), for any S and , are multisets; namely, they may contain repeated values.
The rest of Section 3 is organized as follows.In Section 3.2 we introduce our pre-process fairness-enhancing mechanism.We do that for a centralized setting, in which all information is held by one party (i.e., L = 1); in such a setting privacy is irrelevant.Then, in Section 3.3 we show how to implement that fairness mechanism in the (horizontally) distributed setting, L > 1, in a manner that offers privacy to the interacting parties P , ∈ [L].

A Pre-Process Fairness-Enhancing Mechanism
Inspired by Feldman et al. (2015), we devise a methodology that improves fairness through decreasing distances between the distributions of attributes of the privileged and unprivileged groups, in a pre-process stage.The goal in performing such a repair is to reduce the dependency of the ML model on the group to which an individual belongs, even when that dependency is not a direct one but rather an indirect one through proxy variables.That is, we reduce the ability to differentiate between groups using the presumably legitimate non-sensitive variables.In contrast to Feldman et al.
(2015), our method is designed specifically to be suitable for privacy-preserving enhancements (see Section 3.3).
To illustrate the intuition behind the proposed method, consider the example depicted in Figure 1.The figure illustrates a case in which SAT scores of individuals are used to predict their success (or failure) in a given job.The plot on the left shows the distribution of SAT scores, within the two sub-populations in the dataset -the privileged group and the unprivileged group.The plot on the right shows the job success rate as a function of the SAT score, within each of those two sub-populations.
In this example, SAT scores may be used to predict success in a given job, since the higher the SAT score is, the higher is the probability for success.However, relying solely on the SAT scores, while ignoring the group to which the candidates belong, may create an undesired bias.More specifically, as can be seen from the figure, unprivileged candidates with SAT scores of approximately 1100 perform just as well as privileged candidates with SAT scores of 1200 (could be because they may have encountered harder challenges along their way for achieving their scores).Therefore, if SAT scores were used for hiring, say by just placing a threshold, unprivileged candidates with high potential would be excluded, whereas lower potential privileged candidates would be hired instead.The goal of the pre-process mechanism that we suggest herein is to rectify this bias.
For each group S ∈ {U, V }, we first partition the multiset of values in D S (X) into (nearly) equal-sized bins of sequential values.Namely, if we let B denote the number of bins, then we partition D S (X) into D S (X) = ⊍ B i=1 b S (i) , where the following two conditions hold:1 (a) for all i ∈ [B − 1], β ∈ b S (i) and β ′ ∈ b S (i+1) , we have β ≤ β ′ ; and (b) the bins are of (nearly) equal sizes, in the sense that the sizes of the bins (viewed as multisets) are either ⌊n S B⌋ or ⌈n S B⌉.These two conditions simply state that the resulting bins are the B-quantiles of D S (X).Below we provide the precise manner in which those bins/quantiles are computed.
After completing the binning process, we compute the minimal values in each of the B bins, for each group S ∈ {U, V }, and also the maximal value in the last bin, Those values enable us to perform a repair of D V (X), as follows: for every bin b we shift all values in it "towards" the values in the corresponding bin in D U (X), according to the following repair rule: Here, x is an original value extracted from the bin b V (i) , while x is its repaired value.This repair procedure represents a linear mapping that performs min-max scaling of the values in each bin of the privileged group to the range of values in the corresponding bin in the unprivileged group.The computation brings the distributions of both groups closer.
The repair tuning parameter λ ∈ [0, 1] controls the strength of the repair.If λ = 0 we perform no repair, since then x = x.If λ = 1 then we get a full repair, since then all values in b V (i) are replaced with values in the range of the corresponding bin b U (i) , while keeping a similar distribution as the original ones in b V (i) .Note that if all values in b V (i) are equal, then both the numerator and denominator in the fraction on the right hand side of Eq. ( 4) are zero.In such a case we interpret that fraction as 1 2. Such a setting implies that all the (equal) values in b V (i) will be mapped to the same value in the middle of the range of the corresponding bin b U 2. At the completion of the repair pre-process procedure, an ML model is trained with the repaired dataset.
To conclude this section, we provide the formal details of the above-described binning scheme.Let us first order all values in D S (X), the size of which is n S , in a nondecreasing manner, as follows: Next, we will define a set of indices in the ordered multiset D S (X) in the following manner: To that end, let q S and r S be the quotient and remainder, respectively, when dividing n S by B; i.e., n S = q S ⋅ B + r S , where r S ∈ [0, B).Then the sequence of indices k (i) , i ∈ [B], is defined by the following equation, It is easy to verify that k (B) = n S .Finally, the bins in D S (X), Eq. ( 5), are defined by K S as follows: We see, in view of Eq. ( 8), that the size of the first r S bins is q S + 1, while all other bins are of size q S .Moreover, m S (i) = min{b S (i) } (Eq.( 2)) equals the (k (i−1) + 1)-th ranked element in D S (X), for all i ∈ [B] and S ∈ {U, V }, while m S (B+1) ∶= max{b S (B) } (Eq.( 3)), equals the maximal element in D S (X), S ∈ {U, V },

An SMC Algorithm for Enhancing Fairness
In Section 3.2 we described our fairness-enhancing mechanism, when all of D is held by a single party (the centralized setting).We now revisit that algorithm, and devise a secure implementation of it, given that the dataset D is horizontally distributed among L parties, as described in Section 3.1.
The steps that pose a challenge when privacy is of concern are the first two: dividing the non-sensitive attribute into equal-sized bins for each group S ∈ {U, V }, i.e., D S (X) = ⊍ B i=1 b S (i) , and then computing the boundaries of those bins, see Eqs. ( 2)-( 3).Performing those computations poses a challenge in the distributed setting, since they depend on data that is distributed among the L parties and cannot be shared due to privacy concerns.
The third step, preforming the repair, poses no problem since it can be carried out by each party locally, independently of others, once the computed bin boundaries m S (i) , i ∈ [B + 1], S ∈ {U, V }, become known to each of the parties.Even though in that step the parties do not collaborate, they must agree upfront on λ (the repair tuning parameter), see Eq. ( 4).Note that at this step, every P , ∈ [L], repairs the values of D V (X) that it possesses, but it does not share the repaired values with anyone else.
As for the fourth and last step, training an ML classifier, the parties need to learn a classifier on distributed data, without sharing that data.Since this is, again, a computational problem that involves all of the distributed dataset, privacy issues kick in.However, for such problems of privacypreserving distributed ML classification there are existing SMC-based solutions, (e.g.Lindell and Pinkas 2000;Slavkovic, Nardi, and Tibbits 2007;Samet 2015;Kikuchi et al. 2016;Fienberg et al. 2006).
Therefore, we focus on the problem of privacy-preserving binning of a distributed dataset, D S (X).We assume that all L parties, P 1 , . . ., P L , agreed upfront on the number of bins, B, and that they know the overall size of the dataset in each group, n S , S ∈ {U, V }.As n S = ∑ ∈[L] n S and n S is known to P , ∈ [L], then n S may be computed by a secure summation sub-protocol (Clifton et al. 2002;Benaloh 1986;Shi et al. 2011).
The privacy-preserving version of our fairness-enhancing mechanism assumes performing computations over integer values.To be able to handle real values, we apply the following simple procedure to convert real values into integers (with some precision loss), prior to executing our mechanism.If we are interested in preserving a precision of d digits after the decimal point, we multiply each real value by 10 d and round the resulting value to the nearest integer.After executing our mechanism, we divide the values by 10 d .
Note that hereinafter, whenever we speak of D(X) (or D (X), D S (X), D S (X)), we consider the values in that sensitive attribute after they were converted to integers, as described above.We also assume that all parties know a lower and upper bound on the values of D(X), denoted α and β respectively.Namely, all entries in D(X) (after they were converted to integers), are within the interval [α, β].
As n S and B are both known to all parties, everyone can compute the sequence K S of increasing indices in the range [0, n S ], as defined in Eqs. ( 6)-( 7).Therefore, it is needed to compute, in a privacy-preserving manner, the (k + 1)-th ranked element in D S (X) for every k = k (i−1) and i ∈ [B], S ∈ {U, V }.An SMC protocol for the privacy-preserving solution of this computational problem was introduced by Aggarwal, Mishra, and Pinkas (2010).
The solution in Aggarwal, Mishra, and Pinkas (2010) relies on standard cryptographic building blocks of secure comparison (Yao 1982) and secure summation.Secure comparison can be applied using generic protocols for SMC, such as (Goldreich, Micali, and Wigderson 1987;Franklin and Yung 1992;Beaver, Micali, and Rogaway 1990), while secure summation is a very simple computation, see e.g.(Clifton et al. 2002;Benaloh 1986).Hence, we can implement the protocol in (Aggarwal, Mishra, and Pinkas 2010) on top of standard libraries (such as Damgård et al. 2009).
The protocol in (Aggarwal, Mishra, and Pinkas 2010) takes an iterative approach.Assume that the parties wish to find the value of the k-th ranked element, for some publicly known k.They perform that search iteratively, applying a binary-search approach.As α and β are the known lower and upper bounds on all values in D(X), the first "guess" for the value of the k-th ranked element is g ∶= ⌊(α + β) 2⌋.Each P , ∈ [L], counts how many elements in D S (X) are smaller than g.Then, by applying a secure summation subprotocol, combined with a secure comparison sub-protocol, they find out whether the number of elements smaller than g in the unified dataset D S (X) is smaller than k or not.Based on the result of this iteration, the range is trimmed and a new guess is computed in the next iteration.

Evaluation
We evaluated the proposed method using the real-world publicly available ProPublica Recidivism dataset.This dataset includes data from the COMPAS risk assessment system (see Angwin 2016;Larson et al. 2016).The dataset includes 10 attributes, such as the number of previous felonies, charge degree, age, race, gender etc., and it has 6167 individual Similarly to many studies on algorithmic fairness, we use accuracy as a measure of prediction performance.Accuracy is measured by the proportion of correct classifications.For measuring fairness, we consider a measure based on equalized odds.As mentioned in Section 2.1, one advantage of this measure (in contrast, for example, to demographic parity), is that a perfectly accurate classifier will be considered fair.More specifically, we define the unfairness measure φ = DF N R + DF P R , where DF N R (resp.DF P R ) is the absolute difference between the FNR (resp.FPR) of the two groups, and FNR (resp.FPR) is the False Negative Rate (resp.False Positive Rate).While the equalized odds measure dictates two separate measures, we use the sum of their absolute values in order to obtain a single combined measure.Such a combined measure will allow us an easier examination of the fairness-accuracy trade-off at later stages, where higher values of φ indicate lower levels of fairness (or higher levels of unfairness).To better understand the repairing mechanism of our method, we also measure the distances between the distributions of attributes within the two groups.We do so by computing the earth mover's distance (EMD) (Rubner, Tomasi, and Guibas 1998), divided by M (size of the range of possible values of the attribute).
For our experiments, we examined varying values of parameters, as follows: Number of bins: B ∈ {1, 2, 3, 4, 6, 8, 10}; Repair tuning parameter: λ ∈ {0.1, 0.2, . . ., 0.9, 1}; and Number of parties: in the majority of our experiments we applied a procedure based on three parties (L = 3), whereas for the sake of measuring runtimes, we used higher numbers of parties (L ∈ {3, 4, 5, 6, 7}).Our ML classifier was logistic regression.In all experiments we used a fixed number of d = 4 digits after the decimal point, and the same ratio in splitting the dataset into train set (66.7%) and test set (33.3%).Splits were repeated 10 different times in a random manner, and the reported results are the average and the 90%-confidence interval over these 10 repetitions.In each of the 10 repetitions we used the number m of the repetition, m ∈ {0 . . .9}, as a random seed2 for shuffling the dataset records and creating a new train-test split.
Our method was implemented in Python3 , assisted with the VIFF library for secure multi-party computations (see Damgård et al. 2009).For our purpose herein -to examine the effects of our pre-process method, we used a simple non-distributed non-private implementation of the ML algorithm 4 .All experiments were executed on a server running Windows Server 2008 R2, having two 6-cores CPU proces-sors with a clock speed of 1.9GHz, and 128GB of RAM.
Results.We first evaluated the effect of our method on unfairness and accuracy.In order to do so, we executed the method using varying values of λ and B, as mentioned in the previous section, and measured the resulting unfairness and accuracy values.To better understand the repairing mechanism of our method, we also measured the resulting distance between the distributions of attributes within the two groups.Recall that accuracy was measured by the proportion of correct classifications, unfairness by φ, and distance by EMD.
Figure 2 shows the effect of the repair tuning parameter λ on the three considered measures for the real-world dataset.Each chart represents a different measure: distance (left), unfairness (center) and accuracy (right).In each chart, the xaxis represents λ, while the y-axis represents the value of the corresponding measure.Colors and line thickness represent the value of B, where thicker lines represent higher values of B. Note that the distances are calculated for each attribute separately and are then averaged over the set of attributes in each dataset.For all three measures, the reported results represent an average over 10 train-test splits.The vertical bars represent a 90%-confidence interval.As can be seen from the figure, by using the proposed method with higher values of λ, it is possible to improve fairness considerably with only a minor compromise in accuracy.The considerable reduction in unfairness is a result of reducing the distances between the distributions of attributes within the two groups, as shown in the left chart in Figure 2.For example, unfairness is reduced from 0.29 (λ = 0.0) to 0.08 (λ = 1.0) -a reduction by 72%; this is achieved with almost no compromise in accuracy (a decrease of less than 1%).
Figure 3 presents a similar analysis to the one presented in Figure 2 to assess the effect of the number of bins B on the three considered measures.Here, the x-axis of each chart represents the value of B, while the y-axis represents the value of the corresponding measure.Colors and line thickness represent the value of λ, where thicker lines represent higher values of λ.As can be seen from the figure, when increasing B, unfairness is reduced with only a minor compromise in accuracy.However, increasing B beyond B = 3 has almost no effect on all three measures.In particular, it barely contributes to the decrease in unfairness.For example, using 10 bins with λ ∈ {0.9, 1} obtains about the same results as with only 3 bins.The latter analysis indicates that it is preferable to use a small number of bins, around B = 3. Larger number of bins does not contribute towards enhancing fairness, but it does entail higher computational and communication costs, as well as increased leakage of information .
We then turned to evaluate the efficiency of our method, as reflected by its runtimes.Figure 4 shows the effect of B (the number of bins) and L (the number of parties) on the runtime of our method.Specifically, we report the runtime for the secure distributed computation of bin boundaries, for both of the groups.The x-axis represents L, while the y-axis represents runtime in minutes.Line colors represent the number of bins.For clarity, we present the runtime for repairing one non-sensitive attribute -the "prior count" atwith the parameters: penalty = " 2" and max iter = 1000.tribute.The figure shows that the runtime depends linearly on L and linearly on B. It is essential to note that the runtimes of the proposed method (as demonstrated in Figure 4) are practical, especially considering that it is applied once.
To conclude, in the above experiments we showed that our method is able to improve fairness considerably with only a minor compromise in accuracy, despite their inherent tradeoff.We further showed that privacy is highly maintained and that the information leakage is minimal, especially as it appears that low values of B suffice for fairness sake.Finally, we showed that the runtime of the proposed method is feasible for a one-time pre-process procedure.

Conclusions
We proposed herein a privacy-preserving pre-process mechanism for enhancing fairness of collaborative ML algorithms.Our method improves fairness by decreasing distances between the distributions of attributes of the privileged and unprivileged groups.We use a binning approach that enables the implementation of privacy-preserving enhancements by means of SMC.As a pre-process mechanism, our method is not limited to a specific algorithm, and hence it can be used with any collaborative ML algorithm.
The evaluation that we conducted, using a real-world dataset, revealed that the proposed method is able to improve fairness considerably, with only a minor compromise in accuracy.We also observed that using a small number of bins (e.g., B = 3), it is possible to achieve that considerable improvement in fairness, with very minor and benign leakage of information.Finally, we demonstrated that the runtime of the proposed method is practical, especially considering that it is executed once as a pre-process procedure.
As future research, we suggest performing a broader analysis on the effect of the different parameters on the mechanism's computational costs.We also suggest to generalize the techniques presented here to handle non-binary sensitive attributes, such as age, with different age groups, race, or residential area.In addition, the current technique handles only numerical non-sensitive attributes; extending the technique to cope also with categorical attributes is in order.Finally, while we focused here on a horizontal distribution, a comprehensive discussion should include a more general distribution framework.Collaborative machine learning is encountered in many application scenarios, and questions of privacy and fairness naturally arise in all of them; hence, we see the current study as a first step in a long quest.

Figure 1 :
Figure 1: The distribution of SAT scores within each of the two sub-populations (left), and the success rate as a function of the SAT score, within each sub-population (right).

Figure 2 :Figure 3 :
Figure 2: The effect of the repair tuning parameter λ on distance, unfairness, and accuracy.Distance Unfairness Accuracy

Figure 4 :
Figure 4: The effect of parameters on runtime.