The Conflict Between Algorithmic Fairness and Non-Discrimination: An Analysis of Fair Automated Hiring

AI-based automated hiring systems cover a wide range of tools of varying complexity, from resume parsing tools to candidate selection models. Their close interference in economic and social life faces raising demands and investigations aiming to reduce the potential discrimination they may cause. This article covers the intersection of EU non-discrimination law and algorithmic fairness in the context of automated hiring systems. The paper analyzes the balance between equality of opportunity (formal and substantive) and equality of outcome, critiques the focus on non-conservative group fairness in machine learning, and discusses the legal implications of automated hiring systems under EU law. Additionally, it highlights often committed fallacies in relation to the process of de-biasing and advocates for a broader understanding of fairness in machine learning that aligns with EU legal standards and societal values.


INTRODUCTION
Automated decision-making systems are increasingly being used to render high-impact decisions regarding human beings.Due to the many concerns about the potential societal impacts of machine learning, governments are beginning to put forward policy positions and draft regulations.In the AI Bill of Rights, the White House states that automated decisions should be designed and deployed in an equitable way [63].In Europe, the AI Act states that automated decisions should not perpetuate historic patterns of discrimination or create new forms of discriminatory impacts [29].To a limited extent, researchers in the field have understood that they had not happened upon an empty field (of research) but instead a garden that has been fostered, cared for, and in some cases ignored for a very long time.The garden is that of Justice.While perspectives from many domains on the concept of distributive social justice have been incorporated into the algorithmic fairness literature: egalitarian philosophies of distribution [7,10,42], socio-technical critiques of technological solutionism [18,57], and concepts from feminist communications and data science like the myth of objectivity and meritocracy [26,35]; the contentions between distributive justice and non-distributive justice, between comparativist and non-comparativist conceptions of discrimination, between egalitarianism and individualism, and between equality of outcome and equality of treatment have largely gone underdeveloped.
In an effort to increase viewpoint diversity and protect the fundamental rights of individuals in the EU, this article contributions are twofold.First, the article applies European Union (EU) nondiscrimination law to a highly exemplative use case: algorithmic fairness in automated hiring systems.To achieve this goal, we: describe the policies of equality of opportunity (both formal and substantive) and equality of outcome, making the legally relevant distinctions therein (2); put forward and analyze the related caselaw of the Court of Justice of the European Union (CJEU), extracting the legal rules (3); and examine the process of de-biasing in light of the rules surrounding the use-case, finding that real-world, unlawful discrimination is likely taking place (6).
The second contribution is a broader effort carried out concurrently with the first, to dispel common misunderstandings that lead to harmful effects (of which the unlawful discrimination of fair automated hiring is an example), we: attempt to surmise the predominant approach to algorithmic fairness through an examination of individual, causal, and group fairness-finding that group fairness is the only substantive approach to fairness that has been put forward (4); explain the trade-off between generalizable outcomes and group similar outcomes, arguing that the realization of that trade-off means that de-biasing in accordance with an independence-based fairness metric is akin to the use of quotas (5); and identify oft committed fallacies that result as a failure to realize the trade-off (5).

BACKGROUND
In employment decisions, equality is sought in opportunity or outcome.Equality of opportunity can be defined as formal or substantive.Formal equality of opportunity requires that applicants be assessed according to their qualifications, that those qualifications be appropriate, 1 and that the most qualified applicant receives the position [8].Selection processes that enforce formal equality of opportunity result in inequalities in outcome between groups when the individuals of a given group are, on a whole, less qualified than another in a given field. 2While substantive equality of opportunity requires all that formal equality of opportunity insists upon during a selection process, it is first and foremost an effort to ensure that each individual in society, regardless of their group membership, has the same opportunities to gain the prerequisite qualifications for positions so that differences between groups are minimal or nonexistent [1,2,6].Equality of outcome, sometimes referred to as equity in social justice literature, requires group equality or similarity in results, irrespective of the differences individuals of those groups may have in terms of qualifications for a given position-generally for the purpose of providing a shortcut from opportunity to representation when substantive equality has not yet been fully realized [19].
In the European Union, positive action is an umbrella term used to describe soft measures like the voluntary pruning of facially neutral employment criteria that may lead to disparate impacts, mainstreaming initiatives, accommodations, the use of impact assessments, and outreach programs [17] for achieving substantive equality of opportunity for members of groups that deal with the consequences of past or present discrimination or disadvantage so that they may compete on an equal footing with others; whereas positive discrimination is a term used to describe strong measures for achieving equality of outcome through preferential treatment when, in a given field of employment, members of the discriminated against or disadvantaged group are not yet, on the whole, equally qualified. 3Positive discrimination in employment decisions is controversial, and the practice has been repeatedly restrained by the Court of Justice of the European Union (CJEU) whenever employment selection processes move from the goal of ensuring formal and substantive equality of opportunity into the pursuit of equal outcomes.The CJEU case-law pertaining to positive discrimination in employment has been settled for nearly two decades, and legal scholars have repeatedly concluded that the CJEU "systematically rejects" selection processes that turn towards equality of outcome [11,51,59].
Automated hiring systems based on machine learning are becoming increasingly commonplace, concerns about algorithmic indirect discrimination in employment decisions are front-and-center, and the technical solutions provided by the research community often systematically deviate from the principle of equal treatment to 1 Where appropriateness is defined in relation to moral relevance [7] or to the lawfulness of desiderata in accordance with indirect discrimination doctrine [24]. 2 See e.g.Plato's Laws discussing the notion of equality, "[W]hen equality is given to unequal things, the resultant will be unequal . .." [53]; see also Hayek on Equality, Value and Merit, "From the fact that people are very different it follows that, if we treat them equally, the result must be inequality in their actual position . . ." [40]. 3See [59] for a detailed explanation of the difference between positive action and positive discrimination in EU non-discrimination law; see also [11].But see e.g.[37,60] for the conflation of the two distinct concepts in the algorithmic fairness literature.combat disparate impacts. 4Legal scholarship on algorithmic discrimination has predominately focused on analyzing the training data of automated systems for features that, if used, may constitute direct or indirect discrimination and the corresponding decisions of those systems for disparate or adverse outcomes [61,67].Indirect or "covert" discrimination is understood in contrast with direct or "overt" discrimination [66] and is fundamentally aimed at achieving substantive equality [25]. 5Indirect discrimination takes place when a neutral provision, criterion, or practice results in a disparate impact on a protected group, "unless that provision, criterion or practice is objectively justified by a legitimate aim and the means of achieving that aim are appropriate and necessary" [4].Thus, using a hiring criteria that causes a disparate impact is not automatically discriminatory.Instead, such criterion are only discriminatory if the principle of proportionality is violated.The proportionality test, therefore, "opens the path for the legality of using a factor that correlates with economically or otherwise favorable traits even though the choice of that factor also leads to the unfavorable treatment of a protected group" [37].
Whether a legal analysis determining the lawfulness of using features that result in a disparate impact based on a protected attribute is performed in practice by designers of automated hiring systems is difficult to know and beyond the scope of this article.It is clear, however, that designers of these systems are aware that criteria which cause a disparate impact based on a protected attribute can potentially be deemed discriminatory [9,31].Their solution: require equality or similarity in employment outcomes [54,65].For example, authors in [54] investigated automated hiring systems and found that a number of the commercially available systems for preselection either remove or curate the training data that produce a disparate or adverse impact or modify the objective function of the learning algorithm to achieve the same result, often in accordance with the disparate impact metric [54].

DISTINGUISHING POSITIVE ACTION FROM PREFERENTIAL TREATMENT IN HIRING
There are a number of equality directives in EU law [2][3][4][5].Each directive is an embodiment of the principle of equal treatment.Equal treatment means that there shall be no discrimination whatsoever, either directly or indirectly, based on the protected attribute laid out in a given directive.However, each equality directive moves from formal to substantive equality of opportunity by allowing Member States to adopt special measures to prevent or compensate for disadvantages linked to the protected attribute.Thus, while the exception to the individual right of equal treatment must be interpreted strictly, measures which take advantage of the derogation, while discriminatory in appearance"are in fact intended to eliminate or reduce actual instances of inequality which may exist in the reality of social life" [19][20][21][22][23].For instance, in Badeck, the Court drew a distinction between training opportunities and employment opportunities [22].To what extent the exceptions of other equality directives will be treated similarly is a matter of academic debate [51].Second, while the CJEU has held that direct horizontal effect can be found in the relationship between equality directives and the Charter of Fundamental Rights, that legal issue will not be discussed here [24].As the reader moves through the following case-law, bear in mind the distinction between soft positive action measures implemented to provide substantive equality of opportunity, like the ones described above, with strong positive discrimination measures implemented to provide equality of outcome, such as the ones under review in the following cases.
In Kalanke, two candidates were shortlisted for the position of Section Manager at the Bremen Parks Department in Germany.Mr. Kalanke, one of the two candidates, had a diploma in horticulture and landscape gardening, had worked for the Parks Department since 1973, and had been acting as the permanent assistant to the previous Section Manager before the position was vacated; Ms. Glibmann, the other candidate, also had a diploma in landscape gardening, granted in 1983, and had worked in the Parks Department as a horticultural employee since 1975.The Parks Department management put forward Mr. Kalanke for the position, but the Staff Committee refused its consent to his promotion.The Staff Committee refused its consent for the promotion of Mr. Kalanke in accordance with the Bremen Law on Equal treatment for Men and Women in the Public Services (LGG) passed in 1990, which stated that women who have the same qualifications as men, applying for the same post, are to be given priority where women are underrepresented in the sector.Mr. Kalanke was successful in arbitration, but the Staff Committee appealed to the conciliation board where the two candidates were found to be equally qualified and priority was to be given to Ms. Glibmann.The case made its way through the labor courts, and eventually the Bundesarbeitsgericht sought a preliminary ruling from the CJEU clarifying the scope of the exception under Article 2 (4) of the Directive from the principle of equal treatment.
The Court began by stating that the purpose of Directive 76/207 is to put into effect the principle of equal treatment for men and women regarding access to employment and promotion within Member States, and that the principle of equal treatment means that there shall be no discrimination whatsoever, either directly or indirectly, on the grounds of sex.The exception under Article 2 (4) of the Directive 76/207 permits national measures which, although discriminatory in appearance, are intended to eliminate or reduce actual instances of inequality and consequently give a specific advantage to women with a view to improving their ability to compete on the labor market and to pursue a career on an equal footing with men.Since Article 2(4) is a derogation from an individual right, the Court determined that the exception must be strictly interpreted.The Court found that national rules that guarantee women automatic and/or absolute and unconditional priority go beyond promoting equal opportunities and overstep the limits of the exception.The Court reasoned that such measures take a shortcut from ensuring substantive equality in fact to mere equality in outcome: "Furthermore, in so far as it seeks to achieve equal representation of men and women in all grades and levels within a department, such a system substitutes for equality of opportunity as envisaged in Article 2(4) the result which is only to be arrived at by providing such equality of opportunity." Thus, the Court ruled that Art.2( 1) and ( 4) of the Directive 76/207 precludes national rules which automatically and/or absolutely and unconditionally give priority to women in sectors where they are underrepresented.
Turning next to Marschall v. Land Nordrhein-Westfalen [20]: in 1994, a teacher named Mr. Marschall applied for a promotion to an open position at a German comprehensive school.In response, Mr. Marschall was informed that, in accordance with the civil service law of the Land, a female candidate of equal suitability, competence and professional performance was to be appointed to the position because there were fewer women than men in that particular grade post in the career bracket.Mr. Marschall brought legal action.The Administrative Court of Gelsenkirchen found that the outcome of the case was dependent on the compatibility of the Land's provision with Art.2(1) and ( 4) of Directive 76/207 and so a preliminary ruling was sought from the CJEU.
The Court began by distinguishing the case from Kalanke.Unlike in Kalanke, the provision in question contained a 'savings clause' that stated that where an individual male candidate had qualifications that might tilt the balance in his favor, a female candidate would not be given priority.After citing the third recital in the preamble to Recommendation 84/635/EEC on the promotion of positive action for women [1], which highlights the need for positive action to counteract prejudices that arise in the employment context due to social attitudes, behaviors, and structures, the Court agreed with the Land and other governments that, even when candidates of the opposite sex are equally qualified, male candidates tend to be promoted in preference to female candidates because of a multitude of stereotypes.Thus, ". . . the mere fact that a male candidate and a female candidate are equally qualified does not mean that they have the same chances."The Court reasoned that a national rule may be lawful under Article 2 (4) if, in each individual case: "it provides for male candidates who are equally as qualified as the female candidates a guarantee that the candidatures will be the subject of an objective assessment which will take account of all criteria specific to the individual candidates and will override the priority accorded to female candidates where one or more of those criteria tilts the balance in favour of the male candidate.In this respect, however, it should be remembered that those criteria must not be such as to discriminate against female candidates" [emphasis added].
Thus, the Court ruled that a national rule which, conditional on a guarantee that the candidatures will be subject to an objective assessment on an individual basis and where that objective assessment tilts in the favor of a male candidate the priority will be overridden, provides a priority to female candidates who are equally qualified, with the purpose of counteracting prejudiced tie-breaking, is compatible with Art.2(1) and ( 4 Anderson and Ms. Fogelqvist was not so great as to violate the requirement of objectivity in the selection process.Mr. Anderson and Ms. Abrahamsson brought legal action that eventually came before the Överklagandenämnden för Högskolan, and a preliminary ruling was requested from the CJEU. The Court held that national rules which give a priority to candidates of an underrepresented sex who possess sufficient qualifications for a given post over a candidate of the opposite sex who would have been appointed otherwise on the basis of merit, are precluded under Article 2(1) and (4) of Directive 72/207 and Article 141(4) EC even if the difference between the candidates' qualifications are not so great as to breach the requirement of objectivity.The Court also ruled that national legislation which limits the scope of positive discrimination to a predetermined number of posts, or to posts specifically designed for that purpose, is still precluded because of the absolute and disproportionate nature of the positive discrimination practice.
Recap: in the employment context, the CJEU has stated that special measures are derogations from the principle of equal treatment and thus need be proportional.Absolute and unconditional preferences are always automatic, but not all automatic preferences are absolute and unconditional.Absolute and unconditional preferences are disproportional because such preferences make the protected attribute the key criterion when comparing candidates between sub-groups of the attribute.Automatic preferences, on the other hand, have the potential to be proportional.For instance, in the case of a tie-breaking scenario between two equally qualified candidates for the purpose of combating stereotypes.For an automatic preference to be proportional the candidates must be subject to an objective assessment, ensuring that where candidates are not equally qualified, the preference will be overridden.

THE TOOLS
Fairness metrics are definitions of equality formulated mathematically, and they are commonly split into three categories: group fairness, causal fairness, and individual fairness.In this section we will make use of the following notation: Y is the target variable, Ŷ is the predicted variable, X are the features,  is the threshold and S denotes the protected variables.

Individual Fairness
The mantra of individual fairness is that similar individuals should be treated similarly.The maxim of similar treatment that individual fairness embodies is an Aristotelian principle of consistency [13].The individual fairness definition states that there should be consistency between the relevant features of two different persons and their respective outcomes in comparison to one another.More specifically, the similarity between the features of two individuals (measured as a distance) should be preserved between their respective labels. 6ote that the principle of consistency, defined as distance between spaces, could be used to detect whether there exists inconsistencies between the relevant features and ground truth of the sample, as well as when there exists inconsistencies between the sample and the outcomes.For instance, the inconsistencies could be seen as an indicator of unreliable data collection processes where data was incorrectly reported, or that the data sample is missing a set of uncollected features that could explain the current inconsistency.We emphasize, however, that the individual fairness metric itself is not concerned with determining the representativeness of the sample nor with determining how well the outcomes generalize to a target population.
Individual fairness defines fairness as a comparison of geometric distance between the features of two data points and the distance between the predictions assigned to these two data points.Once distance is defined, individuals can be compared, and inconsistencies (unfairness) can be rectified.However, the distance must be defined, and defining a distance presupposes prior knowledge about "fairness." In other words, the principle of consistency is empty [56], and so requires a substantive notion of fairness to define what makes similar cases similar (i.e. the distance).Thus, there is a circularity in the proposition that individual fairness is a definition of fairness.It may be that the principle of consistency is a necessary requirement for fairness to be achieved, but consistency or similarity alone is not sufficient to constitute an independent notion of fairness [32].
Philosophers and jurists might best understand this point by analyzing Aristotle's principle: similar individuals should be treated similarly and dissimilar individuals should be treated dissimilarly.What does it mean for individuals to be similar?There are three possible interpretations, the first two of which attempt to draw an ought from an is: (1) it might mean individuals similar in every respect should be treated similarly, or (2) it might mean individuals similar in some respect should be treated similarly [62].Regarding the first, individuals cannot be identical and yet still be distinct-it is a contradiction in terms.The second interpretation leads to the absurdity that all individuals should be treated similarly because every individual is similar in some respect.The third interpretation is that of individual fairness which derives an ought from an ought: (3) individuals that are similar in some morally significant respect should be treated similarly.Hence, the principle becomes a simple tautology."People who by a rule should be treated alike should by the rule be treated alike" [62].
Some advocates of the individual fairness approach argue that substantive notions of fairness need be defined by domain experts [28,34], while others argue that the group fairness metrics should fill the void [32].In any case, individual fairness should be understood as a tool to implement fairness once defined, rather than as a conception of fairness in and of itself.Having set aside individual fairness as a definition of fairness, causal fairness may be examined.

Causal Fairness
Causality based metrics define the effect of protected attributes on the decision, and thus these definitions do not rely only on the observational data but require a study of causal relationships that reflect the social and economical aspects of the data collection process.Causal fairness shares the same conception of fairness as group fairness and is only different in the sense that a different set of techniques is used to achieve this goal [15,44,47].For example, the observational statistical parity measure (see equation 1), which is a group fairness definition, requires equality between the probabilities of inclusion in the positive predicted class for each protected group; while its version of causal parity(see equation 2) changes slightly this definition by introducing the notion of intervention by modifying the value of the protected attribute to a specific value and observe its effect.
Thus, these causal measures of fairness still link back to the idea of group similarity in outcomes; however, they reach it by introducing the causal effects that a change in the protected attribute value may cause to the decision.

Group Fairness
To build machine learning models that produce outcomes that are group similar, the first step is to define a measure or metric that reflects a notion of acceptable group dissimilarity.There exists a "zoo" of these metrics [15] that define the acceptability of group dissimilarity differently using notions of statistical independence, sufficiency, and separation.A number of surveys and reviews on the taxonomy of metrics and interventions have been published [14,15,49,52].
Separation based metrics, namely equalised odds, require independence between attributes and prediction conditioned on the target  .In other words, separation ensures that the model has the same false-positive rate and false-negative rate across groups.Taking the case law as an example, this means that an equal proportion of suitable men and women applying for the job are predicted to be suitable employees.
Sufficiency-based metrics, namely calibration and predictive parity, ensure that given the prediction, the target is independent of the group, meaning that the the prediction Ŷ is sufficient for  , with the same example, a sufficiency-based de-biasing algorithm will ensure an equal proportion of men and women predicted to be suitable employees are actually suitable employees.Both sufficiency and separation use the target variable and thus make an assumption about the objectivity of the target variable, for example, whether the labelling process of the data was done in an objective way and a rigorous inter annotation agreement process was done.
Independence-based fairness metrics, namely demographic parity, statistical parity, and disparate impact, are measurements of group similarity in outcomes by ensuring statistical independence between the outcomes (the predictions) and the protected features.
The mathematical formulation of disparate impact ( 3) is built around an independence between the joint distribution of a protected attribute  and the classification outcome Ŷ in the case of a binary classifier.Thus, when the event Ŷ = 1, is the positive outcome, the acceptance rates of different groups must be greater than a threshold determined by a predefined term .For disparate impact,  is the 80% or the 4/5 rule.Demographic parity is a widely used metric for independencebased group fairness, especially in fairness-aware automated hiring systems.The authors of [54] found that the most common metric used by commercially available pre-selection systems is the disparate impact metric.Our use case is automated hiring, so we elaborate more on this metric throughout this paper.Independence metrics require the same positive prediction ratio across groups identified by the protected attribute.Furthermore, independence measures rely only on the distribution of features (protected and non protected) and decisions, namely on (, , Ŷ ), thus even in the case of perfect prediction algorithm7 the independence metrics are not necessarily satisfied.To explain, independence is satisfied in a perfect predictor only if the target is evenly distributed across all groups, which is not always the case.Therefore, independence fairness metrics do not conserve the status quo and thus are known to be "non − conservative" [55].Now, simply measuring the group similarity of the outcome alone has no effect on the decision-making process.For instance, the measurement could be used to analyze whether a given feature might create a disparate impact to determine its proportionality in accordance with indirect discrimination doctrine, or to determine whether diversity goals have been met.Even the removal of proportional features that led to a disparate impact could be in alignment with the goal of substantive equality of opportunity.However, measuring the group similarity of outcomes (substantive equality of opportunity) is different than constraining those outcomes to be group similar (equality of outcome).Other suggestions from the algorithmic fairness literature that are in accordance with substantive equality of opportunity could include: outreach programs designed to attract talent from underrepresented groups, stakeholder involvement throughout the ML pipeline (including feature selection) [48], and a diverse group composition amongst the designers of the system [46].None of those suggestions "imply an attempt to achieve a final result" [22].

FROM MEASURING FAIRNESS TO DE-BIASING
The process of ensuring a final result is known as de-biasing.Before explaining how de-biasing is performed technically, it is important to understand how bias is defined and the implications of that definition.In data-driven processes like machine learning, bias is traditionally defined as a deviation from the true value of a parameter or variable [30].In fair machine learning, bias is defined as a deviation from group similarity in outcomes [15,16,49,52].Why is this an important distinction?The distinction between these two definitions of bias illuminates the goal of the processes.Where machine learning is a historical, descriptive and predictive process, de-biasing is an ahistorical, prescriptive process.While philosophers might best understand the thrust of the point through these remarks on the distinction between is and ought, jurists might best understand by comparing the separation thesis found in legal positivism to the differentiation made here [39].The separation thesis insists on the separation between ( 1) what the law is and ( 2) what the law ought to be.Now, when the true value of a parameter leads to group dissimilarity in outcomes, the true value is dubbed biased.This is due to the fact that group dissimilarities in outcomes can either be the result of: (1) group disparities existing in the target population that are reflected in a representative sample and carried into the outcomes by generalizable hypothesis assumptions (accuracies), or (2) an unrepresentative sample and/or non-generalizable assumptions that have the potential to underestimate or exaggerate group disparities (inaccuracies).De-biasing in accordance with an independence-based fairness metric is the purposeful underestimation of group disparities.In other words, decisions made on a representative sample have the potential to reflect the target population in the model outcomes, and those outcomes would have the same disparities between groups that exist in the target population.Thus to reach group similar outcomes the sample must be made unrepresentative or the hypothesis assumptions non-generalizable.Why is it important to understand the difference between is and ought statements?Sometimes people erroneously draw conclusions about what ought to be based solely on observations of what is, without providing a justifying logical bridge.This gap forms the basis of what is commonly referred to as the is/ought fallacy [43].When one infers an "ought" from an "is" without justification, they commit this fallacy.It represents a logical error, premised on the implicit assumption that the state of affairs necessarily dictates how it should be.Conversely, a less discussed but equally fallacious reasoning is what might be called the "ought/is" fallacy.This involves a reverse projection, where ideals about how things should be are assumed to reflect the actual state of the world.This form of reasoning often leads to a kind of wishful thinking, mistaking one's moral vision for empirical reality.For instance, if a person holds that all individuals should be treated equally (a normative statement) and, based on this belief alone, assumes that all individuals are equal (a descriptive statement), they are engaging in this reverse fallacy.This assumption, precisely the axiomatic assumption that "we are all equal," has already been noted as a common underlying axiom of algorithmic fairness [33,34].Such an assumption is especially absurd when one considers that the realization that "we are not all equal" is the exact motivation behind the more egalitarian strands of algorithmic fairness.Dismissing the tradeoff between generalizable outcomes and group similar outcomes based on the insistence that "we are all equal" is an example of the ought/is fallacy.There are other common mistakes that result from the failure to understand the trade-off.For instance, authors in [18] argue that the conflict between "accuracy" and "fairness" is the result of framing the trade-off as an optimization problem.Their argument rests on a causal fallacy.Recognizing the "inherent conflict" between generalizable outcomes and group similar outcomes in a data setting which contains group disparities and then optimizing between those competing interests cannot be the cause of differences between subgroups of a target population that exist independently in that data setting.
To understand the trade-off technically, reference must be made to the trade-off between accuracy and fairness.The lower bound of that trade-off has been estimated via proof [36,70].And Authors in [50] have proven that in the case of a binary classifier it will be asymptotically possible to maximize both accuracy and fairness simultaneously only if the protected attribute and the target variable are perfectly independent.On the other extreme, if the protected attribute is highly correlated with the target variable then it is only possible to maximize either the accuracy or the fairness at the same time.In between those two extremes, the trade-off is determined by the strength of the correlation between the target and the protected attribute.As the [50] proof states, if the protected attribute and the target variable are perfectly independent of one another, the more generalizable the model is, the more group parity will be present.Some authors use this fact to argue that accuracy and fairness are complimentary [18,27,41]; even going so far as to state that the "fairness-accuracy trade-off formulation also forecloses the very reasonable possibility that accuracy is generally in accord with fairness" [18].While it is true that under certain conditions generalizable outcomes and group similar outcomes are complimentary, the reliance on that truth to minimize the importance of the tradeoff is highly misleading.Generalizable outcomes and group similar outcomes can only be complimentary in a data setting where no group disparities exist (necessarily defined as group parity in the context of perfect independence).If there is no group disparity in the data setting, there is no need for de-biasing.If group disparity exists in the data setting, generalizable outcomes and group similar outcomes will be uncomplimentary (i.e. the protected attributes and target variable will be correlated.)Others observe that, in practice, constraining outcomes to be group similar can sometimes increase accuracy [64].Again, the observation is correct but can lead to a misunderstanding.When the use of a fairness constraint increases the accuracy, either the protected feature and target variable are independent (and so see the above argument) or the data sample was so unrepresentative that enforcing group similarity increased the accuracy by happenstance.And, that increase in accuracy by happenstance could never go beyond the group similarity present in the target population without decreasing the generalizability of the model.
The trade-off between generalizable outcomes and group similar outcomes is obvious.The logical conclusion of the trade-off is also obvious: where there exists the greatest need for de-biasing (i.e.data settings that contain large group disparities), data-driven processes like machine learning are most useless.In other words, as the connection between the model outcomes and the target population becomes more tenuous to become less dissimilar amongst subgroups, the use of machine learning becomes harder to justify.The more the outcome is already known (manually coded), the less need there is for a data driven approach-a script or quota could fulfill the same purpose.Thus, the trade-off presents a threat to the field.Beyond practical implications about energy sustainability and the waste of compute, why is this an important point?The use of quotas and preferential treatment for the purpose of balancing group disparities in society is not a new phenomenon, and the normative and legal questions surrounding their use have likely already been developed in a given jurisdiction.For example, quotas are directly the subject matter of the entire first half of this article.When a technology is understood, it is much easier to identify whether the use of it in a given context is lawful.
Once a metric is chosen, one of the three following de-biasing strategies can be adopted: (1) pre-processing the input data to remove, alter, or curate the underlying data that lead to group dissimilarities [31,38,69], (2) in-processing where the model is constrained to produce group similar outcomes by modifying the learning algorithm's objective functions [12,68]; and/or (3) post-processing the output of the model, rather than changing anything about the sample or hypothesis assumptions, by using an algorithm based on a function that detects potential group dissimilarities and adjusts the labels accordingly [45].If the chosen debiasing process requires the elimination of differences between groups based on a protected attribute, while disregarding the base-rate differences between those groups, the effect would be to give systematic, preferential treatment to one group at the expense of the other.The frequency or severity of that systematic deviation from equal treatment would depend on the strength of the correlation between the protected attribute and the target variable in the original, unmodified sample.The trade-off between an automated hiring system that seeks to achieve equality of treatment versus equality of outcome is inextricably linked to the trade-off between generalizable outcomes and group similar outcomes, where the generalizability in employment decisions is an instantiation of qualification assessment objectivity (Marschall Test), and group similarity in outcomes (fairness) is an instantiation of preferential treatment.Placed in this context, the "cost of fairness" [50] is the sacrifice of the individual, fundamental right to equal treatment [6].

ANALYSIS
To begin with the question posed at the heart of algorithmic fairness: should we address group base-rate differences through a process of de-biasing, thereby creating preferential conditions for some based on their protected attributes in order to reach an equitable distribution, or should we maintain equal treatment in the competition itself, relying instead on institutions committed to substantive equality of opportunity and positive action policies to redress factual inequalities between groups in society?In the context of hiring in the EU, that normative question has already been answered.In Kalanke, the CJEU put forward the primary concern and determining factor of proportionality in the employment context: whether the practice substitutes substantive equality of opportunity for the outcome that is only to be reached by the realization of factual equality in society.
In Marschall, the preferential treatment of the underrepresented sex was limited to tie-breaking scenarios of equally qualified candidates to counteract prejudiced tie-breaking that existed in social reality in accordance with the goal of substantive equality of opportunity.The Marschall "savings clause" ensured that outcome equality would not pursued-requiring the employer to subject the candidatures to an objective assessment, where the preference would be overridden if the candidate of the over-represented sex had qualifications that would tilt the balance in their favor.Unlike the practice in Marschall, candidates subjected to fair automated hiring processes that de-bias in accordance with an independencebased metric, are not objectively assessed in the first place, let alone given the assurance of an override.Previous to the selection process and comparison of applicants, the data sample would have been modified and/or the hypothesis assumptions trained to undervalue the qualifications of some and overvalue the qualifications of others based on their group membership.In other words, the weights of the applicants' features, without preferential treatment, are not brought to bear on the hiring decision.Thus, the integrity of a tie-breaking scenario is compromised at the outset.
In Abrahamsson, the CJEU considered legislation which required that a candidate for a public position belonging to the under-represented sex and possessing sufficient qualifications for that post must be given a preference over a candidate of the opposite sex that would have been appointed otherwise in order to achieve equal gender representation in the given field of employment.The Court found that the objectivity of the selection process could, therefore, not be precisely determined.Such a practice, the Court reasoned, would result in the selection of candidates with qualifications not equal to but inferior to those of candidates of the opposite sex, ultimately substituting the individual assessment of candidate merit for group membership.The Court also ruled that even if the scope of such a practice was limited to a predetermined number of posts, or to posts specifically designed for that purpose, it would still be precluded because of the absolute and unconditional nature of the practice.Unlike in the Abrahamsson case, the objectivity, or the lack thereof, of the selection process of an automated hiring system could be determined.A system which simply rids the outcome of group skew (defined as the quotient of between group distance and within group distance) or group dissimilarity (in accordance with a an independence-based fairness metric) and, in the process of doing so, necessarily disregards the representativness of the sample and generalizability of the model, shows its lack of an objective assessment.Further, Abrahamsson tells us, that creating a threshold at which candidates are qualified and then ensuring equal outcomes between groups post threshold satisfaction, would likely be precluded under an interpretation of the derogation of the relevant equality directive.For the above reasons, fair automated hiring systems that de-bias in accordance with independence-based metrics would likely be deemed unlawful due to their automatic preferential treatment.Such systems, certainly if used for selection processes of public posts, are simply a high-tech evasion of law which has been settled for decades.

CONCLUSION
It is widely recognized that automated hiring systems must not discriminate.Often fair machine learning and the tool-set it provides is seen as the answer to creating non-discriminatory automated hiring systems.However, the fact is that the most commonly used metric in fair automated hiring systems ensures a discriminatory effect when used as a basis for a de-biasing process.If the chosen debiasing process, using a non-conservative metric, requires the elimination of differences between groups based on a protected attribute, while disregarding the base-rate differences between those groups, the effect would be to give systematic, preferential treatment to one group at the expense of the other.And the frequency or severity of that systematic deviation from equal treatment would depend on the strength of the correlation between the protected attribute and the target variable in the original, unmodified sample.While algorithmic unfairness and discrimination are often used synonymously, the importance of "accuracy" and the estimation and preservation of model generalizability should not be ignored when determining the legality of such systems.Algorithmic fairness and algorithmic non-discrimination are not one in the same, and further research into the conflicts between the two in different jurisdictions and applications is required to ensure that automated decision-making systems are just.
We suspect the reason for equality of outcome being a dominant approach on the policy side of the algorithmic fairness literature is largely due to the fact that the traditional machine learning approach has the potential to satisfy a meritocratic conception but can never satisfy an equitable conception in a data setting that contains group disparities.Our concern is that by defining fairness as equality of outcomes, the community may be leading policy-makers and regulators to believe that fairness is absent in automated decisions without the use of the equitable approach.We hope for an expansion in how fairness is conceived so that the literature can capture the same kind of diversity in opinion that is present in the wider societal discourse.We also hope to have shown that the rejection of the trade-off between generalizable outcomes and group similar outcomes has, in some cases, resulted in fallacious reasoning and misleading assertions.Researchers should confront the reality that group similar outcomes require the introduction of inaccuracies in a data setting where group disparities are present.The research community should be more straightforward about what is being sacrificed in the name of equal outcomes.Obfuscating the nature of that sacrifice may lead to unlawful discrimination.As was once wisely said, "There are no solutions.There are only trade-offs" [58].
) of Directive 76/207.The final case to discuss is Abrahamsson and Anderson v. Fogelqvist [21].In 1996, eight candidates applied for a professorship at the University of Göteborg, including Ms. Abrahamsson, Ms. Destouni, Ms. Fogelqvist, and Mr. Anderson.The selection board voted twice: (1) in relation to the scientific qualifications of all candidates, Mr. Anderson received five votes and Ms. Destouni received three votes; (2) taking into account both scientific merits and a positive action provision, Ms. Destouni received six votes and Mr. Anderson two votes.The selection board proposed that Ms. Destouni be appointed, placing Mr. Anderson in second and Ms. Fogelqvist in third.Later, Ms. Destouni withdrew her application, and the Rector of the University appointed Ms. Fogelqvist to the position.The Rector stated that the difference between Mr.