Structural Interventions and the Dynamics of Inequality

Recent conversations in the algorithmic fairness literature have raised several concerns with standard conceptions of fairness. First, constraining predictive algorithms to satisfy fairness benchmarks may lead to non-optimal outcomes for disadvantaged groups. Second, technical interventions are often ineffective by themselves, especially when divorced from an understanding of structural processes that generate social inequality. Inspired by both these critiques, we construct a common decision-making model, using mortgage loans as a running example. We show that under some conditions, any choice of decision threshold will inevitably perpetuate existing disparities in financial stability unless one deviates from the Pareto optimal policy. Then, we model the effects of three different types of interventions. We show how different interventions are recommended depending upon the difficulty of enacting structural change upon external parameters and depending upon the policymaker's preferences for equity or efficiency. Counterintuitively, we demonstrate that preferences for efficiency over equity may lead to recommendations for interventions that target the under-resourced group. Finally, we simulate the effects of interventions on a dataset that combines HMDA and Fannie Mae loan data. This research highlights the ways that structural inequality can be perpetuated by seemingly unbiased decision mechanisms, and it shows that in many situations, technical solutions must be paired with external, context-aware interventions to enact social change.


INTRODUCTION
Algorithms are increasingly being utilized in socially impactful settings to guide decision-making.Several methods have been proposed for debiasing algorithms to ensure that they are nondiscriminatory.Many of these methods consist of technical guidelines for algorithm development, which include altering parameters within machine learning models and constraining algorithms to satisfy certain fairness metrics [6,69].
This framing of algorithmic fairness has been criticized on several different fronts.Some have demonstrated that constraining algorithms to satisfy fairness criteria may lead to suboptimal welfare outcomes for members of both advantaged and disadvantaged subgroups.Others have argued that the focus on formal, technical interventions to ensure fairness distracts from more pressing concerns about how the predictive algorithm's results are used to make unjust decisions that lead to undesirable social consequences.Still others are concerned with ways in which it is sometimes impossible to separate the effects of historical discrimination from existing social facts encoded in the training data; such historical inequities are often "baked in" to datasets.
When designing policies to make algorithmic decision-making systems more fair and just, these concerns point us to ask several questions: first, what are the welfare effects and outcomes of these decision-making processes?Second, how can our understanding of the social contexts of these algorithms enable us to target the most crucial leverage points when it comes to policymaking?Third, to what extent are these algorithmic decision-making systems reifying unjust inequities that are a result of historical discrimination, and how can we design policies that ameliorate the effects of this injustice?
We borrow from feminist and political philosophy, as well as previous discussions in the algorithmic fairness literature [35,52], to propose how we can use a framework of structural injustice and structural interventions to address these concerns in a formal model.We describe a class of decision structures, which we more fully characterize in section 3, in which applicants in a population undergo a decision process that is a function of their initial financial status.Depending on the result of this decision, applicants then gain or lose financial stability proportionally to their initial status.This is a type of feedback loop, in which the decision structure exacerbates existing inequalities present in the population over time [77].We will use mortgage loans as a running example.
The main contributions of this paper are as follows: • We create a quantitative model of the effects of structural interventions on a system with persistent inequality, based on theories of structural injustice previously qualitatively described in the algorithmic fairness literature.• We describe conditions under which any intervention upon decision thresholds will either perpetuate systemic inequality, or will harm the better-off group without any improvement to the plight of the worse-off group.• We characterize conditions under which structural interventions may be preferable to algorithmic ones, depending on a policymaker's preference for equity or efficiency.• We apply our framework and propose examples of structural interventions in the housing domain.
In Section 2, we lay out some existing debates within the algorithmic fairness community surrounding the inadequacy of predominant fairness models and situate our contributions within the context of related work.Then in Section 3, we set up our model and prove three propositions about its behavior.Section 4 describes and simulates the effects of different kinds of policy interventions upon sample distributions.Section 5 contains an empirical demonstration and simulation of these results using HMDA and Fannie Mae data, and we conclude in Section 6.

BACKGROUND
Structural injustice has been the focus of recent discussions in the algorithmic fairness literature [44,52,63,65].To illustrate an example of structural injustice, we begin with an enduring puzzle: as of 2024, discrimination based on identity categories like race, gender, and nationality is prohibited across domains like education, employment, lending, and housing.However, unequal outcomes, such as the wealth gap and the homeownership gap between Black and White Americans, or the income gap between men and women, still remain, and in many cases, have grown [3,19,54,83].
Why might this be the case?One explanation may be that membership in some identity categories may affect an individual's preferences and behavior.A second explanation is implicit bias: decisionmakers might still be affected by subconscious discrimination.Another explanation is that while present-day decision-makers may be unbiased, very little has been done to rectify past harm, the effects of which still pervade the present.A fourth explanation is that unequal outcomes persist, not only because of the biased actions of individual actors, nor because of injustice that exclusively happened in the past, but because of larger systems that are structurally biased against marginalized groups.
Our definition of structural injustice focuses on the third and fourth explanations for how decision-making systems generate social inequality.First, decision-making systems are often trained on data that reflect unjust historical patterns, which causes them to reproduce status quo disparities that are a result of historical inequity [33,67].This problem cannot always be resolved by better data collection or by clarifying gaps between theoretical constructs and measurement [30,49].In these most difficult cases, membership in the minoritized group can be constitutively related to features in the construct space that are, in turn, casually influential upon target outcomes [41,46].Parallels can be drawn between this type of injustice and Young's account of structural injustice, which often arises in situations where no actor is liable for blame or wrongdoing [92].Similarly, algorithmic decision-making systems can reify status quo inequalities without being biased or discriminatory in a traditional sense, and without being the result of blameworthy behavior by a discrete bad actor [45].
Second, decision-making systems perpetuate inequality because they are situated in present-day, unjust social structures.A social structure can be thought of as a particular configuration of a network of causal influence between social phenomena [38,80].Haslanger also describes a socio-structural explanation as one that answers not the what but the why of causal relationships.These investigations can be useful for understanding the effects of unjust social structures.For instance, a recidivism risk algorithm may inform a judge's decision for pre-trial detainment.In many cases, we might ask not just "was this algorithm fair to sentence Black defendants in these cases?" but also, "what are the social structures that have created a causal connection between recidivism risk and imprisonment, or between imprisonment rate and community well-being?"These questions illuminate how the prison-industrial system structures networks of causal influence between algorithms and their social consequences.
While there is debate about moral responsibility for structural injustice [68], we take as a starting point that data scientists and decision-makers are responsible for the social consequences of algorithmic decision-making systems [34,52].We investigate the conditions under which decision-making systems perpetuate injustice and the mechanisms by which they do.Some solutions have included developing more careful guidelines for how algorithmic predictions affect decision-making, using distributive justice as a standard for making better decisions [60].We draw in particular from Green's concept of a "structural intervention" toward algorithmic justice [35], which alter the way that algorithmic predictions affect downstream social outcomes.In the recidivism case, this may mean decreasing incarceration time overall while increasing social services, so that an algorithmic prediction may not have such high-stakes social consequences.We model these structural responses by changing the parameters that govern the magnitude of downstream effects.Figure 1 demonstrates this graphically: instead of focusing on debiasing a predictive algorithm at the site of the algorithm, we must examine the larger social structure that the algorithm is a part of, and if necessary, consider interventions upstream and downstream of the decision.This structural injustice framework describes how the entire causal chain may generate injustice without blameworthiness or liability at any particular point.

Fairness and the consequentialist and structural critique
The socio-structural explanation for persistent inequality responds to many existing concerns raised by the algorithmic fairness literature, and in many cases, offers a solution to these existing dilemmas.First, the sociotechnical critique argues that many fairness interventions that focus exclusively on technical interventions at the site of the algorithm [36,37] can obfuscate larger patterns of unjust social systems that these algorithms are a part of [72].Instead, algorithms should be evaluated with their social context, with a particular focus on the interactions between the social and technical aspects of a decision-making system.This broader evaluative framework may investigate legal and political contexts, data collection processes, and social impacts of the decision-making systems [14,25,82].In practice, some have suggested frameworks that use value chains or bottleneck theory [5,50] that identify vulnerable or high-impact locations within the algorithmic pipeline that are most in need of intervention.
Second, the consequentialist critique describes how seemingly fair procedural mechanisms may generate non-optimal outcomes for disadvantaged groups [15,47,86], especially over a longerterm time horizon [18,64,74].Additionally, compliance with legal standards of non-discrimination by "blinding" an algorithm to an individual's protected attribute may also generate non-optimal outcomes [55,57,71].Overall, the consequentialist perspective encourages evaluating outcomes rather than procedures [9,32,53], arguing that practitioners concerned with justice ought to directly study welfare for disadvantaged groups.Consequentialist-flavored case studies on algorithmic justice have been explored in domains such as education [15], lending [10], and criminal justice [16,48].
Third, impossibility theorems have proven that when there exist differences in the base rate of outcomes, decision-makers often cannot simultaneously satisfy multiple intuitive fairness metrics.For example, equal opportunity and calibration can be incompatible [56], as are equal opportunity and predictive parity [12].Green discusses how this result is a reflection of the social world and its existing inequalities, demonstrating that "in an unequal society, decisions based in formal equality are guaranteed to produce substantive inequality." [35].
Fourth, the historical bias critique argues that, because of existing biases embedded in data, standard intuitions about what constitutes a fair decision frequently fail.Existing inequities can exist in the relevant construct space-for instance, racial discrimination can generate a predictive relationship between race and socioeconomic status that can have a real bearing on some outcome of interest [30].When these relationships exist due to historical discrimination, standard assumptions that are used to reason about intuitive fairness standards may fail [29,41,94].Furthermore, some algorithmic decision-making systems may dynamically interact with a social system at multiple points, where input parameters at time  may be dependent upon outputs of the algorithm itself at time  − 1 or earlier.In these cases, algorithms may not simply perpetuate historical injustice but sometimes may even exacerbate discrepancies [24,27,31,62,76].
The structural explanation addresses concerns raised by these dilemmas.Our model responds to the sociotechnical critique, making explicit the values that policymakers might prefer [14] and thinking beyond technical solutions for social change [82].The structural change model also addresses the consequentialist critique, explicitly defining a welfare function and assessing how policy changes may alter outcomes for both advantaged and disadvantaged populations.Next, the structural change model is based on Green's suggestion that the concern with misaligned fairness metrics is largely a result of existing inequalities in the social world.Our model aims to capture situations in which such "base rate" differences can or cannot be eliminated.Finally, the dynamic nature of our model allows for the assessment of long-term consequences in a feedback-based setting, showing how existing inequalities might be perpetuated or mitigated over time.

Related Work
Our work builds on a body of literature studying the dynamics of qualification rates.Liu et al. model a one-step lending process where individual's score is affected by a selection mechanism.This model shows that constraining selection policies to satisfy certain fairness criteria, such as demographic parity or equal opportunity, can sometimes lead to greater declines in qualification status compared to the unconstrained policy [64].Jorgensen et al. extend this result to several other fairness criteria [51].Williams and Kolter model a similar loan approval setting as Liu et al., with slight modifications to the update function [91].In contrast, they show that unconstrained policies may increase inequality while fairness-constrained policies can sometimes lead to convergent outcomes.Our lending setting is inspired by these models, but instead of evaluating their behavior when constrained by specific fairness rules, we show that there are certain conditions where no possible policy constraints can decrease inequality without harming the advantaged group.
Other related work studies how qualification rates dynamically evolve over a longer time period [70], how the model might change when individuals can choose to invest resources in improving their qualification status over time [93], or how within-group disparities are exacerbated by these decision systems [86].Like Mouzannar et al., Zhang and Tu, and Sun et al., we demonstrate that there exist conditions under which no threshold policy will lead to equality in the long term.We additionally examine a certain type of policy intervention that alters the relationship between late payment and financial penalties, which we call a structural intervention.Most of the works mentioned above explicitly assume a fixed background structure, examining only interventions on approval rates.Zhang and Tu allude to a certain type of "transitional intervention" that increases the equilibrium approval rate for both groups, but do not show how such interventions may increase the equilibrium equity between these groups.We also study the differences between group-blind and group-specific interventions.
The most similar work that formally models structural interventions may be [17].Cruz Cortés et al. models three separate interventions: one that enforces fairness constraints, one that changes the initial score distribution of the two populations, and one where the disadvantaged population receives a boost.Their results largely agree with ours, showing that the latter two structural interventions are more effective in many situations.While Cruz Cortés et al. simulates only two structural interventions; we show how structural interventions of different magnitudes may affect outcomes.We then give an example of a policymaker's utility function that places different weights on short-term equality and efficiency, and show the desirability of different interventions in each case.

MODEL 3.1 Single Time Step Model
We use a loan approval setting as a running example.Consider a loan applicant pool that consists of two subgroups: an advantaged group  and a disadvantaged group .At time , every applicant has some likelihood of paying off their requested loan.Let  ∼ Ber(   ) be a random variable that corresponds to whether an individual pays off their loan or not, and suppose that the parameter    depends on the individual's subgroup identity  ∈ {,  } and the time .
We assume that there exist two different distributions of probability repayment parameter    for the two subgroups.Now consider a threshold decision rule that allocates loans to applicants: we choose   ∈ [0, 1] such that applicants with repayment probability parameter    ≥   are granted the loan, while applicants with repayment probability parameter    <   are denied.In similar settings, it has been demonstrated that a threshold rule is optimal, and that many fairness constraints, such as equal opportunity, can be enforced in terms of different thresholds [64,93].
We examine the evolution of the distributions of    over time, where  ∼ Ber(   ): is a scaling parameter, and  measures the severity of financial penalty for a late payment.We will examine interpretations of  more closely in the last section.At each time step, part of the population is approved and part of the population is denied.Denied individuals maintain the same likelihood of repayment; in this step, we hold external facts constant, assessing the effects of the decisionmaking process alone.Approved individuals either accrue benefit , if they make a timely payment at time , or penalized , if their payment is late.This update equation captures the intuition that timely payment of loans helps build credit, and in the home mortgage case, that homeownership is an avenue for wealth-building.This also captures the intuition that when  > 1, the penalty for late payment is larger than the benefit accrued from repayment at each time step.Finally, the minimum and maximum bounds ensure that    +1 ∈ [0, 1] for all .
In expectation, this is () refers to the cumulative distribution of    evaluated at .We assume that the policymaker is blind to each individual's group identity; they must set a universal , so   =   .This assumption corresponds to legal standards such as disparate treatment law in mortgage lending, which prohibits using race, sex, nationality, and other protected characteristics as a reason for setting different approval rules [39,59].
All proofs for this and subsequent propositions can be found in Appendix A. We make the assumption because of the linearity of our function, and because it is bounded at its endpoints.This assumption is not difficult to attain in practice:    = 1 if and only if everyone in the population has probability of repayment = 1, and    = 0 if and only if everyone in the population has probability of repayment = 0.These two cases are both trivial, and all possible policies will maintain those equilibria.
In the intermediate cases, this proposition is related to the speed of convergence of    and    .Not only are inequalities between the subpopulations maintained, they are deepened by the process.
The corollary states that under certain conditions, any inequalities will be deepened by this distribution mechanism.No choice of  will be able to avoid this outcome; that is, no change in policy will be enough to avoid perpetuating structural inequality.We show later that this is plausible in many empirical demonstrations.
Next, we show what happens when we choose the optimal outcome.if we set the threshold to be β.
To ensure that    +1 −    =    +1 −    , we must lower the advantaged population mean without a corresponding increase in the disadvantaged population mean.It has been shown in similar situations that many group fairness criteria, such as demographic parity or equal opportunity, can be implemented through a decision threshold rule that permits different thresholds for different groups [64].In other words, these fairness criteria are often not Pareto optimal.

Long Term Behavior
We examine the steady-state behavior of this system while making one key assumption: that there exists no external influence on the dynamics of the system.Proposition 3.5.Assume that ,  ∈ Q, and let  ∈ [0, 1] be some arbitrary threshold.Then  ( < lim  →∞    < 1) = 0 for all group identities  ∈ {,  }.
The long-term steady-state of the process results in a bifurcation in the population within each identity subgroup: part of the population attains repayment probability 1, while another part of the population is relegated to being below  for perpetuity.In reality, external interventions can often propel members of the sub- group to a higher status.However, we are merely examining the long-term effects of this process in isolation, showing that no matter the choice of , low-resource members of the population may not be able to be helped by this process.In Appendix A we demonstrate how we can compute the probability of certain stationary distributions given initial parameters.

INTERVENTIONS
In the previous section, we showed that under certain assumptions, no decision-making policy will eliminate initial status inequalities.Now we examine several different types of interventions.Recall in subsection 3.1 that the policymaker can choose to set the threshold .We consider situations under which the policymaker can change  as well.
Consider the following aggregate utility function for a population of size  that is divided according to some identity label denote the maximum mean repayment rate prior to any intervention, that is, the expected value at  after applying the optimal decision threshold  = β after every time step.Let E[    ] denote the mean repayment rate at time  under some intervention policy.Then the policymaker's utility from an intervention is as follows: The first term corresponds to outcome parity: to what extent does it matter that the two groups have different outcome means?The second term corresponds to aggregate utility change across both groups, the difference between the optimal unconstrained policy and the post-intervention policy.This is a version of a utilitarian equity-efficiency tradeoff, where a higher  represents a preference for outcome equity over outcome efficiency and vice versa.We examine the effects of three possible interventions.In this particular formulation, we assume that the populations are of equal sizes, but it can be modified for the case where the populations are of different sizes.
• This intervention modifies .Let ĉ be the pre-intervention level of .Let  correspond to the magnitude of change that we can make to ĉ.When  is large, this means that we have the resources to decrease ĉ significantly.We choose the new  = ĉ − ĉ/2.When ĉ < , then π  ≤    .Then, we also choose the optimal   to maximize   .
• Policy 3: "group-conscious intervention" We now assume that  can be different for different groups:   depends on the subgroup.This intervention exclusively modifies   .Because we are directing resources specifically to modify   , assume that   = ĉ  and   = ĉ  −  ĉ  .
Similarly to policy 2, we permit changes in   to maximize   .
Policy type 1 is a technical intervention within the existing decision structure, which holds  and  constant.Policy types 2 and 3 are structural interventions that depend on the amount of resources  to modify the structural parameter . can also be thought of, more generally, as the responsiveness of the social world to policies that attempt to alter .
In the following simulations, we start with two populations, one advantaged and one disadvantaged.We choose a sample size of 1000 and simulate the effects of different policies and interventions on samples of these populations.First, we determine the optimal   such that lim  →∞    is maximized, and likewise for .Then, we compare these outcomes lim  →∞   , with the outcomes generated when we choose certain interventions.
In Figure 2, we sample   0 from the beta(4, 8) distribution and   0 from the beta(3, 8) distribution. 1We display a probability density estimate at time  = 0 in the top graph.The bottom three graphs display the recommended policies after simulating policy outcomes over  = 20 timesteps.They represent three different policymaker preferences: the leftmost graph shows the recommended interventions when the policymaker's  = 0.2, that is, when they prefer outcome efficiency over equity, the middle graph with  = 0.5, which is a balanced preference, and the rightmost with  = 0.8 corresponds with a preference for equity over efficiency.For each of these individual graphs, the -axis corresponds to the starting   value, and the -axis corresponds to the magnitude of change that is possible.In other words, a high intervention % on the -axis means that the starting   can be significantly reduced by the policy intervention.
Figure 2 shows that group-conscious structural interventions are more strongly recommended when  = 0.2, especially when the intervention magnitude is large, while the beta-only interventions are sufficient when  = 0.8.These recommendations are highly contingent upon the starting distributions of   0 , however, and we see patterns emerge when comparing how recommendations differ depending on the initial distribution.Other results depend more specifically on the starting distributions.First, we can look at the two graphs on the left side of Figure 3 when the overall mean is low for both populations.Here there is a strong recommendation for group-conscious interventions when the starting  level is low, or when there are enough resources such that the  level can be reduced significantly for the disadvantaged population.This pattern shows up very clearly in the bottom left corner, when there is high intergroup difference.In these cases, only a large intervention will have any effect, because both populations are starting at a place where they are more likely to fail.The "band" pattern exhibited when the mean is low but the populations are close together can be explained by the fact that a too-large intervention would benefit the disadvantaged population to a point where they would be better off than the advantaged population.In this case, a preference for outcome parity would only lead to recommending the group-conscious intervention in the case where it leads exactly to parity and increased efficiency in outcomes.
Next, when the overall mean is high, corresponding to the two graphs on the right side of Figure 3, a different pattern emerges.In the case when the populations have initially close starting positions, if the policymaker has fewer resources, and  is high, the recommendation is to invest in group-conscious interventions.However, when  is already low, or when the intervention is large, the recommended intervention is the group-blind intervention.This is likely because if the starting point for the disadvantaged population is fairly high, and it is easy for them to "catch up" to the advantaged population, a large intervention is not required for them to reach outcome parity and stability at the right-hand side of the distribution.In this case, it is most beneficial overall to improve the overall condition of both parties.
But this pattern is different in the case where the population means are far apart, even when the overall mean is high.In this case, there is a strong preference for improving the plight of the disadvantaged group through structural means when   can be significantly reduced.Since the advantaged population is already so well off, and the disadvantaged population can achieve much better outcomes through a c-type intervention, the recommendation is to invest all resources into the latter.
In general, the recommended intervention depends strongly on the policymaker's preference for outcome equity versus efficiency, and also depends on the starting conditions of the two populations.Note that each of these represents only the long-term effect of a onetime intervention.An expanded study may look at how dynamic interventions may respond to changing population distributions to improve overall conditions or to increase the speed of convergence of the two distributions in time.

EMPIRICAL DEMONSTRATION 5.1 Results
We apply these methods to demonstrate how a mortgage loan approval procedure may perpetuate existing financial inequalities between Black and White applicants.We use two datasets.First is the publicly-available Home Mortgage Disclosure Act (HMDA) dataset from 2021, 2 an individual loan-level dataset that records important demographic, geographic, and economic features of applicants, as well as labels of whether each application was approved or denied.It has a couple of limitations: although HMDA collects information on credit scores, this information is not released to the public.HMDA data also crucially does not contain data on default or late payment.
The second dataset used is the publicly-available Fannie Mae Single Family Loan Performance Data 3 .Fannie Mae (FNMA) and Freddie Mac are government-sponsored enterprises that buy conforming mortgages on the secondary mortgage market from lending institutions.As of 2020, Fannie Mae and Freddie Mac own approximately sixty percent of conforming mortgage loans in the U.S. [1] The Fannie Mae dataset contains a nationwide subset of all mortgage loans owned by Fannie Mae, originated between 2000 and 2022.FNMA contains loan-level information on the financial features of borrowers at loan origination.FNMA also contains time series, loan-level data on repayment on a monthly basis.For ease of data processing, we limited the analysis to mortgages originating in Massachusetts, and we used a subset of the FNMA data from the first two quarters of 2017.
Our goal was to determine the distribution of risk of mortgage non-repayment across different subpopulations in the 2021 HMDA dataset.To do this, we found shared features between the HMDA and FNMA dataset: loan amount, loan-to-value ratio, debtto-income ratio, and number of units in the property.Then, we created a logistic regression to predict risk of late payment, which we then trained on the FNMA dataset.Using this model, we predicted risk of late payment in the 2021 HMDA sample.Because there were relatively few features used, this model may not have extremely high predictive accuracy; however, it is useful for illustrative purposes to demonstrate how different distributions of risk across different populations may result in sustained inequality.For additional information on the datasets and methodology, see Appendix B.
The predicted repayment values are displayed in the top panel of Figure 4.This distribution fulfills the condition of stochastic dominance described above, and an empirical test across all  ∈ [0, 1] at a step size of 0.01 verifies this assumption.See Figure 6 in Appendix B for a visual demonstration.Since this is true, if we let  [   ] to be the expectation of repayment probability for the White population and  [   ] to be the expectation of repayment probability for the Black population, Theorem 3.3 tells us that for all possible ,  [   ] ≥  [   ].We then simulated the repayment status of each loan for  = 20 timesteps to model the dynamics of population inequality over time.Figure 4 displays the recommended intervention for each value of  and the policymaker's choice of .Notice that when  < 1, the choice of intervention becomes relatively unimportant, as both distributions converge to 1 without any external intervention.When  > 1, we are split into two cases.When we have the resources to perform a large intervention that significantly reduces , the optimal policy is a group-blind intervention that benefits both groups.However, when resources are limited and the intervention size is small, if we prefer outcome parity, the beta-only intervention is optimal.If we prefer efficiency, the group-conscious intervention that lowers  for one group only is optimal.
There are several caveats to the model: first, we calculated repayment risk based on a single late payment.While it is true that a single late payment can substantially lower credit score [21], penalties for late payments on mortgage loans tend to be relatively lenient, compared to auto or credit loans [61].Mortgage default tends to result in more serious consequences, but there were not enough defaults in the dataset to construct a meaningful prediction model.
Second, our model evaluates the entire population at every time step to determine whether they are above or below the given threshold, "accepting" or "rejecting" them each time.In practice, individuals do not have to periodically re-apply for a loan, and furthermore, rejected applicants are not likely to re-apply for a loan so soon after a rejection.Regardless, the outcomes of the model conform to the likely result, which is that approved households remain approved at every time step unless they fall below some threshold, after which they remain perpetually in the rejection state.Future work may model the effects of interventions on rejected members below the threshold.
Third, this model imperfectly captures the relationship between wealth accrual from homeownership and loan repayment risk.Households accrue wealth the longer they own their homes, but wealth may have heterogeneous or nonlinear effects on repayment risk [79].
Finally, notice that this sample only includes individuals who applied for a mortgage loan.If one's concern is only with the fairness of the model, this is not a problem.However, for the purposes of understanding larger-scale patterns of housing injustice, our sample likely understates the racial gap in mortgage risk.This is because the mortgage applicant pool is subject to self-selection bias.Many individuals who want to own a home, who do not have enough financial assets to do so, will not apply.

Discussion and Policy Recommendations
There exists persistent racialized inequalities in housing despite the de jure elimination of segregation and discrimination in the U.S. [85].Homeownership is one of the primary ways to build wealth in the U.S., and the gap between the percentage of white Americans who are homeowners and the percentage of Black Americans who are homeowners has increased since the 1960s [11,40,58].This is a result of the legacy of historical redlining and segregation, as well as present-day structural barriers to homeownership for Black Americans.This can be seen in the 2008 financial crisis: well-intentioned policies that attempted to reduce barriers to homeownership have nevertheless harmed communities they intended to help by subjecting largely low-income, single-head-of-household Black women to predatory and exploitative loans [87].
We chose this application domain because it demonstrates how loan approval algorithms are embedded in an unjust social context.The statistical association between race and financial capital cannot be eliminated only by better data collection, blinding methods, or fairness constraints.Black Americans who, on average have less wealth and income, are less likely to be able to afford and maintain a mortgage [23,43,78].In line with recent work on reparative algorithms, we suggest a framework of how to evaluate policies that actively seek to remedy these injustices, rather than merely engaging in formalist procedures that gesture at fairness without any improvement of material circumstances [20,84].
What kind of policies will close the homeownership and wealth gap?On a pure egalitarian view, outcome parity can be achieved by harming the advantaged group enough such that their average outcome is closer to the disadvantaged average.This is the recommended policy when we place a heavy weight on bare egalitarianism; a simple alteration of the decision threshold may lead to worse outcomes for the advantaged group.However, this outcome may not be intuitively desirable.This intuition powers the well-known leveling-down objection to egalitarianism [73].Prioritarianism is an attractive alternative to egalitarianism which argues that benefits to the worst-off members of society should be weighed more heavily than benefits to advantaged members [88].Prioritarian concerns apply most saliently when we are faced with the question of reallocating benefits from a more advantaged group to a less advantaged group.But in this case, the act of deliberately harming members of the advantaged group does not entail material benefit to the worse-off group.
Pure egalitarianism, and its non-consequentialist counterpart, demographic parity, have drawn a plethora of criticism [2,26].If we wish to design algorithms that advance material justice and want to reject pure egalitarian approaches towards equality, we must alter the decision structure itself.
Our model includes one such approach.In practice, several policy mechanisms can lower .For instance, late mortgage payments have a significant impact on credit score [21], which may impact one's ability to access credit in other markets.Mitigating the drop of credit score upon a single late payment may lower the financial impact of late payment, as credit ratings can causally impact future creditworthiness [66,75].Another intervention may involve making mortgage forbearance programs with favorable interest rates more widely available during times of financial crisis.Mortgage forbearance programs were used by largely minority and low-income borrowers during first couple years of the COVID-19 pandemic, and such programs reduced inequalities, helped maintain cash reserves for families affected by unemployment, and allowed borrowers to pay off other pressing debts [4,28,81].
There is evidence that  for mortgage loans is already lower than it is for some other forms of debt [61].Mortgage debts, unlike automobile debts or credit card debts, may not result in immediate foreclosure, whereas credit card loans or auto loans tend to default much more quickly.In the housing domain, our model may be more useful for modeling the housing process for renters: a single missed rent payment is much more likely to result in an eviction filing.Eviction may significantly negatively affect many aspects of a renter's life, including health outcomes [8,90], employment prospects [13,22], and future housing outcomes [89].A structural change in this area would decouple the effect of late payments and the cascade of harmful consequences that result from eviction.
Mortgage forbearance programs or eviction harm reduction programs are policy changs that we call "structural modifications to the decision system." Another structural intervention may increase the affordable housing stock, an intervention which operates upstream of the decision process.Mathematically, we can model this as a shift in the distribution of risk across all populations-everybody is more likely to find affordable housing.Both types of interventions can be modeled by changing the parameters that govern the behavior of the dynamic system, and future work can evaluate the tradeoffs and the impacts of upstream and downstream interventions in the housing space.
However, such programs can be limited in their imaginative capacity.One structural intervention that is harder to model may involve the restructuring of debt markets entirely.For instance, Herzog calls into question the ethics of private debt markets, arguing that they are a mechanism for dynamically reinforcing injustice [42].Such sweeping social reconfigurations are currently difficult to evaluate, model, or implement, and more research is necessary to understand how to design and assess such alternative structures.

CONCLUSION
In this project, we have modeled a common decision-making mechanism in a loan approval setting and shown how certain initial inequitable conditions can be perpetuated over time.Then, inspired by accounts of structural injustice in algorithmic decision-making [35,52], we imagine how a structural intervention might be incorporated into this formal model as alterations upon parameters that govern the relationship between an algorithmic decision and its social consequences.We then demonstrate the effects of policies that target these parameters, showing how different policies may be recommended depending on a policymaker's preference for equity or efficiency.In particular, we simulate the effects of resource limitations and constraints on the magnitude of possible social change on our recommended interventions.We find that in many cases, structural interventions are widely preferred and are often recommended over exclusively technical interventions, especially when efficiency is valued over bare outcome equality.
Our project imagines a type of structural intervention upon a penalty parameter, which represents only one way in which we can model the effects of structural change.More research is necessary to assess the types of policies which cause social change.Further research can also create more sophisticated models of larger-scale social intervention.
We wish to conclude by emphasizing that such interventions can hold the key to addressing many of the conundrums and contradictions that the fairness community has grappled with for the past few years: why is it that so many common, intuitive conceptions of fairness are incompatible with each other in practice?And why is it that adhering to these fairness metrics may not materially benefit disadvantaged populations in the long run?Industry standard fairness criteria should not be rejected entirely; they operate at a different scale, and serve a different function, from structural interventions that aim to advance justice in various domain areas of social importance.However, it is important to clarify the extent to which fairness guidelines can deliver on desirable outcomes.If policymakers are interested in more equitable, efficient social outcomes, structural interventions may be most suited towards advancing those ends.Proof.For ease of notation, let  +  +1 = max{min{   +  −  (1 − ), 1}, 0}.First, we use the tower rule:

A PROOFS
) which, by our assumption, simplifies to Now we want to show that This is the conditional expectation of a function of a continuous variable    , where the function consists of products, maximums, and minimums that are all continuous.Therefore we know   +1 is a continuous function of .By the extreme value theorem,   +1 attains a maximum value.That is, there exists  such that   +1 () attains a maximum.Now we show that the maximum does not depend on the distribution of   .We can write   +1 as a conditional sum: The function () =  max{min{ + , 1}, 0} + (1 − ) max{min{ − , 1}, 0} is nondecreasing in  if ,  ≥ 0 and linear between 0 and 1.So either () ≤  for all  ∈ [0, 1] or there exists some minimal  0 ∈ [0, 1] such that for all  >  0 , () > .
In the first case, set  = 1, and in the second, set  = min{max{ 0 , 0}, 1}.We can see that this would maximize the sum above.Finding  here also does not depend on the distribution of   , so the optimal β is the same for both the advantaged and disadvantaged populations.
Second, we prove that    +1 ≥    +1 .Because of the dominance assumption, we know that ], starting at some point between  0 and  1 .One moves some distance to the right,  ≥ 0 with some probability , and some distance to the left,  ≤ 0 with probability 1 − .Suppose that any point  >  1 is all one absorbing state, and any point  <  0 is another absorbing state, and suppose ,  ∈ Q.Then there exists a finite amount of positions between  0 and  1 , plus the two absorbing states, that the random walk could attain with probability > 0.
Proof.Since ,  ∈ Q, we can represent them as  =   and  =   , where , , ,  ∈ Z. Then at any arbitrary time  ≥ 0, we have taken  steps to the right and  steps to the right, where ,  ∈ N.So the position we are at at time  is either  0 ,  1 , or can be represented as . There are a finite number of multiples of 1  between  0 and  1 , so there are a finite number of positions that the random walk can attain.□ Proposition A.5. Assume that ,  ∈ Q, and let  ∈ [0, 1] be some arbitrary threshold.Then  ( < lim  →∞   < 1) = 0.
Proof.Consider a Markov chain, and let  0 be the starting state.Since ,  ∈ Q, this means there are finitely many states that   can attain between [0, 1], as per the above lemma.Suppose there are  states, and  of them are absorbing states that correspond to 0, 1, and any point < .Then we can describe this with a  ×  matrix , where each entry    corresponds to the transition probability between state   and state   .This transition matrix can also be written as the block matrix where the identity block in the top left corresponds to all the absorbing states.Then it can be shown that lim We show using induction that The  = 1 base case is .For the inductive step, assume that this holds for  ≥ 2. We show that this is true for  + 1.
=0     +1 as desired.Since the process is time-homogeneous, it is also the stationary distribution.Now we show lim →∞   = 0. Consider all the non-absorbing states in the Markov chain: we know that it is possible for them to reach 0 in a finite number of steps −, and also that it is possible for them to reach  in a finite number of steps.For each state , let   be the minimum number of required steps for them to reach an absorbing state.Then let  = max    .Then in  steps, the probability of reaching an absorbing state is > 0 for every starting state.Therefore there exists this  such that the maximum row sum of   < 1, since   is row stochastic.By Gersgorin's disc theorem that the spectral radius  (  ) < 1, since the diagonal entries of   = 0 (when in a transitory state at time , it is impossible to stay at that state at  + 1 by design.)We then can write   =  −1  , where  is the Jordan normal form.Then lim because all eigenvalues  < 1 in J.

B EMPIRICAL
In section 5, we created a prediction of risk for a sample of applicants drawn from the HMDA 2021 applicant dataset, subsetted to include applicants who were applying for a home loan in Massachusetts.We use the 2021 data, because HMDA datasets before 2020 did not publicly release crucial features, such as the debt-to-income ratio and loan-to-value ratio of applicants.As of 2022, HMDA datasets still do not include another key feature: credit score.A subset of the available features from HMDA can be found in Table 1.As described above, HMDA does not contain data about default or late payment.To supplement this dataset, we make use of the FNMA dataset, which does contain information on default and late payment, but does not contain information on crucial demographic characteristics of applicants.Relevant FNMA variables are also listed in Table 1.
Our methodology involves first constructing a prediction model for risk of late payment, trained on FNMA data, and then using this to predict risk for the HMDA applicants.Because of this, the prediction model can only include features that are shared between the two datasets.Table 1 shows that six key variables are shared: debt-to-income ratio, loan-to-value ratio, loan amount, number of units, metropolitan statistical area (MSA), and loan purpose.In our methodology, we filtered both datasets to include entries where the loan purpose was "purchase" rather than "refinance." We also did not use MSA in prediction.The remaining four variables were used to construct the predictive model.
First, using the FNMA dataset, I ran both an OLS and a logistic regression with probability of late payment as the response variable.The results can be seen in Table 2. Comparing regressions (1) and (2) shows that there was a substantial increase in Adjusted  2 when including credit score, which could potentially indicate that credit score contains important explanatory information about financial risk that is not captured by other variables, such as loan-to-value ratio and debt-to-income ratio.This demonstrates a limitation in our study.The ultimate goal of this exercise was to construct an individual loan-level dataset that contained both risk of late payment and demographic information about an applicant's membership in historically marginalized identity categories.Because the former was not available in HMDA, and the latter was not available in FNMA, our prediction for risk of late payment is an imperfect proxy for risk level.For the purposes of this study, however, this proxy is still informative for demonstrating how differences in initial qualification status can be exacerbated in the long term.
In regression (3), I used the logistic regression (3) from Table 2 to predict late payment risk using variables shared between the HMDA and FNMA dataset: loan amount, loan-to-value ratio, debt-to-income ratio, and number of units.This was the final model that was used to generate the predictions for the HMDA dataset, and the results can be seen in Figure 4.
The results are reproduced here in Figure 5, alongside the empirical cdf and maximum mean at different  values in Figure 6.The empirical estimated cdf shows that our assumption of stochastic dominance is satisfied since there is no overlap in the cdf.The maximum means plot shows the maximum attainable average result for each subgroup.For instance, if  = 2, this simulation shows roughly that the optimal policy will generate  [   ] = 0.94, and  [   ] = 0.90.This simulation was based on 25 runs, so  = 25 here.We can see from this that since our applicant pool has a fairly high initial probability of timely payments, a small  will likely lead to convergence of the two populations to 1.

Figure 1 :
Figure 1: Structural interventions in the algorithmic decisionmaking pipeline change the effects of decisions on downstream consequences.

Figure 2 :
Figure 2: The top graph displays the distribution of  0 for the advantaged (pink) population and the disadvantaged (blue) population.The three bottom graphs depict the recommended intervention for different levels of .The y-axis is  , as described in the earlier sections that enumerate each policy.We choose the policy that maximizes lim  →∞   .Higher opacity levels correspond to a stronger recommendation for a certain intervention over the other two (all effect sizes are scaled to 1).The lower left coordinate of each box is the anchor point for each box; for example, a yellow square with lower left coordinate (a,b) means that at starting  =  and intervention %  = , the group-blind intervention is recommended.

Figure 3
Figure3depicts these results for 4 different distributions.Across all distributions, it appears that increasing  leads to a stronger recommendation for the -only intervention.On the contrary, when  is low, the group-conscious interventions are more strongly recommended in general. measures the degree one prefers "outcome equity" over "overall efficiency." Somewhat counterintuitively, this means that an increased preference for efficiency over outcome equity increases the strength of recommendations for group-specific interventions.Other results depend more specifically on the starting distributions.First, we can look at the two graphs on the left side of Figure3when the overall mean is low for both populations.Here there is a strong recommendation for group-conscious interventions when the starting  level is low, or when there are enough resources such that the  level can be reduced significantly for the disadvantaged population.This pattern shows up very clearly in the bottom left corner, when there is high intergroup difference.In these cases, only a large intervention will have any effect, because both populations are starting at a place where they are more likely to fail.The "band" pattern exhibited when the mean is low but the populations are close together can be explained by the fact that a too-large intervention would benefit the disadvantaged population to a point where they would be better off than the advantaged population.In this case, a preference for outcome parity would only lead to recommending the group-conscious intervention in the case where it leads exactly to parity and increased efficiency in outcomes.

Figure 3 :
Figure 3: Each quadrant depicts starting distributions and recommended interventions at  = 0.2, 0.5, 0.8.Blue corresponds to the group-conscious intervention, yellow corresponds to the group-blind, and gray corresponds to beta-only.The lower left coordinate of each box is the anchor point for each box; for example, a blue square with lower left coordinate (a,b) means that at starting  =  and intervention %  = , the group-conscious intervention is recommended.Opacity corresponds to the marginal utility of this intervention over the next best intervention type.

Figure 4 :
Figure 4: The top panel shows the empirical kernel density estimate of the distribution predicted risk of late payment in the HMDA Massachusetts 2021 data.The bottom three graphs show the recommended interventions for different , , and intervention size.

Figure 5 :
Figure 5: Empirical kernel density estimate of the distribution predicted risk of late payment in the HMDA Massachusetts 2021 data.

Figure 6 :
Figure 6: The left panel shows the estimated empirical cdf of the two distributions in Figure 5.The right panel shows the maximum attainable mean for the two populations, given the starting level of .
where  is the cumulative distribution function.

Table 1 :
Partial list of data fields in the FNMA and HMDA datasets Therefore the probability of reaching any absorbing state given that we start at  ∈ [, 1] = 1.