Gender Animus Can Still Exist Under Favorable Disparate Impact: a Cautionary Tale from Online P2P Lending

This paper investigates gender discrimination and its underlying drivers on a prominent Chinese online peer-to-peer (P2P) lending platform. While existing studies on P2P lending focus on disparate treatment (DT), DT narrowly recognizes direct discrimination and overlooks indirect and proxy discrimination, providing an incomplete picture. In this work, we measure a broadened discrimination notion called disparate impact (DI), which encompasses any disparity in the loan’s funding rate that does not commensurate with the actual return rate. We develop a two-stage predictor substitution approach to estimate DI from observational data. Our findings reveal (i) female borrowers, given identical actual return rates, are 3.97% more likely to receive funding, (ii) at least of this DI favoring female is indirect or proxy discrimination, and (iii) DT indeed underestimates the overall female favoritism by . However, we also identify the overall female favoritism can be explained by one specific discrimination driver, rational statistical discrimination, wherein investors accurately predict the expected return rate from imperfect observations. Furthermore, female borrowers still require 2% higher expected return rate to secure funding, indicating another driver taste-based discrimination co-exists and is against female. These results altogether tell a cautionary tale: on one hand, P2P lending provides a valuable alternative credit market where the affirmative action to support female naturally emerges from the rational crowd; on the other hand, while the overall discrimination effect (both in terms of DI or DT) favors female, concerning taste-based discrimination can persist and can be obscured by other co-existing discrimination drivers, such as statistical discrimination.


INTRODUCTION
Online peer-to-peer (P2P) lending is a fast-growing FinTech innovation which allows individual investors to directly crowdfund individual borrowers' loans through online platforms, bypassing traditional intermediaries like banks.In this digital lending landscape, the P2P lending platforms often leverages machine learning (ML) models and alternative data to assess borrower creditworthiness.The ML-based credit scoring largely supports investors' lending decisions and improves borrowers' access to credit [47,63].As a result, P2P lending has showed great success in extending financial access and promoting financial inclusion [40,86].Notably, the global P2P lending market was valued at USD 82.3 billion in 2021, and is projected to reach USD 804.2 billion by 2030 [8].
The extent of gender discrimination against borrowers on P2P lending platforms, whether male or female, is an important yet under-explored topic.While some studies show that female borrowers are often equally [21,33] or more likely [32,77,78] to be funded on their P2P loans, they focus on disparate treatment (DT), a notion that narrowly recognizes direct discrimination.Specifically, the existing studies test the direct effect of borrower gender on the loan's funding success, controlling for all other observed characteristics.However, DT fails to account for the effects of indirect and proxy discrimination [2], wherein seemingly neutral practices can still have a disproportionately negative impact on a specific group.For instance, in Griggs v. Duke Power Co. [1] the US Supreme Court deemed the company's standardized testing requirement for job transfers to be illegal.It is because although the requirement was not intentionally discriminatory, it had a disparate impact on black employees and was not reasonably job-related.Thus, to fully realize the legal guarantees of equal citizenship, it is crucial to adopt a broader perspective beyond DT.
More importantly, the extent of discrimination elimination will depend on the discrimination drivers [37,48], but identifying discrimination drivers, such as taste-based discrimination [23] and statistical discrimination [9,16,75], is a non-trivial task.In tastebased discrimination 1 , the decision makers discriminate because they have personal taste that favors (or disfavors) members of a particular group.On the other hand, statistical discrimination refers to the situation where the decision makers perceive the group-level difference in decision-related characteristics and use this information to infer the expectations from noisy decision signals.Unlike taste-based discrimination, statistical discrimination is considered rational and economical.However, little is known about whether taste-based and statistical discrimination can co-exist and contribute to the overall discrimination in online P2P lending.
Online P2P lending provides a unique opportunity to study gender discrimination and its underlying drivers in ML-assisted collective human decisions.In this study, we introduce a broadened discrimination notion called disparate impact (DI), which is motivated from the legal doctrine of disparate impact [1].Disparate impact allows the plaintiff to initiate a discrimination case by showing a disparate adverse impact on the protected group without showing explicit categorization.To refute the claim, the defendant must prove the shown disparity can be "justified as serving a legitimate business goal" [4].In other words, DI encompasses any unwarranted disparity that does not commensurate with individuals' true qualifiedness, regardless of whether it is direct discrimination.In online P2P lending, since investors can decide the amount they wish to invest in each loan, the loan's eligibility for funding is solely determined by its return rate2 .As such, we define DI as the disparate funding of loans that yield identical return rates but differ in the gender of the borrower.The notion of DI studied in this work relates to and expands upon the concept of equality of opportunity in computer science [25,52,82] and disparate impact in economics [15,18,19].
However, empirical estimation of DI remains a challenge.This is mainly because the return rates of unfunded loans are unobservable.In a related work, Arnold et al. [15] develop a quasi-experimental solution to measure the broadly defined discrimination in the context of bail decisions.But this approach has limitations, as (i) it only pertains to binary qualifiedness, and (ii) it relies on random assignment of decision-makers.It thus may not be applicable in many other settings, including online P2P lending.Alternative estimation strategies are needed.
We develop a two-stage predictor substitution (2SPS) approach to estimate DI using only observational data.In the first stage, we predict and impute the return rates for the unsuccessful loans.Specifically, we consider the decomposition of return rate as the product between the repayment ratio3 and (1 + interest rate), where the interest rate is fully observed.We construct a survival model [35,66], which explicitly models the amount of time it takes for a borrower to default, to predict the repayment ratio of the loan.In the second stage, we estimate DI non-parametrically.Taken together, we bootstrap the framework for 500 times to obtain confidence intervals that account for both stages.We also analyze the asymptotic properties of the 2SPS approach and show its robustness.
We obtain anonymized backend data from one of the largest online P2P lending platforms in China and empirically investigate the gender discrimination following the 2SPS approach.We find female borrowers, given identical return rates, are 3.97% (95% CI: 3.98%∼3.95%)more likely to be funded.While the DI favoring female is marginal for loans with return rate ≤ 1.16, it is highly significant for loans with return rate > 1.16, which constitute 90% of our data.Through a decomposition analysis, we show that at least 37.1% of the DI favoring female is due to indirect or proxy discrimination.Corroboratively, DT indeed underestimates the favoritism towards female borrowers by 44.6%, equivalent to a 2.15% higher likelihood of funding for female borrowers compared to male borrowers, ceteris paribus.Our results remain consistent across a number of robustness checks.
We further investigate a decision model which decomposes DI into two components, capturing taste-based and statistical discrimination respectively, by incorporating a threshold test [76,84] in the second stage of 2SPS approach.Surprisingly, we find the overall female favoritism can be fully explained as rational statistical discrimination, which posits that the investors accurately predict the expected return rate from noisy repayment ratio (intuitively borrower creditworthiness) signals.The statistical discrimination favors female because female borrowers are factually less likely to default and have higher repayment ratio.Consequently, given identical but noisy repayment ratio signals, the expected return rate is higher for female borrowers than male borrowers.However, despite this, female borrowers still need a higher expected return rate of 1.099, compared to 1.079 for male borrowers to secure funding.This suggests another driver taste-based discrimination co-exists and is against female borrowers.
We conclude this work with a discussion of our empirical results.We contend that gender gap in financial access still exists [40] and females are still disadvantaged in many other credit markets [29,31,89].P2P lending, therefore, complements traditional bank lending by providing an alternative credit market where the affirmative action to support females is driven by the market forces rather than bureaucratic rules, is win-win, and can emerge naturally from the rational crowd.On the other hand, it is important to recognize that different discrimination drivers can co-exist and contribute to the overall discrimination effect.This paper illustrates a real-world scenario where while the overall discrimination effect favors female (both in terms of DI and DT), gender animus against female can still persist, because it may be obscured due to a substantial amount of statistical discrimination.We hope the insights in our study act as a starting point for future research to deepen the current understanding of discrimination drivers in human and/or machine decision decisions.

RELATED WORK
P2P Lending Uncovering the importance of FinTech enabled financial inclusion, there is an increasing number of research on online P2P lending.One stream of research tries to identify various determinants of funding success and borrower default [55,60], including appearance [44,78], social ties [49,64], and textual descriptions [63].Herzenstein et al. [56], Zhang and Liu [91] study the online investors' herding behavior.Berger and Gleisner [24] look into the role of group leaders in crowdfunding.Lin and Viswanathan [65] find that home bias, the tendency that transactions are more likely to occur when two parties are geographically closer, still exists in online P2P lending.And Tang [86] finds P2P lending is a substitute for traditional bank lending in terms of infra-marginal borrowers, but serves as a complement in terms of small loans.Gender Discrimination in P2P Lending Although female is often discriminated against in traditional bank lending [11,26,69], findings from P2P lending tend to show otherwise.For example, Barasinska and Schäfer [21] find no effect of gender on funding success on a German P2P lending platform.Pope and Sydnor [77], Ravina [78] find females are favorably treated on an American P2P lending platform Prosper.com.Notably, Chen et al. [32] and Chen et al. [33] study P2P lending platforms in China.While Chen et al. [33] find no significant gender effect on funding success, Chen et al. [32] find females are more likely to be funded than males.This line of literature, however, focuses on disparate treatment and provides an incomplete picture of gender discrimination.They also do not identify how taste-based discrimination and statistical discrimination can co-exist.Disparate Impact In economics, the ideal of recognizing any unwarranted disparity that does not commensurate with individuals' true qualifiedness traces back to Aigner and Cain [9].Ayres [18,19] discusses the distinction between testing disparate treatment and disparate impact.Arnold et al. [15] develop a quasi-experimental tool to measure disparate impact in bail decisions.Bohren et al. [28] further decompose disparate impact into direct discrimination and systemic discrimination.The latter is the combined effect of indirect and proxy discrimination.A closely related notion is equality of opportunity (EOpp) in the computer science literature [25,52,82].Motivated from a binary classification setting, EOpp argues the false negative rate, intuitively the denied opportunity to those who deserve it, should be equalized across different protected groups.Notably, EOpp similarly narrowly controls for the true qualifiedness.Taste-based and Statistical Discrimination The classical tastebased discrimination is first theorized in Becker [23], which considers discrimination as blatantly applying disparate decision thresholds.Then, Aigner and Cain [9], Arrow [16], Phelps [75] formalize the statistical discrimination model, where the decision makers do not have an intrinsic preference for one group but are incentivized to use the protected attributes to accurately assess the individual's quality of interest, which might lead to unfair outcomes.There is an increasing interest to distinguish taste-based and statistical discrimination, for example in healthcare [20], bail decisions [15] and roads policing [67], as well as algorithmic decisions [73] and reinforcement learning [45].

BACKGROUND 3.1 The P2P Lending Platform
We collaborate with one of the largest online P2P lending platforms 4in China (hereinafter referred to as "the platform").The platform connects individual investors to individual borrowers across the country and offers unsecured personal loans.When borrowers submit a loan application, they self-specify the desired loan amount, interest rate, and duration of the installment payments from a range allowed on the platform.The borrowers are also required to provide a set of loan and borrower information.Then, a loan listing containing these information is generated on the platform.
The loans are open for investors to view and subscribe for a fixed period of time.The investors may subscribe a small amount for every loans to diversify the risk.The platform operates an All-or-Nothing crowdfunding policy.A loan is considered successful if and only if the borrowing amount is fully reached.Otherwise, the loan is unsuccessful and the borrower will not receive any money.For the successful loans, the borrowers make repayments in equated monthly installments (EMI) according to the loan term.
The platform develops a machine learning (ML)-based credit scoring model to support investors' lending decisions, making the loan's funding success a ML-assisted collective human decision.At every loan application, the credit scoring model categorizes the borrower into credit grades from I to VIII, with grade I / VIII representing the lowest / highest risk.Then, the credit grade is shown on the loan's information page to assist the investors' lending decisions.However, our data does not include the credit grade information.

Data
We obtained anonymized backend data on loans and borrowers on the platform.Our data consists of 1,006,161 loan listings with 12-month installment plans between January 1, 2016 and June 30, 2016.As listed in Table A1 in Appendix, for every loan, our data includes (i) borrower characteristics such as gender, marriage, age, employment, education, whether he/she is a repeated borrower, number of past failed borrowings, number of past aborted borrowings, number of past ontime payments, and number of past late payments; and (ii) loan characteristics including borrowing amount, interest rate, whether the loan is requested on the mobile application, and whether the loan is an express loan.Furthermore, our data also includes whether a loan is successfully funded and, if so, the borrower's installment payment record over the loan term.In consistent with the platform's operational definition, an installment is considered defaulted if the payment is late for more than 90 days.Table A1 in Appendix reports the descriptive statistics.
We note that the 2SPS estimate of DI only requires us to include covariates that are predictive of the repayment ratio, but does not requires us to control all factors that enter the investor's lending decisions, a fundamental difference from DT. Two comments are in order.First, our data in fact includes more attributes, such as the borrower's city, district, and registration time.But we do not include them because we find them unpredictive through a backward feature selection procedure using the Akaike Information Criterion [10].Second, the 2SPS estimate of DI might be biased due to omitted variables in predicting the repayment ratio.Concrete examples are the loan's credit grade and description, which are not available in our data but are reasonably believed to contain additional information about the borrower's default risk.But this issue is generally unavoidable.We discuss this bias theoretically in Sec.4.3 and empirically test it in Sec.5.4.
We conduct the following pre-processing steps.First, we drop the loans whose payments are non-conventional for ease of analysis, including (1) those who have some installment partially payed (N=2,806) and (2) those who first default some installment(s) but then pay again (N=3,301).Second, we winsorize the loans whose amount, age, # past ontime payments, # past late payments, # past failed borrowings, and # past aborted borrowings fall in the top or bottom 0.5% quantile to eliminate outliers (N=29,985).Lastly, we drop loans whose interest rates are lower than 16% (N=293,596), which are always unfunded.It reflects that the investors on the platform simply do not consider loans whose interest rates are lower than 16%.The final sample consists of 676,473 loans.Borrower Gender Given the centrality of gender discrimination in this paper, it is worth elaborating how the gender attribute is obtained.When borrowers first sign up on the platform, they are required to provide the 18-digit Chinese Citizen Identity Number.Then, the borrower's biological sex is extracted from the second last digit: odd numbers are issued to males and even numbers to females.Therefore, in this paper we treat gender as the binary biological sex.We acknowledge this treatment risks marginalizing and miscategorizing transgender and non-binary gender.Finally, we note that borrower gender is explicitly shown on the loan's information page.

Notations
The loans are indexed using a subscript  ∈ [ ].We use X  to denote the vector of loan and borrower characteristics shown in Table A1.  ∈ {,  } denotes the borrower's gender:  means male and  means female.  > 0 denotes the interest rate.And   = 1(or 0) denotes loan  is successfully funded (or unsuccessful).For every loan , we construct two variables: the repayment ratio   and the return rate   .The repayment ratio   ∈ [0, 1] is defined as the ratio of the borrower's successful repayment amount to the total amount the borrower should repay, including the principal and the interest.  < 1 means the borrower defaults and   = 1 means the borrower does not default.The return rate   ∈ [0, 1 +   ] is defined as the ratio of the borrower's successful repayment amount to the principal.  = 0 corresponds to the complete loss of the principal and interest;   = 1 corresponds to the borrower repaying the principal but not the interest; and   = 1 +   corresponds to the borrower repaying the principal and interest in full.Importantly, the return rate   admits the following decomposition: (1)

Disparate Treatment and Disparate Impact
We start by discussing disparate treatment (DT), a notion that is also legally based but narrowly recognizes direct discrimination.
From the litigation perspective, the disparate treatment doctrine requires the plaintiff to show the prima facie disparities are at least in part motivated by the legally protected attributes.Then, when the defendant articulates some legitimate non-discriminatory reasons [3] or "demonstrate that it would have taken the same action in the absence of the impermissible motivating factor" [5], the burden shifts back to the plaintiff to demonstrate that the defendant's stated reasons are insufficient and pretextual [42].In other words, it is ultimately the plaintiff's burden to show the legally protected attributes indeed enter the defendant's decision making.Statistically, it requires examining whether the protected attributes still have a significant effect on decisions, after controlling all other relevant factors.The previous literature on P2P lending [21,32,33,77,78] follows this approach in testing disparate treatment.Disparate impact has distinct legal elements compared to disparate treatment.The disparate impact doctrine was formalized in the landmark U.S. Supreme Court case Griggs v. Duke Power Co. [1].The Court concluded that the company's facially neutral standardized testing requirement is illegal because (i) the tests prevented a disproportionate number of African-American employees from transferring to higher-paying jobs and (ii) they are not reasonable measures of job performance.From the litigation perspective, a plaintiff bringing a disparate impact case bears the initial burden to show both a disparate adverse effect on a protected group and a specific policy that causes such statistical disparity.The defendant, then, must prove the shown disparity can be "justified as serving a legitimate business goal" and no less discriminatory alternatives exists [4,6].
A statistical measure of disparate impact, therefore, should measure any unwarranted disparity that does not commensurate with individuals' true qualifiedness, irrespective of whether it is direct discrimination.In P2P lending context, since the investors can determine their own investment amount to every loan, the loan's qualifiedness to funding solely refers to the return rate.Correspondingly, we define disparate impact as the disparate funding of loans that differ in gender but yield identical return rates.Definition 1. Disparate Impact (DI) at a return rate level  is defined as follows: (2) The average level of DI is given by: where the expectation is taken over the population distribution of the return rate .
We acknowledge that two important policy components of disparate impact are missing in our statistical measure.One is the plaintiff's burden to establish a causal relation between a specific policy and the shown statistical disparity.The other is the defendant's burden to show no other less discriminatory practice exists.We recognize the significance of these components and encourage future research to consider them in their analyses.

TWO-STAGE PREDICTOR SUBSTITUTION
We develop a two-stage predictor substitution approach to estimate DI from observational data.Estimating DI poses a challenge because the return rate is only observable for the funded loans.To overcome this, in the first stage of 2SPS, we take a predictive approach to impute the return rate for the unsuccessful loans.Then in the second stage, DI can be directly non-parametrically estimated.

First Stage: Survival Model
We leverage the decomposition of the return rate as the product between the repayment ratio and (1 + the interest rate) (Eq.1), where the interest rate is fully observed.We construct a survival model to predict the repayment ratio using the observed loan and borrower characteristics.Then, the predicted repayment ratio is multiplied with (1 + the interest rate), which is fully observed, to produce the predicted return rate.
Survival models focus on modelling the time to the occurrence of some event [35], such as patient's time-to-death in biomedical studies [36,72], machine's time-to-failure in engineering, and borrower's time-to-default [41,70,83,85] in financial context.We choose survival model as a predictive model for repayment ratio for several reasons.First, a borrower's time-to-default provides a sufficient statistics for the repayment ratio because the borrowers make repayments in equated monthly installments and the loans have fixed 12-month installment plan.Second, survival model explicitly characterizes the default risk over time, leading to more accurate predictions than single period classification models [83].Lastly, survival model automatically addresses right-censoring of the borrower's payment record, which occurs in our data due to data cut-off.
Formally, the survival model focuses on  ∈ N = {0, 1, 2, • • • }, the random variable that represents the number of months till the occurrence of default.We call  default time for short.In our context,  = 0 means the borrower defaults at the 1st month's installment (and all subsequent months'), and thus corresponds to repayment ratio  = 0.  = 1 means the borrower defaults at the 2nd month (and all subsequent months'), and has repayment ratio  = 1/12.Similar is true for  ∈ {2, • • • , 11}. ≥ 12 means the borrower does not default and has repayment ratio  = 1.
A survival model is completely defined by the hazard function, which defines the instantaneous rate of default given that the borrower has not defaulted before and including the -th month.Since we consider monthly installment and  ∈ N, ℎ() is the probability that the borrower defaults at the ( + 1)-th month condition on he/she has not defaulted before and including the -th month.With a slight abuse of language, we call ℎ() the hazard rate at the -th month (rather than at the ( + 1)-th month).
We start with the classical Cox proportional hazard (PH) model [38], which assumes the following hazard function, (5) The baseline hazard ℎ 0 () is non-parametric and describes the effect of time for individuals with X = 0, who serves as a reference cell.The parametric component exp(X), then, describes the relative increase or decrease of hazard associated with X.The Cox PH model imposes two strong assumptions-the log-linearity assumption and the proportional hazard assumption-that are unlikely to strictly hold in reality.We generalize the model to relax these assumptions.The Cox PH model is a log-linear model, where the continuous covariates act exactly linearly on the log-hazard.Our first generalization is to allow non-linear relationships by applying natural spline transformation [87], where   denotes the natural spline transformation.In implementation the natural spline transformation is only applied on the continuous covariates.
The proportional hazard (PH) assumption is the distinguishing feature of the Cox PH model.It, nonetheless, restricts that all loans' hazard rates over time are of a common shape, determined by ℎ 0 (), and the covariates X affect the hazard rate time-independently.Our second generalization is to allow time-dependent effects by adding time interactions, ℎ( | X, ) = ℎ 0 () exp  (1)   (X) +  (2)   (X)   () , (7) where both the covariates and time are natural spline transformed.
We explain how the survival model of Eq. 7 is fitted in Appendix A.2, following standard practices in survival analysis [35,66].Using the fitted survival model, we obtain predicted repayment ratio by sampling from the predicted hazard rate over the loan term, with pseudocode shown in Procedure 1 in Appendix.Finally, the predicted repayment ratio is multiplied with (1 + interest rate) to produce the predicted return rate.We acknowledge that more sophisticated survival models, such as mixed cure models [74], and machine learning models [88] can potentially improve the predictive power.But we balance between model's simplicity and predictive power.

Second Stage: Non-Parametric Estimate
Using the predicted return rate for the unsuccessful loans as well as a small portion of successful loans (< 3%) whose payment record are right-censored due to data cut-off, DI of Eq. 2 and 3 can be directly non-parametrically estimated.To address the challenge of small sample sizes at unique return rate values, we divide the return rate into small intervals and assume a constant loan success rate within each interval.We obtain confidence intervals by bootstrapping both stages for 500 times.In every iteration, we resample the data, fit the survival model, and estimate DI using the newly fitted survival model.

Analysis of Bias
Using the Bayes rule, DI can be expressed as follows, where we use subscript  in   to denote the distribution is additionally condition on borrower gender  = .We use D () to denote our 2SPS estimate of  (), which uses predicted repayment ratio λ for the unsuccessful loans5 , We define bias from the first stage of 2SPS as  , (), The first-stage bias  , () is measured at a specific repayment ratio level  condition on borrower gender , interest rate  , and loans being unsuccessful  = 0. We now can express the asymptotic bias for our 2SPS estimate of DI in terms of the first-stage bias: Therefore, our 2SPS estimate is unbiased as long as  2 () = 0.A more strict but sufficient condition is both the average first-stage biases   ,   are zero.Two comments are worth highlighting.First, as defined in Eq. 10, the first-stage bias  , () is measured without conditioning on X.We only need the predicted repayment ratio λ to induce a distribution unbiased to   ( |  = 0,  =  ), which is a much weaker condition than unbiasedness of the predicted repayment ratio itself, the first-stage bias  , (/ ) is further marginalized over the interest rate  (red terms).I.e., only the average first-stage biases affect the 2SPS estimate.The 2SPS estimate is also robust to the average first-stage biases   ,   when they are non-zero.The reasons are as follows.First,   and   are both down-scaled in  2 (): the coefficient in front of them (blue terms) are always smaller than 1.Second, When   and   have the same sign, they partially cancel each other in  2 ().Intuitively, the 2SPS estimate is partially protected from systematic over-or underestimation of repayment ratio.

Survival Model
We plot the fitted hazard curve in Fig. 1 and report the survival model's fitted coefficients in Tab.A2 and Fig. A1 in Appendix.For both male and female borrowers, the hazard rate is the highest at the 1st month, drops abruptly at the 2nd month, and then slightly increases till around the 8th month, and finally decreases to around 0 at the 12th month.Our interpretation is: the uncollateralized nature of online P2P lending invites a number of ill-intentioned borrowers who intend to default all installments.Fewer borrowers default at the last several installments because it is uneconomical: they could either default at an earlier time to earn a higher economic gain, or do not default at all to maintain their credit.Notably, male borrowers have an increased hazard rate, 1.503 (log-S.E.=0.021) times that of the female borrowers.Female has a lower propensity to default at all months.
We report and discuss diagnostics of the fitted survival model in Appendix A.5.We assess the proportional hazard (PH) assumption by plotting the scaled Schoenfeld residual [51,80].We assess the model's goodness-of-fit by plotting the Cox-Snell residual [39].And we report the fitted survival model's predictability in terms of the concordance index [53].Results show the survival model is well-specified, has goodness-of-fit, and is predictive of borrowers' default.We note these are standard and widely accepted techniques for assessing the adequacy of survival models and we refer the readers to Section 4 of Collett [36] for a detailed review.

Repayment Ratio and Return Rate
We plot the gender-specific repayment ratio distribution and return rate distribution in Fig. 2 and 3, using the predicted values for the unsuccessful loans.We find female borrowers' repayment ratio distribution dominates the male borrowers'.A higher fraction of female borrowers-92.4%compared to 88.9% for male borrowersdo not default and have repayment ratio  = 1.And fixing any repayment ratio level  ∈ {0, 1/12, • • • , 11/12} where default does occur, there is always a lower fraction of female compared to male.Nonetheless, the female borrower' return rate distribution does not dominate the male borrower's.Female's return rates are more concentrated in an intermediate range, between 1.16 ∼ 1.24.Male's return rate has larger variation: their loans are more likely to yield both very high (> 1.24) and very low (< 1.16) return rates.The reason is, as reported in the descriptive statistics (Tab.A1), males on average borrow at a higher interest rate, which partially compensate their lower repayment ratio.

Disparate Impact
Fig. 4 reports the estimated loan success rate and disparate impact.Let's first focus on the loan success rate.The trend that loans with higher return rates are more likely to be funded holds individually within the loans whose return rate < 1.16 and within the loans whose return rate ≥ 1.16.The loan success rate drops abruptly from the return rate interval [1.1, 1.16) to [1.16, 1.2].Our explanation is while these two set of loans yield similar return rates, they differ greatly in their interest rates.Since the interest rate in our data ranges between [0.16, 0.36], the loans with return rate between [1.16, 1.2] are mostly the loans that do not default but offer some of the lowest interest rates on the platform6 .In contrast, the loans with return rate between [1.1, 1.16] are the loans that offer higher interest rates but later defaulted.Therefore, the investors appear to be risk-seeking in funding loans with return rate [1.1, 1.16) at a much higher rate than loans with return rate [1.16, 1.2).
DI is estimated as the difference between male and female's loan success rates, conditional on identical return rates.We observe marginal DI favoring female for loans with return rate < 1.16 but very significant DI favoring female-with magnitude varing between 2.48% and 6.68%-for loans with return rate ≥ 1.16.The CIs are especially sharp for the latter because around 90% of the loans in our data have return rate ≥ 1.16.Averaged over the population distribution of return rate, we find female borrowers are 3.97% (95% CI: -3.98%∼-3.95%)more likely to have their loans successfully funded, given identical return rates.We note that as reported in the descriptive statistics (Tab.A1), female borrowers prima facie have 2.2% higher loan success rate, 85.9% compared to 83.7% for male borrowers.By additionally controlling the return rate, DI reveals the female favoritism on the platform is around 1.8 times the prima facie gender difference.

Analysis
Indirect and Proxy Discrimination We use a decomposition technique to investigate how much of DI is indirect and proxy discrimination.This procedure proceeds as follows.We first replace the second stage of 2SPS with an OLS regression, regressing loan success on borrower gender and return rate.The return rate is spline transformed to allow non-linear effects.The gender coefficient gives a different 2SPS estimate of DI where the second stage is parametric.Then, by additionally controlling the observed loan and borrower characteristics in OLS regression, we measure the reduction of the gender effect, which identifies the DI that can be explained by them.This estimate gives a lower bound for indirect and proxy discrimination, since there might be unobserved mediators and proxy variables.In a concurrent work, Bohren et al. [28] develop a similar decomposition-based approach to distinguish direct and systemtic discrimination.
The results are reported in Tab.??.First, the OLS estimate finds female is 3.88% more likely to be funded given identical return rates.It is very close to and corroborates our original 2SPS estimate of 3.97%.Second, additionally controlling for observed loan and borrower characteristics significantly reduces the gender effect to 2.44%.It means at least 37.1% of the DI favoring female is indirect or proxy discrimination.Lastly, we also report in Tab.?? an estimate of DT, which controls for loan and borrower characteristics but not the return rate.DT indeed underestimates the female favoritism by 44.6%, equivalent to saying female is 2.15% more likely to be funded than male, ceteris paribus.Data Disaggregation To investigate how DI varies in different cohorts of loans or borrowers, we report in Tab.A4 in Appendix the reestimated DI in data subsets disaggregated by borrowing time, marriage, repeated borrower, age, borrowing amount, interest rate, employment, and education.We find consistent DI favoring female in almost all subsets, with mild variation in its magnitude.The DI favoring female is the largest for student borrowers at 6.45%, and is larger for loans with higher interest rates.Notably, we find the magnitude of DI favoring female decreases with higher education, and even becomes against female for borrowers with Master or Doctorate degrees.But these borrowers only constitute around 0.2% of our data.This is the only case where we find DI becomes against female.Sensitivity to Overestimation of the Repayment Ratio In the 2SPS approach, we predict the repayment ratio for the unsuccessful loans using data from the successful loans.The concern is we might systematically overestimate the unsuccessful loan's repayment ratio because the investors use information that is unrecorded but is indeed predictive of default in their lending decisions.One such concrete example is the loan's description, which the investors do observe, is not available in our data, and is reasonably believed to be predictive of default [61,90].This issue is generally unavoidable as it is unrealistic to account for all factors that are predictive of default.Although Section 4.3 theoretically shows the 2SPS estimate is partially protected from systematic overestimation of the repayment ratio, we are still interested in empirically testing its sensitivity.
Thus, we simulate the scenario where there is an omitted fixed effect associated with the unsuccessful loans.A fixed effect in the ~84.4% of loans do not default all loans default 0.7 0.8 0.9 Various 2SPS estimates.Some covariates in X are eliminated in the backward feature selection procedure of OLS regression using Akaike Information Criterion (AIC) [10].
-0.0397 -0.0376 -0.0355 -0.0332 -0.0310 survival model is a constant multiplicative factor to the hazard rate (defined in Eq. 4) across all months.We assume the omitted fixed effect is in the direction that the actual hazard rates are higher than our current predictions, and vary its strength from {1.5, 2, 2.5, 3}.
Concretely, the unsuccessful loans' predicted hazard rates across all months are multiplied with 1.5, 2, 2.5, or 3.The results are reported in Fig. 5.We observe a mild linear decrease in the magnitude of DI favoring female.The DI favoring female is still more than 3% when the unsuccessful loans' default risks are underestimated by three-fold.Applying a linear extrapolation, if the actual DI is zero or is against female, the unsuccessful loans' actual hazard rates will have to be at least 10 times our current predictions.Using Fig. 1 as a reference, it means their hazard rates are at least around 5% at every month except the very last months.We argue this scenario is extremely unlikely.Thus, our finding of DI favoring female is robust to potential overestimation of repayment ratio.

TASTE-BASED AND STATISTICAL DISCRIMINATION 6.1 Decision Model
Next, we investigate a decision model which decomposes DI into two components due to taste-based [23] and statistical discrimination [9,16,75].Concretely, we assume every loan 's repayment ratio is sampled from a gender-specific Gaussian distribution,   ∼  (  ,  2 ,0 ), with repayment ratio mean   and repayment ratio variance  ,0 .We stylize the process of investors evaluating various information of the loan as perceiving a noisy repayment ratio signal λ , λ =   +  ,1   ,   ∼  (0, 1), with gender-specific noise variance  ,1 .We assume the investors are able to accurately predict the expected repayment ratio λ , where   is known as the signal reliability.  exactly captures the repayment ratio variance  ,0 and the repayment ratio noise variance  ,1 's effect in computing the expected repayment ratio λ .The above process formalizes statistical discrimination, where the investors leverage the easily observable borrower gender to accurate predict the expected repayment ratio λ .But doing so might have a disparate impact on male and female borrowers.
Then, since the interest rate   is fully observed, we assume the loan's funding success is determined by whether the expected return rate, computed as the product between the expected repayment ratio λ and (1 +   ), exceeds a gender-specific threshold The funding threshold   formalizes taste-based discrimination.The investors are said to have an intrinsic gender taste that favors (or disfavors) female if   <   (or   >   ).We will use gender taste, gender animus, and taste-based discrimination interchangeably.
Two comments are in order.First, this model gives the investors the benefit of the doubt by assuming they accurately predict the expected repayment ratio.While it is likely the investors are inaccurate, inaccurate statistical discrimination can be observationally equivalent to gender taste [14,59].We therefore follow the common approach in regarding inaccurate statistical discrimination as also arising from gender taste [15].Second, this model gives an "as-if" characterization of the investors' collective funding behavior, but does not imply their actual intentions, which is a much more difficult task.This issue is inherent to most works inferring discrimination drivers from observational data [59,84].The findings from this model is still valuable, for example in ruling out statistical discrimination as the only driver of disparate impact.
We estimate parameters from this model using a similar 2SPS approach.The first stage is identical: a survival model is fitted to predict the repayment ratio for the unsuccessful loans.This directly allows us to estimate the repayment ratio mean   and variance  ,0 .In the second stage, using the predicted repayment ratio and the estimated   and  ,0 , we use Bayesian inference to infer the posteriors of the repayment ratio noise variance  ,1 and the funding threshold   .Finally, the signal reliability   can be derived using the estimated  ,0 and  ,1 .The Bayesian inference step is elaborated in Appendix A.7.It has good convergence as indicated by the Gelman-Rubin diagnostic R [50] at most 1.0023 for all parameters, as well as visual inspection of the traces shown in Fig. A5.We bootstrap both stages for 500 times to obtain confidence intervals.We note the second stage closely resembles a threshold test for discrimination [76,84].

Results
Tab. 1 reports the estimated parameters.Most importantly, we find (i) the observed DI favoring female can be completely explained as rational statistical discrimination and (ii) female borrowers still need to have 2% higher expected return rate in order to be funded, i.e., gender animus against female still exists.To put another way, if the investors only engage in statistical discrimination and do not have taste-based discrimination, female borrowers would be more favorably funded.The statistical discrimination is mainly driven by two factors: (i) female borrowers have a higher repayment ratio mean   , 0.957 compared to 0.934 for male borrowers; and (ii) because the investors' signal reliability   is very low, the repayment ratio mean   plays a significant role in computing the expected repayment ratio λ .

DISCUSSIONS
P2P Lending as a Market-based Affirmative Action Given the reality that gender gap in financial access still persists [40] and female is still disadvantaged in many other credit markets [29,31,89], our empirical results suggest online P2P lending provides an alternative credit market where the affirmative action to support female can arise naturally from the rational crowd.We take a liberal interpretation of affirmative action that focuses on the effect of narrowing existing inequality but does not imply intentions from the investors and/or the platform, similar to the discussion of market affirmative action in Cooter [37].Below, we discuss two reasons why P2P lending can be a desirable policy instrument to narrow the gender gap in financial access.
First, the affirmative action in P2P lending is driven by market forces, which is arguably more efficient than heavy-handed bureaucratic rules such as quota systems [22,34].The P2P lending market functions like a perfectly competitive market because (i) the size of P2P loans is typically small and (ii) there is a large number of investors.Based on standard demand-supply analysis [37], the market competition will impose the cost of gender animus-in the form of lower expected return-upon the investors who demand it.Therefore, competition reduces gender animus.Competition, however, will not eliminate statistical discrimination because it is efficient and reflects rational behavior [75].Consequently, because females are indeed less likely to default on their P2P loans, the market forces will drive the investors to engage in statistical discrimination that favors female.
Second, affirmative action resulted from statistical discrimination leads to an important narrative shift.The female borrowers' favorable DI is incentivized by their own higher expected return rates rather than being granted due to their disadvantaged socioeconomical status.Particularly, it generates a viable defence to the attack on affirmative action that less qualified female borrowers are selected over more qualified male borrowers [30].In fact, more qualified female borrowers-in terms of expected return rates-are selected by the investors.
On a cautionary note, statistical discrimination by itself is likely to be illegal in many jurisdictions.In order to qualify as affirmative action, a practice must be narrowly tailored, not violate the rights of non-protected groups, and meet other legal requirements.Since we lack expertise in law, we refrain from delving into the discussion of legality in this paper.Gender Animus Can Still Exist Under Favorable Disparate Impact While our study on disparate impact reveals a larger female favoritism on the collaborated P2P lending platform than disparate treatment, We caution against making overly simplistic or absolute interpretations of the findings.We provide counter evidence that the overall female favoritism can be completely explained as rational statistical discrimination and gender animus against female can still exist.Gender animus does not become right or justified when its effect is obscured by statistical discrimination.The underpinning is different discrimination drivers, such as gender taste and statistical discrimination, can co-exist and contribute to the overall discrimination.We call for future research that not only tests the existence of discrimination but also identifies the underlying drivers of discrimination, as highlighted by Bohren et al. [27], Hull [59].

LIMITATIONS
There are several limitations in this work, including (1) treating gender as the binary biological sex, (2) the risk of overestimating the unsuccessful loan's repayment ratio due to omitted factors, and (3) the nature of an "as-if" characterization in attempting to identify discrimination drivers.This work is also limited in its inability to discuss individual investors' gender discrimination due to data constraint.Similarly, we are unable to explicitly investigate the ML-driven credit grade' effect on gender discrimination due to the unavailability of the credit grade information.This also presents an opportunity for future research.

CONCLUSION
This work presents a case study of gender discrimination and its underlying drivers on a prominent Chinese online P2P lending platform.We measure the disparate impact (DI) favoring female is around 3.97%, which reveals a larger female favoritism than commonly studied discrimination notion of disparate treatment (DT).But we also identify this female favoritism can be explained as rational statistical discrimination.Gender animus against female can still exit and be obscured by other discrimination drivers.We conclude by discussing the positive role P2P lending can play to reduce the existing gender gap in financial access and the importance of, besides measuring the overall discrimination, identifying what drives discrimination and decomposing their effects.constant coefficient.For these four covariates, the visual test shows the violations are not economically meaningful.Constant coefficients still provide satisfactory approximations to the smoothed curve, and always lie within the 95% CI.In fact, although we do not report due to the limited space, the approximation is visually no worse than the other covariates for which the  2 test is unable to reject the null.Combining evidence from both the  2 test and the visual test, we show the PH assumption is reasonably satisfied.Survival Model Has Goodness-of-Fit.We use the Cox-Snell residual [39] to assess the model's goodness-of-fit.If the model is correct and well-fitted, the Cox-Snell residual   should resemble a censored sample from an unit exponential distribution.And if   follows a censored unit exponential, the cumulative hazard of   against   should be a straight line with zero intercept and unit slope.Survival Model is Predictive.The fitted survival model is predictive of the borrower's default with a concordance index 0.638 (S.E.=0.001) [53].
It means the fitted model predicts which loan has higher repayment ratio-or is more trustworthy-with 63.8% accuracy, for all possible pairs of comparable loans.In Fig. A4, we plot the defaulted loans' ranks as predicted by the survival model against their default months.Each defaulted loan's rank ranges from 0 to 1.A rank of 1 means the loan has the highest predicted default probability at its default month from that month's risk set, which includes all loans that have not defaulted so far.A rank of 0 then means the loan has the lowest predicted default probability.We see that the survival model correctly predicts higher risk for the actually defaulted loans across all motnhs.Its predictability has mild variation and is the highest at the very first month.

Fig. A1 :
Fig. A1: The figures plot the log-risk (with robust S.E.) of the continuous covariates in the fitted survival model.Exponentiated log-risk is the multiplicative increase / decrease of hazard compared to the baseline hazard (ℎ 0 ( ) in Eq. 7).
Fig.A3implements this test.The estimated cumulative hazard of   closely matches the 45°line and thus the survival model is well-fitted.

Fig. A2 :
Fig. A2: The visual test for PH assumption.A smoothed curve with 95% CI is fitted to the scaled Schoenfeld residual.The blue line is a horizontal fit.
residual Cumulative hazard of the Cox−Snell residual
Fig.4: Left and right figures plot the estimated conditional loan success rate and disparate impact (DI), respectively.In the left figure, red solid line denotes female and blue dashed line denotes male.Since the interest rate in our data ranges within [0.16, 0.36], all loans with return rate < 1.16 are defaulted loans.Around 84.4% of loans with return rate ≥ 1.16 do not default.
) −2 , Tab. A3: The  2 test for PH assumption.The null hypothesis is the regression line fitted between the scaled Schoenfeld residual and the observation time has zero slope.