Privacy-Preserving Gross Domestic Product (GDP) Calculation Using Paillier Encryption and Differential Privacy

Gross Domestic Product (GDP) is the total value of goods and services provided by a country for a period of time like one year. It is well known and is widely used by most countries to measure their economy. Various methodologies are used for GDP computation including three major approaches: (i) production, (ii) expenditure, and (iii) income. However, the practical application of these methodologies to actual country-level GDP assessment remains unsatisfied due to inherent limitations in accessing individual-level data. This paper studies the intricacies of GDP calculation, particularly focusing on the income approach. It explores innovative methods to ensure the privacy of participants in this computation, presenting techniques involving encryption and differential privacy. Experiment results show the proposed methods are promising and strongly protect individuals' privacy. This endeavor is groundbreaking, marking the first attempt to calculate GDP and related values while safeguarding contributors' privacy.


INTRODUCTION
The Gross Domestic Product (GDP), a pivotal economic metric, encapsulates the monetary worth of ultimate goods and services-those procured by end consumers-fabricated within a nation during a designated interval.It constitutes the comprehensive summation of all augmented values produced within an economy.The nomenclature gross within GDP signifies the inclusive tallying of products, irrespective of their ensuing utilization.A product's purpose could encompass consumption, investment, or the substitution of an asset.
Diverse methodologies are at one's disposal for GDP computation including the three approaches: (i) production, (ii) expenditure, and (iii) income.Within the ambit of this discourse, we gravitate towards the income approach as the foundational framework for GDP assessment.In the income approach paradigm, GDP materializes as the amalgamation of total national income (TNI), sales tax (SL), depreciation (D), and net foreign factor income (F).TNI designates the cumulative assemblage of all earnings that a nation's residents and businesses accrue over a specified duration.While orchestrating GDP (and by extension, TNI) calculations, the imperative of preserving the privacy of each participant in the computational schema stands paramount.
This research calculates the true GDP by using the income approach, which has a pitfall that may hinder the calculation.It is because the income approach needs the individuals' cooperation for reporting their genuine incomes.People may have a privacy concern such as taxing and hacking when they submit their incomes.A gamut of methodologies is used in this research to safeguard the confidentiality of subjects embroiled in this evaluative process.Two distinct methodologies are expounded herein.The inaugural methodology hinges on the aegis of Paillier encryption, thereby effectuating privacy preservation.The subsequent methodology revolves around the tenets of differential privacy, thereby engendering data confidentiality.To the best of our knowledge, this marks the inaugural endeavor towards the computation of GDP and TNI whilst assiduously upholding the privacy prerogatives of the contributors enmeshed in this intricate calculus.
The rest of this paper is organized as follows.Section 2 shows the background information of this research and related research.Section 3 introduces the three approaches used for GDP calculation.One of them, the income approach, will be focused in this paper.Two major methods used in this research, Paillier encryption and differential privacy, are given in Sections 4 and 5, respectively.The conclusion and future research directions are given in Section 6.

RELATED LITERATURE
A genuine GDP (Gross Domestic Product) can be found only if individual has no privacy concerns.This research is to propose methods for privacy-preserving GDP calculation including three themes: (i) GDP computation, (ii) Paillier encryption, and (iii) differential privacy, which are discussed in this section.

GDP Calculation
GDP is the value of total products and services provided by a country in a time period [14].GDP calculation is straightforward, but wearying.Nevertheless, finding the genuine GDP values has many pitfalls [3].Grishin, Ustyuzhanina, and Komarova [10] discuss the problem of using the GDP to assess the level of the country's economic development.For example, one of the problems is the neglect of labor within the households and the exchange of goods and services.Fioramonti [9] shows GDP is not able to include all gains and losses in an economy.That is why it is a "gross" indicator.For example, it disregards the value of the natural resources like water consumed in the economic process, as these are obtained free of charge from nature.In addition, the economic costs of pollution and environmental degradation from industrial development are not considered.

Paillier Encryption (PE)
Paillier encryption is used in this research, and it is widely used for applications to public-key cryptography.The fundamental idea of Paillier encryption is based on the composite residuosity class problem [13].Paillier proposes a trapdoor mechanism, which derives three encryption schemes: (i) a trapdoor permutation and (ii) two homomorphic probabilistic encryption schemes computationally comparable to RSA, which is a well-known public-key cryptosystem.Paillier's cryptosystems, based on usual modular arithmetics, are proved to be secure under appropriate assumptions in the standard model.Orlandi, Scholl, and Yakoubov [12] show the constructions of homomorphic secret sharing and pseudorandom correlation functions, which could be used for solving the distributed discrete logarithm problem in Paillier groups, allowing two parties to locally convert multiplicative shares of a secret (in the exponent) into additive shares.Related research can be found in the articles [2,4,8].

Differential Privacy (DP)
Dwork et al. [5,7] conduct research on privacy-preserving statistical databases.Consider a trusted server that holds a database of sensitive information.Given a query function  mapping databases to reals, the so-called true answer is the result of applying  to the database.To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user.They also propose general functions  , proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function  .Wei et al. [16] propose a framework based on the concept of differential privacy, in which artificial noises are added to parameters at the clients' side before aggregating, namely, noising before model aggregation federated learning (NbAFL).They prove that the NbAFL can satisfy DP under distinct protection levels by properly adapting different variances of artificial noises.Moreover, they develop a theoretical convergence bound of the loss function of the trained FL model in the NbAFL.They prove that the proposed NbAFL scheme satisfies the requirement of DP in terms of global data under a certain noise perturbation level with Gaussian noises by properly adapting their variances.Related research can be found from the articles [1,6,11].

THE THREE APPROACHES USED FOR GDP CALCULATION
The potency of both global and local economies exerts its influence upon every individual.At the crux of this economic understanding lies the Gross Domestic Product (GDP), a metrical gauge of an economy's expanse, efficacy, and overarching well-being [14].This quantification transpires both annually and quarterly in the United States.In the context of India, it is subjected to quarterly and annual assessment, with the revelation of each quarter's statistics transpiring with a deferment of two months from the denouement of the respective quarter's operational timeline.The annual GDP dataset is unshrouded on the 31st of May, embellished with a similar twomonth delay.In the United Kingdom, the choreography is slightly divergent.Here, novel GDP data is perpetually being assembled on a monthly basis, although it's the quarterly compilations-a triad of months captured in tandem-that command the broadest spectrum of attention.GDP calculation is complicated because it needs to consider many factors.The GDP computation is underpinned by three diverse methodological frameworks (i) expenditure, (ii) income, and (iii) production approaches.

The Expenditure Approach
Among these, the expenditure approach emerges as the most ubiquitously employed GDP derivation technique, hinging on the pecuniary outlays undertaken by ultimate consumers.This encompasses a multitude of instances, such as consumer disbursements on sustenance, the acquisition of services, corporate investments in industrial equipment, and the procurement of commodities and amenities by both governmental bodies and foreign entities.The formulation that encapsulates the expenditure approach is given by Equation 1.
where the variables bear the following connotations: •  designates consumption, denoting the collective magnitude of private consumer disbursement within a nation's economic fabric.•  corresponds to aggregate government expenditure, encompassing disbursements that span remuneration for government personnel, infrastructural undertakings like road construction and repair, allocations to public educational institutions, and military outlay.•  amalgamates the entirety of a country's investment expenditures.•   , symbolizing net exports, signifies the differential between a nation's exports and imports.

The Income Approach
This approach encapsulates the summation of earnings engendered by the provision of goods and services shown in Equation 2.
where • Total National Income amalgamates the entirety of wages, rentals, interests, and profits, among other economic inflows.• Sales Tax denotes the levy imposed by the government on consumer expenditures allocated towards goods and services.• Depreciation pertains to the allocation of cost to a tangible asset throughout its operational lifespan.• Net Foreign Factor Income underscores the contrast between the income generated by a country's citizens and enterprises in foreign territories, juxtaposed against the income procured by foreign citizens and enterprises within the country.
Figure 1 shows the items for the income approach of GDP calculation.This dichotomy serves as the focal point of this paper's emphasis on GDP computation via the income approach.

The Production Approach
On the other hand, the production approach revolves around the consolidation of value addition across each stage of the production continuum.Here, value addition signifies the aggregate revenue deduced from total sales minus the valuation of intermediary inputs utilized in the production continuum.For instance, flour is an intermediary input, whereas bread is the ultimate end product.The computation of GDP holds profound significance in appraising a nation's performance.Presently, GDP computation relies on an array of estimations, a practice that may yield outcomes not commensurate with a desired echelon of precision.This discrepancy is intrinsically linked to the notion that individual engagement might not reach the sought-after level of involvement.
The reluctance of participating individuals to disclose their private fiscal reserves or earnings to the governmental apparatus constitutes a pivotal factor.The looming specter of punitive measures imposed by authorities for tax evasion amplifies these apprehensions, constituting a principal concern.A salient solution emerges if the GDP calculation mechanism could furnish a certain degree of confidentiality for individual users.This would entail refraining from ascertaining the entirety of an individual's income while still permitting the submission of income data to the system in segmented parcels, amalgamated with analogous inputs from other system participants.Such a safeguarded confidentiality schema augments the allure of GDP computation, engendering greater enthusiasm among users to divulge their fiscal particulars.

PAILLIER ENCRYPTION FOR PRESERVING PRIVACY IN GDP COMPUTATION
Safeguarding privacy assumes paramount significance in shielding users' information from unauthorized exposure.The following two sections are meticulously dedicated to the exposition of two distinct methodologies for the calculation of GDP, all with an underpinning of privacy preservation.The GDP calculation is executed through the utilization of Paillier encryption in this section.The succeeding section embarks on GDP computation anchored in the tenets of Differential Privacy (DP).

Paillier Encryption (PE)
Within this context, the computation of a nation's GDP can be ingeniously executed while upholding privacy, employing the efficacy of Paillier encryption shown in Figure 2. PE constitutes a form of For an unencrypted message , bearing the constraint  < , a random number  is selected, satisfying the condition  < , and an element  which is  ∈  2  (integers ranging from 1 to  2 ), functioning as the generator.The steps are shown in Equation 3.
The decryption uses the Equation 4 to find the plaintext .

Paillier Encryption Applied to GDP Calculation
This research adeptly computes a privacy-conserving GDP utilizing the additive homomorphic property inherent in Paillier Encryption, as elucidated in Equation 6.
( ( ( 1 ),  ( 2 ))) =  1 +  2 (6) This signifies that the decryption of the combined encrypted values equates to the summation of the original unencrypted values they represent.This research introduces the innovative concept of "candidates" as the agents to facilitate this endeavor.Within the system's framework, a variable quantity of candidates is generated, denoted as , a numerical parameter constrained to a value beneath a designated threshold, symbolized by  shown in Figure 3.For the purpose of clarity, let us denote three illustrative candidates: Alice, Bob, and Charlie.These entities assume the role of receiving fractional segments of participants' incomes.The disbursements from each participant are randomly apportioned among these candidates.The candidates then collectively deliberate and resolve the frequency at which they would accept these income portions from each participant, subsequently disseminating this information across all participants.Consequently, participants are empowered to fractionate their fiscal data, with the candidates sharing their public encryption keys with all participants.These segmented values undergo encryption via the Paillier scheme at the candidate side.Among the participants, a singular leader is elected within the cluster.During each time division, this leader undertakes the responsibility of receiving encrypted values from all participants within the group.The leader then accumulates these encrypted values in a cumulative manner.Once the leader receives these values, it performs the additive homomorphic encryption process on its end and subsequently dispatches the amalgamated result to the designated candidates.An example is given in Table 1.The essence of Figure 3 can be encapsulated in Algorithm I.
It is noteworthy that each candidate remains unaware of the specific income values attributed to individual participants.The leader, operating within this framework, remains uninformed about the specific values held by other participants in the group, given that these values are maintained in an encrypted format.After

DIFFERENTIAL PRIVACY (DP) FOR PRIVACY PRESERVATION
The utilization of differential privacy (DP) engenders an indistinguishability property, offering an ingenious safeguard.This property manifests through the perturbation of query outcomes in such a manner that an inquirer is rendered incapable of discerning the presence or absence of individual data, let alone the specific individual data itself.Within the purview of DP, two datasets, denoted as  and , are considered neighbors if they solely diverge in a solitary row.A query  exhibits -differential privacy across all conceivable outcomes  and all neighboring datasets  and  in compliance with Equation 7.
Here, the symbol  embodies the quantum of privacy fortification.Equation 7 delineates that the probability of an algorithm generating a specific output  on dataset  remains at most   times the probability of producing the identical output on dataset .Should the system involve a participation cohort of  individuals, then  and  ∈   ×   , wherein  represents the number of partitions.Introducing the concept of sensitivity  for a function  :   ×   →   shown in Equation 8.
The notion of sensitivity, within the ambit of differential privacy, represents the magnitude by which output data can maximally alter upon the inclusion or exclusion of a solitary individual's data.It quantifies the influence of a singular participant within the dataset.
A lower sensitivity value corresponds to a diminished potential for deriving information pertaining to individual data from the ultimate output.Here, ∥ • ∥ signifies the distance matrix employed to quantitatively assess the variance between function outputs for neighboring datasets.The sensitivity metric stands as an amalgamation of individual partitions across temporal index  shown in Equation 9.
The Laplace mechanism constitutes a technique harnessed for introducing perturbations into the output within any given time interval , ensuring the preservation of differential privacy.This mechanism operates through the introduction of noise via the Laplace distribution.The Laplace distribution, an essential component, is a probabilistic distribution characterized by a density function represented by Equation 10.
Within this context,  assumes the role of the introduced noise magnitude.The amplification of  equates to an augmentation in the amplitude of the incorporated noise.This scholarly exposition undertakes the procedure of obfuscating actual participant data by cloaking it with Laplace noise, thereby shrouding authentic GDP values from external observers.Here, individual participants undertake the autonomous task of imbuing the noise, ensuring that external observers remain oblivious to this manipulation, a process entirely devoid of interdependence on fellow participants within the system.This pursuit is actualized by each participant, at any given juncture , infusing noise extracted from the Gamma distribution.Notably, the Laplace distribution can be disassembled into a sequence of distinct individual distributions.This deconstruction is represented by Equation 11, where  1 and  2 denote two mutually independent and identically distributed Gamma distributions, both characterized by shape parameter 1/ and scale parameter .
During each discrete time instance denoted as , each participant takes the initiative to incorporate gamma noise, as outlined in Equation 11, to their respective GDP values, subsequently transmitting this augmented data to the aggregator.The aggregator then undertakes the task of summing all the data points received at time .Consequently, the resultant value for participant  at time  materializes in Equation 12.
Thus, for any given temporal point , the summative value across all participants crystallizes in Equation 13.
Leveraging the expression in Equation 12, we can refine Equation 13to read as Equation 14.
By invoking the formulation delineated in Equation 11, Equation 14assumes a modified structure as depicted in Equation 15.
Hence, the central idea is to incorporate Gamma noise during each temporal interval, ensuring that the authentic partitioned GDP values remain concealed from potential intruders.The computation of GDP is expressed through the Equation 16. Figure 4 shows the structure of the method of differential privacy.However, the computed GDP exhibits divergence from the authentic counterpart by a factor of  multiplied by Laplace noise, symbolized as   .Acceptance of this computed value is contingent upon its divergence being within the error threshold of 5%.If it adheres to this error margin, the value is deemed suitable for utilization within calculations.The discrepancy between actual GDP and the perturbed counterpart is graphically elucidated in Figure 5.The ensuing depiction encapsulates the resultant graph while adopting a scaling parameter of 0.5.This parameter selection ensued from a series of iterative trials involving the available data, culminating in the determination of an apt scaling value that effectively obfuscates the genuine GDP data at the temporal juncture .Presented below are the juxtaposed GDP values in contrast with the calculated perturbed GDP values: An astute observation reveals an insignificant 0.1% discrepancy, which is comfortably contained within the prescribed permissible margins.In the context delineated in Figure 4, we ascribe the nomenclature "capsule" to the assembly of individuals involved in the computation.Should  denote the populace of the nation, and  represent the size of the capsule, the entirety of the population can be partitioned into / capsules.Once the differential GDP of each capsule is ascertained, these capsules congregate to constitute the terminal nodes within a tree structure.Employing the principles expounded in the differential-privacy calculations as referenced in Equation 15, these values undergo recursive processing within the hierarchical tree, persisting until they ascend to the root node.The eventual outcome residing within the root node attains the coveted status of privacy-preserved GDP for the nation.

CONCLUSION AND FUTURE RESEARCH
In essence, Gross Domestic Product (GDP) serves as a fundamental economic indicator, encapsulating the value of goods and services produced within a nation over specific timeframes.Calculation of GDP includes three approaches: production, expenditure, and income.The calculation is straightforward, but exhaustive.Moreover, the calculated GDP may not be true because of the privacy concerns of individuals like taxing and hacking.This paper delves into the intricacies of GDP calculation, particularly focusing on the income approach.It explores innovative methods to ensure the privacy of participants in this computation, presenting techniques involving encryption and differential privacy.This endeavor is groundbreaking, marking the first attempt to calculate GDP and related values while safeguarding contributors' privacy.The system can be found at GitHub [15] for interested readers.
An inherent limitation of the expounded differential privacy approach lies in the cumulative addition of   noise with each temporal partition.Consequently, for  time divisions, a cumulative total of  ×   noise is superimposed upon the computed GDP value.While its acceptance remains permissible when adhering to the 5% threshold, this method warrants exploration for its inherent shortcomings.In response, the authors are considering a novel paradigm aimed at obviating noise entirely.Simultaneously, this advanced framework endeavors to yield a pristine GDP calculation that is devoid of errors, while unwaveringly safeguarding the confidentiality of individual participants ensconced within the system.The underlying notion revolves around a gradual self-healing process aimed at mitigating the errors previously incurred, thereby culminating in the precise transmission of user values upon reaching the culmination of the temporal division denoted as .

Figure 1 :
Figure 1: Items to Be Considered for the Income Approach of GDP Calculation

Figure 3 :
Figure 3: Work Flow of the Proposed Method Using Paillier Encryption

Table 1 :
An Example of the Three Candidates from participants over  rounds, the candidates engage in the additive homomorphic encryption of all encrypted ciphertexts.The resulting composite ciphertext is subsequently decrypted using the private key of the candidate on the client side.The summation of the deciphered values originating from all participants ultimately represents the cumulative aggregate transmitted by the entire cohort of participants.Algorithm I: Paillier encryption for GDP calculation Algorithm I: Encryption for GDP Calculation (1) Random selection of candidates transpires within the system.(2) Candidates undertake the generation of both public and private keys through the utilization of Paillier encryption.(3) The public keys devised by candidates are disseminated to all entities within the capsule.(4) Participants congregate into groups, designating a singular leader for each respective cluster.(5) The participants proceed to fractionate their data into  distinct divisions, each divergence tailored to avoid uniformity.(6) The encrypted data is orchestrated by participants and subsequently conveyed to their designated leaders.(7) Leaders enact homomorphic addition on the accumulated data and transmit the outcome to candidates.(8) Candidates receive and execute a homomorphic addition on this data, ultimately decrypting the GDP component.(9) The summation of values across all candidates constitutes the GDP.

Figure 4 :
Figure 4: Structure of the Proposed Method Using Differential Privacy

Figure 5 :
Figure 5: The Actual GDP (Blue) from the Proposed Method and the Noised GDP (Red)