Enhancing Sustainability Marketing Strategies in Online Transactions: A Categorical Factorization Approach

The rapid advancement of internet technology has revolutionised business operations, particularly in the realm of online purchasing. Companies are increasingly embracing online platforms to showcase their products and services, as consumers are drawn to the cost-effectiveness and convenience of online shopping. This study employs Categorical Factor Analysis (FA) to analyse click data from JD.com, with the objective of investigating the relationship between online consumer characteristics and purchase history. The primary focus is to identify the influential factors and latent variables that impact the average transaction value of online orders. The findings of this research highlight the pivotal role played by sustainability marketing strategies, along with other factors such as cognitive quality, financial capacity, group demand levels, and online shopping convenience. Interestingly, the study reveals that the utilization of coupons, often associated with sustainability marketing efforts, may have unintended consequences on overall transaction values. Each identified factor demonstrates a significant 99% level of significance (p<0.01) within the regression analysis. This study contributes valuable insights into online consumer behaviour, emphasising the importance of sustainability marketing strategies in shaping consumer choices and guiding companies in the development of environmentally conscious pricing and allocation strategies.


INTRODUCTION
The expansive capability of the Internet has achieved global ubiquity, reshaping its primary focus from mere information exchange to a powerful platform that not only facilitates business transactions and shopping experiences but also serves as a catalyst for sustainability-driven endeavors [1].The evolution of Internet technology has effectively transcended the temporal and spatial limitations associated with traditional models, solidifying the Internet as the predominant marketplace where sustainability marketing plays an increasingly pivotal role.Beyond its convenience and efficiency, the Internet has become the preferred platform for a diverse array of activities, including sustainable shopping, banking, and bill payments.From a commercial perspective, the Internet has revolutionized how businesses market, sell, and engage with customers, with sustainability marketing strategies gaining prominence.The surge in online shopping, especially for sustainable products, can be attributed to its multifaceted client benefits and strengthened security protocols [2], underscoring the pressing need for a comprehensive understanding by both vendors and consumers to optimize their online spending while aligning with sustainability principles.
JD.com, a technological pioneer in China, specializes not only in supply chain solutions and logistics but also places a significant emphasis on sustainability and green initiatives.Their commitment extends to digital transformation through the integration of advanced technologies like 5G, AI, big data, cloud computing, and IoT, while concurrently incorporating sustainability marketing into their business model.This seamless technological integration has empowered JD.com to amass a substantial wealth of user-centric data, encompassing user demographics, order details, and access modalities, offering valuable insights into the variables that influence order values, particularly those associated with sustainability-driven choices.This study postulates a correlation between these influential factors and order transaction amounts, employing Factor Analysis (FA) to elucidate this relationship and identify predominant features, including sustainability marketing strategies.Consequently, the insights gleaned from this research can prove instrumental not only in the formulation of pricing strategies and the enhancement of average transaction values on e-commerce platforms but also in the development of precise predictive models that incorporate sustainability as a critical factor.This paper is organized as follows: The subsequent section provides a contextual background, offering an insight into the realm of online purchasing with a sustainability marketing perspective.The third section outlines the methodology, presenting the analytical processes, research findings, and regression analyses of the study, highlighting the significance of sustainability in the results.The final section concludes the paper, drawing inferences from the study conducted, emphasizing the pivotal role of sustainability marketing in shaping online consumer behavior and offering recommendations for sustainable e-commerce strategies.

BACKGROUND 2.1 Online Shopping and the Role of Sustainability Marketing
China's inaugural online purchase in 1998 marked the inception of its remarkable online shopping journey.As of 2020, China's annual online market sales had soared to 10.8 trillion yuan, constituting a substantial 21.9% of the total societal sales, with projections indicating a further increase to 1.122 trillion yuan in 2021 [3].On a global scale, online shopping has transcended periodic activity to become an integral part of daily life, culminating in a staggering $4.28 trillion in sales in 2020, with expectations of further growth to $5.4 trillion in 2022.The onset of the COVID-19 pandemic further solidified the indispensability of online shopping due to widespread quarantine and lockdown measures [4].
The concept of online shopping, initially conceived in the mid-1990s by visionaries Doddy and Davidson (1967), has now materialized as a fundamental aspect of modern consumerism, thanks to the continuous evolution of Internet technology.This transformation has not only reshaped consumer habits but also revolutionized sales methodologies, necessitating a shift from traditional seller-centric strategies to a more holistic approach centered around consumer behavior and data-driven insights [5].It is within this evolving landscape that sustainability marketing finds its foothold, as consumers increasingly seek ethically and environmentally responsible products and businesses.This evolution presents not only challenges but also substantial opportunities to analyze customer data and identify influential factors in online sales, with sustainability marketing strategies playing a pivotal role in shaping purchasing decisions.The next section outlines the methodology used in this study to pinpoint the determinants of online transaction value, with particular attention to the sustainability aspect.

Factor Analysis (FA)
Factor Analysis (FA) is a statistical approach employed to explore the underlying constructs affecting observed variables and is pivotal for dimensionality reduction [6].It provides profound insights into correlations among variables, integrating several correlated variables into fewer factors, revealing how independent factors control intricate relationships among measured indicators [7].
The two types of factor analysis, exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) have different targets.
Particularly, this study focuses on EFA, which attempts to discover the latent structure impacting the observed variables.Let be latent variables (i.e.factors) and be the observed variables (i.e.items) that acts as measures of factors.By conventional standardisation of the factors, they are related such that = + in the form of matrices, of which each row is where is the factor loading for the -th item and j-th factor, and is normally distributed with variance called unique variance, denoted by ∼ (0, ) for = 1, . . ., .To conduct the EFA, firstly calculate the correlation matrix of 1 , . . ., . The most commonly used is Pearson's correlation, expressed as a square matrix and ¯ is the mean over observed value of each entry, = , from 1, . . . .But in this article an alternative correlation will be introduced specifically for the existence of categorical observed variables.Then choose a factor extraction, such as principal component, maximum likelihood (ML), principal axes factor and generalised least squares, to estimate the factor loadings and the unique variances and to choose the factors accounting for the variance the most.Lastly, rotate the factors obtained to find a more interpretable factor solution.This paper will compare the result of categorical EFA to achieve better the research aim, which is to derive the most influential characteristics of the online transaction value.

Data Source and Variables Selection
The analysis utilizes datasets from JD.com, provided for the 2020 MSOM Data Driven Research Challenge, detailed by [8].These datasets, from March 2018, encapsulate the entire customer experience cycle from product browsing to reception, focusing on a specific consumable category, and are devoid of major promotions or holidays influences.They are structured into seven tables detailing SKUs, users, orders, clicks, among others.
In summary, the user table describes 457, 298 users, showcasing a predominance of female customers (over 60%) and majorly comprises individuals aged 16-45, making up around 75% of the users.The order table consists of 486, 928 unique orders, and the click table records over 20 million click instances from 2.5 million users, across various platforms.
For analyzing transaction volumes, understanding user shopping behavior is crucial.The relevant variables are grouped logically to understand correlations between them and identify key contributors to shopping behavior.The twelve selected variables are categorized into order-related, user characteristics related to online shopping accounts, and those related to demographics like age and education level.Refer to Table 1 for detailed groupings and meanings of selected variables.

Pearson's vs Polychoric Correlation
This section aims to assess the appropriateness of Pearson's and polychoric correlations for EFA involving a mixture of numerical and categorical variables.

Group
Variables Meanings Key Measure _ The final transaction volume of the user paid for the item in the order.Order Feature _ _ The discount rate when the user directly purchased the product in that order._ _ The discount rate applied when the user used a coupon in that order._ _ The discount rate applied when the products provide a bundle discount in that order.Whether the order was completed on the PC channel (coded as 1) or the Mobile channel (coded as 0).User's Online Shopping Characteristics Whether the user is plus membership or not._ A number ranging from 1∼5 indicated how much is the total purchase of that user in the past.ℎ _ The purchasing power estimation of that user User's Personal Characteristics

_
Age level of that user, there are five age levels: "16∼25", "26∼35", "36∼45", "46∼55" and ">= 56", coded from 1 to 5 respectively.Whether the user is female or not.Whether the user is single or not.Indicating the education level of that user, ranged from 1: less than high school to 4: holds a graduate degree._ Indicating the city development level of the user according to the order address, ranging from 1: rural small cities to 5: the most developed cities such as Beijing and Shanghai.1) for the variables.Given the dataset involves millions of clicks, even correlations above 10% are significant and marked.
Observations: Variables related to users' online shopping and personal features are notably correlated, hinting at personal characteristics as potential significant determinants of online shopping behavior, aligning with [9].They likely fall under the same factor.'User level' and 'Purchase power' display correlations with numerous other variables, reflecting their information-rich nature and suitability as key consumer feature measures, corroborating their significance as per JD.com.Pearson's Correlation-Based Principal Component EFA (Unrotated): Polychoric correlation is advantageous over Pearson's correlation in factor analysis as it considers the latent variables' underlying distribution leading to categorical ones, assuming a multivariate Normal distribution, and remains robust even when this assumption is violated [10].This has been substantiated by studies such as [11], demonstrating its superior performance in ML EFA over Pearson's, providing more precise structure estimations under varying conditions.
In this subsection, factor analysis utilizes a polychoric correlation matrix, depicted in Table 4, to integrate categorical variables more effectively.
The correlations amplify in comparison to Pearson's, for instance, 'user_level' and 'plus' is 0.456 with Pearson's and 0.654 with polychoric.Importantly, polychoric correlation maintains continuous numerical variables intact, offering insight into the latent distribution for the categorical variables.This is coherent with our endeavor to comprehend the latent distribution for categorical variables while preserving the integrity of continuous numerical variables.Based on this matrix, EFA was reiterated using principal-component factors with eigenvalues in Table 4.

Rotation Selection
Factor Analysis results can be rotated to yield better interpretability.The initial EFAs were unrotated, serving as reference for evaluating subsequent rotational choices: varimax and promax.
Varimax Rotation (Polychoric correlation-based Principal Component EFA): Varimax, an orthodox rotation, assumes factors are orthogonal and hence uncorrelated.It optimizes variance within each column of the factor loadings matrix, distinguishing variables by factors.Varimax is prevalent due to its computational simplicity, with a 47% application rate in studies over six years [12].However, its simplicity can sometimes compromise the interpretability of the results [13].The results of Polychoric correlation-based Principal Component EFA (Varimax rotated) are not shown because they were not ultimately used in this study.
Promax Rotation (Polychoric correlation-based Principal Component EFA): Promax, an oblique rotation, allows for correlated variables and is purported to yield simpler structures.Its results are depicted in Table 5 and Figure 1.
Comparing the unrotated, varimax and promax scatter plots and loading tables reveals the superior interpretability of rotated results.Promax rotations are chosen for subsequent discussions due to their allowance for correlations, aiding in more nuanced interpretations.The aim is not just data reduction, but also summarizing key influences.Factor 1 is indicative of consumer cognitive quality and spending power; Factor 2, influenced by age and marital status, represents average consumer demand [14]; Factor 3 encapsulates online shopping convenience, contingent on purchase platform and city development level [15]; Factor 4 is synonymous with the impact of general marketing, contrasting the applications of direct and bundle discounts; and Factor 5 represents the additional effort consumers exert for discounts through coupons.The interpretations of all five factors are consolidated in Table 6.

Regression Analysis
Regression analyses were performed to validate the extracted factors and investigate their influence on the transaction values of online purchases (Table 7).Linear regressions on five factors were conducted, with final transaction value as the dependent variable.All factors were significant at the 99% level (p<0.01).
Examination of the regression coefficients yields definitive conclusions.Elevated average consumer perceptions, purchasing power, and average consumer demand of a consumer group significantly impact spending amounts.Further, a consumer is likely to spend more in a more amenable shopping environment [f3].Marketing campaigns [f4] do foster spending, albeit with less efficacy compared to the prior mentioned factors.Notably, the presence of a coupon discount [f5] seemingly detracts from a consumer's willingness to pay, aligning with the findings of [16].This suggests two plausible explanations; the required additional effort to obtain the coupon or the perceived lack of cost-effectiveness when the coupon is unavailable, causing consumers to abandon purchases.

CONCLUSION
Online shopping has become the preferred choice for consumers, a trend that has accelerated significantly during the pandemic.This research, which leveraged unique datasets from JD.com and employed PCA and categorical FA, has delved into the factors influencing online consumer behavior and their impact on transaction values.One of the principal findings of this study is the significance of sustainability marketing strategies, which emerged as a key factor affecting consumer choices and transaction values.All factors included in the final model exhibit a high level of significance at 99% (p<0.01).
The implications of this research extend to guiding future investigations into the influence of these factors and providing strategic recommendations for e-commerce platforms.It emphasizes the importance of improving the environmental convenience offered by online platforms and suggests caution when implementing marketing campaigns, especially those involving coupon issuance.The potential negative effects of coupons on consumer purchase intentions, which could outweigh the benefits of discounts, underscore the need for a more nuanced approach to sustainability marketing.
However, it is essential to acknowledge the limitations of this study, particularly the unexplored interactions among variables due to constraints in factor analysis and the imprecise classification of factors based on existing literature and data distribution.Future research endeavors can focus on refining factor classification using multi-dimensional datasets and conducting lab experiments to uncover causal relationships, ultimately providing a deeper understanding and practical insights for optimising e-commerce strategies with sustainability marketing at the forefront.

Table 2 :
Matrix of Pearson's correlation