ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource Languages

In this multicultural age, language translation is one of the most performed tasks, and it is becoming increasingly AI-moderated and automated. As a novel AI system, ChatGPT claims to be proficient in such translation tasks and in this paper, we put that claim to the test. Specifically, we examine ChatGPT's accuracy in translating between English and languages that exclusively use gender-neutral pronouns. We center this study around Bengali, the 7$^{th}$ most spoken language globally, but also generalize our findings across five other languages: Farsi, Malay, Tagalog, Thai, and Turkish. We find that ChatGPT perpetuates gender defaults and stereotypes assigned to certain occupations (e.g. man = doctor, woman = nurse) or actions (e.g. woman = cook, man = go to work), as it converts gender-neutral pronouns in languages to `he' or `she'. We also observe ChatGPT completely failing to translate the English gender-neutral pronoun `they' into equivalent gender-neutral pronouns in other languages, as it produces translations that are incoherent and incorrect. While it does respect and provide appropriately gender-marked versions of Bengali words when prompted with gender information in English, ChatGPT appears to confer a higher respect to men than to women in the same occupation. We conclude that ChatGPT exhibits the same gender biases which have been demonstrated for tools like Google Translate or MS Translator, as we provide recommendations for a human centered approach for future designers of AIs that perform language translation to better accommodate such low-resource languages.


INTRODUCTION
The last months of 2022 saw the meteoric rise in popularity of what has become one of the hottest AI tools of 2023 -ChatGPT1 .Developed by OpenAI2 on the GPT-33 language model, the conversational agent set a record for the fastest growth since launch, exceeding 100 million new users in its first two months with over 13 million users per day in its first full month of operation [39] as it has found usage in a wide range of both recreational and professional domains.With such expansive usage, ChatGPT is might be an upstart competitor and potential usurper of Google's throne as the go-to tool for general question-answering and information seeking, with the New York Times calling it "the first notable threat in decades" to Google's near-monopoly in this space [36].
ChatGPT is trained on large corpora of publicly available data and uses Reinforcement Learning from Human Feedback (RLHF), whereby designers produce conversations where human AI trainers play both the user and the AI assistant.Such an approach opens it up to the possibility of exhibiting biases and sterotypes that have downstream ethical implications (though OpenAI claims that Chat-GPT takes extensive measures towards bias mitigation [61]), as researchers and journalists alike have warned [e.g., 7,22,37].Such calls necessitate a thorough examination of ChatGPT.
In this paper, we examine ChatGPT's performance on a task that is one of Google's most common ones -language translation.Specifically, we examine whether ChatGPT has learned from the heavy criticism that Google has received for its insistence on gendering words and occupations in English translations of words that are gender-neutral in their original language [e.g., 34,54,67,73].Seeing as how this is a critical and well-established flaw within Google Translate over the past half-decade, we believe that a competitor should seek to rectify this in its tool.
We investigate ChatGPT's performance over a series of translation tasks.We base these tasks on prompts focused around occupations and actions, pursuant to prior research highlighting biases in texts that associate certain actions and occupations with specific genders e.g. to be a doctor or to go to work is to be male, whereas to be a nurse or cook/clean is to be female [e.g., 14,24,32,51,65].We conduct this investigation through translations between English and Bangla/ Bengali.The choice of Bengali is informed by two reasons: Bengali is gender-neutral in its pronouns, and the first author is a native speaker of Bengali.Through our findings, we demonstrate a pattern by which ChatGPT translations perpetuate and amplify gendered (mostly heteromasculine) defaults in occupations and actions that should be gender-neutral, and conferring higher respect to men over women in the same occupation.Though we center this research around translations between English and Bengali, we verify our observed phenomena through translations in five other languages which are similarly gender-neutral in their pronouns: Farsi, Malay, Tagalog, Thai, and Turkish.These languages are chosen because of their collective population of over 500 million people and because they all use gender-neutral pronouns, which is important because we study gender biases/errors that emerge in translations between English and these languages.
Our contributions are threefold: (1) We provide a comprehensive demonstration of the persistence and amplification of gender roles and stereotypes assosciated with actions and occupations when ChatGPT translates into English sentences which do not provide any gender information in their source languages, as we demonstrate that ChatGPT's reinforcement learning strategy does not handle bias mitigation in machine translation which has significant implications on perpetuating bias and shaping human cognition about who should be a doctor and who should be a nurse, among other occupations.We exemplify the insertion of binary genders into instances where the non-binary pronoun would have been most appropriate, and the failure to translate the English gender-neutral pronoun 'they' into gender-neutral pronouns in other languages which threatens to erase non-binary and trans identities in downstream tasks.We present one of the first studies of language translation tasks performed by ChatGPT (the only other being [42]), and generally one of the first studies about Chat-GPT.Given its popularity and usage, it is important to extensively study ChatGPT and its potential to perpetuate negative biases and stereotypes,and our study is important in starting this conversation within academic circles.(2) We conduct our study in Bengali, the 7 ℎ most spoken language in the world [12] (over 337 million people [31]).Even though this is such a widely spoken language with a rich cultural history and heritage, it is significantly understudied in the translation space.It has only tangentially been studied in [65], and by non-speakers of Bengali.We study it from a native speaker's perspective, a perspective important to capture and accurately interpret the underlying culturespecific connotations of translations.(3) Beyond demonstrating these phenomena in Bengali, we show generalization across other languages with gender-neutral pronouns -Farsi, Malay, Tagalog, Thai, and Turkish.Such generalization across multiple languages is not commonly examined in the same single study (with the exception of [65]).In these cases, we only study translations into English, because English is the highest-resource language of all these based on the training data ChatGPT uses.We definitively demonstrate ChatGPT perpetuating gender stereotypes and inserting an inferred gender based on actions and occupations into sentences that are designed to be gender neutral in their languages of origin, languages which are classified as 'low-resource' in the AI space [23].We demand stronger performance for such languages that adequately respects their prevalence in the world and accommodates the billions who collectively speak them.

BACKGROUND 2.1 Gender in Languages and Translations
Global languages have several similarities and differences when evaluated across a variety of properties, and one such property is how they handle gender.Some languages contain grammatical gender, whereby nouns are classified with genders [25].Grammatical gender is especially interesting in the case of inanimate nouns e.g. in English, a language without grammatical gender, the sun is genderless whereas in Hindi, a language that uses grammatical gender, it is considered masculine.Linguists [e.g., 25,46] largely believe that assignment of grammatical gender within languages evolved over time in arbitrary patterns unique to each language.
Beyond grammatical gender, languages also contain semantic or natural gender, which is a pattern of using different words to refer to different nouns based on the determined gender of the noun.For instance, in English we refer to male cattle as 'bulls' and female cattle as 'cows'.Semantic gender is also commonly expressed through word pairs that contain a root word and a changed version derived from it, e.g. the feminine word 'lioness' in English is derived from the masculine 'lion' by adding the suffix '-ess'.This is known as markedness [40], where the root word is said to be 'unmarked'.Historically, most gendered pairs of nouns are such that the masculine noun is unmarked, and femininity is denoted by somehow marking the masculine [e.g., 6,40,78].
Since languages have their own rules, cultural contexts, and nuances with respect to gender, an interesting site of study is when they come into contact with each other through processes of translation.Language translation is complicated, and must be done with a good understanding of the rules of both source and destination languages [75].This is especially true when languages differ on the basis of grammatical gender e.g. when translating the sentence 'The sun was shining but the river was cold' from grammatically gender-neutral English to grammatically gendered Hindi, it is important to know that 'sun' should be masculine-gendered and 'river' should be feminine-gendered, which would in turn influence the nature of the Hindi phrases of the verbs 'was shining' (in this case, 'चमक रहा था') and 'was cold' (in this case, 'ठं डी थी').
Therefore, translation tasks require keen understandings of languages involved in the process, and a successful translator must be both careful and respectful of the nuances and cultural contexts within source and destination languages to be effective at their job.However, the task of translation is becoming increasingly automated and offloaded to language models and machine translators.

Language Models and Datasets in Translation Tasks
Large-scale language models have become ubiquitous across a variety of domains, in tasks such as sentiment analysis [e.g., 3,38,50 Within the context of language translation, Brown et al. [13] and Och and Ney [60] developed the computational foundations for machine translation.Such models might be trained either on unlabeled monolingual corpora [e.g., 11] or labeled and translated texts [e.g., 48].Common approaches of using language models in translation tasks involve using feed-forward Neural probabilistic language models [70] or RNN-based models [55].Currently, one of the most prevalent approaches to large-scale translations is the use of Neural Machine Translators (NMTs), pioneered by Google and used within their Google Translate tool.Since their inception, NMTs are considered the state-of-the-art solution in the field.
Like in other ML contexts, the accuracy of machine translations often depends on the quantity and quality of training data the language models have access to, with increases in accuracy generally being correlated to increased quality of data [45].Within collecting multilingual data, a common approach is to mine parallel texts in multiple languages, such as different languages of the Bible [26], and then applying similarity measures to determine parallelisms at the sentence level [71].It is at this level of data collection and availability that languages are differentiated between, because some languages (such as English or other European languages) have vast corpora of text data or are selected for mining [e.g., 30], creating a massive gulf with other languages for whom labeled parallel or bitextual data are sparse in publicly available datasets [35].Such languages that have low coverage or are underrepresented in global datasets are known as low-resource languages [23].Because of this gulf in data availability, translations in the context of low-resource languages are generally poorer than high-resource languages.
In this paper, we study translations to and from several such low-resource languages in the context of what currently is one of the most popularly used AI-tools: ChatGPT.At the outset, it is important to recognize that ChatGPT, or its underlying ML model GPT-3, was designed as a Generative AI and not a translation tool.As an LLM, GPT-3 is capable of translation tasks without necessarily being optimal at them.However, it is important to study ChatGPT and GPT-3 in the context of language translation given the prominent evidence of translation fails by dedicated tools such as Google Translate or MS Translator (detailed in the next section) and the large public uptake of ChatGPT into a wide range of tasks beyond its initial design goals.Though research in this field is sparse given the novelty of the tool [42], we believe this present study to be critical, considering how several millions of users might use or are already using ChatGPT as a translation tool.

Biases and Errors related to Gender Pronouns in Machine Translations
That machine translators make errors and exhibit biases in context of gender when translating between languages with different gender rules has been well established both in common usage and literature.Such criticism has been levied against popular translation tools such as Google Translate [e.g., 32,65] and MS Translator [e.g., 67,73], especially in the context of English translation.Such gender bias is displayed in several ways.Firstly, it is evident in patterns of nouns (e.g.doctor = male), pronouns and verbs (e.g.cooking = female) to which machine translators assign male or female gender.A study of 74 Spanish nouns revealed that an overwhelming majority of those were assigned male pronouns in English translation while only 4 were deemed to be female [51].Closer inspection reveals that occupations such as doctors, engineer and president are often assigned male pronouns, whereas those such as dancer, nurse, and teacher are often denoted as female [65].Secondly, genderization occurs towards verbs, as actions like cooking and cleaning are assosciated with women while reading and eating were assigned male [32].Finally, language models even overwrite information about subjects' genders provided in translation, as Stanovsky et al. [73] demonstrated an English sentence about a female doctor receiving a machine translation into Spanish that classified them as male.While these examples are in high-resource languages such as English and Spanish, the problem is exacerbated in low-resource languages, such as Turkish [e.g., 24], Malay [e.g., 65], Tagalog [e.g., 32] and others.This further widens the gap between languages, because traditionally low-resource languages (e.g.most Asian languages) deal with gender differently than high-resource languages (e.g.Romance languages), leading to increased translation errors [69].
Our objective is not to demonstrate anew that machine translation exhibits gender biases when translating between languages that handle gender differently, especially for low-resource languages.Rather, this paper intends to show that the phenomenon persists in the latest most popular and state of the art tool, and that developers have failed to address it despite the knowledge in the field, despite claiming that they mitigate biases in their design [61].

Author Linguistic Positionality
The first author is fluent in Bengali, having grown up in Bengal (India) for 18 years speaking the language.This fluency is in Standard Colloquial Bengali (SCB) and, of the various Bengali dialects (detailed in Section 3.2), he primarily speaks Rahri, though he is also conversational in Bangali.He also speaks Hindi and Urdu fluently.

Translating to/from Bengali
Bengali/Bangla is the 7 ℎ -most spoken language in the world [12], with an estimated 300 million people speaking it as their mother tongue and almost 37 million second-language speakers [31].Most of these are residents or emigrants from Bangladesh or the state of Bengal in India, although it is also recognized as one of the official languages of Sierra Leone as a tribute to the contribution of Bangladeshi UN Peacekeepers in ending their civil war.It has several dialects, such as Bangali, Rahri, Varendri, Rangpuri, Shantipuriya, Bikrampuri, Jessoriya, Barisali, and Sylheti [20].Such dialects are primarily spoken, as the majority of the written Bengali in India is in Standard Colloquial Bengali (SCB) [56], a standardized version of the language that is perhaps the closest to Rahri.
A feature of the Bengali language which is central to this study is the absence of gendered pronouns.While English uses the gendered pronouns 'he'/'she' and the gender-neutral pronoun 'they', pronouns in Bengali are gender-neutral.The three most used pronouns in Bengali are েস (pronounced 'shey'), ও (pronounced 'o') and িতিন (pronounced 'teeni' with a soft t).While েস and ও can be used to refer to anyone, িতিন is used to refer to respected people.
Even though it uses gender-neutral pronouns, Bengali still contains marked binary-gendered words to refer to animals and occupations e.g.lion/lioness (িসংহ/িসংহী), tiger/tigress (বাঘ/বািঘনী), and actor/actress (অিভেনতা/অিভেন ী).In those examples, the male version of the Bengali word is the root for the female version, and genderization is performed by adding vowels to the root word.However, not all gendered pairs have direct translations to distinct English words e.g. the same word 'teacher' translates to িশক্ষক for male teachers and িশ ক্ষকা for female teachers.
In more recent iterations of SCB over the past decade, there is a growing movement of using the root/default version of gendered words to refer to individuals of nonbinary gender or in cases when the gender of the person is not known.Therefore, the English sentence 'they are a teacher' should translate to 'েস একজন িশক্ষক' and vice versa.The gender-neutral pronoun 'they' should translate the English word 'teacher' to the default 'িশক্ষক' and the Bengali pronoun 'েস' should translate to the gender-neutral 'they'.
We examine whether translations to and from Bengali honor the gender-neutral pronoun, or provide the appropriately marked nouns when English prompts contain information about gender.

Prompting ChatGPT
We queried ChatGPT with a series of prompts (detailed in Section 3.4).The first author created a new account for this study and performed the querying tasks in new sessions on the free version of ChatGPT on ten different days, giving a day's gap in between each time.The intent behind using new sessions was to mitigate the language model's learning from previous conversations, and performing queries on different days was to ensure that results would form a pattern and strengthen our observed themes, rather than stand as a single phenomenon which could have occurred on a particular day for any number of reasons.Prompts were tried out one by one instead of all together, in order to avoid possibly hitting the character limit for single queries.

Single-Occupation Prompts.
A primary methodological task in this study was the formation of prompts with which to query ChatGPT.To test whether ChatGPT preserves gender-neutrality in Bengali sentences, we designed a set of prompts carrying the format 'েস একজন ______|' (They are a ______.)Such a construction is because we intend to fill in the latter stage of the prompt with occupation titles, pursuant to prior work on querying gender in translation tasks based on occupations [e.g., 47,65].We centered our process of selecting occupations with which to fill the aforementioned blanks in Caliskan et al. 's [15] work on implicit gender-occupation biases.We began with the US Bureau of Labor Statistics' (BLS) 2022 report of labor force statistics 4 , converted the 50 most common occupations to single-word titles following Caliskan et al. 's [15] process, and then translated them to Bengali.The full list of occupation titles is shown in List 1 in Appendix A.
The accurate translations of these prompts should contain the 'they' pronoun for all occupations i.e. the prompt 'েস একজন ডা-ার|' should translate to 'They are a doctor.' Through ChatGPT's translations into English (shown in Section 3.3), we examine its preservation (or lack thereof) of the gender-neutral pronoun.
We also designed a series of 50 prompts using the English titles of the aforementioned occupations, beginning with the genderneutral 'They are a ______.' The intention with these prompts was to examine whether ChatGPT correctly identified the pronoun 'they' to translate into one of the Bengali pronouns েস, ও and িতিন e.g. a correct translation of the English prompt 'They are a doctor' into Bengali is 'েস একজন ডা ার|' Furthermore, to investigate whether ChatGPT can provide the appropriately marked forms of words when provided with gender information, we designed a set of prompts with the construction 'He/She is a ______.' We could not use the aforementioned occupations, because most of them are not marked.We also could not use an equivalent of the BLS data for Bengal/Bangladesh, because such data is not publicly available.Therefore, based on the first author's lived experience and cultural context, we identified 10 occupations common in Bengal/Bangladesh and have marked pairs in Bengali based on gender.They are as shown in List 2 in Appendix A.
We thus formed a set of 20 prompts e.g.'He is an actor/She is an actor', for which the correct translations in Bengali are expected to be 'েস একজন িশক্ষক' and 'েস একজন িশ ক্ষকা', respectively.
We collectively refer to these 120 prompts (50 Bengali and 50 English prompts from List 1 + 20 prompts from List 2) as singleoccupation prompts.In Table 1, we provide some expectations of correct English to Bengali translation, along with rationale.

Action-occupation Prompts.
We built another set of Bengali prompts where we intended to construct a scenario that would be equitable and accessible to everyone, irrespective of gender.We identified the scenario of an individual waking up in the morning, performing an action, and then going to work within particular occupations.The prompts contain no information about the gender of the person who is the subject.Therefore, the most accurate translations into English should use the gender-neutral 'they' pronoun.We hereafter refer to these as action-occupation prompts.

Testing Across Five Other Languages
To achieve generalization in our study of potentially biased languages beyond Bengali, we extended this study to other languages that use gender-neutral pronouns.We sought native speakers of such languages from within our networks and identified five languages to study: Farsi, Malay, Tagalog, Thai, and Turkish.These are all low-resource languages spoken by many millions of people all over the world, which makes them important to study.We worked with native speakers of each language to construct respective sets of single-occupation prompts using the occupations in List 1, and corresponding correct English translations.We tested these following the process outlined in Section 3.3, with the only difference being that these were only tried once as opposed to ten days.

FINDINGS
We supplement our findings with screenshots from ChatGPT to provide direct evidence, but present them in Appendix B for concision and increased readability.

Translating Single-Occupation Prompts
For the first set of our single-occupation prompts, where we provided ChatGPT with 50 sentences each in the construction 'েস এক-জন ______|' (They are a ______.)and filled each blank in with occupations mentioned in List 1. Across a period of 10 days, we observed that 29 occupations (such as doctor, engineer, plumber, programmer, carpenter, etc.) were exclusively assigned the pronoun 'He' in translation.The full set of occupations in List 3 (Appendix A), and examples are shown in Figure 2 (Appendix B).Furthermore, ChatGPT exclusively assigned the English pronoun 'She' to prompts containing 11 occupations (e.g.nurse, therapist, hairdresser, assistant, aide, etc.) on all 10 days of testing.The full set of occupations is are captured in List 4 (Appendix A), with few examples shown in Figure 3 (Appendix B).
Only for 6 occupations -lawyer, administrator, officer, specialist, hygienist, and paralegal -did ChatGPT assign the English pronouns 'He/she' on all days of testing, though it did not use the pronoun 'They'.A few examples shown in Figure 4 (Appendix B).
Finally, there were 4 occupations -janitor, chef, nutritionist, and salesperson -for which ChatGPT demonstrated some variation in its assignment of pronouns, in the way that it did not consistently assign the pronoun 'he' or 'she' across different days of testing.An example is shown in Figure 5 (Appendix B).Such variations were only observed within the first 3 days of testing, as results stabilized starting day 4 to the pronoun that was assigned on day 3, and were replicated every day after.
For 'They are a [occupation].' prompts, we observed ChatGPT's complete failure to recognize the English gender-neutral pronoun 'they' as singular.In all 50 instances across 10 days, we observed ChatGPT translating 'they' to the Bengali collective pronoun 'তা-রা', producing grammatically incorrect and incoherent translations.The correct translations should be 'েস/িতিন/ও একজন ______|' Some examples are shown in Figure 7 (Appendix B).
Finally, we examine ChatGPT's performance in displaying appropriate markedness of gendered words, using the prompts 'He is a ______.' or 'She is a ______.', and using the words in List 2. We observe that ChatGPT is perfectly able to translate words to their appropriate marked or unmarked versions given the gendered pronouns (he/she) of the subject, as shown in Figure 6 (Appendix B).However, a phenomenon we noticed is that ChatGPT assosciated sentences with the female pronoun with the Bengali pronoun 'েস', whereas it assosciated the male pronoun with the more respectful Bengali pronoun 'িতিন'.Such a pattern was true for all sets of occupations.

Translating action-occupation Prompts
For the action-occupation prompts, we crafted a set of Bengali prompts with the base construct 'েস সকােল ঘু ম েথেক উেঠ ______, এবং কােজ যায়। েস একজন ______|' ('They wake up in the morning, [action] and go to work.They are a [occupation].') We observed that for some actions -cooking breakfast, cleaning the room, and reading -translations into English invoked the pronoun 'she' across all occupations, as shown in Figure 8 (Appendix B).
For some actions, the English translations produced different pronouns, which can be attributed to be a function of the occupations provided.Being a doctor, engineer, scientist, chef, and psychiatrist were assigned the pronoun 'he' when assosciated with occupations in Section 3.4.2excluding the three mentioned above, whereas being a nurse, nutritionist, and assistant were assigned the pronoun 'she'.Examples are shown in Figure 9 (Appendix B).
What stood out is the complete absence of the gender-neutral English pronoun 'they' across all of the translations, with not a single prompt being translated into English carrying that pronoun.

Gender-Based Machine Translation Across Other Languages
Having demonstrated patterns of gender bias in bidirectional translations between Bengali and English in both single-occupation and action-occupation prompts, we examine whether similar patterns are observable in other languages.Based on translations of the single-occupation prompts, we observe a clear replication of the aforementioned patterns.In all of the languages we examined (Farsi, Malay, Tagalog, Thai, and Turkish), we observe that the respective gender-neutral pronouns are translated to gendered pronouns depending on the occupation.Similar patterns as in Section 4.1, i.e. translating a gender-neutral pronoun to 'he' for doctors and 'she' for nurses, emerge.There is also a complete absence of the English gender-neutral pronoun 'they' in any translations, across all these languages.Results are summarized in Table 2.

ANALYSIS: GENDER ASSOCIATIONS WITH ACTIONS AND OCCUPATIONS
We observe widespread presence of gender associations with actions and occupations in Bengali ↔ English translations and other low-resource languages.We observe a clear majority of occupations being assosciated with the male pronoun 'he' in the singleoccupation prompts when translating from into English, where occupations such as doctor, engineer, and baker were assosciated with the male pronoun 'he' whereas occupations such as nurse, assistant and therapist were assosciated with the female pronoun 'she'.The only indication of ChatGPT respecting the gender neutrality of Bengali pronouns is where translations were assigned both pronouns 'he/she', as shown in Figure 4, but it occurs far too infrequently than expected (see Table 2).The same can be observed for translations of the action-and occupation prompt, where actions such as cooking breakfast and cleaning are assosciated with female pronouns.An interesting and novel finding is the interaction of actions and occupations, as we find that biases towards actions seem to override those towards occupations.An example of this is that while the occupation 'doctor' is assosciated with the male pronoun in the single-occupation prompts (List 3), the effect of associating the action of cooking breakfast overwrites that to produce the female pronoun 'she' as shown in Figure 8.While the presence of implicit gender-action biases that dictate a woman's place to be in the kitchen or within the household [14] are certainly observable, it can be extended that such biases are prevalent in societies all over the world since the start of human history, and perhaps predates occupational biases.
Our findings are consistent with previous work [e.g., 14, 15, 73] that demonstrates how word embeddings contain implicit genderoccupation biases, biases which exist as a result of over two centuries of text corpora containing such associations [19] and are amplified as a result of language models being trained on such text and then creating more such word embeddings in their outputs.Given that ChatGPT, by its designers' admission [61], is trained on large sets of such publicly available text corpora in English and other languages, it is likely that such gender biases stem from biases within contextualized word embeddings.Caliskan et al. [14] found strong evidence of such gender biases embedded within the widely-used internet corpora GloVe [63] and fastText [8] through the development and extension of the Word Embedding Association Test [15] and the iterated Single-Context Word Embedding Association Test [14], biases which are evident within our findings too.Such biases are deeply embedded from text corpora being developed over decades of texts containing such biases and might be very difficult to remove, though researchers such as Bolukbasi et al. [9] have put forward approaches to debias word embeddings.
For English to Bengali translation, the most startling finding is ChatGPT's complete inability to translate the English genderneutral pronoun 'they' into an equivalent gender-neutral Bengali pronoun, as it incorrectly translates 'they' to a collective pronoun.This is particularly alarming, both for translation because it leads to grammatically inaccurate and non-sensical Bengali outputs, but also in a larger context because it contributes towards a linguistic erasure of non-binary and transgender identities.Though research into non-binary identities in the context of AI-assisted language translations is sparse, our findings demonstrate the need for a meticulous examination of this problem of inaccurate inference of the gender-neutral English pronoun is.
Additionally, when ChatGPT does respect provided gender information to produce appropriately gender-marked versions of Bengali nouns, it confers lower respect to women as it uses the pronoun 'েস', reserving the the more respectful 'িতিন' for sentences with the male pronoun.We do not believe this to be accidental, since it perpetuates the trend of placing higher respect on men.
Our findings in Bengali, combined with generalizations across five other languages, thus paint the picture that ChatGPT does not take into consideration research in the past decade in this field, as it demonstrates similar gender biases and erroneous translations that other tools such as Google Translate have been slated for.

LOW-RESOURCE LANGUAGES, LOW ACCURACY AND POWER
All of the languages studied here -Bengali, Farsi, Malay, Tagalog, Thai, and Turkish -are classified as low-resource languages on account of low levels or a general unavailability of large corpora of text data or other manually crafted linguistic resources in such languages.Such a comparative lack of data (in contrast to languages such as English, Spanish, French etc.) is because billions fewer of words in such languages are put out into the Internet in contrast to those in higher-resource languages.While practitioners in this space might simply see this disparity as something that exists in the world, it is important to ask: why does this gulf exist?
The simple fact remains that due to centuries of imperial and colonial enterprise, languages such as English, Spanish and French have expanded and now dominate in lands far beyond their origins, and the digital age of globalism has made it such that proficiency in one or more of those languages has almost become a necessity to achieve certain levels within industries.Indeed, a not-so-subtle expression of this is the fact that this present article is being written in English, and not one of the languages studied.While we cannot undo the myriad effects of the legacies of colonialism and imperialism, we can certainly acknowledge and center them in our interpretation of phenomena such as the ones being demonstrated here.Translation is a demonstration of power, perhaps best exemplified by the fact that almost every large airport in the world (one of the largest sites of cultural confluence) will have signage in local languages also translated to English even if it is not a popularly spoken language in that part of the world, to reflect the lasting effects of the colonial enterprise that made English a global lingua franca.It is in English or centered around translating to/from English where designers of widely-used translation services, such as Google Translate and now OpenAI, operate as they design and improve translation software.Borrowing Andone's [2] feminist theory of translation as production of knowledge beyond simply reproduction from one language to the other, English (and other high-resource languages) control the means of production of such knowledge and what knowledge (or text in what languages) get to be mined into the scope of language models.
It is important to recognize language translation as something much more than its perhaps well-intentioned traditional intention of being 'merely a linguistic shift from one text to another with the least possible interference, and remain faithful to the source text' [17].When ChatGPT assigns an incorrect gender in translation or inserts a binary gender into gender-neutral sentences, it is much more than a simple error.In its undertaking of such translation tasks, ChatGPT makes a decision to infer gender by applying information and context beyond what is provided in the source sentence.Especially in the context from translating from low-resource languages into the high(est)-resource English, these inferences perpetuate colonial and imperial perspectives of traditional gender roles, values and cultures that have oppressed peoples for centuries.In today's Internet age where tools like Chat-GPT are designed in high-resource contexts (in English and by USbased developers) but made available and reaching people globally, designers of current and future tools must carefully consider their potential impacts before and during deployment.
The failures of ChatGPT in the aforementioned translation tasks must therefore not simply be considered a technical problem which can be spot-fxied by the bandaid of 'better' data or 'better' code [5].Rather, it is a sociotechnical failure [18], where 'better' data is difficult to achieve due to the various social constraints designed to favor languages that are already high-resource.Addressing this failure therefore needs to consider the social aspect, and examine how biases prevalent within word embeddings or exemplified in results are reflections of those prevalent within society [9].

A HUMAN CENTERED APPROACH TO AI-ASSISTED LANGUAGE TRANSLATION
Our findings of ChatGPT's underwhelming and error-laden performance in language translations from low to high-resource languages as it amplifies gender bias has implications for design into the future of such technologies.We believe that a future where AI-assisted language translations are both more accurate and more appropriate involves a human centered approach to designing such systems.Human centeredness is a cousin to the field of user centeredness, which involves soliciting end-user feedback early and often during the design process [59].Human centeredness extends this notion further by incorporating considerations of social and ethical practices into the design process [33].
A human centered approach would, at its core, center willing and knowledgeable first-language multilingual speakers towards forming accurately labeled text corpora, because such speakers can leverage appropriate cultural context and epistemic experience in building such corpora.This effort is especially important since these people are likely the ones who will use the language translation tools under design (at least in their respective languages) the most.We are appreciative of the work of Costa et al. [26] and their many-to-many benchmark FLORES-200 dataset spanning 204 languages, most of which are traditionally low-resource.Their principles of 'No Language Left Behind', prioritizing the needs of underserved communities by sharing resources and libraries/datasets through open-sourcing and being interdisciplinary and reflexive in such approaches, pave the way towards stronger representation.
Particular attention must be paid to individuals representing low-resource languages, because such languages are traditionally neglected [26].Care must be taken such that human contributors are adequately compensated for their time and efforts, and given adequate opportunities to refuse participation and withdraw at their convenience, keeping with best practices of not exploiting epistemic labor from individuals lower in power differentials [27].Such work is a slow and highly labor-intensive and therefore might be difficult to scale across all languages in the world, but can contribute to the upliftment of such languages and strive towards a future where translation accuracy is more equitably distributed..At the implementation level, a human centered translation agent should seek clarification or ask questions when provided text without enough context to translate accurately [68].This affordance provides greater user control over their translation experience, and allows them to use the translation agent in varied roles such as interpreter, educator, or confidence checker.Additionally beneficial might be observing and modeling translations off human dialogue in group discussions, in groups moderated by translators [66].Designers might also consider suggestions on models on flexible conditional language generation [16], and adopt gender-aware approaches [e.g., 29,47] or attempts to debias algorithms [9].
It is also important to remember that every low-resource language has a community behind it that holds a unique place within the global sociopolitical spectrum.Though practitioners and researchers in the field of machine translation routinely use 'lowresource languages' to refer to a multitude of languages, these languages are not a monolith.Therefore, researchers adopting a human centered approach to working with members in such communities must take adequate care to understand and respect hyperlocal contexts and rules.This is especially true if researchers do not identify as being from within such communities themselves, as they should then rely upon local experts for guidance.
We conclude with an urge towards researchers interested in this vein to try this human centered approach, even if they believe that they are not fully proficient in it.Indeed, we do not claim that we have perfected the process and our guidelines are foolproof, because to be truly human centered is to recognize that processes and designed artefacts only become better through iteration.Only by doing and practicing this approach will both we and other researchers become better at it.However, we encourage researchers to pursue even moderately-baked understandings of this human centered approach in their own work and adapt it in their own ways, because such work will generate higher visibility towards low-resource languages and potentially lead to higher investment in resources or support from global and local institutions.

LIMITATIONS AND FUTURE WORK
Like is the case in other studies with publicly available tools such as Google Translate [e.g., 32,65], a limitation of our study is that we cannot guarantee reproducibility of our results for other researchers perfectly re-implementing our methods.This is because tools such as Google Translate and ChatGPT are subject to being updated and the underlying algorithms might change, such that our queries might produce different results.
Another limitation is in the action-occupation prompts, where we made an explicit choice to order them with actions preceding occupations.This likely impacted how the overall gender was determined in translation, and therefore an important extension of this work would be to test the order the other way around.
In a few instances of single-occupation prompts, ChatGPT initially provided us with incorrect translations of the occupation titles, and had to be corrected.For instance, it incorrectly translated the English word 'hygienist' to া য্িবজ্ঞানী, a Bengali word which actually translates to 'health scientist' in English.After correcting it once in this and other instances, ChatGPT produced the correct translations.A future extension of this work could be to study such factually incorrect translations produced by ChatGPT, and examine patterns within what words it gets wrong.
Finally, with the advent of ChatGPT-Plus5 and the novel language model GPT-4 at the time of this writing, this study warrants replication.In conducting such a replication, prompts could be designed parallel templates within standardized tests such as the Word Embedding Association Test (WEAT) [15] to strengthen the validity of observed results.

CONCLUSION
In this paper, we examined language translation performed by Chat-GPT, one of the newest AI tools that is witnessing widespread usage.We examine its performance in translating between English and Bengali, the latter chosen in part because it employs genderneutral pronouns and in part because of the sparsity of its coverage in the translation context despite it being natively spoken by over 300 million people across the world.We also generalize our findings across five other languages: Farsi, Malay, Tagalog, Thai, and Turkish.Based on prior work in evaluating translations [e.g., 14,15,65,73], we examined translations based on occupations and actions, as we were interested in seeing how ChatGPT handled the gender-neutral pronoun in translation tasks.
Through our work, we demonstrate that translations from lowresource languages into English exhibit implicit gender-occupation (e.g.doctor = male, nurse = female) and gender-action biases (e.g.cook = female), with actions potentially being a stronger factor in determining the gender of the sentence subject.We also observe ChatGPT's complete failure to associate the English genderneutral pronoun 'they' to its Bengali counterparts, as it produced translations which are grammatically incorrect and non-sensical, thus contributing towards the erasure of non-binary and trans identities.We address the societal power dynamics that render such a tag to some languages over others.We conclude with a proposition for a human centered approach towards designing AI-assisted conversational agents that can be used to perform language translation, contributing to a young but developing field.
With the advent of tools such as ChatGPT and ChatGPT Plus, the AI technology scene could be on the precipice of a changing of the guard where giants such as Google could give way to new stalwarts such as OpenAI in the field.This is an opportunity to improve the way such systems are designed, as we envision a humancentered design process that centers human flourishing and upliftment of traditionally marginalized peoples.

Figure 1 :
Figure 1: Action-occupation prompts.Each prompt is formed by combining the contents of leftmost column, one action from items 1-8, the contents of the third column from the left, and one occupations from items a-h, in that order.

List 4 :Figure 2 :
Figure 2: Examples of ChatGPT assigning the male English pronoun 'He' to the occupations engineer, mechanic, and pilot (from top to bottom).

Figure 3 :
Figure 3: Examples of ChatGPT assigning the female English pronoun 'She' to the occupations nurse, therapist and assistant (from top to bottom).

Figure 4 :
Figure 4: Examples of ChatGPT assigning the English pronouns 'He/She' to the occupations lawyer, officer and administrator (from top to bottom).

Figure 5 :
Figure 5: Example of the same Bengali prompt receiving two different translations in English: assigning the pronouns 'He' (top) and 'She' (bottom) respectively.

Figure 6 :
Figure 6: Examples of ChatGPT providing appropriately marked versions of Bengali words for teacher, but conferring a pronoun indicative of higher respect to the prompt with the English pronoun 'he' over that with the pronoun 'she'.

Figure 7 :
Figure 7: Examples of ChatGPT failing to recognize the pronoun 'they' as singular, thus producing grammatically incorrect Bengali translations with collective pronouns.

Figure 8 :
Figure 8: Examples of ChatGPT associating the female pronoun 'she' with the action of cooking, irrespective of the occupation in the second half of the prompt.

Figure 9 :
Figure 9: Example of the actions of brushing teeth (top) and eating breakfast (bottom) being assigned different pronouns based on occupations (doctor = 'he', nurse = 'she').