Better Little People Pictures: Generative Creation of Demographically Diverse Anthropographics

We explore the potential of generative AI text-to-image models to help designers efficiently craft unique, representative, and demographically diverse anthropographics that visualize data about people. Currently, creating data-driven iconic images to represent individuals in a dataset often requires considerable design effort. Generative text-to-image models can streamline the process of creating these images, but risk perpetuating designer biases in addition to stereotypes latent in the models. In response, we outline a conceptual workflow for crafting anthropographic assets for visualizations, highlighting possible sources of risk and bias as well as opportunities for reflection and refinement by a human designer. Using an implementation of this workflow with Stable Diffusion and Google Colab, we illustrate a variety of new anthropographic designs that showcase the visual expressiveness and scalability of these generative approaches. Based on our experiments, we also identify challenges and research opportunities for new AI-enabled anthropographic visualization tools.


80%
Visualizing Diversity in the US in 2021 100 Americans Figure 1: Examples of demographically diverse anthropographics with assets in various visual styles created using the Stable Diffusion generative model.We propose a conceptual workflow for crafting anthropographic assets from text-to-image models with a human designer in the loop, and identify risks and challenges involved in the design process.

ABSTRACT
We explore the potential of generative AI text-to-image models to help designers efficiently craft unique, representative, and demographically diverse anthropographics that visualize data about people.Currently, creating data-driven iconic images to represent individuals in a dataset often requires considerable design effort.
Generative text-to-image models can streamline the process of creating these images, but risk perpetuating designer biases in addition to stereotypes latent in the models.In response, we outline a conceptual workflow for crafting anthropographic assets for visualizations, highlighting possible sources of risk and bias as well as opportunities for reflection and refinement by a human designer.
Using an implementation of this workflow with Stable Diffusion and Google Colab, we illustrate a variety of new anthropographic designs that showcase the visual expressiveness and scalability of these generative approaches.Based on our experiments, we also identify challenges and research opportunities for new AI-enabled anthropographic visualization tools.

INTRODUCTION
Anthropographics are human-shaped visualizations that are frequently used in visualization research and data journalism to represent data about people.Most current anthropographics are unit visualizations that use human shapes of varying levels of complexity to represent individual data points.Although past research on anthropographics has primarily investigated their effectiveness at provoking prosocial feelings towards humanitarian causes [4,24], there has been renewed interest in their potential to help viewers appreciate the humanity of the people being represented [11,12,32].The field has also seen growing calls for designers and researchers to critically consider how they design data visualizations.For instance, Schwabish and Feng [31] propose that designers need an awareness of racial equity when creating visualizations of data about marginalized populations and propose a number of guidelines for design.Within visualization research, Dhawka et al. [11] defined demographically diverse anthropographics as ones that represent demographic data using physical characteristics of diversity-in contrast to homogeneous anthropographics that use generic human shapes to represent varied groups and individuals, which may lead to misrepresentation of marginalized groups.Currently, creating demographically diverse anthropographics requires considerable design effort and is restricted by access to design tools.Most published anthropographics in data journalism represent large datasets about people.For instance, news outlets have used complex anthropographic visualizations with hundreds of thousands of data points to represent COVID-19 deaths and infection counts in attempts to evoke a sense of togetherness among readers [1] as well as to highlight the scale of the pandemic [6].Thus, creating complex diverse anthropographics of these datasets can be time-consuming and also requires access to specialized design talent.However, generative text-to-image models can support designers by streamlining the process of creating and prototyping diverse anthropographic assets of varying complexity.In fact, the visualization research community has increasingly been envisioning opportunities for designers and researchers to benefit from the capabilities of generative text-to-image models.For instance, Schetinger et al. [29] propose research opportunities to integrate existing text-to-image models such as DALL•E 2 1 , Midjourney 2 , and Stable Diffusion 3 within a generic visualization design workflow and explicitly identify the creation of anthropographics as an area of interest.Among these calls to integrate generative models in visualization research, there has also been much-needed scrutiny of the limitations and impact of the biases latent in text-to-image models.For instance, extensive media coverage by Bloomberg [26] has underscored how the ageist, sexist, and racist biases present in these models are amplified in the outcomes they produce.These existing limitations may also interact with the designers' own biases, resulting in potentially harmful or misleading images.Additionally, there are ethical concerns [10] surrounding the use of generative models, such as a lack of data agency and privacy, the spread of misinformation via deepfakes [22], human rights concerns around data annotation [38], copyright issues [20], and harmful impacts on artists [17].
To help navigate these challenges, we propose a conceptual workflow for rapidly designing diverse anthropographic assets for data visualizations, highlighting possible sources of risk and bias, as well as opportunities for reflection and refinement.Building on previous work on diverse anthropographics and biases in generative text-to-image models, we explore how these models can streamline the process of creating anthropographic assets.We make three contributions: First, we contribute a visualization workflow that foregrounds the importance of the human designer in mitigating biases in the output of generative text-to-image models.Second, using an implementation of our workflow with Stable Diffusion 1 https://openai.com/dall-e-2(Permalink: perma.cc/4YMQ-XABE) 2 https://www.midjourney.com/(Permalink: https://perma.cc/J4XX-ZRHG 3 https://stability.ai/news/stable-diffusion-public-release(Permalink: perma.cc/2G2N-U7E7)and Google Colab4 , we illustrate various anthropographic designs to showcase the range of possible visual styles and the scalability of these generative approaches.Lastly, we contribute a collection of challenges and research opportunities to advance the creation of AI-enabled tools for diverse anthropographics.

RELATED WORK
We first summarize related work within information visualization, human-computer interaction (HCI), computer vision, and algorithmic systems with a focus on (1) diverse anthropographics, and (2) biases in generative AI systems.

Diverse Anthropographics
Previous work on anthropographics has focused on their effectiveness for evoking prosocial feelings in donation allocation experiments as well as their potential for representing demographic data with an awareness of racial equity.Boy et al. [4] proposed a design space for anthropographics and carried out charitable donation experiments to investigate their effect on provoking feelings such as empathy in viewers.Subsequent work by Morais and colleagues [23] has included a more extensive design space for anthropographics and large-scale donation allocation experiments for humanitarian causes.Ivanov and colleagues [16] explored the potential of anthropographics in the shape of human silhouettes in immersive environments while Liem et al. [19] used black and white anthropographics sketches to investigate attitudes towards immigration in Europe.
Considering the potential of anthropographics for representing demographic data, Dhawka et al. [11] proposed demographically diverse anthropographics that visualize human diversity using physical characteristics and distinguish these from current homogeneous anthropographics with generic human shapes that obscure the demographic differences between distinct individuals and groups.Additionally, Dhawka et al. [11] proposed several research directions involving generative AI to rapidly create diverse anthropographics.Similarly, Schetinger et al. [29] outline research opportunities for visualization researchers to experiment with generative models and specifically identify the creation of anthropographics as an area of interest.Despite much discussion around potential uses of generative AI for anthropographics, previous work has yet to explore how to rapidly and effectively create diverse anthropographic assets.Our work bridges this gap by contributing a conceptual workflow that involves human designers in the creation of anthropographic assets, and demonstrates it using a variety of visual styles, input datasets, and prompts.

Bias in AI systems
Ongoing work in related fields has documented the many biases present in AI models and systems, emphasizing the importance of data audits to detect biases and encouraging more inclusive research practices.In their early analysis of image datasets used for training facial analysis algorithms, Buolamwini and Gebru [7] found that these training datasets were heavily skewed towards White individuals and that the algorithms severely misclassified people with darker skin.In their examination of 92 publicly available AI Aesthetic inconsistency."Native American" labels produce caricatured representations.
Similar demographics result in a lack of individual variation.
Anomalous results and visual defects.training datasets, Park et al. [27] found that older adults are severely underrepresented, with the majority of images skewing towards younger people.Birhane and colleagues [3] carried out a data audit of the LAION-400M training dataset on which generative models such as Stable Diffusion have been trained, with their findings underscoring that the training dataset contains deeply disturbing NSFW images, including images that amplify racist, sexist, and harmful stereotypes against marginalized and underrepresented populations.Similarly, Scheuerman et al. [30] examined the values present in image datasets used in the computer vision community and found that the collection, curation, and creation processes behind these training datasets often prioritize efficiency over care for the individuals whose data is being used.Considering the outputs produced by AI systems, Salminem and colleagues [28] examined the biases present in 10,000 artificially generated images of individual faces and found that the images skewed towards younger White women.Wolfe and Caliskan [35] examined three language-and-image models for previously documented biases and found that the images produced by these models strongly associated being American with being White.Other recent work by Bianchi et al. [2] has demonstrated how input prompts for basic social roles used with text-to-image generative models can result in images with racist and sexist stereotypes.These findings [2,35] further support the consensus in the AI fairness community that these models are significantly biased towards White and male individuals and perform poorly for the majority of demographic groups.In our workflow, we discuss the various steps in a visualization pipeline where a designer needs to be aware of the biases latent in generative text-to-image models and be actively involved in detecting and refining biased image outputs.

CREATING DIVERSE ANTHROPOGRAPHICS WITH GENERATIVE MODELS
Building on the research opportunities outlined by Schetinger et al. [29] and Dhawka et al. [11], we set out to examine the potential for generative text-to-image models to support the rapid creation of diverse and representative human images for use in anthropographics.Our own prior experiences and those of other researchers and designers highlight that designing complex anthropographics typically requires considerable design effort and is often restricted by a lack of dedicated design tools.These challenges are particularly apparent with anthropographic approaches like face charts [24] which represent individuals or groups using human-like images with high specificity and realism-and thus require the creation of large numbers of unique assets which accurately reflect the demographics of the visualized populations.Given recent rapid advances in the capabilities of generative text-to-image models, we hoped to explore their ability to help designers streamline the creation of anthropographic assets, while also characterizing their risks.Our team of three authors was diverse, including two researchers (Dhawka and Willett) with previous experience creating demographically diverse anthropographics using other tools, and a third (Perera) who had no prior experience with these visualizations.The three members also spanned a range of ages, genders, ethnicities, and countries of origin.Throughout, we sought to leverage these varied perspectives, especially when identifying risks and challenges posed by generative anthropographic tools and brainstorming future strategies to mitigate these issues.For each we show results for a 20-year old "woman" and "man" across five race categories ("Asian", "Black", "Hispanic or Latino", "Multiracial" and "White") drawn from the US census [9].

Background and Current Approaches
Currently, creating diverse anthropographics is a labor-intensive process.Dhawka et al. [11] outline design-related challenges in the creation of demographically diverse anthropographics, including a lack of specialized tools for designers.Given these constraints, anthropographics have predominantly been used in data journalism stories at large news outlets with access to design teams and considerable technical resources.Without these, individual designers have often relied on labor-intensive combinations of hand-drawn illustrations and digital tools to create anthropographic visualizations.For instance, Mona Chalabi-a data journalist at The Guardian and freelance illustrator-describes a complex process for creating her unique hand-drawn infographics [14] which includes drawing the illustrations by hand before using a combination of Excel, R, and Photoshop to compose the final visualizations.Given the limitations around how designers currently create these visualizations, our workflow can support designers in using text-to-image generators for the rapid prototyping and creation of diverse anthropographics while aiming to mitigate potential issues that may arise during this process.
Compared to these labor-intensive approaches, constructing diverse anthropographics using text-to-image generators has the potential to be considerably faster and easier.However, as documented by ongoing work on the downsides of these systems [2,26,35], the output images produced by text-to-image generators may perpetuate or even amplify societal biases around age, gender, race and other demographic characteristics that are often present in the training datasets of these models.Figure 2 (inspired by the experiments run by Nicoletti and Bass at Bloomberg [26]) illustrates how naive approaches for generating diverse anthropographic assets without designer intervention may result in biased, caricatured, and inconsistent images.Both the promise and pitfalls evident in these base images motivated us to explore more nuanced human-in-the-loop models for generative anthropographic asset creation.

Methodology
We took a reflective and design-oriented approach to these explorations, spending roughly six months using a variety of textto-image models to generate assets for demographically diverse anthropographics in distinct visual styles and with various input datasets.During the first three months, the second author (Perera) conducted an iterative and open-ended set of experiments using both ArtBreeder5 and Stable Diffusion.These design experiments examined a variety of models, prompts, and seeding techniques with a focus on identifying reliable prompts for creating anthropographic assets and probing model biases.During this period, Perera created over 1500 example images, meeting weekly with the first and third authors (Dhawka and Willett) to discuss design progress, outcomes, and potential opportunities.
Building on these initial explorations, all three authors worked together over the following three months to more systematically test a larger set of text-to-image models and a range of visual styles, create assets, and author new anthropographic visualizations.Generative text-to-image models are evolving quickly, with a large  and growing range of new models regularly available for experimentation.To maximize our ability to explore a wider variety of tailored models, we chose to focus all of our later explorations on Stable Diffusion, which supports a broad range of open-source and community-generated models across many art styles shared on platforms such as Civitai 6 , and Hugging Face7 .Throughout these six months, we developed and tested custom prompts using demographic category labels and by varying descriptors for human features such as hair, facial features, and clothing.To streamline the process of creating anthropographic assets, we also implemented a reproducible workflow that leveraged interactive Google Colab8 notebooks as well as our own custom anthrogen Python library (see supplementary materials).We based the majority of our later experiments, as well as our final visualizations, on publicly available population demographic data drawn from the United States Census Public Use Microdata Sample database (PUMS) [9,33] via its public API.PUMS provides access to fine-grained de-identified census data for individual US residents, each with several hundred attributes including age, race, geographic location, occupation, and ancestry.We then used this data to systematically populate text-to-image prompts and quickly generate batches of tens or hundreds of unique assets representing individuals with those demographic characteristics.These assets then served as the building blocks for a series of new anthropographic visualizations (including those shown in Figures 1, 6, 7, and 8) which we created using a mix of design tools (Adobe Illustrator, Miro), visualization applications (Tableau), and other platforms (D3/Observable).
Our results include examples from at least ten different opensource Stable Diffusion variants, spanning a wide range of visual styles and levels of expressiveness (Figure 3), such as models trained on abstract shapes (Flat Design Icons9 ), work by historical artists (Van Gogh Diffusion10 ), and images from contemporary 3D animated films (Disney Pixal11 ), as well as more photo-realistic images (Open Journey 12 , Realistic Vision 13 ).
Throughout this period, we met weekly to discuss our design choices, itemize risks and biases we observed, and share strategies to address the challenges we encountered.Based on our discussions, we refined our prompts to create more diverse anthropographic assets that reflected the input demographic labels and introduced greater individual variation.We also iteratively refined and abstracted our process, developing a conceptual workflow that actively involves human designers in the generative creation of demographically diverse anthropographic assets.

Conceptual Workflow
Due to their socially-sensitive implications, creating demographically diverse anthropographics often involves challenging design decisions [11].This makes automating the use of text-to-image models for anthropographic authoring challenging, given their unpredictability and tendency to reproduce biases in their (often obfuscated) training data.Instead, we believe these tools lend themselves better to iterative human-in-the-loop design processes, in which designers have the opportunity to guide image generation while actively guarding against potential sources of risk.With that in mind, we propose a conceptual workflow (Figure 4) for semi-automated anthropographic asset creation, highlighting both opportunities for designer input as well as possible hazards.At each step, we pose questions that highlight possible risks-including opportunities for biases from the text-to-image model or the designer to manifest in the resulting assets.While informed primarily by our experiments with Stable Diffusion variants, the workflow is model-agnostic and can be implemented with other text-to-image models or applications.
In addition to prompting critical thinking around the decisions and risks associated with using text-to-image models for anthropographic creation, the workflow can also serve as a guide for new authoring tools.For example, during our own experimentation, we developed custom Google Colab notebooks (see Figure 5 and supplementary material) that instantiate the workflow's steps and decision points using Colab Forms.These provide a simple but extensible implementation of the workflow that foregrounds important choices and encourages reflection on risks at each step.

Workflow Phases and Decision Points
We divide the workflow into two phases: a Data Curation and Model Selection stage in which designers consider and augment their input data and choose a suitable text-to-image model, and an Image Generation and Refinement stage in which designers iteratively manage and evaluate the generation of anthropographic assets.

Data Curation and Model Selection.
Creating demographically diverse anthropographics typically requires that the input dataset (the dataset to be visualized) contains demographic attributes and/or descriptors of physical diversity.However, the lack of demographic attributes in a dataset need not be a barrier to creating diverse anthropographics.Building on the data generation strategies from Dhawka et al. [11], a designer may decide to create anthropographic assets by joining input data with existing population-level datasets such as census or survey samples.Alternatively designers may choose to randomly generate or simulate demographic and physical diversity attributes.
Designers must also choose which attributes in the dataset they wish to visually encode in the anthropographics.These design decisions will influence the choice of text-to-image model and initial template prompts.Given the highly-tuned nature of many text-to-image models, at this stage designers need to consider the documented capabilities and limitations of the available models.
Although our examples primarily examine varying visual styles to create demographically diverse anthropographics with high specificity, designers may also choose to vary other anthropographic dimensions (including those identified in Morais et al.'s design space [23]).

Image
Generation and Refinement.Due to the inscrutability of text-to-image models, using them often involves considerable effort in engineering input prompts to generate the desired output images.After designers begin generating images, they may refine and iterate on the input prompts as needed, depending on the quality and plausibility of the outputs.For erroneous images or ones with visible social biases, the designer may fine-tune the prompts and demographic attributes in the dataset as required.Given the don't qualify for unpaid maternity leave under the FMLA.relatively high computational cost associated with synthesizing large sets of images, our model treats image generation as a twopart process-with initial iterations devoted to tuning prompts using small samples of the dataset and generation of larger batches of images done only after the initial prompts perform satisfactorily.Visible errors or biases in these larger sets may require further designer action to correct them, and pervasive issues at any point may require designers to step back and revisit their initial data, model, or prompt template choices.

Risks when Generating Anthropographics
At each stage in our workflow, we highlight potential sources of risk that designers may wish to consider.As in a conventional visualization design process, designers need to remain mindful of how existing biases may spread or new issues may arise.Datasets to be visualized are usually not neutral and contain existing biases that reflect data measurement and collection processes.Often, unaddressed biases at this step propagate through the visualization pipeline and can manifest in the final visualization.Designers also need to carefully consider whether demographically diverse anthropographics are appropriate for the data or context being visualized.For instance, diverse anthropographics may be suitable for smaller datasets where demographic data such as age, gender, occupation (among other characteristics) could provide additional context and encourage audiences to better interact with the visualizations.However, they may be less appropriate when preserving the privacy and anonymity of the people behind the data is a higher priority.Diverse anthropographics are also less suitable in cases where revealing the demographic and physical characteristics of individuals may lead audiences to interpret the data from a deficit perspective or through their existing negative biases.
When choosing the input dataset, a designer may introduce biases when deciding to randomly generate or simulate additional data about demographic or physical attributes.More importantly, designers need to be conscientious that exposing demographic details does not pose any risk to the people in the data.In these cases, designers can choose to experiment with diverse anthropographics of varying levels of realism, where abstracted visual styles may offer more privacy to individuals over more photo-realistic anthropographics.The generated data also may not accurately depict the populations being visualized.Here, the designer runs the risk of creating inaccurate physical depictions that may reinforce stereotypes or may not be credible to viewers.The designer's positionality and the intended purpose of the final anthropographic visualization may influence their choice of text-to-image model and visual style.Images in the training data for these models may contain ageist, sexist, racist, and other societal biases from the data collection and annotation process (as documented by previous work [2,3,27,35]).Prompt templates provided by the creators of model variants may also be biased in similar ways.Likewise, the input prompts developed by a designer may reflect their own unconscious biases, which may compound the societal inequities present in the model's training data.Furthermore, the opacity of current text-to-image models makes it challenging to tailor input prompts to specific model variants.The designer's unconscious biases or personal expectations could also hinder the process of identifying whether output images misrepresent or harm the people in the data.Towards the end of this process, the designer needs to ensure that the final visualization contains adequate provenance and contextual information that accurately and credibly explains the input dataset, choice of model, prompts, and other design decisions to viewers.These risks are not exhaustive but rather reflect the main challenges we encountered during our experiments.We provide a deeper discussion of the overarching challenges around the use of generative text-to-image models for anthropographic creation in section 4.

Case Studies
To illustrate our exploratory process, we describe several of our experiments creating anthropographic assets of varying visual styles and realism, each using different input datasets and fine-tuned Stable Diffusion variants.Through three case studies, we discuss the creation of new anthropographic designs and highlight challenges we encountered along the way.These explorations each incorporate assets generated using Stable Diffusion, our anthrogen Python library, and custom Colab templates, with additional graphic design tools (Miro, Illustrator) for final composition.Links to live Colab notebooks for each case study can be found in the footnotes as well as the supplementary materials.
3.6.1 Generating Abstract and Highly-Stylized Assets.In our first case study 14 , we experimented with creating both demographically homogeneous and demographically diverse assets for a simple infographic (Figure 6) highlighting access to parental leave in the More Women Than Men Are Going To College In The US. US-inspired by a Vox Media data story titled "The Economic Case for Abortion Rights" [15].We used the base model of Stable Diffusion to create simple homogeneous human shapes representing women who do not qualify for parental leave, emulating the icons used in the original Vox story.We then paired these with diverse brushstroke portraits created using Van Gogh Diffusion 15 , a variant of Stable Diffusion tuned using images of Van Gogh paintings, to evocatively highlight the diversity of the population impacted by these leave policies.Since the original data story does not provide additional demographic data other than gender, we used only the descriptor "woman" from the original graphic when creating the homogeneous icons.With the Van Gogh variant, we used additional race demographic labels from the US census [9] to generate unique demographically diverse anthropographic assets.Using the base model of Stable Diffusion, it took us over an hour to generate simple, homogeneous human shapes.This primarily involved experimenting with a number of input prompts to generate coherent, legible images.Comparatively, our experiments with the Van Gogh variant took less than 30 minutes to generate roughly 50 images, from which we then picked the 6 that we felt best suited the piece.Traditionally most anthropographics have been built around simple human-shaped icons, which as Scott McCloud [21] and others suggest, can function as more universal symbols reflecting a wide range of individuals.Here however, generating simple human shapes with the base Stable Diffusion model was challenging because text-to-image models may not contain images labelled that way ("simple human shapes", "human icons") and most of the fine-tuned Stable Diffusion models we experimented with were 15 https://huggingface.co/dallinmackay/Van-Gogh-diffusion (Permalink: https://perma.cc/N7V2-4WF5)not trained on large datasets of icons.Although generative text-toimage models excel at creating complex, realistic images, most of current approaches are unable to produce recognizable, coherent, or legible geometric icons (as demonstrated in recent work by Wu et al. [37]).In our experiments, the base Stable Diffusion model failed to produce legible images that even remotely resembled human icons.Furthermore, even when creating homogeneous anthropographics, designers may still need to make encoding choices that may reflect their own biases.For instance, by choosing to keep homogeneous anthropographic assets with stereotypical female outlines and hair styles, our resulting visualization only illustrates one representation of gender.

59% 41%
3.6.2Rapidly Generating Cartoonized Assets.In our second case study 16 , we explore how a designer might rapidly create a percentile visualization featuring a larger number of diverse anthropographic assets (Figure 7)-here illustrating a statistic about the gender gap among undergraduate college students in the US [36].To create more playful and expressive cartoonized assets evoking a younger population, we experimented with a variant of Stable Diffusion trained on computer-animated Disney characters 17 .The input data only contained gender information.However, to create a more inclusive, diverse, and racially-balanced anthropographic we chose to add simulated race and age data.We generated data for 100 individuals by randomly sampling ages between 17-25 (stereotypical ages for US undergraduates) and sampling race labels from 5 top-level race/ethnicity labels drawn from the US census [9] (similar to the approach illustrated in Figure 3).

PEOPLE
in FLORIDA Figure 8: An example infographic with photo-realistic anthropographic assets generated using the Realistic Vision variant and PUMS census data [33], representing individuals residing in the US states of Wyoming and Florida.In contrast to the naive example in Fig. 2, this version features more representative and photo-realistic anthropographic assets generated using age, gender, race, and ancestry category labels derived from the census.
Our experiments generating hundreds of anthropographic assets in this specific visual style took roughly under an hour and we refined erroneous images such as ones with extra limbs or floating objects by re-generating assets using the same prompt until we were satisfied with the quality of the image.Our final visualization contains 100 images of individuals, with 59 images representing "women", and 41 images representing "men".We observed that the assets in this visual style contained facial features that were somewhat similar across the 5 race categories but that may not necessarily reflect real-life human features.For instance, several of the images we generated for the "White" and "Black" race categories contained white hair.Additionally, the images produced were still skewed towards certain demographics and exhibited a beauty bias (with faces featuring perfect hair, skin, and facial features), as well as stereotypical representations of gender (long hair and feminine makeup) and race (culturally specific hair styles) latent in text-toimage models [3].

3.6.3
Creating Data-Driven Photo-Realistic Assets.Our third case study 18 illustrates how a human designer in the loop can help guide and mitigate potential bias when creating more photo-realistic representations of individuals-here visualizing data about Electoral College [13] representation in two US states, Wyoming and Florida (Figure 8).To create assets for this infographic, we used the Realistic Vision variant of Stable Diffusion 19 , which is fine-tuned for creating highly photo-realistic images.For this experiment, we used demographic data for individual residents of the two states 18 Colab notebook for case study #3: https://bit.ly/42I22cY 19https://huggingface.co/SG161222/Realistic_Vision_V2.0 (Permalink: https://perma.cc/4UCH-WJKL)drawn from PUMS, the United States Census Public Use Microdata Sample [33] which contains fine-grained data about age, gender, race, ancestry, and US state.To more accurately represent the demographics of the two states, we randomly sampled data from 20 anonymized individuals from Wyoming and 75 from Florida, then used their age, gender, race, and ancestry data to parameterize the image generation step.
Creating the final set of assets entailed numerous rounds of iteration to develop a reliable prompt template.We then batch-generated the final set of 95 images, manually inspecting and repeating generation for the roughly 10% of images with visible aesthetic issues (cropped faces, malformed accessories, etc.) or which raised bias concerns for us (namely clothing that might reflect unrepresentative stereotypes).As a result of decisions and improvements during the workflow, the resulting images appear more consistent, realistic, and representative than those generated using a naive approach (Figure 2).However, they still exhibit a consistent beauty bias, tending towards youthful and idealized faces that may not accurately correspond to target ages.The binary gender categories in the underlying census data also produce images that exhibit a strong and stereotyped gender divide, even though the underlying model is capable of producing more gender-ambiguous faces.Moreover, relying on a photo-realistic model tuned using a smaller set of training images may result in even more individuals who look extremely similar to one other.These synthetic photo-realistic images-which could easily be mistaken for real photos-also highlight challenges related to explaining the provenance of anthropographic assets to viewers.We elaborate on these risks and their impact on anthropographic visualizations in deeper detail in section 4.

CHALLENGES
Designing demographically diverse anthropographics can be challenging, particularly when using generative text-to-image models with a documented history of producing biased and flawed outputs.We extend the set of challenges related to the design of demographically diverse anthropographics outlined by Dhawka et al. [11] with a specific focus on risks particular to text-to-image models.

Ethical Concerns with Generative Models
Despite their potential, generative text-to-image models are flawed and opaque tools.Currently, very little is known about how these models generate highly realistic and complex images.In fact, the AI fairness research community has devoted significant efforts in understanding and mitigating the harms from text-to-image models [3,8,10,17].Generative models frequently produce "hallucinations" and can result in a wide range of unrealistic or problematic imagery, in some cases contributing to the spread of misinformation online.Additionally, most contemporary text-to-image models also come with deep-rooted data agency and privacy concerns.These models are frequently trained and maintained using large datasets that have been collected without much care for data privacy and transparency [3] and are increasingly being commercialized by companies for profit.Meanwhile, individuals whose data is being collected and used in these training datasets are rarely remunerated and are typically unaware of their inclusion.Furthermore, these datasets are annotated by human data workers who are largely situated in the Global South, work for little pay, and are exposed to disturbing content [38].There have also been growing concerns about copyright issues and the impact of text-to-image models on artists [17], with upcoming class-action litigation against the companies behind Stable Diffusion, Midjourney, and other models [20].As designers and researchers who use AI in our work, we should be aware of and critically weigh these ethical concerns and the ongoing harm caused by rapid advances in these technologies against their potential benefits.

Model-Specific Language
Using text-to-image models to create anthropographic assets may involve considerable prompt engineering effort from designers to generate desired output images.In our experiments with Stable Diffusion, we found that the character of our prompts and the quality of the resulting images varied considerably depending on whether the model variant was fine-tuned for the specific visual style we wanted and whether our source data contained keywords to which the model was responsive.For instance, when generating simple shapes for use in homogeneous anthropographics, using a Stable Diffusion variant that has been trained on iconic images will typically yield better results than a variant that has been fine-tuned for more complex, photo-realistic images.However, because of the opacity of these models, it is very difficult to anticipate whether an image prompt designed for one model will work well with another or will perform robustly when generating large numbers of different anthropographic assets.
Prompts that include demographic terms pose particular challenges, since different training datasets and processes mean that each text-to-image model learns different associations between data attributes like race or ethnicity labels and phenotypic attributes like skin colors and hairstyles.Because training data for these models rely heavily on scraped metadata and labels assigned by human data workers from a wide range of backgrounds [18] these labels can be highly subjective or even biased.They are also extremely unlikely to align cleanly with the category labels used in demographic datasets.For example, models designed for general-purpose image synthesis are unlikely to have learned reliable or accurate mappings between verbose or ambiguous US Census categories for underrepresented populations (like "American Indian or Alaska Native" or "two or more races") and the diverse physical appearances of the real people who identify with those labels.Even common race descriptors like "Black" or "White" have extremely different cultural connotations in different communities around the world, and depending on the source of the training data can be confounded with other colloquial, biased, or insensitive terms.Yet, because they are less-aliased, prompts that mirror those biased or offensive descriptors may actually result in higher quality image outputs.Thus, having a human designer in the loop is essential for detecting and mitigating these issues.

Ensuring Individual Variation
One of the major challenges of using text-to-image models to create anthropographic assets is the tendency of text-to-image models to produce extremely similar images in response to related prompts.As seen in our case study with photo-realistic assets (Sec 3.6.3),this issue can manifest dramatically when generating images based on limited sets of demographic attributes, which may in turn map to relatively narrow corners of a model's feature space.For example, with most text-to-image models, requests for images showing a "45 year old Black woman", "50 year old Black woman", and "50 year old Hispanic woman" are likely to produce images that are uncannily similar to one another-creating the unintended impression that the images show relatives or even different versions of the same person, rather than individuals sampled from a broader population.
To produce more distinct anthropographic assets, we experimented with multiple strategies to induce individual variationincluding adding prompt modifiers for occupation and ancestry based on census data, along with randomized modifiers for hair, nose, and facial features (Figure 9).Broadly, we found that adding these kinds of additional modifiers introduced only limited variation.However, due to the opacity of these models and our own design biases, it is challenging to extrapolate from these experiences to other text-to-image models.Additionally, due to the lack of transparency around how text-to-image models produce these photo-realistic images, it is difficult to assess how wide of a range of individuals a given model can possibly create.
Discussions in online communities devoted to using Stable Diffusion to create game characters and AI art suggest that generating variation in photo-realistic images is a common challenge.User communities on platforms such as Reddit 20 and Civitai 21 have experimented with approaches that use random seeds, image-toimage prompts, and targeted starter images to create photo-realistic outputs with greater individual variation.However, designers and researchers also need to consider the unintended side effects of these approaches.For instance, generating variation using imageto-image approaches may require considerable additional effort when choosing starter images.Designers may also risk producing images that look too similar to real individuals.Recent work by Carlini et al. [8] found that text-to-image models "remember" the images they were trained on and this memory is reflected in output images that retain characteristics of the training data, raising concerns about the privacy, anonymity, and consent of the individuals in the data.

Mitigating Biases in Image Outputs
Previous work on the various biases latent in generative text-toimage models have raised concerns about corresponding biases within the datasets used to train these models [3].Online user communities have experimented with training the base Stable Diffusion model on curated datasets 22 to mitigate beauty and gender biases.However, streamlined processes to resolve these biases are few and relatively understudied.
In our workflow, we provide options for designers to refine faulty or biased anthropographic assets until they decide that the output is usable and meets their expectations.However, this process relies on designers' positionalities, their awareness of their own biases, and their understanding of the potential harm that biased images can cause to the populations being visualized.Designers may lack the cultural context to appreciate inaccuracies or stereotypes in synthetic images of people whose backgrounds differ from their own, especially when they include culturally-specific clothing, hairstyles, accessories, and other markers.Additionally, as Park et al. [27] point out, most AI training datasets are skewed towards 22 https://civitai.com/models/98755/humans(Permalink: https://perma.cc/2H7J-8V5F)younger populations and are ageist towards older ones.Indeed, our experiments with a variant of Stable Diffusion fine-tuned for photo-realistic images showed that expanding the age range to include older adults resulted in only slight, and somewhat implausible, changes in the output images.Analogous biases related to other demographic attributes such as gender, race, religion, and occupation exist in current text-to-image models.

Explaining Provenance and Testing Credibility with Viewers
The convincing appearance of many text-to-image outputs (particularly photo-realistic ones) is likely to create confusion for many viewers, who may be surprised or feel deceived when they learn that the individuals shown in an anthropographic are synthetic.This potential mistrust emerges from an increasing public awareness of pernicious uses of AI-generated images of people, including the use of deepfakes (fake images with the likenesses of real individuals) to spread misinformation online.In the fashion industry, companies such as Levi's [34] have also garnered public criticism for using synthetic images of non-White individuals in their marketing to make disingenuous claims about diversity and inclusion.As a result, honest and transparent approaches to creating diverse anthropographics call for ways in which designers can explain the provenance of and rationale for the generated images to viewers.This likely entails explicitly labeling generated images and the anthropographic designs that include them, and explicitly communicating information about the underlying models, datasets, and design decisions.The potential costs and benefits of these kinds of anthropographics (particularly photo-realistic ones) also remain poorly understood, since little research has yet examined the extent to which viewers find them legible, humanizing, or credible.Therefore, although we illustrate example anthropographic designs with photorealistic images, we suspect they may be appropriate for a relatively narrow range of use cases in which the audiences and groups being represented are well-understood and the risks to individuals are low.We urge designers to strongly consider the impacts of these representations on the people or groups they representespecially when showing data about marginalized and underrepresented populations.Wherever possible, we believe that designers should foreground the possibility of using real data about real people (with consent and when such data is available) and consider more abstracted visual styles (including approaches like stippled or cartoonized portraits) that are less likely to be misinterpreted.

RESEARCH OPPORTUNITIES
Based on our reflections, we present several research opportunities for visualization researchers to support the development of AIdriven tools for demographically diverse anthropographics.We see these opportunities as a starting point for researchers to engage more deeply with the challenges faced by designers working on diverse anthropographics outside of the academy.

Training Anthropographic-Specific Models
The potential of text-to-image models for creating assets for diverse anthropographics is currently hampered by the limitations of the data used to train these models.As Birhane et al.Given these concerns, it may be beneficial to consider training custom text-to-image models specifically to support the creation of diverse anthropographics.First and foremost, we expect that anthropographics-oriented models would benefit from training on debiased and audited datasets that reduce opportunities for the underlying model to perpetuate systemic social and cultural biases.The gender-balanced dataset used by Buolamwini and Gebru in their Gender Shades project [7] highlights the potential of these approaches, which could be extended to categories such as age diversity, (dis)ability status, race, religion, geographical location, among others.Labeling training images with standard demographic keywords that align with those used in demographic datasets like national censuses, could also make it easier to generate assets that accurately reflect those characteristics.However, doing so presents a variety of coding and privacy challenges, and datasets linking images of real individuals to systematic demographic encodings are rare outside of government ID databases-and even these may not accurately reflect the ways in which individuals self-identify.The creation of opt-in community-generated datasets to support this kind of training represents an intriguing (but possibly daunting) opportunity for future research.Finally, training models using anthropographic-style images (including more naturalistic photos of real individuals as well as more abstract and privacy-preserving representations) could considerably improve the quality of the resulting assets.In particular, naturalistic photos of diverse individuals, as well as examples of more abstracted visual marks-such as ISOTYPE figures, human silhouettes, and stippled portraits-could reduce the impact of beauty bias while allowing designers to more readily generate images that suit a variety of data visualization forms.

Interactive Anthropographic Design Tools
Supporting the creation of diverse anthropographics also calls for new tools tailored to various steps of the visualization design process.While our initial implementations of our conceptual workflow via Colab notebooks (Sec 3.3) illustrate one way of operationalizing anthropographic asset creation, we anticipate that the needs of designers will differ based on the intended purpose of the visualizations they create.
Tools for integrating anthropographic elements into existing visualizations may be particularly useful for data journalists and other designers who want to quickly prototype and deploy designs that include human representations.These could include interfaces that bring anthropographic creation into graphical visualization tools alongside other generative glyph-creators (such as Brehmer et al. 's Diatoms [5]) as well as more tailored applications that streamline the iterative refinement and deployment of generated assets.Meanwhile, tools for auditing outputs and assessing credibility could help designers identify biases or inconsistencies in both assets and visualizations.Auditing tasks like these could benefit from (semi-)automated visual tools for clustering and comparing assets-taking cues from machine learning-oriented bias detection tools like Munechika et al. 's Visual Auditor [25].Similarly, community-and crowd-oriented tools for eliciting feedback on assets-ideally from representatives of both the target audiences and the visualized individuals or groups-could help designers understand the impact of their designs and build awareness of their own biases.This genre of tools may be of particular interest to designers seeking to use anthropographics to humanize datasets or elicit specific emotional responses.Finally, tools for exposing asset provenance are likely to be an essential complement to any anthropographics that target an audience for whom the origin of the images may be cause for interest or concern.Hence, we call for more research into tools and strategies that can support transparency and provide context for viewers.These might include visualization software that supports embedding interactive tooltips, footnotes, or other augmentations with longer text descriptions of the input datasets, prompts, and text-to-image models used (as illustrated in Figure 10).Ultimately, we believe that future research in this space calls for more direct input from designers to clarify challenges they face at different steps in their anthropographic creation and storytelling processes.

CONCLUSION
In this paper, we proposed a conceptual human-in-the-loop workflow to support the creation of anthropographics using generative text-to-image models.Our goal was to explore how these technologies can support designers in rapidly crafting assets for unique, representative, and demographically diverse anthropographics.Because this work draws primarily on qualitative reflection grounded in our own experiences creating anthropographic assets, rather than a structured comparison between our curated process and more automated approaches, additional research would be necessary to validate the efficacy of our proposed workflows.Nevertheless, our experiments highlight a set of nuanced and context-dependent challenges that are unlikely to be solved in the near term by automated systems, along with forward-looking opportunities to reduce the risks associated with these models.
As with any technology, generative text-to-image models can have undesirable impacts.However, in the context of designing data-driven diverse anthropographics, the risks associated with text-to-image models can cause great harm, especially to marginalized and underrepresented populations.At the same time, these models can considerably lower the barriers to easily and rapidly create demographically diverse anthropographics, making these kinds of visualizations more accessible to designers and audiences.Moreover, because these technologies are trained on our collective data and benefit from public use, we strongly believe that a considerable communal effort is necessary to improve them.As these technologies become mainstream, we need to reflect, as a research community, on how they can benefit us without causing harm to the most vulnerable among us.With that in mind, we hope that our contributions in this work can initiate community conversations and foster new opportunities to support more designers in accessing and adopting these technologies to create better representations of people.

Figure 2 :
Figure 2: Images of 36 individuals generated using Stable Diffusion 2.1 based on US Census data from Wyoming[33] with the prompt "a color photograph of a {{{age}}} year old {{{race}}} {{{sex}}}, headshot, high-quality" (based on Nicoletti and Bass's Bloomberg piece[26]) reveal a range of aesthetic issues as well as cultural and individual biases in the source model.

Figure 3 :
Figure 3: A snapshot of our experiments with visual style, expressiveness, and realism using five different Stable Diffusion model variants.For each we show results for a 20-year old "woman" and "man" across five race categories ("Asian", "Black", "Hispanic or Latino", "Multiracial" and "White") drawn from the US census[9].

Figure 4 :
Figure 4: Our conceptual workflow for anthropographic asset generation outlines multiple decisions and opportunities for iteration during data curation and model selection as well as image generation and refinement.Throughout, we highlight potential risks that designers may need to consider.

Figure 5 :
Figure5: An instantiation of our conceptual workflow using Google Colab Forms.These notebooks, which we developed alongside our model during our design explorations, scaffold designers through the process of generating anthropographic assets while drawing attention to risks and encouraging critical reflection at each stage.See supplementary material for complete notebooks.

Figure 6 :
Figure 6: Our diverse anthropographic re-creation of a statistic about women's access to parental leave from "The Economic Case for Abortion Rights" by Vox Media [15] with assets generated from the base Stable Diffusion model and Van Gogh Diffusion, a variant trained on Van Gogh paintings.

Figure 7 :
Figure 7: Our anthropographic visualization of the college gender gap in the US [36] created using 100 cartoonized individual anthropographic assets in the Disney character visual style, representing college students between the simulated ages of 17 to 25.The resulting assets manifest beauty biases and contain facial features that may not reflect the real population.

Figure 9 :
Figure 9: Experiments with generating variation in the Realistic Vision model.All images use the same race ("Black") and gender ("woman") and each column uses a different age.The second row adds occupation and ancestry modifiers from the US Census PUMS dataset, while the third row also adds randomized modifiers for hair and facial feature modifiers.Despite considerably different prompts, most additions result in relatively subtle changes.
[3] caution in their audit of the LAION-400M dataset-on which earlier variants of the Stable Diffusion model are trained-current training datasets are rife with harmful biases towards marginalized and underrepresented populations.Moreover, existing datasets lack the kinds of demographic labels and examples of representative styles that would be most useful in a visualization context.

Figure 10 :
Figure10: A snapshot from our third case study showing photo-realistic assets generated using the Realistic Vision variant with PUMS census data[33] representing individuals residing in the US state of Wyoming.We illustrate how a designer can use interactivity to explain the provenance of the anthropographic assets.Details include the input dataset and specific prompts used.