1 Introduction
Large language models like GPT-3 [
21,
97,
103] are increasingly becoming part of human communication. Enabled by developments in computer hardware and software architecture [
97], large language models produce human-like language [
56] by iteratively predicting likely next words based on the sequence of preceding words. Applications like writing assistants [
38], grammar support [
66], and machine translation [
45] inject the models’ output into what people write and read [
51].
Using large language models in our daily communication may change how we form opinions and influence each other. In conventional forms of persuasion, a persuader crafts a compelling message and delivers it to recipients – either face-to-face or mediated through contemporary technology [
94]. More recently, user researchers and behavioral economists have shown that technical choice architectures, such as the order of options presented affect people’s behavior as well [
42,
72]. With the emergence of large language models that produce human-like language [
25,
56], interactions with technology may influence not only behavior but also opinions: when language models produce some views more often than others, they may persuade their users. We call this new paradigm of influence
latent persuasion by language models, illustrated in Figure
1.
Latent persuasion by language models extends the insight that choice defaults affect people’s behavior [
42,
72] to the field of language and persuasion. Where
nudges change behavior by making some choices more convenient than others, AI language technologies may shift opinions by making it easy to express certain views but not others. Such influence could be
latent and hard to pinpoint: choice architectures are visible, but opinion preferences built into language models may be opaque to users, policymakers, and even system developers. While in traditional persuasion, a central designer intentionally creates a message to convince a specific audience, a language model may be opinionated by accident and its opinions may vary by user, product and context.
Prior research on the risks of generative language models has focused on conventional persuasion scenarios, where a human persuader uses language models to automate and optimize the production of content for advertising [
39,
61] or misinformation [
25,
67,
106]. Initial audits also highlight that language models reproduce stereotypes and biases [
23,
54,
83] and support certain cultural values more than others [
57]. While emerging research on co-writing with large language models suggests that models become increasingly active partners in people’s writing [
70,
104,
105], little is known about how the opinions produced by language models affect users’ views. Work by Arnold et al. [
3] and Bhat et al. [
16,
17] shows that a biased writing assistant may affect movie or restaurant reviews, but whether co-writing with large language models affect users’ opinions on public issues remains an open and urgent question.
This study investigates whether large language models that generate certain opinions more often than others affect what their users write and think. In an online experiment (N=1,506), participants wrote a short statement discussing whether social media is good or bad for society. Treatment group participants were shown suggested text generated by a large language model. The model, GPT-3 [
103], was configured to either generate text that argued that social media is good for society or text that argued the opposite. Following the writing task, we asked participants to assess social media’s societal impact in a survey. A separate sample of human judges (N=500) evaluated the opinions expressed in participants’ writing.
Our quantitative analysis tests whether the interactions with the opinionated language model shifted participants’ writing and survey opinions. We explore how this opinion shift may have occurred in secondary analyses. We find that both participants’ writing and their attitude towards social media in the survey were considerably affected by the model’s preferred opinion. We conclude by discussing how researchers, AI practitioners, and policymakers can respond to the possibility of latent persuasion by AI language technologies.
3 Methods
To investigate whether interacting with opinionated language models shifts people’s writing and affects people’s views, we conducted an online experiment asking participants (N=1,506) to respond to a social media post in a simulated online discussion using a writing assistant. The language model powering this writing assistant was configured to generate text supporting one or the other side of the argument. We compared the essays and opinions of participants to a control group that wrote their social media posts without writing assistance.
3.1 Experiment design
To study interactions between model opinion and participant opinion in a possibly realistic and relevant setting, we created the scenario of an opinionated discussion on social media platforms like Reddit. Such discussions have a large readership [
79], pertain to political controversies, and are plausible application settings for writing assistants and language models. We searched ProCon.org
1, an online resource for research on controversial issues, to identify a discussion topic. We selected “Is Social Media Good for Society?” as a discussion topic. We chose this topic because it is an easily accessible discussion topic that is politically relevant but not considered so controversial that entrenched views may limit constructive debate.
To run the experiment, we created a custom experimental platform combining a mock-up of a social media discussion page, a rich-text editor, and a writing assistant. The assistant was powered by a language generation server and included comprehensive logging tools. To provide a realistic-looking social media mock-up, we copied the design of a Reddit discussion page and drafted a question based on the ProCon.org discussion topic. Figure
2 shows a screenshot of the experiment. We asked participants to write at least five sentences expressing their take on social media’s societal impact. We randomly assigned participants to three different treatment groups:
(1)
Control group: participants wrote their answers without a writing assistant.
(2)
Techno-optimist language model treatment: participants were shown suggestions from a language model configured to argue that social media is good for society.
(3)
Techno-pessimist language model treatment: participants received suggestions from a language model configured to argue that social media is bad for society.
3.2 Building the writing assistant
Similar to Google’s
Smart Compose [
29] and Microsoft’s predictive text in Outlook, the writing assistant in the treatment groups suggested possible continuations (sometimes called “completions”) to text that participants had entered. We integrated the suggestions into a customized version of the rich-text editor Quill.js
2. The client sent a generation request to the server whenever a participant paused their writing for a certain amount of time (750ms). Including round-trip and generation time, a suggestion appeared on participants’ screens about 1.5 seconds after they paused their writing.
When the editor client received a text suggestion from the server, it revealed the suggestion letter by letter with random delays calibrated to resemble a co-writing process (cf. [
71]). Once the end of a suggested sentence was reached, the editor would pause and request from the server an extended generation until at least two sentences had been suggested. Participants could accept each suggested word by pressing the tab key or clicking an accept button on the interface. In addition, they could reset the generation, requesting a new suggestion by pressing a button or key.
We hosted the required cloud functions, files, and interaction logs on Google’s Firebase platform.
3.3 Configuring an opinionated language model
In this study, we experimented with language models that
strongly favored one view over another. We chose a strong manipulation as we wanted to explore the
potential of language models to affect users’ opinions and understand whether they could be used or abused to shift people’s views [
8].
We used GPT-3 [
23] with manually designed prompts to generate text suggestions for the experiment in real-time. Specifically, we accessed OpenAI’s most potent 175B parameter model (“text-davinci-002”). We used temperature sampling, a method for choosing a specific next token from the set of likely next tokens inspired by statistical thermodynamics. We set the sampling temperature (randomness parameter) to 0.85 to generate suggestions that are varied and creative. We set the frequency and presence penalty parameters to 1 to reduce the chance that the model suggestions would become repetitive. We also prevented the model from producing new lines, placeholders, and list by setting logit bias parameters that reduced the likelihood of the respective tokens being selected.
We evaluated different techniques to create an opinionated model, i.e., a model that
likely supports a certain side of the debate when generating a suggestion. We used prompt design [
73], a technique for guiding frozen language models to perform a specific task. Rather than updating the weights of the underlying model, we concatenated an engineered prompt to the input text to increase the chance that the model generates a certain opinion. Specificially we inserted the prefix
"Is social media good for society? Explain why social media is good/bad for society:" before participants’ written texts when generating continuation suggestions. The engineered prompt was not visible to participants in their editor UI; it was inserted in the backend before generation and removed from the generated text before showing it to participants.
Initial experimentation and validation suggested that the prompt produced the desired opinion in the generated text, but when participants strongly argued for another opinion in their writing, the model’s continuations would follow their opinion. In addition to the prefix prompt, we thus developed an infix prompt that would be inserted throughout participants’ writing to reinforce the desired opinion. We inserted the snippet (
"One sentence continuing the essay explaining why social media is good/bad:") right before the last sentence that participants had written. This additional prompt guided the model’s continuation towards the target opinion even if participants had articulated a different opinion earlier in their writing. Validation of the model opinion configuration is provided in section
4.5. We also experimented with fine-tuning [
53] to guide the models’ opinion, but the fine-tuned models did not consistently produce the intended opinion.
3.4 Outcome measures and covariates
We collected different types of outcome measures to investigate interactions between participants’ opinions and the model opinion:
Opinion expressed in the post: To evaluate expressed opinion, we split participants’ written texts into sentences and asked crowd workers to evaluate the opinion expressed in each sentence. Each crowd worker assessed 25 sentences, indicating whether each argued that social media is good for society, bad, or both good and bad. A fourth label was offered for sentences that argued neither or were unrelated. For example, "Social media also promotes cyber bullying which has led to an increase in suicides" (P#421) was labeled as arguing that social media is bad for society, while "Social media also helps to create a sense of community" (P#1169) was labeled as social media is good for society. We collected one to two labels for each sentence participants wrote and collected labels for a sample of the writing assistant’s suggestions. In sentences where we collected multiple labels, the labels provided by different raters agreed 84.1% of the time (Cohen’s κ = 0.76).
Real-time writing interaction data: We gathered comprehensive interaction logs at the key-stroke level of how participants interacted with the model’s suggestions. We recorded which text the participant had written, what text the model had suggested, and what suggestions participants had accepted from the writing assistant. We measured how long they paused to consider suggestions and how many suggestions they accepted.
Opinion survey (post-task): After finishing the writing task, participants completed an opinion survey. The central question, “Overall, would you say social media is good for society?” was designed to assess shifts in participants’ attitude. This question was not shown immediately after the writing task to reduce demand effects. Secondary questions were asked to understand participants’ opinions in more detail: “How does social media affect your relationships with friends and family?”, “Does social media usage lead to mental health problems or addiction?”, “Does social media contribute to the spread of false information and hate?”, “Do you support or oppose government regulation of social media companies?” The questions were partially adapted from Morning Consults’ National Tracking Poll [
34]; answers were given on typical 3- and 5-point Likert scales.
User experience survey (post-task): Participants in the treatment groups completed a survey about their experience with the writing assistant following the opinion survey. They were asked, “How useful was the writing assistant to you?”, whether “The writing assistant understood what you wanted to say” and whether “The writing assistant was knowledgeable and had expertise.” To explore participants’ awareness of the writing assistant’s opinion and its effect on their own views, we asked them whether “The writing assistant’s suggestions were reasonable and balanced” and whether “The writing assistant inspired or changed my thinking and argument.” Answers were given on a 5-point Likert scale from “strongly agree” to “strongly disagree.” An open-ended question asked participants what they found most useful or frustrating about the writing assistant.
Covariates: We asked participants to self-report their age, gender, political leaning, and their highest level of education at the end of the study. We also constructed a “model alignment” covariate estimating whether the opinion the model supported was aligned with the participant’s opinion. We did not ask participants to report their overall judgment before the writing task to avoid commitment effects. Instead, we asked them at the end of the study whether they believed social media was good for society before participating in the discussion. While imperfect, this provides a proxy for participants’ pre-task opinions. It is biased by the treatment effect observed on this covariate, which amounts to 14% of its standard deviation.
3.5 Participant recruitment
We recruited 1,506 participants (post-exclusion) for the writing task, corresponding to 507, 508, and 491 individuals in the control, techno-optimist, and techno-pessimist treatment groups, respectively. The sample size was calculated based on effect sizes observed in the pilot studies’ post-task question, "Overall, would you say social media is good for society?" at a power of 80%. The sample was recruited through Prolific [
84]. The sample included US-based participants at least 18 years old (M= 37.7, SD= 14.2); 48.5% self-identified as female, and 48.6% identified as male. 38 participants identified as non-binary and eight preferred to self-describe or not disclose their gender identity. Six out of ten indicated liberal leanings; 57.1% had received at least a Bachelor’s degree. Participants who failed the pre-task attention check (8%) were excluded. Six percent of participants admitted to the task did not finish it. We paid participants $1.50 for an average task time of 5.9 minutes based on an hourly compensation rate of $15. For the labeling task, we recruited a similar sample of 500 participants through Prolific. The experimental protocols were approved by the Cornell University Institutional Review Board.
3.6 Data sharing
The experiment materials, analysis code and data collected are publicly available through an Open Science repository (
https://osf.io/upgqw/). A research assistant screened the data, and records with potentially privacy-sensitive information were removed before publication.
5 Discussion
The findings show that opinionated AI language technologies can affect what users write and think. In our study, participants assisted by an opinionated language model were more likely to support the model’s opinion in a simulated social media post than control group participants who did not interact with a language model. Even participants who took five minutes to write their post – ample time to write the five required sentences – were significantly affected by the model’s preferred view, showing that conveniently accepted suggestions do not explain the model’s influence. Most importantly, the interactions with the opinionated model also led to opinion differences in a later attitude survey. The opinion shifts in the survey suggest that the differences in written opinion were associated with a shift in personal attitudes. We attribute the shifts in written opinion and post-task attitude to a new form of technology-mediated influence that we call latent persuasion by language models.
5.1 Theoretical interpretation
The literature on social influence and persuasion [
92] provides ample evidence that our thoughts, feelings, and attitudes shift due to interaction with others. Our results demonstrate that co-writing with an opinionated language model similarly shifted people’s writing and attitudes. We discuss below how
latent persuasion by AI language technologies extends and differs from traditional social influence and conventional forms of technology-mediated persuasion [
94]. We consider how the model’s influence can be explained by discussing two possible vectors of influence inspired by social influence theory [
92]–informative and normative persuasion– and a third vector of influence extending the nudge paradigm [
42,
72] to the realm of opinions.
5.1.1 Informational influence.
The language model may have influenced participants’ opinions by providing new information or compelling arguments, that is, through
informational influence [
81]. Some of the suggestions the language model provided may have made participants think about benefits or drawbacks of social media that they would not have considered otherwise, thus influencing their thinking. While the language model may have provided new information to writers in some cases, our secondary findings indicate that
informational influence may not fully explain the observed shifts in opinion. First, the model influenced participants consistently throughout the writing process. Had the language models influenced participants’ views through convincing arguments, one would expect a gradual or incremental change of opinion, as has been observed for human co-writers [
63]. Further, our participants were largely unaware of the language model’s skewed opinion and influence. The lack of awareness of the models’ influence supports the idea that the model’s influence was not only through conscious processing of new information but also through the subconscious [
88] and intuitive processes [
58].
5.1.2 Normative influence.
The language model may have shifted participants’ views through
normative influence [
81]. Under normative influence, people adapt their opinions and behaviors based on a desire to fulfill others’ expectations and gain acceptance. This explanation aligns with the
computers are social actors paradigm [
82], where the writing assistant may have been perceived as an independent social actor. People may have felt the need to reciprocate the language model, applying the social heuristics they apply in interactions with other humans. The
normative influence explanation is supported by the finding that participants in our experiment attributed a high degree of expertise to the assistant (see Figure
8). The wider literature similarly suggests that people may regard AI systems as authoritative sources [
2,
60,
76]. However, our experimental design presented the language model as a support tool and did not personify the assistant. An ad-hoc analysis of participants’ comments on the assistant suggested that they did not feel obliged to reciprocate or comply with the models’ suggestions, indicating that the strength of normative influence may have been limited.
5.1.3 Behavioral influence.
Large language models may affect people’s views by changing behaviors related to opinion formation. The suggestions may have interrupted participants’ thought processes and driven them to spend time evaluating the suggested argument [
17,
27]. Similar to
nudges, the suggestions changed participants’ behavior, prompting participants to consider the models’ view and even accept it in their writing. According to self-perception theory [
13], such changes in behavior may lead to changes in opinion. People who do not have strongly formed attitudes may infer their opinion from their own behavior. Even participants with pre-formed opinions on the topic may have changed their attitudes by being encouraged to communicate a belief that runs counter to their own belief [
12,
99]. The finding that the model strongly influenced participants who accepted the models’ suggestions frequently corroborates that some of the opinion influence has been through behavioral routes. The
behavioral influence route implies that the user interface and interaction design of AI language systems mediate the model’s influence as they determine when, where, and how the generated opinions are presented.
We conclude that further research will be required to identify the mechanisms behind latent persuasion by language models. Our secondary findings suggest that the influence was at least partly subconscious and not simply due to the convenience and new information that the language model provided. Rather, co-writing with the language model may have changed participants’ opinion formation process on a behavioral level.
5.2 Implications for research and industry
Our results caution that interactions with opinionated language models affect users’ opinions, even if unintended. The results also show how simple it is to make models highly opinionated using accessible methods like prompt engineering. How can researchers, AI practitioners, and policymakers respond to this finding? We believe that our results imply that we must be more careful about the opinions we build into AI language technologies like GPT-3.
Prior work on the societal risks of large language models has warned that models learn stereotypes and biases from their training data [
14,
28,
44] that may be amplified through widespread deployments [
19]. Our work highlights the possibility that large language models reinforce not only stereotypes but all kinds of opinions – from whether social media is good to whether people should be vegetarians and who should be the next president. Initial tools have been developed for monitoring and mitigating generated text that is discriminating [
23,
54,
83] or otherwise offensive [
7]. We have no comparable tools for monitoring the opinions built into large language models and in the text they generate during use. A first exploration of the opinions built into GTP-3 by Johnson et al. [
57] suggests that the model’s preferred views align with dominant US public opinion. In addition, a version of GPT trained on 4chan data led to controversy about the ideologies that training data should not contain. We need theoretical advancements and a broader democratic discourse on what kind of opinions a well-designed model should ideally generate.
Beyond unintentional opinion shifts through carelessly calibrated models, our results raise concerns about new forms of targeted opinion influence. If large language models affect users’ opinions, their influence could be used for beneficial social interventions, like reducing polarization in hostile debates or countering harmful false beliefs. However, the persuasive power of AI language technology may also be leveraged by commercial and political interest groups to amplify views of their choice, such as a favorable assessment of a policy or product. In our experiment, we have explored the scenario of influence through a language-model-based writing assistant in an online discussion, but opinionated language models could be embedded in other applications like predictive keyboards, smart replies, and voice assistants. Like search engine and social media network operators [
65], operators of these applications may choose to monetize the persuasive power of their technology.
As researchers, we can advance an early understanding of the mechanisms and dangers of latent persuasion through AI language technologies. Studies that investigate how latent persuasion differs from other sorts of influence, how it is mediated by design factors and users’ traits, and engineering work on how to measure and guide model opinions can support product teams in reducing the risk of misuse and legislators in drafting policies that preempt harmful forms of latent persuasion.
5.3 Limitations and generalizability
As appropriate for an early study, our experiment has several limitations: We only tested whether a language model affected participants’ views on a single topic. We chose this topic as people had mixed views on it and were willing to deliberate. Whether our findings generalize to other topics, particularly where people hold strong entrenched opinions, needs to be explored in future studies. Further, we only looked at one specific implementation of a writing assistant powered by GPT-3. Interacting with different language models through other applications, such as a predictive keyboard that only suggests single words or an email assistant that handles entire correspondences, may lead to different influence outcomes.
Our results provide initial evidence that language models in writing assistance tasks affect users’ views. How large is this influence compared to other types of influence, and to what extent effects persist over time, will need to be explored in future studies. For this first experiment, we created a strongly opinionated model. In most cases, model opinions in deployed applications will be less definite than in our study and subject to chance variation. However, our design also underestimates the opinion shifts that even weakly opinionated models could cause: In the experiment, participants only interacted with the model once. In contrast, people will regularly interact with deployed models over an extended period. Further, in real-world settings, people will not interact with models individually, but millions will interact with the same model, and what they write with the model will be read by others. Finally, when language models insert their preferred views into people’s writing, they increase the prevalence of their opinion in future training data, leading to even more opinionated future models.
5.4 Ethical considerations
The harm participants incurred through interacting with the writing assistant in our study was minimal. The opinion shift was likely transient, inconsequential, and not greater than shifts ordinarily encountered in advertising on the web and TV. Yet, given the weight of our research findings, we decided to share our results with all participants in a late educational debrief: In a private message, we invited crowdworkers who had participated in the experiment and pilot studies to a follow-up task explaining our findings. We reminded participants of the experiment, explained the experimental design, and presented our results in understandable language. We also provided them with a link to a website with a nonpartisan overview of the pros and cons of social media and asked them whether they had comments about the research. 1,469 participants completed the educational debrief in a median time of 109 seconds, for which they received a bonus payment of $0.50. We asked participants for open-ended feedback on our experiment so they could voice potential concerns. 839 participants provided open-ended comments on our experiment and results. Their feedback was exceptionally positive and is included in the Open Science Repository.
Considering the broader ethical implications of our results, we are concerned about misuse. On the one hand, we have shown how simple it is to create highly opinionated models. Our results might motivate some to develop technologies that exploit the persuasive power of AI language technology. In disclosing a new vector of influence, we face ethical tensions similar to cybersecurity researchers: On the one hand, publicizing a new vector of influence increases the chance that someone will exploit it; on the other hand, only through public awareness and discourse effective preventive measures can be taken at the policy and development level. While risky, decisions to share vulnerabilities have led to positive developments in computer safety [
77]. We hope our results will contribute to an informed debate and early mitigation of the risks of opinionated AI language technologies.