Reinforcement Learning-based Counter-Misinformation Response Generation: A Case Study of COVID-19 Vaccine Misinformation

The spread of online misinformation threatens public health, democracy, and the broader society. While professional fact-checkers form the first line of defense by fact-checking popular false claims, they do not engage directly in conversations with misinformation spreaders. On the other hand, non-expert ordinary users act as eyes-on-the-ground who proactively counter misinformation -- recent research has shown that 96% counter-misinformation responses are made by ordinary users. However, research also found that 2/3 times, these responses are rude and lack evidence. This work seeks to create a counter-misinformation response generation model to empower users to effectively correct misinformation. This objective is challenging due to the absence of datasets containing ground-truth of ideal counter-misinformation responses, and the lack of models that can generate responses backed by communication theories. In this work, we create two novel datasets of misinformation and counter-misinformation response pairs from in-the-wild social media and crowdsourcing from college-educated students. We annotate the collected data to distinguish poor from ideal responses that are factual, polite, and refute misinformation. We propose MisinfoCorrect, a reinforcement learning-based framework that learns to generate counter-misinformation responses for an input misinformation post. The model rewards the generator to increase the politeness, factuality, and refutation attitude while retaining text fluency and relevancy. Quantitative and qualitative evaluation shows that our model outperforms several baselines by generating high-quality counter-responses. This work illustrates the promise of generative text models for social good -- here, to help create a safe and reliable information ecosystem. The code and data is accessible on https://github.com/claws-lab/MisinfoCorrect.


INTRODUCTION
Online misinformation reduces trust in vaccines and health policies [7,46,68], leads to violence and harassment [6,84], questions democratic processes and elections [80][81][82], increases polarization [86], and harms well-being [97].Most people receive information and news from social media [108], which is often "ground-zero" for health misinformation and where misinformation spreads faster and farther than truth [46,101].COVID-19 vaccine misinformation, including false claims that the vaccine causes infertility, contains microchips and even changes DNA and genes has fueled vaccine hesitancy, reduced vaccine uptake, and prolonged the pandemic.Besides, misinformation also causes harms to people directly.For example, misinformation that Bill Gates creates vaccines to depopulate people led to distrust and verbal attacks [25].Thus, it is critical to curb the spread of online misinformation [13,28,47,50,58,112,120].In this work, we use a broad definition of misinformation which includes falsehoods, inaccuracies, rumors, decontextualized truths, or misleading leaps of logic [45,115].
Professional fact-checkers and journalists provide objective factchecks for viral claims and release their determination on their website, which are incredibly useful to create detection models.However, fact-checkers do not actively engage with misinformation spreaders on social media platforms [58].On the other hand, nonexpert social media users, i.e., ordinary users or crowd, act as eyeson-the-ground who proactively question and counter misinformation, including emerging misinformation [9,58,60,84,94,98,119].They complement fact checkers who can only verify a handful of stories after they have gone viral [3,42].Recent evidence shows that 96% of counter-misinformation responses are made by ordinary users, while professionals account for the rest [58].
Alarmingly, linguistic analyses of in-the-wild crowd-generated counter-responses revealed that 2 out of 3 counter-misinformation posts are rude and do not use fact-checking evidence to support their counter-response [58].Uncivil counter-responses can lead to reduced trust in the correcting user [23,93] and result in arguments [15,44,57].This implies an urgent need to empower crowds so that they counter misinformation more effectively.
Thus, in this work, we seek to facilitate healthy misinformation correction by the crowd, which includes being objective, evidenced, and polite -properties that have been shown to be effective [77,91].To do so, we propose to create a counter-misinformation response generator, which generates desirable counter-response for a misinformation post (as illustrated in Figure 1).Our study is focused on countering misinformation on Twitter, given its prominence in the spread of online misinformation.
Challenges.Generating effective counter-misinformation responses poses several challenges.First, there is no existing dataset containing pairs of annotated misinformation posts and counter responses.Second, there is no counter-misinformation response generator model.The closest research works in fact-checking generator [99] are non-conversational and related research in counterhate speech/counter-argument generator [5,16,92,121] do not apply directly since they are not evidence-based or not specific to misinformation.Third, counter-misinformation responses are effective if they have the following desirable properties: objective and evidenced [77,91], makes rational arguments [65], refutes fallacy in reasoning [87], and polite [55,85].Off-the-shelf text generator models do not directly generate counter-responses with this desiderata.Four, bot-generated or template-based responses are not effective since they are non-personalized and non-contextualized with respect to the false claims made in the misinformation post.Thus, the counter-response needs to be relevant to the misinformation post.
Present work.We propose to create two novel datasets containing misinformation and counter-responses (solution to challenge 1) -one collected from in-the-wild social media responses from Twitter and another created by crowd-sourcing from college students.We focus on four popular COVID-19 vaccine misinformation topics on Twitter (e.g., Bill Gates created vaccines to depopulate people [22,79], and vaccines can cause infertility [1,39], contain microchip [83], alter DNA [52,62]).To create the in-the-wild dataset, for each misinformation topic, we collect all the replies to misinformation tweets identified in prior research [34].We annotate associated replies to identify the responses that counter the tweet along with their textual attributes of refuting, politeness, and factuality.Finally, we have 754 misinformation tweet and countering response pairs.For the crowd-sourced response generation, we recruit and train 17 college students to write counter-misinformation replies when given misinformation posts.In total, we collect 591 crowdsourced replies.
Next, we propose a reinforcement learning-based framework, called MisinfoCorrect, that learns to generate counter-misinformation responses that are polite, evidenced, and refute misinformation (solutions to challenges 2 and 3).Specifically, this agent utilizes a policy network on a transformer-based language model adapted from GPT-2 [72].During training, we reward the generation that increases the politeness and refutation attitude.Additionally, we ensure text fluency and relevancy to the misinformation post by adding fluency and relevance rewards in the reinforcement learning framework (solution to challenge 4).
MisinfoCorrect is evaluated against five representative baselines on the task of counter-misinformation response generation.Quantitative and qualitative experiments show that it can outperform the baselines by generating high-quality counter-responses.
To summarize, our contributions are as follows: • We create two large novel and annotated datasets containing misinformation and counter-response pairs from social media (inthe-wild) and generated via crowd-sourcing (in-lab).Together, both datasets contain 1,345 counter-misinformation responses.
• We propose a reinforcement learning based counter-response generation framework, where the counter-response is especially rewarded for being polite, evidenced, and refuting misinformation.
• Results on actual COVID-19 vaccine misinformation conversations show that the proposed model outperforms existing representative baselines.

RELATED WORKS 2.1 Social Correction of Misinformation by Non-Expert Ordinary Users
Recent studies have shown remarkable effectiveness of social correction by non-expert users by conducting experiments via interviews [12,43,94], surveys [43,96], and in-lab experiments [94].This correction has been shown to be as effective as professional correction [77], curbs misinformation spread [17,24,113], and works across topics [8,9,11,[103][104][105], platforms and demographics [102,[105][106][107]. Notably, users' polite and evidenced responses that refute misinformation are shown to effectively counter misinformation and reduce the belief in misinformation [14,55,65,65,77,85,87,91]. Users correct others, typically friends [56], owing to a sense of social duty [24,30,61,66,96], anger, or guilt [88].These works provide considerable evidence that correction by ordinary users is effective when countering misinformation and in mitigating the spread of misinformation.On the other hand, considering the limited capability of professional fact-checkers, the large number of ordinary users and their efforts in social correction show great potential for a scalable solution to countering misinformation.

Analysis of Crowd-Generated Misinformation Flagging and Countering
Crowd-generated counter-misinformation complements fact-checking and correction by professionals -the latter has already been studied extensively [32,33,67,77,110,117].Emerging research has analyzed the role that non-experts play in flagging and countering misinformation.Twitter's Birdwatch [70] is a recently-launched platform that allows users to report and flag misinformation.Studies have analyzed the data from Twitter Birdwatch [4,21,60,70], which have shown how users actively engage to identify tweets that they believe are misleading and provide contextual notes to debunk them.Users have different levels of debunking capability.However, Birdwatch only allows users to flag misinformation and does not allow user-to-user communication and countering of misinformation on Twitter.Thus, user flagging behavior within the Birdwatch ecosystem is not representative of user behavior on the broader Twitter platform or on other social media platforms.Recent works by Micallef et al. [58,59] have analyzed how users counter misinformation in-the-wild on Twitter, Facebook, and Reddit.They showed that 96% of all counter misinformation posts on Twitter are made by "ordinary citizens" [58] and counter-misinformation behavior happens on multiple platforms [59].Existing works, however, have not studied how to empower the crowd to counter and correct misinformation by generating effective responses.

Fact-Check Generation Methods
The goal of fact-check generation methods [99,100] is to respond to misinformation with a fact-checking URL.However, we consider a broader task of counter-response generation where the response text has to be generated.Existing works [99,100] consider any post with a fact-checking URL to two websites (Snopes and Politifact) as a fact-checking response, which is an inaccurate assumptiona fact-checking URL can be present to ridicule or oppose the factcheck [59] and can be taken out of context [59].Importantly, only 1 out of 3 users use URL evidence when correcting misinformation [58] and YouTube is the most frequently used URL, instead of fact-checking URLs [59]; consequently, studies relying only on factchecking URLs are limited in their scope and do not learn from the majority of user-generated corrective posts.Our work overcomes these shortcomings by creating two novel datasets (Section 4), one using social media, including both URL and non-URL responses, and another using crowdsourced data collection.We further perform several manual annotation steps while creating the data to ensure only exact counter-responses are present in the data.

Counter-Hate and Counter-Argument Text Generation
Counter-hate [16,36,92,121] and counter-argument [5,37,40] text generation tasks are also related to our problem setting, where the generated text is aimed to refute the original post spreading hate and any generic argument, respectively.Some proposed models finetune large scale unsupervised language models on the hate-speech or argument text for text generation [75,92].Other models first generate a set of candidate counter-hate/counter-argument replies, and then select one based on the relevance to the original post in a generate-then-retrieve or identify-substitute manner [37,40,121] .Meanwhile, some related counter-hate/counter-argument datasets have also been released [40,71,74].However, it should be noted that compared to counter-misinformation response generation, the task of counter-hate generation does not necessitate responses to be evidence-based.Similarly, the counter-argument generation is a generic task (e.g., arguing whether immigration is good) and is not specific to misinformation.Additionally, large annotated and curated datasets exist for counter-hate and counter-argument [71,74], which is not the case for counter-misinformation generation.
To fill these gaps, we both curate two novel datasets and propose a counter-misinformation generator which can refute misinformation while being polite and providing evidence.

PROBLEM DEFINITION
Given a misinformation post , we aim to build a text generator  such that it can output counter-response ĉ = (), which has certain desirable properties P.
The desirable properties of ĉ are motivated by research works from social scientists, journalists and psychologists regarding misinformation correction, which shows that counter responses are effective if they have the following desirable properties: politeness [55,85], objective and evidenced [77,91], make rational arguments [65], convey the competence of the commenter [65], and refute fallacy in reasoning [14,87].More elaborately, the desirable properties include: • Refuting: the response explicitly refutes the the misinformation to correct the misinformation spreader.The expressed refutation via explicitly and objectively refuting misinformation in counter response can reduce misinformation's impact [91].• Evidence: the response contains supporting sentences to back up the refutation.Evidenced-based responses can more effectively debunk the misleading claims, and likely reduce the belief of misinformation poster [14].More importantly, people are more willing to agree with a countering response when it is evidence-based [14].• Politeness: the response is polite to avoid possible backfire.
When countering misinformation, uncivil responses can aggravate the misinformation poster, while it is more likely that the misinformation spreader favorably considers the true information when responses are polite [55,85].
Beyond these specific requirement in misinformation correction domain, other textual properties are also required in generated text: • Fluency: the generated text should be fluent in expression such that it is natural for people to read and understand.• Relevance: the response should be relevant to the misinformation post and ensure coherent expression.

COUNTER-RESPONSE DATASETS: IN-THE-WILD AND CROWDSOURCED
We create two novel counter-response datasets, first containing in-the-wild social media counter-responses and second containing crowdsourced in-lab counter-responses.

Misinformation Topics
We focus on COVID-19 vaccine misinformation due to its impact across the world.We mainly choose four popular misinformation topics to which a large number of users have been exposed and impacted [1,22,39,52,58,62,79,83].These misinformation topics gained popularity from December 2020 when the COVID-19 vaccines were approved by the FDA [79], in essence, • Bill Gates conspiracy theories [22,79]: This includes conspiracies claiming that Bill Gates created the COVID-19 vaccine to depopulate people or he holds the patents for COVID-19 vaccine to profit from the vaccine sales.
• COVID-19 vaccines alter DNA or the vaccine is gene therapy [52].To identify misinformation tweets, we first create a COVID-19 vaccine misinformation tweet classifier using BERT [20] based on tweet annotations provided by Hayawi et al. [34].This classifier has a performance in precision, recall and F1 scores of 0.972, 0.979 and 0.975, respectively.Then, we use this classifier to classify all remaining non-annotated tweets.Finally, we have 141,766 classified misinformation tweets.We crawl all their direct replies, resulting in 793, 828 replies.
To create a high-quality dataset, we manually annotate all the classified 1, 655 misinformation tweets by the textual content to remove false positives and only focus on original tweets (no retweets) and English-language content, as is common practice [38,58].

Annotating Counter-Misinformation
Replies and Training the Classifier.Naturally, not all responses to misinformation tweets counter it.Therefore, to develop a counter-response dataset, we create the following procedure.
Training a counter-response classifier: Since annotating all 11, 970 responses manually is labor-intensive, we leverage existing work by Jiang et al. [41] to create a belief versus disbelief classifier in social media responses.Specifically, following their pipeline, we create the classifier using RoBERTa [51] and train it on their annotated responses.Since the topics of the original data and trained classifier in Jiang et al. [41] are different from ours, we annotated additional responses.Specifically, two students annotated 500 randomly-selected responses from the unlabeled 11, 970 responses, resulting in an interrater agreement score of 0.7033 measured by percent agreement.This gave 244 responses expressing belief and 118 expressing disbelief, while the remaining were neither expressing belief or disbelief.We used these annotated responses to fine-tune the disbelief classifier to our data and topic.Conducting five-fold cross validation, the classification performances of the classifier per precision, recall and F-1 scores were 0.695, 0.687 and 0.691, respectively.Finally, we use the fine-tuned classifier to identify all potential disbelief replies among all the 11, 970 responses.This resulted in 2, 852 responses classified as disbelief or counter-response.Then, we manually verify all the classified responses through the textual content to remove all false positives.Finally, 754 true counter-responses are identified, which we use in our work.

Annotating Linguistic Properties of Counter-Responses.
Two students annotated 50 counter-responses as per the three desired properties [14,55,85]: • Refuting: is the response explicitly rejecting the false claim or the misinformation spreader?• Evidence: does the response contain evidence or supporting words or sentences to back up the counter-response?• Politeness: Is the reply rude, neutral, or polite like having a soft and friendly tone in the expression?
The measured inter-rater agreement score by percent agreement is 78%.Disagreements were discussed and a final label was given.Next, each annotator annotated the remaining counter-responses to assign final labels to them.
Finally, this results in 754 annotated (misinformation tweet, counter-response) pairs from 238 misinformation tweets.The distribution of the linguistic properties of counter-responses is shown in Table 1 As per the statistics, in-the-wild counter-responses are very low quality -38.19% responses are rude, 75.99% do not have evidence, and 22.02% do not explicitly refute the misinformation.This indicates they may not be effective.This further reinforces the critical and timely need for our research to develop an effective counterresponse generator.

Crowdsourced In-lab Counter-Misinformation Responses
The above statistics show that most in-the-wild responses are rude and lack evidence.As a result, it will be challenging to train an effective counter-response text generator model using this data alone.Out of these, 17 participants met the criteria of having least high-school education, being fluent in English, highly familiar with online misinformation, and having seen online debunking.
Second, each subject is provided written guidance about writing an effective counter-misinformation response governed by existing literature [14,14,55,85].Representative counter-misinformation examples are shown that are manually selected by the authors from the in-the-wild dataset (Section 4.2.3).Each subject is given up to 50 randomly-selected COVID-19 vaccine misinformation tweets (from the in-the-wild social media dataset) identified in Section 4.2.1.These tweets span all four misinformation topics (Section 4.1) to ensure diverse responses from different subjects.
After filtering out 90 written responses that do not satisfy any desirable properties (Section 3), we finally created a high-quality counter-misinformation response dataset containing 591 crowdgenerated responses.A representative example is shown below: Misinformation Post: It's not a vaccine, it's gene therapy.Gene therapy is an experimental technique.It's the same technology used in cloning, DNA editing, and stem cell research.In-the-wild Counter-response: You are born to speak nothing but lies.Crowdsourced Counter-response: Sorry to see you think in this way.It is not correct.The vaccine is not gene therapy.Instead, it uses mRNA to generate spike protein to protect people.Please do not say the misinformation again.

MISINFOCORRECT: A COUNTER-RESPONSE GENERATION MODEL
Here we describe our proposed counter-response generation model that leverages the two datasets to generate counter-responses for a given misinformation post.The generated counter-responses should have the desirable properties described in Section 3.

A Reinforcement Learning Framework
We choose a reinforcement learning-based approach due to its success in a variety of controllable text generation tasks [49,78].Moreover, we utilize reinforcement learning (RL) on top of a GPT-2 transformer-based text generation model since it is capable of generating quality example with limited number of examples derived from its strong generation power and is widely-used in text generation task [78].By this design, we can bias the text generation process such that the generated counter-response is of high quality.Figure 2 presents the overview of MisinfoCorrect.Below we describe the components of the RL agent:

State:
The misinformation post provides the conversational text.The RL agent takes the misinformation post as the input to enhance the quality of counter-response text so that the response is relevant to the misinformation claims.Formally, the state  ∈ S is the same as the content of the misinformation post , i.e.,  = .
Our policy uses a string containing  for representation, which is also widely used in BERT-like models [20].

Action:
Given state , the agent generates a candidate counter response ĉ.This generation action is represented as  lying in the whole action space A,  ∈ A, which is composed of all arbitrarylength sentences.We represent  as the text generator, and the action is  = ().

Policy:
The policy is based on the transformer language model with the task of masked multi-head self-attention layers on GPT-2 [72,95].The input is an encoded representation of the state  and output is the action .The generation task is framed as a language modeling problem where the goal is to generate ĉ that maximizes the conditional probability  ( ĉ |).When using transformer component of GPT-2, we first encode our input string "m".Then, after transforming the encoded representation as a vocabulary-sized vector using a softmax layer, we have a probability distribution over the entire vocabulary tokens.Next, top- sampling method is used with the probability distribution to sequentially output a sequence of tokens to form a sentence.When the sampling process selects a special end-of-sequence token, the generation process stops.This generates the candidate counter-response ĉ.

5.1.4
Reward: Research has shown that counter-misinformation responses are effective if they are polite, provide evidence, and explicitly refute the misinformation (Section 3).We design multiple novel reward functions to encourage the generated response to have these properties along with ensuring that the generated text is fluent, coherent, and relevant to the misinformation post.We describe the rewards below.
• Politeness Reward: Polite counter-responses are preferred (Section 3).We quantify the preference toward politeness as a politeness reward   and create a politeness classifier   using BERT [20] to measure politeness of text leveraging existing work [19].The classifier fine-tuned and tested in our data in Section 4 has a classification performance measured via precision, recall and F1 score of 0.8864, 0.9512, 0.8001.The politeness reward is formally computed as   =   ( ĉ).
• Refutation Reward: Counter-responses that explicitly refute the misinformation are more effective (Section 3).Thus, we define the refutation reward     to reward the actions that increase refutation of ĉ and penalize actions that decrease the refutation of ĉ.Following similar disbelief and polarity classification research works [2,41], we build the refutation classifier     using BERT [20] which measures whether the text expresses refutation.However, distinct from Jiang et al. [41], who only use the response text for classification, we use both the tweet and generated response as input.The reason is that the refutation relationship would be better predicted by capturing the relative stance between the tweet and its response.We quantify the refutation reward as     =     (, ĉ).In our experiments, the refutation classifier is first trained on the annotated data by Jiang et al. [41].Then, we finetuned and tested it on our data (Section 4), which finally achieves reasonable performance in precision, recall and F1 score with values of 0.7917, 0.8085, 0.7999, respectively.
• Evidence Reward: Responses containing evidence are more effective in countering misinformation [14].Thus, we seek to generate response that provides textual evidence.We do not seek to provide a fact-checking URL as evidence, since readers are unlikely to click and read an external article from social media platforms [26,27].
To effectively quantify the presence of evidence in responses, we consider the counter-response content where the response counters the misinformation post with supporting and relevant sentences.
We create an evidence classifier   to predict whether the response provides evidence that counters the misinformation post.The classifier is trained by combining two sets of evidenceproviding responses -first is the in-the-wild social media counterresponses that contain evidence (Section 4.2.3), and second is the subset of crowdsourced responses (Section 4.3) with evidenced responses.Finally, we create a balanced dataset of 573 evidencedresponses and 573 non-evidenced-responses to train the classifier.
We use BERT [20] as the classifier which takes both the post and response as inputs in a pair-wise setting [73] to measure the post-response pairwise relationship.After five-fold cross validation, the performance score of precision, recall and F1 score is 0.8864, 0.9512, 0.9176.The output of the classifier is the evidence reward,   , computed as   =   (, ĉ).
• Fluency Reward: The agent needs to ensure that the response is fluent and grammatically correct.Thus, we want to reward actions that generate fluent outputs and penalize ones that result in non-fluent responses.To achieve this goal, following the previous work [53], we design the fluency reward    which is the inverse of perplexity of the generated countering reply ĉ as , where   −2 is the GPT-2 language model for English and  is the number of words in ĉ.
• Coherence Reward: Given a misinformation post, the generated response should be relevant to the post.We design a coherence reward  ℎ which is computed via semantic similarity between  and ĉ as  ℎ = (, ĉ), where  measures the semantic similarity between two posts.In practice, we utilize the embedding from BERT model of the two text pieces [20] and compute their cosine similarity.Total reward: Finally, the total reward is as where , , , ,  are weights indicating the importance of rewards.

Optimization and Training
Warm-up start: We first use the pre-trained weights of DialoGPT [118] to initialize the weights in the transformer-based GPT-2 language model.Next, motivated by the warm-up approaches in reinforcement learning for dialogue generation by Li et al. [49], we use the warm-start strategy on the paired data of misinformation posts and countering replies.

Reward Increment Training for Reinforcement Learning:
To train the agent in the reinforcement learning framework, we take advantage of the existing reward increment training approach where the non-negative factor, offset reinforcement and characteristic eligibility are considered in the standard reinforcement learning setting [89].In our setting for simplicity, we consider the reward  from the generated post and the probability of generating this post given the misinformation post,  ( ĉ |).Finally, the loss function L is computed as L ( ) = − *  ( ĉ |), where  is the set of model parameters.We use  to facilitate computation.Meanwhile, we utilize the negative of the reward to deploy the conventional gradient descent approach in experiments.Adam is used as the optimizer for model training [29].

EXPERIMENTAL EVALUATION
We examine the performance of the proposed counter-misinformation response generation model.In particular, we focus on answering the following research questions: • RQ1: Can the proposed model generate counter-misinformation responses of high quality with the desirable properties (Section 3)? • RQ2: What is the impact of using in-the-wild data versus crowdsourced data on the generated text output?• RQ3: What is the contribution of each component of the proposed method?
• RQ4: Is the text generated good as evaluated by humans?

Baselines
We compare our model with representative dialog generation baselines and the work in fact-checking text generation: • Fact-checking Text Generation (FC-GEN) [99]: The fact-checking text generation model takes in the tweets and replies for generation using gated recurrent unit.2: Performance comparison of counter-response generators when trained on social media and crowdsourced responses.

Method
• DialoGPT [118]: A dialogue generation model built on GPT-2 framework and pre-trained on Reddit conversations.
• BART [48]: An large pre-trained language model framework for sequence-to-sequence text generation.
• Partner [78]: A reinforcement-learning-based text rewriting method to output text.

Evaluation Metrics
To quantitatively evaluate the performance of the model, we use several metrics to measure both the effectiveness of the counter response and the text quality as follows: • Politeness: We use the politeness classifier   to test the level of politeness expressed in generated responses (Section 5.1.4).
• Refutation: We use the trained refutation classifier     to measure refutation score, as defined in Section 5.1.4.
• Evidence: We use trained evidence classifier   (Section 5.1.4)to measure how much evidence the reply provides.
• Perplexity: Following previous research [18,53], we use pretrained GPT-2 language model to quantify perplexity to evaluate the expressed text fluency.
• Relevance: Following previous research [116], we compute the semantic similarity using BERT [20] to capture the coherence between posts and generated responses.

RQ1: Evaluation of the Proposed Model
We train all the models with counter-responses from both social media dataset (Section 4.2) and crowdsourced counter-responses (Section 4.3).Specifically, we create a "clean" social media dataset by only selecting counter-responses with at least one dimension among politeness, refutation, and evidence labeled as positive.This is because training with low-quality counter-responses will lead to poor generation results.In addition, we use all crowdsourced counter-responses as they are all manually-verified to be polite, refuting, and evidenced.
The results comparing the generation models are shown in Table 2.As can be seen, our proposed model generates the best counterresponses.When compared with baselines, our model has the highest politeness, refutation and evidence scores while still maintaining significantly lower perplexity and comparable relevance scores to ensure text of high quality.

RQ2: Impact of Dataset Quality
Here we examine the impact of the dataset quality on the quality of generated response.We train the model using only a "clean" social media responses (i.e., responses that are evidenced, refuting, neutral, or polite) and no crowdsourced counter-responses.The performance results are shown in Table 4. First, we observe that compared to Table 2, the quality of responses generated by each model degrades.This highlights the importance of collecting crowdsourced data, which is of higher quality compared to social media data.Second, we note that our proposed model still generates the best counter-responses as per all metrics, except in relevance, in which it performs the second best.

RQ3: Ablation Study
We examine the contribution of key components for effective counterresponse generation (i.e., politeness, refutation and evidence rewards) in MisinfoCorrect on social media and crowdsourced responses data.We compare the model variations when using RL: • Base MisinfoCorrect model (Base): this model is the basic GPT-2 model fine-tuned on our dataset in a dialog manner as DialoGPT [118], but without using any rewards for training.
• Base + politeness reward: we only consider the politeness reward • Base + refutation reward: we only consider the refutation reward • Base + evidence reward: we only consider the evidence reward.
• MisinfoCorrect model: this is the complete model with all the reward functions.
The results are shown in Table 5.When we only use the politeness, refutation or evidence reward function in the reinforcement learning framework, the corresponding politeness, refutation and evidence score is the highest and shows a significant increase compared to the Base model without any reward.When all the reward functions are combined in the MisinfoCorrect framework, there is a slight drop in each of the individual politeness, refutation, and evidence metrics, but it still has the second highest values along

RQ4: Qualitative Evaluation
Experimental Setup: In addition to the quantitative evaluation of response generation, we follow previous research works [35] and also conducted human evaluation experiments to qualitatively examine the model performance.In particular, we recruited 10 subjects following the same procedure described in the counter-response annotation process (Section 4.3).Each subject is presented 30 data points, where each data point consists of one misinformation post and two counter-responses, and then asked "which response is better when countering the misinformation post: the first, the second, or are they equally effective?".We test three settings: (1) the real counter-response versus the generated response by MisinfoCorrect; (2) the generated response by MisinfoCorrect versus the closest method, i.e., fact-checking generator (FC-GEN) [99]; (3) the generated response by MisinfoCorrect versus the most methodologically comparable baseline, i.e., DialoGPT [118].We do not inform the subjects which response is generated by which method.Within each setting, we randomly 50 data points for comparison, and each data point is annotated by two users.In the analysis of the results, we only summarize the data points on which the two users provide the same label, i.e., disagreement cases are discarded.In total, we received 300 data points in human evaluation for the three settings.Ethics: This protocol was approved by Georgia Tech's IRB.Results: We get the following result: (1) Real response versus MisinfoCorrect: In 46 out of 50 cases, both annotators provided the same answers.Among these, response generated by MisinfoCorrect were preferred in 76% cases, while in 6.5% cases, both responses were rated as equal.Real responses were preferred in the remaining cases.
From all three comparison results, we can see that responses generated by MisinfoCorrect are preferred over the responses generated by the competing methods and the real responses.One representative example in Table 3 also illustrates the difference between these models and real responses.Altogether, the qualitative results show the potential for MisinfoCorrect in a real application to empower users to counter misinformation.

DISCUSSION AND LIMITATIONS
Generalization across topics, languages, and entire conversations: While MisinfoCorrect only studied one topic (COVID-19 vaccine misinformation) on one platform (Twitter) and in one language (English), the proposed model is general.It can be adopted for other topics easily by providing topic-specific data and content from other platforms.Non-English or multi-lingual language models can be used to develop response generator beyond English.Additionally, our method only generates one direct response and does not generate entire conversation (which can be the future work).
Intended use of the model: The model can be made available via a web portal or an API, where a user can input a misinformation post and our model will generate one or more counter-responses.
Counter-response may lead to online arguments: One may wonder whether using the generated counter-responses can lead to online arguments.Our model is intended to encourage users who voluntarily and proactively already counter misinformation to do so in a polite and respectful manner -recall that 96% of all countermisinformation responses are already generated by ordinary users, even though 2 out of 3 times their responses are rude and abusive.Since our model generates polite responses, it has lower chance of leading to online fights.
Limitation of evaluation based on machine evaluation: The evaluation replying on classifiers can have limits and are faulty.This may lead to the inaccurate comparison results between models.More human evaluations are needed for a comprehensive comparison.

CONCLUSION
Overall, this work shows the potential to build on the recent advancements in generative text models to use them for social good applications.In this work, we extended these models for countermisinformation response generation.Our proposed model showed promise by generating responses that were qualitatively and quantitatively better than real responses and other generated responses.
The future work lies in three directions: (i) deploying and evaluating the model in practice, (ii) collecting data from professional fact-checkers as expert-generated counter-responses and compare the model performance against the current setup, and (iii) developing multi-lingual and multi-modal model to generate visual counterresponses.

A APPENDIX A.1 Data Annotation and Collection
The brief guide to writing counter-misinformation responses is: Application Setting: On Twitter, when someone writes a misinformation tweet, we would like to write a reply to counter the misinformation such that we can mitigate the spread of misinformation.Guidance: Please write a response like you would try to engage or counter the misinformation.When writing replies, you may want to consider the following: i.You may want to refute the tweet or express disagreement towards the tweet; ii.You may want to include supporting sentences, reasons, or evidence to make the reply reliable; iii.You may want to be polite, avoid confrontation, or avoid any impolite or rude expressions in the response.One Example: Tweet: "The Biden "vaccine passport" is here.You either get a non-FDA approved experimental gene modification therapy (euphemistically called a "vaccine") or you'll be denied access to public transportation, sports venues, air travel and more.Obey or stay home."; Reply: "To correct you: it is not a gene modification therapy, there is no proof of this nor a scientific rationale, mRNA does not integrate in the human genome.I am for freedom of choice and against vaccine passports but let's stick to the facts" One Tweet You Write A Response To: Tweet: "It's not a vaccine, it's gene therapy.Gene therapy is an experimental technique.It's the same technology used in cloning, DNA editing, and stem cell research." Your Response: "Sorry to see you think in this way.It is not correct.The vaccine is not gene therapy.Instead, it uses mRNA to generate spike protein to protect people.Please do not say the misinformation again."

A.2 Experiment Details
Some experiment details are included here: • During our experiments, all methods are fine-tuned or trained from scratch on the annotated tweet-response pairs from the in-the-wild and crowdsourced datasets from Section 4.

Figure 1 :
Figure 1: An overview of counter-misinformation response generation task.

Figure 2 :
Figure 2: The overview of the MisinfoCorrect framework.
[71,78,92]reate an alternate dataset via crowdsourcing.Motivated by similar text generation for healthy and social good online communication[71,78,92], we recruit users familiar with Twitter to generate counter-misinformation responses that have the desired properties mentioned earlier in Section 3. Ethics: This protocol was approved by Georgia Tech's IRB.Procedure: We use the following three-step process:First, we recruited 20 college undergraduate and graduate students majoring in engineering domains in March 2022.During the screening, subjects provided background information including: (1) Highest education level: high-school, bachelors, masters, or doctorate; (2) Fluency in English: basic, intermediate, advanced (fluent or native speaker); (3) Familiarity with the concept of online misinformation on Twitter: not familiar, somewhat familiar, highly familiar; and (4) Witnessed countering misinformation online: yes or no.
Polite.↑ Refut.↑ Evid.↑ Perpl.↓ Rele.↑ Table 3 illustrates responses generated by the proposed model and other baselines.As we can see, compared to other methods, MisinfoCorrect can generate text of desirable properties.It's not a vaccine, it's gene therapy.Gene therapy is an experimental technique.It's the same technology used in cloning, DNA editing, and stem cell research.MisinfoCorrect (proposed): This is not true.And, the vaccine is not gene therapy.It has nothing to do with cloning or DNA, and only uses mRNA for immunization goal.Please stop this misinformation.DialoGPT: This is so unbelievably wrong.It is not gene therapy.The vaccine does not change DNA.FC-GEN: It is misinformation.The vaccine is not gene therapy not gene therapy.

Table 3 :
Examples of generated counter-responses by the proposed and baseline methods.

Table 4 :
Performance comparison of counter-response generators when trained on social media responses only.