skip to main content
10.1145/2872427.2883085acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Public Access

Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes

Published: 11 April 2016 Publication History

Abstract

Wikipedia is a major source of information for many people. However, false information on Wikipedia raises concerns about its credibility. One way in which false information may be presented on Wikipedia is in the form of hoax articles, i.e., articles containing fabricated facts about nonexistent entities or events. In this paper we study false information on Wikipedia by focusing on the hoax articles that have been created throughout its history. We make several contributions. First, we assess the real-world impact of hoax articles by measuring how long they survive before being debunked, how many pageviews they receive, and how heavily they are referred to by documents on the Web. We find that, while most hoaxes are detected quickly and have little impact on Wikipedia, a small number of hoaxes survive long and are well cited across the Web. Second, we characterize the nature of successful hoaxes by comparing them to legitimate articles and to failed hoaxes that were discovered shortly after being created. We find characteristic differences in terms of article structure and content, embeddedness into the rest of Wikipedia, and features of the editor who created the hoax. Third, we successfully apply our findings to address a series of classification tasks, most notably to determine whether a given article is a hoax. And finally, we describe and evaluate a task involving humans distinguishing hoaxes from non-hoaxes. We find that humans are not particularly good at the task and that our automated classifier outperforms them by a big margin.

References

[1]
D. Acemoglua, A. Ozdaglar, and A. ParandehGheibi. Spread of (mis)information in social networks. Games and Economic Behavior, 70(2):194--227, 2010.
[2]
B. T. Adler and L. De Alfaro. A content-driven reputation system for the wikipedia. In WWW, 2007.
[3]
J. E. Blumenstock. Size matters: word count as a measure of quality on wikipedia. In WWW, 2008.
[4]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.
[5]
C. Budak, D. Agrawal, and A. El Abbadi. Limiting the spread of misinformation in social networks. In WWW, 2011.
[6]
C. Castillo, M. Mendoza, and B. Poblete. Information credibility on twitter. In WWW, 2011.
[7]
X. Chen, S.-C. J. Sin, Y.-L. Theng, and C. S. Lee. Why do social media users share misinformation? In JCDL, 2015.
[8]
G. De la Calzada and A. Dekhtyar. On measuring the quality of wikipedia articles. In WICOW, 2010.
[9]
M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, H. E. Stanley, and W. Quattrociocchi. The spreading of misinformation online. PNAS, 2016.
[10]
R. DeNardo. The Captain's Propensity: The Andromeda Incident II. Strategic Book Publishing, 2013.
[11]
D. Fallis. A functional analysis of disinformation. iConference, 2014.
[12]
B. Fogg, J. Marshall, O. Laraki, A. Osipovich, C. Varma, N. Fang, J. Paul, A. Rangnekar, J. Shon, P. Swani, et al. What makes web sites credible?: a report on a large quantitative study. In SIGCHI, 2001.
[13]
B. Fogg, C. Soohoo, D. R. Danielson, L. Marable, J. Stanford, and E. R. Tauber. How do users evaluate the credibility of web sites?: a study with over 2,500 participants. In DUX, 2003.
[14]
H. Frankfurt. On bullshit. Raritan Quarterly Review, 6(2):81--100, 1986.
[15]
A. Friggeri, L. A. Adamic, D. Eckles, and J. Cheng. Rumor cascades. In ICWSM, 2014.
[16]
A. Gupta, H. Lamba, P. Kumaraguru, and A. Joshi. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In WWW companion, 2013.
[17]
P. Hernon. Disinformation and misinformation through the internet: Findings of an exploratory study. Government Information Quarterly, 12(2):133--139, 1995.
[18]
M. Hu, E.-P. Lim, A. Sun, H. W. Lauw, and B.-Q. Vuong. Measuring article quality in wikipedia: models and evaluation. In CIKM, 2007.
[19]
H. Keshavarz. How credible is information on the web: Reflections on misinformation and disinformation. Infopreneurship Journal, 1(2):1--17, 2014.
[20]
N. Khomami. Woman dies after taking 'diet pills' bought over internet. Website, 2015. http://www.theguardian.com/society/2015/apr/21/woman-dies-after-taking-%diet-pills-bought-over-internet (accessed Oct. 16, 2015).
[21]
S. Kwon, M. Cha, K. Jung, W. Chen, and Y. Wang. Prominent features of rumor propagation in online social media. In ICDM, 2013.
[22]
T. Lavergne, T. Urvoy, and F. Yvon. Detecting fake content with relative entropy scoring. In PAN, 2008.
[23]
E.-P. Lim, B.-Q. Vuong, H. W. Lauw, and A. Sun. Measuring qualities of articles contributed by online communities. In WI, 2006.
[24]
M. McCormick. Atheism and the Case Against Christ. Prometheus Books, 2012.
[25]
M. J. Metzger. Making sense of credibility on the web: Models for evaluating online information and recommendations for future research. JASIST, 58(13):2078--2091, 2007.
[26]
D. Mocanu, L. Rossi, Q. Zhang, M. Karsai, and W. Quattrociocchi. Collective attention in the age of (mis)information. Computers in Human Behavior, 2015.
[27]
K. Morris. After a half-decade, massive Wikipedia hoax finally exposed. Website, 2013. http://www.dailydot.com/news/wikipedia-bicholim-conflict-hoax-deleted (accessed Oct. 16, 2015).
[28]
M. R. Morris, S. Counts, A. Roseway, A. Hoff, and J. Schwarz. Tweeting is believing?: understanding microblog credibility perceptions. In CSCW, 2012.
[29]
N. P. Nguyen, G. Yan, M. T. Thai, and S. Eidenbenz. Containment of misinformation spread in online social networks. In WebSci, 2012.
[30]
V. Qazvinian, E. Rosengren, D. R. Radev, and Q. Mei. Rumor has it: Identifying misinformation in microblogs. In EMNLP, 2011.
[31]
P. R. Rosenbaum and D. B. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41--55, 1983.
[32]
M. Tambuscio, G. Ruffo, A. Flammini, and F. Menczer. Fact-checking effect on viral hoaxes: A model of misinformation spread in social networks. In WWW companion, 2015.
[33]
J. Wales. How frequent are Wikipedia hoaxes like the "Bicholim Conflict"? Website, 2013. https://www.quora.com/How-frequent-are-Wikipedia-hoaxes-like-the-Bichol%im-Conflict (accessed Oct. 16, 2015).
[34]
C. N. Wathen and J. Burkell. Believe it or not: Factors influencing credibility on the web. JASIST, 2002.
[35]
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393(6684):440--442, 1998.
[36]
Wikimedia Foundation. Page view statistics for Wikimedia projects. Website, 2015. https://dumps.wikimedia.org/other/pagecounts-raw (accessed Oct. 16, 2015).
[37]
Wikipedia. Alan Mcilwraith. Website, 2015. https://en.wikipedia.org/w/index.php?title=Alan_Mcilwraith&oldid=682760%877 (accessed Oct. 16, 2015).
[38]
Wikipedia. Balboa Creole French. Website, 2015. https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:List_of_hoaxe%s_on_Wikipedia/Balboa_Creole_French&oldid=570091609 (accessed Oct. 16, 2015).
[39]
Wikipedia. Do not create hoaxes. Website, 2015. https://en.wikipedia.org/w/index.php?title=Wikipedia:Do_not_create_hoax%es&oldid=684241383 (accessed Oct. 16, 2015).
[40]
Wikipedia. Wikipedia Seigenthaler biography incident. Website, 2015. https://en.wikipedia.org/w/index.php?title=Wikipedia_Seigenthaler_biogr%aphy_incident&oldid=677556119 (accessed Oct. 16, 2015).
[41]
T. Wöhner and R. Peters. Assessing the quality of wikipedia articles with lifecycle based metrics. In WikiSym, 2009.
[42]
Q. Xu and H. Zhao. Using deep linguistic features for finding deceptive opinion spam. In COLING, 2012.
[43]
F. Yang, Y. Liu, X. Yu, and M. Yang. Automatic detection of rumor on sina weibo. In MDS, 2012.
[44]
H. Zeng, M. A. Alhossaini, L. Ding, R. Fikes, and D. L. McGuinness. Computing trust from revision history. Technical report, 2006.

Cited By

View all
  • (2025)Temporal Insights for Group-Based Fraud Detection on e-Commerce PlatformsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348512737:2(951-965)Online publication date: Mar-2025
  • (2025)Combating Digital Deception: A Survey on Early Misinformation Detection on Social MediaMachine Learning for Social Transformation10.1007/978-981-97-7532-3_11(139-154)Online publication date: 3-Jan-2025
  • (2025)A Social Network Based Approach to Analyzing Artistic Influences on American Stand-Up ComediansSocial Networks Analysis and Mining10.1007/978-3-031-78554-2_1(3-19)Online publication date: 25-Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '16: Proceedings of the 25th International Conference on World Wide Web
April 2016
1482 pages
ISBN:9781450341431
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 11 April 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. detection
  2. disinformation
  3. hoax
  4. misinformation
  5. rumor
  6. wikipedia

Qualifiers

  • Research-article

Funding Sources

  • Stanford Data Science Initiative
  • Defense Advanced Research Projects Agency
  • SAP
  • Volkswagen
  • Boeing
  • Army Research Office
  • Yahoo
  • Wikimedia Research Fellowship
  • National Science Foundation
  • Facebook

Conference

WWW '16
Sponsor:
  • IW3C2
WWW '16: 25th International World Wide Web Conference
April 11 - 15, 2016
Québec, Montréal, Canada

Acceptance Rates

WWW '16 Paper Acceptance Rate 115 of 727 submissions, 16%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)687
  • Downloads (Last 6 weeks)119
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Temporal Insights for Group-Based Fraud Detection on e-Commerce PlatformsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348512737:2(951-965)Online publication date: Mar-2025
  • (2025)Combating Digital Deception: A Survey on Early Misinformation Detection on Social MediaMachine Learning for Social Transformation10.1007/978-981-97-7532-3_11(139-154)Online publication date: 3-Jan-2025
  • (2025)A Social Network Based Approach to Analyzing Artistic Influences on American Stand-Up ComediansSocial Networks Analysis and Mining10.1007/978-3-031-78554-2_1(3-19)Online publication date: 25-Jan-2025
  • (2024)Fake News and Rumors on Social Media in Cameroon's 2018 Presidential Election: Analyzing Political Communication in the Post-Truth EraAfrican Journal of Social Sciences and Humanities Research10.52589/AJSSHR-XPFCVDEC7:3(243-265)Online publication date: 19-Aug-2024
  • (2024)Self-Orientalist Islamophobic Discourse: “Vikipedi Türkiye” CaseMedya ve Din Araştırmaları Dergisi10.47951/mediad.1520173(123-146)Online publication date: 28-Nov-2024
  • (2024)A Survey on the Use of Large Language Models (LLMs) in Fake NewsFuture Internet10.3390/fi1608029816:8(298)Online publication date: 19-Aug-2024
  • (2024)Disinformation as a danger to international security: An exploration of the implications in the Italian contextGeopolitical, Social Security and Freedom Journal10.2478/gssfj-2023-00016:1-2(1-19)Online publication date: 22-May-2024
  • (2024)Life Histories of Taboo Knowledge ArtifactsProceedings of the ACM on Human-Computer Interaction10.1145/36870448:CSCW2(1-32)Online publication date: 8-Nov-2024
  • (2024)Beyond Text: Multimodal Credibility Assessment Approaches for Online User-Generated ContentACM Transactions on Intelligent Systems and Technology10.1145/367323615:5(1-33)Online publication date: 14-Jun-2024
  • (2024)Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Croatian, Serbian, Bosnian, and Serbo-Croatian WikipediasProceedings of the ACM on Human-Computer Interaction10.1145/36373388:CSCW1(1-26)Online publication date: 26-Apr-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media