ABSTRACT
Social media sites are challenged by both the scale and variety of deviant behavior online. While algorithms can detect spam and obscenity, behaviors that break community guidelines on some sites are difficult because they have multimodal subtleties (images and/or text). Identifying these posts is often regulated to a few moderators. In this paper, we develop a deep learning classifier that jointly models textual and visual characteristics of pro-eating disorder content that violates community guidelines. Using a million Tumblr photo posts, our classifier discovers deviant content efficiently while also maintaining high recall (85%). Our approach uses human sensitivity throughout to guide the creation, curation, and understanding of this approach to challenging, deviant content. We discuss how automation might impact community moderation, and the ethical and social obligations of this area.
Supplemental Material
Available for Download
- B Thomas Adler, Luca De Alfaro, Santiago M Mola-Velasco, Paolo Rosso, and Andrew G West. 2011. Wikipedia vandalism detection: Combining natural language, metadata, and reputation features. In Computational linguistics and intelligent text processing. Springer, 277--288.Google Scholar
Digital Library
- Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne. 2008. Finding high-quality content in social media. In Proceedings of the 2008 international conference on web search and data mining. ACM, 183--194. Google Scholar
Digital Library
- Ronald L Akers. 1977. Deviant behavior: A social learning approach. (1977).Google Scholar
- Nazanin Andalibi and Andrea Forte. 2016. Social Computing Researchers, Vulnerability, and Peer Support. In Ethical Encounters in HCI: Research in Sensitive and Complex Settings Workshop at the Conference on Human Factors in Computing Systems.Google Scholar
- Nazanin Andalibi, Pinar Ozturk, and Andrea Forte. 2017. Sensitive Self-disclosures, Responses, and Social Support on Instagram: the case of #depression. In Proceedings of the ACM Conference on Computer Supported Cooperative Work & Social Computing. Forthcoming. Google Scholar
Digital Library
- Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural Codes for Image Retrieval. In ECCV. Google Scholar
Cross Ref
- Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 2011. Modern Information Retrieval - the concepts and technology behind search, Second edition. Pearson Education Ltd., Harlow, England.Google Scholar
Digital Library
- Michael S Bernstein, Andrés Monroy-Hernández, Drew Harry, Paul André, Katrina Panovich, and Gregory G Vargas. 2011. 4chan and/b: An Analysis of Anonymity and Ephemerality in a Large Online Community. In ICWSM.Google Scholar
- Jeremy Blackburn and Haewoon Kwak. 2014. Stfu noob!: predicting crowdsourced decisions on toxic behavior in online games. In Proceedings of the 23rd international conference on World wide web. ACM, 877--888. Google Scholar
Digital Library
- Dina LG Borzekowski, Summer Schenk, Jenny L Wilson, and Rebecka Peebles. 2010. e-Ana and e-Mia: A Content Analysis of Pro-Eating Disorder Web Sites. American journal of public health 100, 8 (2010), 1526.Google Scholar
- Amy Bruckman, Pavel Curtis, Cliff Figallo, and Brenda Laurel. 1994. Approaches to managing deviant behavior in virtual communities. In Conference Companion on Human Factors in Computing Systems. ACM, 183--184. Google Scholar
Digital Library
- Erin E Buckels, Paul D Trapnell, and Delroy L Paulhus. 2014. Trolls just want to have fun. Personality and individual Differences 67 (2014), 97--102.Google Scholar
- Alissa Centivany and Bobby Glushko. 2016. 'Popcorn Tastes Good': Participatory Policymaking and Reddit's 'Amageddon'. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Google Scholar
Digital Library
- Stevie Chancellor, Zhiyuan Lin, Erica L Goodman, Stephanie Zerwas, and Munmun De Choudhury. 2016b. Quantifying and Predicting Mental Illness Severity in Online Pro-Eating Disorder Communities. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. ACM, 1171--1184. Google Scholar
Digital Library
- Stevie Chancellor, Zhiyuan Jerry Lin, and Munmun De Choudhury. 2016a. "This Post Will Just Get Taken Down": Characterizing Removed Pro-Eating Disorder Social Media Content. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 1157--1162. Google Scholar
Digital Library
- Stevie Chancellor, Tanushree Mitra, and Munmun De Choudhury. 2016c. Recovery Amid Pro-Anorexia: Analysis of Recovery in Social Media. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2111--2123. Google Scholar
Digital Library
- Stevie Chancellor, Jessica Pater, Trustin Clear, Eric Gilbert, and Munmun De Choudhury. 2016d. #thyghgapp: Instagram Content Moderation and Lexical Variation in Pro-Eating Disorder Communities. In Proceedings of the 2016 Conference on Computer Supported Cooperative Work & Social Computing(CSCW). ACM. Google Scholar
Digital Library
- Yi Chang, Lei Tang, Yoshiyuki Inagaki, and Yan Liu. 2014. What is tumblr: A statistical overview and comparison. ACM SIGKDD Explorations Newsletter 16, 1 (2014), 21--29. Google Scholar
Digital Library
- Adrien Chen. 2014. The Laborers Who Keep Dick Pics and Beheadings Out of Your Facebook Feed. (2014). https://www.wired.com/2014/10/content-moderation/Google Scholar
- Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizel, and Jure Leskovic. 2017. Anyone Can Become a Troll: Causes of Trolling Behavior in Online Discussions. In Proceedings of the ACM Conference on Computer Supported Cooperative Work & Social Computing. Forthcoming. Google Scholar
Digital Library
- Justin Cheng, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2015. Antisocial Behavior in Online Discussion Communities. In International Conference on Weblogs and Social Media (ICWSM). AAAI.Google Scholar
- Denzil Correa and Ashish Sureka. 2014. Chaff from the wheat: characterization and modeling of deleted questions on stack overflow. In Proceedings of the 23rd international conference on World wide web. ACM, 631--642. Google Scholar
Digital Library
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297. Google Scholar
Cross Ref
- John P Davis, Shelly Farnham, and Carlos Jensen. 2002. Decreasing online "bad" behavior. In CHI'02 Extended Abstracts on Human Factors in Computing Systems. ACM, 718--719.Google Scholar
Digital Library
- Munmun De Choudhury. 2015. Anorexia on Tumblr: A Characterization Study. In Proc. Digital Health. Google Scholar
Digital Library
- Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013. Predicting depression via social media. In AAAI Conference on Weblogs and Social Media.Google Scholar
- J. Delhumeau, PH. Gosselin, H. Jegou, and P. Perez. 2013. Revisiting the VLAD Image representation. In ACM Multimedia. Google Scholar
Digital Library
- Emily Denton, Jason Weston, Manohar Paluri, Lubomir Bourdev, and Rob Fergus. 2015. User conditional hashtag prediction for images. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1731--1740. Google Scholar
Digital Library
- Nicholas A Diakopoulos. 2015. The Editor's Eye: Curation and Comment Relevance on the NY Times. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 1153--1157.Google Scholar
Digital Library
- Judith S Donath and others. 1999. Identity and deception in the virtual community. Communities in cyberspace 1996 (1999), 29--59.Google Scholar
- Harris Drucker, Donghui Wu, and Vladimir N Vapnik. 1999. Support vector machines for spam categorization. IEEE Transactions on Neural networks 10, 5 (1999), 1048--1054. Google Scholar
Digital Library
- John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, Jul (2011), 2121--2159.Google Scholar
Digital Library
- T Emmens and A Phippen. 2010. Evaluating Online Safety Programs. Harvard Berkman Center for Internet and Society.[23 July 2011] (2010).Google Scholar
- Casey Fiesler, Cliff Lampe, and Amy S Bruckman. 2016. Reality and perception of copyright terms of service for online content creation. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. ACM, 1450--1461. Google Scholar
Digital Library
- Jesse Fox and Margaret C Rooney. 2015. The Dark Triad and trait self-objectification as predictors of men's use and self-presentation behaviors on social networking sites. Personality and Individual Differences 76 (2015), 161--165. Google Scholar
Cross Ref
- Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas Mikolov, and others. 2013. Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems. 2121--2129.Google Scholar
Digital Library
- Pierre Garrigues, Sachin Farfade, Hamid Izadinia, Kofi Boakye, and Yannis Kalantidis. 2016. Tag Prediction at Flickr: a View from the Darkroom. arXiv preprint arXiv:1612.01922 (2016).Google Scholar
- Eric Gilbert. 2013. Widespread underprovision on reddit. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 803--808. Google Scholar
Digital Library
- Val Gillies, Angela Harden, Katherine Johnson, Paula Reavey, Vicky Strange, and Carla Willig. 2005. Painting pictures of embodied experience: The use of nonverbal data production for the study of embodiment. Qualitative research in psychology 2, 3 (2005), 199--212. Google Scholar
Cross Ref
- Philippe-Henri Gosselin, Naila Murray, Hervé Jégou, and Florent Perronnin. 2014. Revisiting the Fisher vector for fine-grained classification. Pattern Recognition Letters 49 (2014), 92--98. Google Scholar
Digital Library
- Lynne Hall and Carlisle E George. 1999. Law and Punishment in Virtual Communities. Proceedings of Cybersociety (1999).Google Scholar
- Sameer Hinduja and Justin W Patchin. 2014. Bullying beyond the schoolyard: Preventing and responding to cyberbullying. Corwin Press.Google Scholar
- Yuheng Hu, Lydia Manikonda, Subbarao Kambhampati, and others. 2014. What We Instagram: A First Analysis of Instagram Photo Content and User Types.. In ICWSM.Google Scholar
- Sara Javanmardi, David W McDonald, and Cristina V Lopes. 2011. Vandalism detection in Wikipedia: a high-performing, feature-rich model and its reduction through Lasso. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration. ACM, 82--90.Google Scholar
Digital Library
- H. Jégou, M. Douze, C. Schmid, and P. Perez. 2010. Aggregating Local Descriptors into a Compact Image Representation. In CVPR. Google Scholar
Cross Ref
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia (MM '14). ACM, NY, NY, USA, 675--678. DOI: http://dx.doi.org/10.1145/2647868.2654889 Google Scholar
Digital Library
- Yannis Kalantidis, Lyndon Kennedy, Huy Nguyen, Clayton Mellina, and David A Shamma. 2016a. LOH and behold: Web-scale visual search, recommendation and clustering using Locally Optimized Hashing. ECCV VSM Workshop (2016).Google Scholar
Cross Ref
- Yannis Kalantidis, Clayton Mellina, Flickr Vision, and Simon Osindero. 2016b. Cross-dimensional Weighting for Aggregated Deep Convolutional Features. In VSM Workshop, ECCV. Google Scholar
Cross Ref
- Ruogu Kang, Laura Dabbish, and Katherine Sutton. 2016. Strangers on Your Phone: Why People Use Anonymous Communication Applications. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. ACM, 359--370. Google Scholar
Digital Library
- Sara Kiesler, Robert Kraut, Paul Resnick, and Aniket Kittur. 2012. Regulating behavior in online communities. Building Successful Online Communities: Evidence-Based Social Design. MIT Press, Cambridge, MA (2012).Google Scholar
Digital Library
- Amy Jo Kim. 2000. Community building on the web: Secret strategies for successful online communities. Addison-Wesley Longman Publishing Co., Inc.Google Scholar
Digital Library
- Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. 2014. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014).Google Scholar
- Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 43--52.Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NIPS.Google Scholar
- Cliff Lampe and Paul Resnick. 2004. Slash (dot) and burn: distributed moderation in a large online conversation space. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 543--550. Google Scholar
Digital Library
- A Laye-Gindhu and KA Schonert-Reichl. 2005. Nonsuicidal self-harm among community adolescents: Understanding the "whats" and "whys" of self-harm. Journal of Youth & Adolescence 34, 5 (2005), 447--457. Google Scholar
Cross Ref
- Stephanie M. Lee. 2016. Why Eating Disorders Are So Hard For Instagram And Tumblr To Combat. (2016). https://www.buzzfeed.com/stephaniemlee/ why-eating-disorders-are-so-hard-for-instagram-and/ /-tumblr-toGoogle Scholar
- D.G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. IJCV 60, 2 (2004), 91--110. Google Scholar
Digital Library
- Justin Ma, Lawrence K Saul, Stefan Savage, and Geoffrey M Voelker. 2009. Identifying suspicious URLs: an application of large-scale online learning. In Proceedings of the 26th annual international conference on machine learning. ACM, 681--688.Google Scholar
Digital Library
- Jeanne B Martin. 2010. The development of ideal body image perceptions in the United States. Nutrition Today 45, 3 (2010), 98--110. Google Scholar
Cross Ref
- J. Nathan Matias, Amy Johnson, Whitney Erin Boesel, Brian Keegan, Jaclyn Friedman, and Charlie DeTar. 2015. Reporting, Reviewing, and Responding to Harassment on Twitter. CoRR abs/1505.03359 (2015). http://arxiv.org/abs/1505.03359Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
- T Mikolov and J Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems (2013).Google Scholar
Digital Library
- Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie S Glance. 2013. What yelp fake review filter might be doing?. In ICWSM.Google Scholar
- Deokgun Park, Simranjit Sachar, Nicholas Diakopoulos, and Niklas Elmqvist. 2016. Supporting comment moderators in identifying high quality online news comments. In Proc. Conference on Human Factors in Computing Systems (CHI). Google Scholar
Digital Library
- Jessica A Pater, Oliver L Haimston, Nazanin Andalibi, and Elizabeth D Mynatt. 2016. "Hunger Hurts but Starving Works": Characterizing the Presentation of Eating Disorders Online. In Proceedings of the 19th ACM conference on Computer Supported Cooperative Work & Social Computing (CSCW).Google Scholar
Digital Library
- F. Perronnin, Y. Liu, J. Sanchez, and H. Poirier. 2010. Large-Scale Image Retrieval with Compressed Fisher Vectors. In CVPR. Google Scholar
Cross Ref
- Rebecca Rafferty and Thomas Vander Ven. 2014. "I Hate Everything About You": A Qualitative Examination of Cyberbullying and On-Line Aggression in a College Sample. Deviant behavior 35, 5 (2014), 364--377. Google Scholar
Cross Ref
- Andrew G Reece and Christopher M Danforth. 2016. Instagram photos reveal predictive markers of depression. arXiv preprint arXiv:1608.03282 (2016).Google Scholar
- Zhou Ren, Hailin Jin, Zhe Lin, Chen Fang, and Alan Yuille. 2016. Joint Image-Text Representation by Gaussian Visual-Semantic Embedding. In ACM Multimedia, Vol. 4. Google Scholar
Digital Library
- Denise Restauri. 2012. "Tumblr to Pinterest to Instagram -- The Self-Harm 'Thinspo' Community Is House-Hunting". (2012). http://www.forbes.com/sites/deniserestauri/2012/04/16/ tumblr-to-pinterest-to-instagram-self-harm-thinspo/ /-community-is-house-huntingGoogle Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, and others. 2014. Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575 (2014).Google Scholar
- David A Shamma, Lyndon Kennedy, Jia Li, Bart Thomee, Haojian Jin, and Jeff Yuan. 2016. Finding Weather Photos: Community-Supervised Methods for Editorial Curation of Online Sources. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. ACM, 86--96.Google Scholar
Digital Library
- Manya Sleeper, Justin Cranshaw, Patrick Gage Kelley, Blase Ur, Alessandro Acquisti, Lorrie Faith Cranor, and Norman Sadeh. 2013. I read my Twitter the next morning and was astonished: A conversational perspective on Twitter regrets. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3277--3286. Google Scholar
Digital Library
- Brian K Smith, Jeana Frost, Meltem Albayrak, and Rajneesh Sudhakar. 2006. Facilitating narrative medical discussions of type 1 diabetes with computer visualizations and photography. Patient Education and Counseling 64, 1 (2006), 313--321.Google Scholar
Cross Ref
- Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y Ng. 2008. Cheap and fast-but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 254--263. Google Scholar
Cross Ref
- Janet Sternberg. 2012. Misbehavior in cyber places: The regulation of online conduct in virtual communities on the Internet. Rowman & Littlefield.Google Scholar
- John Suler. 2004. The online disinhibition effect. Cyberpsychology & behavior 7, 3 (2004), 321--326. Google Scholar
Cross Ref
- John R Suler and Wende L Phillips. 1998. The bad boys of cyberspace: Deviant behavior in a multimedia chat community. CyberPsychology & Behavior 1, 3 (1998), 275--294. Google Scholar
Cross Ref
- Bart Thomee, Benjamin Elizalde, David A Shamma, Karl Ni, Gerald Friedland, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (2016), 64--73. Google Scholar
Digital Library
- Giorgos Tolias, Yannis Avrithis, and Hervé Jégou. 2015. Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images. International Journal of Computer Vision (2015), 1--15.Google Scholar
- Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2016. Particular object retrieval with integral max-pooling of CNN activations. In ICLR.Google Scholar
- Simon Tong and Edward Chang. 2001. Support vector machine active learning for image retrieval. In Proceedings of the ninth ACM international conference on Multimedia. ACM, 107--118. Google Scholar
Digital Library
- Khoi-Nguyen Tran and Peter Christen. 2015. Cross-language learning from bots and users to detect vandalism on wikipedia. IEEE Transactions on Knowledge and Data Engineering 27, 3 (2015), 673--685. Google Scholar
Digital Library
- Tumblr. 2016. "Tumblr Community Guidelines". (2016). https://www.tumblr.com/policy/en/communityGoogle Scholar
- Alfredo Vellido, José David Martín-Guerrero, and Paulo JG Lisboa. 2012. Making machine learning models interpretable.. In ESANN, Vol. 12. Citeseer, 163--172.Google Scholar
- Jason Weston, Sumit Chopra, and Keith Adams. 2014. # TagSpace: Semantic embeddings from hashtags. (2014).Google Scholar
- Elad Yom-Tov, Luis Fernandez-Luque, Ingmar Weber, and P Steven Crain. 2012. Pro-Anorexia and Pro-Recovery Photo Sharing: A Tale of Two Warring Tribes. J Med Internet Res (2012).Google Scholar
Index Terms
Multimodal Classification of Moderated Online Pro-Eating Disorder Content
Recommendations
Conformity of Eating Disorders through Content Moderation
CSCWFor individuals with mental illness, social media platforms are considered spaces for sharing and connection. However, not all expressions of mental illness are treated equally on these platforms. Different aggregates of human and technical control are ...
"This Post Will Just Get Taken Down": Characterizing Removed Pro-Eating Disorder Social Media Content
CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing SystemsSocial media sites like Facebook and Instagram remove content that is against community guidelines or is perceived to be deviant behavior. Users also delete their own content that they feel is not appropriate within personal or community norms. In this ...
Detecting and Characterizing Eating-Disorder Communities on Social Media
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningEating disorders are complex mental disorders and responsible for the highest mortality rate among mental illnesses. Recent studies reveal that user-generated content on social media provides useful information in understanding these disorders. Most ...





Comments