10.1145/2872427.2883062acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Abusive Language Detection in Online User Content

Online:11 April 2016Publication History

ABSTRACT

Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior.

References

  1. S. Brody and N. Diakopoulos. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! using word lengthening to detect sentiment in microblogs. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 562--570, Edinburgh, Scotland, UK., July 2011. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. D. Buhrmester, T. Kwang, and S. D. Gosling. Amazon's mechanical turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1):3--5, Jan 2011. Google ScholarGoogle ScholarCross RefCross Ref
  3. Y. Chen, Y. Zhou, S. Zhu, and H. Xu. Detecting offensive language in social media to protect adolescent online safety. In Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom), pages 71--80. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Djuric, H. Wu, V. Radosavljevic, M. Grbovic, and N. Bhamidipati. Hierarchical neural language models for joint representation of streaming documents and their content. In International World Wide Web Conference (WWW), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and N. Bhamidipati. Hate speech detection with comment embeddings. In Proceedings of International World Wide Web Conference (WWW), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Faruqui and C. Dyer. Non-distributional word vector representations. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 464--469, Beijing, China, July 2015. Association for Computational Linguistics. Google ScholarGoogle ScholarCross RefCross Ref
  7. J. Horton, D. G. Rand, and R. J. Zeckhauser. The online laboratory: Conducting experiments in a real labor market. National Bureau of Economic Research Cambridge, Mass., USA, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  8. Q. Le and T. Mikolov. Distributed representations of sentences and documents. In T. Jebara and E. P. Xing, editors, Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196. JMLR Workshop and Conference Proceedings, 2014.Google ScholarGoogle Scholar
  9. B. Liu. Sentiment Analysis and Opinion Mining. Morgan Claypool Publishers, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  10. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.Google ScholarGoogle Scholar
  11. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 3111--3119. Curran Associates, Inc., 2013.Google ScholarGoogle Scholar
  12. G. Paolacci, J. Chandler, and P. G. Ipeirotis. Running experiments on amazon mechanical turk. Judgment and Decision Making, 5(5):411--419, 2010.Google ScholarGoogle Scholar
  13. E. Pitler and A. Nenkova. Using syntax to disambiguate explicit discourse connectives in text. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 13--16, Suntec, Singapore, August 2009. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Sood, J. Antin, and E. Churchill. Profanity use in online communities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1481--1490. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. O. Sood, J. Antin, and E. F. Churchill. Using crowdsourcing to improve profanity detection. In AAAI Spring Symposium: Wisdom of the Crowd, 2012.Google ScholarGoogle Scholar
  16. M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers to non-factoid questions from web collections. Computational Linguistics, 37:351--383, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Suri and D. J. Watts. Cooperation and contagion in web-based, networked public goods experiments. PloS One, 6(3), 2011. Google ScholarGoogle ScholarCross RefCross Ref
  18. W. Warner and J. Hirschberg. Detecting hate speech on the world wide web. In Proceedings of the Second Workshop on Language in Social Media, pages 19--26, Montréal, Canada, June 2012. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Yang, W. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning and inference in knowledge bases. CoRR, abs/1412.6575, 2014.Google ScholarGoogle Scholar
  20. D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics, pages 189--196. Association for Computational Linguistics, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Yin, Z. Xue, L. Hong, B. D. Davison, A. Kontostathis, and L. Edwards. Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB, 2:1--7, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Abusive Language Detection in Online User Content

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      ACM Other conferences cover image
      WWW '16: Proceedings of the 25th International Conference on World Wide Web
      April 2016
      1482 pages
      ISBN:9781450341431

      Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      • Online: 11 April 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '16 Paper Acceptance Rate 115 of 727 submissions, 16%
      Overall Acceptance Rate 1,087 of 7,181 submissions, 15%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!