skip to main content
research-article

Analyzing and Mining Comments and Comment Ratings on the Social Web

Published:08 July 2014Publication History
Skip Abstract Section

Abstract

An analysis of the social video sharing platform YouTube and the news aggregator Yahoo! News reveals the presence of vast amounts of community feedback through comments for published videos and news stories, as well as through metaratings for these comments. This article presents an in-depth study of commenting and comment rating behavior on a sample of more than 10 million user comments on YouTube and Yahoo! News. In this study, comment ratings are considered first-class citizens. Their dependencies with textual content, thread structure of comments, and associated content (e.g., videos and their metadata) are analyzed to obtain a comprehensive understanding of the community commenting behavior. Furthermore, this article explores the applicability of machine learning and data mining to detect acceptance of comments by the community, comments likely to trigger discussions, controversial and polarizing content, and users exhibiting offensive commenting behavior. Results from this study have potential application in guiding the design of community-oriented online discussion platforms.

References

  1. E. Agichtein., C. Castillo, D. Donato, A. Gionis, and G. Mishne. 2008. Finding high-quality content in social media. In Proceedings of the International Conference on Web Search and Data Mining (WSDM'08). ACM Press, New York, 183--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Alonzo and M. Aiken. 2004. Flaming in electronic communication. Decis. Support Syst. 36, 3, 205--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. 2007. I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC'07). ACM Press, New York, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Chang and C. Lin. 2011. Libsvm: A library for support vector machines. ACM Trans. Intel. Syst. Technol. 2, 3, 27:1--27:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chelaru, C. Orellana-Rodriguez, and I. S. Altingovde. 2012. Can social features help learning to rank youtube videos? In Proceedings of the 13th International Conference on Web Information Systems Engineering (WISE'12). 552--566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. Cheng, C. Dale, and J. Liu. 2007. Understanding the characteristics of Internet short video sharing: YouTube as a case study. Tech. rep. arXiv:0707.3670v1 cs.NI. arXiv e-prints, Cornell University, NY.Google ScholarGoogle Scholar
  8. O. Dalal, S. H. Sengemedu, and S. Sanyal. 2012. Multi-objective ranking of comments on web. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 419--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Danescu-Niculescu-Mizil, G. Kossinets, J. Kleinberg, and L. Lee. 2009. How opinions are received by online communities: A case study on amazon.com helpfulness votes. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM Press, New York, 141--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. De Choudhury, H. Sundaram, A. John, and D. D. Seligmann. 2009. What makes conversations interesting? Themes, participants and consequences of conversations in online social media. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM Press, New York, 331--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Denecke. 2008. Using SentiWordNet for multilingual sentiment analysis. In Proceedings of the 24th International Conference on Data Engineering Workshops. 507--512.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Dumais, J. Platt, D. Heckerman, and M. Sahami. 1998. Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management (CIKM'98). ACM Press, New York, 148--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Esuli. 2008. Automatic generation of lexical resources for opinion mining: Models, algorithms and applications. SIGIR Forum 42, 105--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Esuli and F. Sebastiani. 2006. SentiWordNet: A publicly available lexical resource for opinion mining. In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC'06). 417--422.Google ScholarGoogle Scholar
  15. C. Fellbaum, Ed. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  16. K. Filippova and K. B. Hall. 2011. Improved video categorization from text metadata and user comments. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). ACM Press, New York, 835--842. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Gill, M. Arlitt, Z. Li, and A. Mahanti. 2007. YouTube traffic characterization: A view from the edge. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC'07). ACM Press, New York, 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Gomez, A. Kaltenbrunner, and V. Lopez. 2008. Statistical analysis of the social network and discussion threads in Slashdot. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM Press, New York, 645--654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Gomez, H. Kappen, N. Litvak, and A. Kaltenbrunner. 2012. A likelihood-based framework for the analysis of discussion threads. J. World Wide Web 16, 5--6, 645--675. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Hanna, A. Rohm, and V. L. Crittenden. 2011. We're all connected: The power of the social media ecosystem. Bus. Horiz. 54, 3, 265--273.Google ScholarGoogle ScholarCross RefCross Ref
  21. F. M. Harper, D. Raban, S. Rafaeli, and J. A. Konstan. 2008. Predictors of answer quality in online q&a sites. In Proceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI'08). ACM Press, New York, 865--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. M. Harwood and C. L. Hahn. 1990. Controversial Issues in the Classroom. ERIC Clearinghouse for Social Studies/Social Science Education.Google ScholarGoogle Scholar
  23. C. Hsu, E. Khabiri, and J. Caverlee. 2009. Ranking comments on the social web. In Proceedings of the International Conference on Computational Science and Engineering. Vol. 4. 90--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Hu, A. Sun, and E.-P. Lim. 2008. Comments-oriented document summarization: Understanding documents with readers' feedback. In Proceedings of the 31st Annual ACM SIGIR International Conference on Research and Development in Information Retrieval. 291--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Joachims. 1998. Text categorization with suport vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning (ECML'98). Springer, 137--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. H. Kietzmann, K. Hermkens, I. P. McCarthy, and B. S. Silvestre. 2011. Social media? Get serious! Understanding the functional building blocks of social media. Bus. Horiz. 54, 3, 241--251.Google ScholarGoogle ScholarCross RefCross Ref
  27. S.-M. Kim, P. Pantel, T. Chklovski, and M. Pennacchiotti. 2006. Automatically assessing review helpfulness. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'06). Association for Computational Linguistics, 423--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Kunegis, A. Lommatzsch, and C. Bauckhage. 2009. The Slashdot zoo: Mining a social network with negative edges. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). 741--750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. A. Kuypers. 2002. Press Bias and Politics: How the Media Frame Controversial Issues. Praeger.Google ScholarGoogle Scholar
  30. Q. Li, J. Wang, Y. P. Chen, and Z. Lin. 2010. User comments for news recommendation in forum-based social media. Inf. Sci. 180, 24, 4929--4939. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Lu, C. Zhai, and N. Sundaresan. 2009. Rated aspect summarization of short comments. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM Press, New York, 131--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Manning and H. Schuetze. 1999. Foundations of Statistical Natural Language Processing. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. G. Mishne and N. Glance. 2006. Leave a reply: An analysis of weblog comments. In Proceedings of the 3rd Annual Workshop on the Weblogging Ecosystem.Google ScholarGoogle Scholar
  34. A. Mishra and R. Rastogi. 2012. Semi-supervised correction of biased comment ratings. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 181--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing (EMNLP'02). Vol. 10, Association for Computational Linguistics, 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Park, M. Ko, J. Kim, Y. Liu, and J. Song. 2011. The politics of comments: Predicting political orientation of news stories with commenters' sentiment patterns. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW'11). 113--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Potthast, B. Stein, F. Loose, and S. Becker. 2012. Information retrieval in the commentsphere. ACM Trans. Intell. Syst. Technol. 3, 4, 68:1--68:21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Rosenberg and E. Binkowski. 2004. Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points. In HLT-NAACL Short Papers (HLT-NAACL-Short'04). Association for Computational Linguistics, 77--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Rowe, S. Angeletou, and H. Alani. 2011a. Anticipating discussion activity on community forums. In Proceedings of the 3rd IEEE International Conference on Privacy, Security, Risk and Trust and the 3rd IEEE International Conference on Social Computing (PASSAT/SocialCom'11). 315--322.Google ScholarGoogle Scholar
  40. M. Rowe, S. Angeletou, and H. Alani. 2011b. Predicting discussions on the social semantic web. In Proceedings of the 8th Extended Semantic Web Conference on The Semanic Web: Research and Applications (ESWC'11), Part II. Springer, 405--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. San Pedro, T. Yeh, and N. Oliver. 2012. Leveraging user comments for aesthetic aware image search reranking. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 439--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. Schuth, M. Marx, and M. de Rijke. 2007. Extracting the discussion structure in comments on news-articles. In Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management (WIDM'07). ACM Press, New York, 97--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. E. Shmueli, A. Kagian, Y. Koren, and R. Lempel. 2012. Care to comment? Recommendations for commenting on news stories. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 429--438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. S. Siersdorfer, S. Chelaru, W. Nejdl, and J. San Pedro. 2010. How useful are your comments? Analyzing and predicting youtube comments and comment ratings. In Proceedings of the 19th International Conference on World Wide Web (WWW'10). ACM Press, New York, 891--900. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Siersdorfer, J. San Pedro, and M. Sanderson. 2009. Automatic video tagging using content redundancy. In Proceedings of the 32nd ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'09). ACM Press, New York, 395--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. Susarla, J.-H. Oh, and Y. Tan. 2012. Social networks and the diffusion of user-generated content: Evidence from YouTube. Inf. Syst. Res. 23, 1, 23--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. Tatar, J. Leguay, P. Antoniadis, A. Limbourg, M. D. De Amorim, and S. Fdida. 2011. Predicting the popularity of online articles based on user comments. In Proceedings of the International Conference on Web Intelligence, Mining and Semantics (WIMS'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. Thelwall, P. Sud, and F. Vis. 2012. Commenting on YouTube videos: From Guatemalan rock to el big bang. J. Amer. Soc. Inf. Sci. Technol. 63, 3, 616--629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Thomas, B. Pang, and L. Lee. 2006. Get out the vote: Determining support or opposition from congressional floor-debate transcripts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'06). Association for Computational Linguistics, 327--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. M. Tsagkias, W. Weerkamp, and M. de Rijke. 2010. News comments: Exploring, modeling, and online prediction. In Proceedings of the 32nd European Conference on IR Research (ECIR'10). 191--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. A. Veloso, W. Meira, T. Macambira, D. Guedes, and H. Almeida. 2007. Automatic moderation of comments in a large on-line journalistic environment. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM'07).Google ScholarGoogle Scholar
  52. C. Wang, M. Ye, and B. A. Huberman. 2012. From user comments to on-line conversations. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12). ACM Press, New York, 244--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Weimer, I. Gurevych, and M. Mhlhuser. 2007. Automatically assessing the post quality in online discussions on software. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (Companion Volume Proceedings of the Demo and Poster Sessions). 125--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. F. Wu and B. A. Huberman. 2008. How public opinion forms. In Proceedings of the 4th International Workshop on Internet and Network Economics (WINE'08). Springer, 334--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Y. Yang and J. O. Pedersen. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML'97). Morgan Kaufmann, San Fransisco, 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. T. Yano and N. A. Smith. 2010. What's worthy of comment? Content and comment volume in political blogs. In Proceedings of the 4th International Conference on Weblogs and Social Media.Google ScholarGoogle Scholar
  57. W. G. Yee, A. Yates, S. Liu, and O. Frieder. 2009. Are web user comments useful for search? In LSDS-web IR Workshop. http://lsdsir09.isti.cnr.it/lsdsir09-7.pdf.Google ScholarGoogle Scholar

Index Terms

  1. Analyzing and Mining Comments and Comment Ratings on the Social Web

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on the Web
      ACM Transactions on the Web  Volume 8, Issue 3
      June 2014
      256 pages
      ISSN:1559-1131
      EISSN:1559-114X
      DOI:10.1145/2639948
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 July 2014
      • Accepted: 1 March 2014
      • Revised: 1 November 2013
      • Received: 1 September 2012
      Published in tweb Volume 8, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!