Abstract
An analysis of the social video sharing platform YouTube and the news aggregator Yahoo! News reveals the presence of vast amounts of community feedback through comments for published videos and news stories, as well as through metaratings for these comments. This article presents an in-depth study of commenting and comment rating behavior on a sample of more than 10 million user comments on YouTube and Yahoo! News. In this study, comment ratings are considered first-class citizens. Their dependencies with textual content, thread structure of comments, and associated content (e.g., videos and their metadata) are analyzed to obtain a comprehensive understanding of the community commenting behavior. Furthermore, this article explores the applicability of machine learning and data mining to detect acceptance of comments by the community, comments likely to trigger discussions, controversial and polarizing content, and users exhibiting offensive commenting behavior. Results from this study have potential application in guiding the design of community-oriented online discussion platforms.
- E. Agichtein., C. Castillo, D. Donato, A. Gionis, and G. Mishne. 2008. Finding high-quality content in social media. In Proceedings of the International Conference on Web Search and Data Mining (WSDM'08). ACM Press, New York, 183--194. Google Scholar
Digital Library
- M. Alonzo and M. Aiken. 2004. Flaming in electronic communication. Decis. Support Syst. 36, 3, 205--213. Google Scholar
Digital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google Scholar
Digital Library
- M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. 2007. I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC'07). ACM Press, New York, 1--14. Google Scholar
Digital Library
- C. Chang and C. Lin. 2011. Libsvm: A library for support vector machines. ACM Trans. Intel. Syst. Technol. 2, 3, 27:1--27:27. Google Scholar
Digital Library
- S. Chelaru, C. Orellana-Rodriguez, and I. S. Altingovde. 2012. Can social features help learning to rank youtube videos? In Proceedings of the 13th International Conference on Web Information Systems Engineering (WISE'12). 552--566. Google Scholar
Digital Library
- X. Cheng, C. Dale, and J. Liu. 2007. Understanding the characteristics of Internet short video sharing: YouTube as a case study. Tech. rep. arXiv:0707.3670v1 cs.NI. arXiv e-prints, Cornell University, NY.Google Scholar
- O. Dalal, S. H. Sengemedu, and S. Sanyal. 2012. Multi-objective ranking of comments on web. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 419--428. Google Scholar
Digital Library
- C. Danescu-Niculescu-Mizil, G. Kossinets, J. Kleinberg, and L. Lee. 2009. How opinions are received by online communities: A case study on amazon.com helpfulness votes. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM Press, New York, 141--150. Google Scholar
Digital Library
- M. De Choudhury, H. Sundaram, A. John, and D. D. Seligmann. 2009. What makes conversations interesting? Themes, participants and consequences of conversations in online social media. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM Press, New York, 331--340. Google Scholar
Digital Library
- K. Denecke. 2008. Using SentiWordNet for multilingual sentiment analysis. In Proceedings of the 24th International Conference on Data Engineering Workshops. 507--512.Google Scholar
Cross Ref
- S. Dumais, J. Platt, D. Heckerman, and M. Sahami. 1998. Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management (CIKM'98). ACM Press, New York, 148--155. Google Scholar
Digital Library
- A. Esuli. 2008. Automatic generation of lexical resources for opinion mining: Models, algorithms and applications. SIGIR Forum 42, 105--106. Google Scholar
Digital Library
- A. Esuli and F. Sebastiani. 2006. SentiWordNet: A publicly available lexical resource for opinion mining. In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC'06). 417--422.Google Scholar
- C. Fellbaum, Ed. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.Google Scholar
- K. Filippova and K. B. Hall. 2011. Improved video categorization from text metadata and user comments. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). ACM Press, New York, 835--842. Google Scholar
Digital Library
- P. Gill, M. Arlitt, Z. Li, and A. Mahanti. 2007. YouTube traffic characterization: A view from the edge. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC'07). ACM Press, New York, 15--28. Google Scholar
Digital Library
- V. Gomez, A. Kaltenbrunner, and V. Lopez. 2008. Statistical analysis of the social network and discussion threads in Slashdot. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM Press, New York, 645--654. Google Scholar
Digital Library
- V. Gomez, H. Kappen, N. Litvak, and A. Kaltenbrunner. 2012. A likelihood-based framework for the analysis of discussion threads. J. World Wide Web 16, 5--6, 645--675. Google Scholar
Digital Library
- R. Hanna, A. Rohm, and V. L. Crittenden. 2011. We're all connected: The power of the social media ecosystem. Bus. Horiz. 54, 3, 265--273.Google Scholar
Cross Ref
- F. M. Harper, D. Raban, S. Rafaeli, and J. A. Konstan. 2008. Predictors of answer quality in online q&a sites. In Proceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI'08). ACM Press, New York, 865--874. Google Scholar
Digital Library
- A. M. Harwood and C. L. Hahn. 1990. Controversial Issues in the Classroom. ERIC Clearinghouse for Social Studies/Social Science Education.Google Scholar
- C. Hsu, E. Khabiri, and J. Caverlee. 2009. Ranking comments on the social web. In Proceedings of the International Conference on Computational Science and Engineering. Vol. 4. 90--97. Google Scholar
Digital Library
- M. Hu, A. Sun, and E.-P. Lim. 2008. Comments-oriented document summarization: Understanding documents with readers' feedback. In Proceedings of the 31st Annual ACM SIGIR International Conference on Research and Development in Information Retrieval. 291--298. Google Scholar
Digital Library
- T. Joachims. 1998. Text categorization with suport vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning (ECML'98). Springer, 137--142. Google Scholar
Digital Library
- J. H. Kietzmann, K. Hermkens, I. P. McCarthy, and B. S. Silvestre. 2011. Social media? Get serious! Understanding the functional building blocks of social media. Bus. Horiz. 54, 3, 241--251.Google Scholar
Cross Ref
- S.-M. Kim, P. Pantel, T. Chklovski, and M. Pennacchiotti. 2006. Automatically assessing review helpfulness. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'06). Association for Computational Linguistics, 423--430. Google Scholar
Digital Library
- J. Kunegis, A. Lommatzsch, and C. Bauckhage. 2009. The Slashdot zoo: Mining a social network with negative edges. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). 741--750. Google Scholar
Digital Library
- J. A. Kuypers. 2002. Press Bias and Politics: How the Media Frame Controversial Issues. Praeger.Google Scholar
- Q. Li, J. Wang, Y. P. Chen, and Z. Lin. 2010. User comments for news recommendation in forum-based social media. Inf. Sci. 180, 24, 4929--4939. Google Scholar
Digital Library
- Y. Lu, C. Zhai, and N. Sundaresan. 2009. Rated aspect summarization of short comments. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM Press, New York, 131--140. Google Scholar
Digital Library
- C. Manning and H. Schuetze. 1999. Foundations of Statistical Natural Language Processing. MIT Press. Google Scholar
Digital Library
- G. Mishne and N. Glance. 2006. Leave a reply: An analysis of weblog comments. In Proceedings of the 3rd Annual Workshop on the Weblogging Ecosystem.Google Scholar
- A. Mishra and R. Rastogi. 2012. Semi-supervised correction of biased comment ratings. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 181--190. Google Scholar
Digital Library
- B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing (EMNLP'02). Vol. 10, Association for Computational Linguistics, 79--86. Google Scholar
Digital Library
- S. Park, M. Ko, J. Kim, Y. Liu, and J. Song. 2011. The politics of comments: Predicting political orientation of news stories with commenters' sentiment patterns. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW'11). 113--122. Google Scholar
Digital Library
- M. Potthast, B. Stein, F. Loose, and S. Becker. 2012. Information retrieval in the commentsphere. ACM Trans. Intell. Syst. Technol. 3, 4, 68:1--68:21. Google Scholar
Digital Library
- A. Rosenberg and E. Binkowski. 2004. Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points. In HLT-NAACL Short Papers (HLT-NAACL-Short'04). Association for Computational Linguistics, 77--80. Google Scholar
Digital Library
- M. Rowe, S. Angeletou, and H. Alani. 2011a. Anticipating discussion activity on community forums. In Proceedings of the 3rd IEEE International Conference on Privacy, Security, Risk and Trust and the 3rd IEEE International Conference on Social Computing (PASSAT/SocialCom'11). 315--322.Google Scholar
- M. Rowe, S. Angeletou, and H. Alani. 2011b. Predicting discussions on the social semantic web. In Proceedings of the 8th Extended Semantic Web Conference on The Semanic Web: Research and Applications (ESWC'11), Part II. Springer, 405--420. Google Scholar
Digital Library
- J. San Pedro, T. Yeh, and N. Oliver. 2012. Leveraging user comments for aesthetic aware image search reranking. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 439--448. Google Scholar
Digital Library
- A. Schuth, M. Marx, and M. de Rijke. 2007. Extracting the discussion structure in comments on news-articles. In Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management (WIDM'07). ACM Press, New York, 97--104. Google Scholar
Digital Library
- E. Shmueli, A. Kagian, Y. Koren, and R. Lempel. 2012. Care to comment? Recommendations for commenting on news stories. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 429--438. Google Scholar
Digital Library
- S. Siersdorfer, S. Chelaru, W. Nejdl, and J. San Pedro. 2010. How useful are your comments? Analyzing and predicting youtube comments and comment ratings. In Proceedings of the 19th International Conference on World Wide Web (WWW'10). ACM Press, New York, 891--900. Google Scholar
Digital Library
- S. Siersdorfer, J. San Pedro, and M. Sanderson. 2009. Automatic video tagging using content redundancy. In Proceedings of the 32nd ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'09). ACM Press, New York, 395--402. Google Scholar
Digital Library
- A. Susarla, J.-H. Oh, and Y. Tan. 2012. Social networks and the diffusion of user-generated content: Evidence from YouTube. Inf. Syst. Res. 23, 1, 23--41. Google Scholar
Digital Library
- A. Tatar, J. Leguay, P. Antoniadis, A. Limbourg, M. D. De Amorim, and S. Fdida. 2011. Predicting the popularity of online articles based on user comments. In Proceedings of the International Conference on Web Intelligence, Mining and Semantics (WIMS'11). Google Scholar
Digital Library
- M. Thelwall, P. Sud, and F. Vis. 2012. Commenting on YouTube videos: From Guatemalan rock to el big bang. J. Amer. Soc. Inf. Sci. Technol. 63, 3, 616--629. Google Scholar
Digital Library
- M. Thomas, B. Pang, and L. Lee. 2006. Get out the vote: Determining support or opposition from congressional floor-debate transcripts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'06). Association for Computational Linguistics, 327--335. Google Scholar
Digital Library
- M. Tsagkias, W. Weerkamp, and M. de Rijke. 2010. News comments: Exploring, modeling, and online prediction. In Proceedings of the 32nd European Conference on IR Research (ECIR'10). 191--203. Google Scholar
Digital Library
- A. Veloso, W. Meira, T. Macambira, D. Guedes, and H. Almeida. 2007. Automatic moderation of comments in a large on-line journalistic environment. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM'07).Google Scholar
- C. Wang, M. Ye, and B. A. Huberman. 2012. From user comments to on-line conversations. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12). ACM Press, New York, 244--252. Google Scholar
Digital Library
- M. Weimer, I. Gurevych, and M. Mhlhuser. 2007. Automatically assessing the post quality in online discussions on software. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (Companion Volume Proceedings of the Demo and Poster Sessions). 125--128. Google Scholar
Digital Library
- F. Wu and B. A. Huberman. 2008. How public opinion forms. In Proceedings of the 4th International Workshop on Internet and Network Economics (WINE'08). Springer, 334--341. Google Scholar
Digital Library
- Y. Yang and J. O. Pedersen. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML'97). Morgan Kaufmann, San Fransisco, 412--420. Google Scholar
Digital Library
- T. Yano and N. A. Smith. 2010. What's worthy of comment? Content and comment volume in political blogs. In Proceedings of the 4th International Conference on Weblogs and Social Media.Google Scholar
- W. G. Yee, A. Yates, S. Liu, and O. Frieder. 2009. Are web user comments useful for search? In LSDS-web IR Workshop. http://lsdsir09.isti.cnr.it/lsdsir09-7.pdf.Google Scholar
Index Terms
Analyzing and Mining Comments and Comment Ratings on the Social Web
Recommendations
How useful are your comments?: analyzing and predicting youtube comments and comment ratings
WWW '10: Proceedings of the 19th international conference on World wide webAn analysis of the social video sharing platform YouTube reveals a high amount of community feedback through comments for published videos as well as through meta ratings for these comments. In this paper, we present an in-depth study of commenting and ...
To comment or not to comment?
Investigated effects of anti-cyberbullying videos on civic behavioral intentions.Manipulated commenting behavior, virality, and arousal level of videos.Greater CBI upon exposure to high than low arousing videos.Significant effect of three-way ...
How Good is Your Comment? A Study of Comments in Java Programs
ESEM '11: Proceedings of the 2011 International Symposium on Empirical Software Engineering and MeasurementComments are very useful to developers during maintenance tasks and are useful as well to help structuring a code at development time. They convey useful information about the system functionalities as well as the state of mind of a developer. Comments ...






Comments