skip to main content
research-article

Sentiment-Focused Web Crawling

Published:06 November 2014Publication History
Skip Abstract Section

Abstract

Sentiments and opinions expressed in Web pages towards objects, entities, and products constitute an important portion of the textual content available in the Web. In the last decade, the analysis of such content has gained importance due to its high potential for monetization. Despite the vast interest in sentiment analysis, somewhat surprisingly, the discovery of sentimental or opinionated Web content is mostly ignored. This work aims to fill this gap and addresses the problem of quickly discovering and fetching the sentimental content present in the Web. To this end, we design a sentiment-focused Web crawling framework. In particular, we propose different sentiment-focused Web crawling strategies that prioritize discovered URLs based on their predicted sentiment scores. Through simulations, these strategies are shown to achieve considerable performance improvement over general-purpose Web crawling strategies in discovery of sentimental Web content.

References

  1. Ahmed Abbasi, Hsinchun Chen, and Arab Salem. 2008. Sentiment analysis in multiple languages: feature selection for opinion classification in Web forums. ACM Trans. Inf. Syst. 26, 3, 12:1--12:34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ahmed Abbasi, Tianjun Fu, Daniel Zeng, and Donald Adjeroh. 2013. Crawling Credible Online Medical Sentiments for Social Intelligence. In Proceedings of the ASE/IEEE International Conference on Social Computing. 254--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dirk Ahlers and Susanne Boll. 2009. Adaptive geospatially focused crawling. In Proceedings of the 18th ACM International Conference on Information and Knowledge Management. 445--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ismail Sengor Altingovde and Ozgur Ulusoy. 2004. Exploiting interclass rules for focused crawling. IEEE Intell. Syst. 19, 6, 66--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Andrea Esuli Stefano Baccianella and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the 7th Conference on International Language Resources and Evaluation.Google ScholarGoogle Scholar
  6. Xue Bai. 2011. Predicting consumer sentiments from online text. Decision Support Syst. 50, 4, 732--742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sotiris Batsakis, Euripides G. M. Petrakis, and Evangelos Milios. 2009. Improving the performance of focused web crawlers. Data Knowl. Eng. 68, 10, 1001--1013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Philip Beineke, Trevor Hastie, Christopher Manning, and Shivakumar Vaithyanathan. 2004. Exploring sentiment summarization. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications. 1--4.Google ScholarGoogle Scholar
  9. Krishna Bharat, Andrei Broder, Jeffrey Dean, and Monika R. Henzinger. 2000. A comparison of techniques to find mirrored hosts on the WWW. J. Amer. Soc. Inf. Sci. Technol. 51, 12, 1114--1122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andrei Z. Broder, Marc Najork, and Janet L. Wiener. 2003. Efficient URL caching for World Wide Web crawling. In Proceedings of the 12th International Conference on World Wide Web. 679--689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Soumen Chakrabarti, Martin van den Berg, and Byron Dom. 1999. Focused crawling: A new approach to topic-specific Web resource discovery. Computer Networks 31, 11--16, 1623--1640. DOI:http://dx.doi.org/10.1016/S1389-1286(99)00052-3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3, 27:1--27:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sergiu Chelaru, Ismail Sengör Altingövde, Stefan Siersdorfer, and Wolfgang Nejdl. 2013. Analyzing, detecting, and exploiting sentiment in web queries. ACM Trans. Web 8, 6, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. YoungSik Choi, KiJoo Kim, and MunSu Kang. 2005. A focused crawling for the web resource discovery using a modified proximal support vector machines. In Proceedings of the International Conference on Computational Science and its Applications. 186--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yoonjung Choi, Youngho Kim, and Sung-Hyon Myaeng. 2009. Domain-specific sentiment analysis using contextual feature generation. In Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion. 37--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gordon V. Cormack, Mark D. Smucker, and Charles L. Clarke. 2011. Efficient and effective spam filtering and re-ranking for large web datasets. Inf. Retrieval 14, 5, 441--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kushal Dave, Steve Lawrence, and David M. Pennock. 2003. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In Proceedings of the 12th International Conference on World Wide Web. 519--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Marc Ehrig and Alexander Maedche. 2003. Ontology-focused crawling of web documents. In Proceedings of the ACM Symposium on Applied Computing. 1174--1178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. David Eichmann. 1995. Ethical web agents. Comput. Networks ISDN Syst. 28, 1--2, 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tianjun Fu, Ahmed Abbasi, Daniel Zeng, and Hsinchun Chen. 2012. Sentimental spidering: leveraging opinion information in focused crawlers. ACM Trans. Inf. Syst. 30, 4, 24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Shima Gerani, Mark J. Carman, and Fabio Crestani. 2009. Investigating learning approaches for blog post opinion retrieval. In Proceedings of the 31st European Conference on Information Retrieval. 313--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Namrata Godbole, Manjunath Srinivasaiah, and Steven Skiena. 2007. Large-scale sentiment analysis for news and blogs. In Proceedings of the International Conference on Weblogs and Social Media.Google ScholarGoogle Scholar
  23. Michelle L. Gregory, Nancy Chinchor, Paul Whitney, Richard Carter, Elizabeth Hetzler, and Alan Turner. 2006. User-directed sentiment analysis: Visualizing the affective content of documents. In Proceedings of the Workshop on Sentiment and Subjectivity in Text. 23--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Allan Heydon and Marc Najork. 1999. Mercator: a scalable, extensible web crawler. World Wide Web 2, 4, 219--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Judy Johnson, Kostas Tsioutsiouliklis, and C. Lee Giles. 2003. Evolving strategies for focused web crawling. In Proceedings of the 20th International Conference on Machine Learning. 298--305.Google ScholarGoogle Scholar
  26. Onur Kucuktunc, B. Barla Cambazoglu, Ingmar Weber, and Hakan Ferhatosmanoglu. 2012. A large-scale sentiment analysis for Yahoo! Answers. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining. 633--642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kevin Lerman, Sasha Blair-Goldensohn, and Ryan McDonald. 2009. Sentiment summarization: evaluating and learning user preferences. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. 514--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hongyu Liu, Evangelos Milios, and Jeannette Janssen. 2004. Probabilistic models for focused web crawling. In Proceedings of the 6th ACM International Workshop on Web Information and Data Management. 16--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Tetsuya Nasukawa and Jeonghee Yi. 2003. Sentiment analysis: capturing favorability using natural language processing. In Proceedings of the 2nd International Conference on Knowledge Capture. 70--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Neil O'Hare, Michael Davy, Adam Bermingham, Paul Ferguson, Píaraic Sheridan, Cathal Gurrin, and Alan F. Smeaton. 2009. Topic-dependent sentiment analysis of financial blogs. In Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion. 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Christopher Olston and Marc Najork. 2010.Web crawling. Found. Trends Inf. Retrieval 4, 3, 175--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1--135. Issue 1--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Gautam Pant and Padmini Srinivasan. 2005. Learning to crawl: comparing classification schemes. ACM Trans. Inf. Syst. 23, 4, 430--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jialun Qin, Yilu Zhou, and Michael Chau. 2004. Building domain-specific web collections for scientific digital libraries: A meta-search enhanced focused crawling method. In Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries. 135--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2011. Sentiment in Twitter events. J. Amer. Soc. Inf. Sci. Technol. 62, 2, 406--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2012. Sentiment strength detection for the social web. J. Amer. Soc. Inf. Sci. Technol. 63, 1, 163--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas. 2010. Sentiment strength detection in short informal text. J. Amer. Soc. Inf. Sci. Technol. 61, 12, 2544--2558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tun Thura Thet, Jin-Cheon Na, Christopher S. G. Khoo, and Subbaraj Shakthikumar. 2009. Sentiment analysis of movie reviews on discussion boards using a linguistic approach. In Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion. 81--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Peter D. Turney. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 417--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Wouter van Atteveldt, Jan Kleinnijenhuis, Nel Ruigrok, and Stefan Schlobach. 2008. Good news or bad news? Conducting sentiment analysis on Dutch text to distinguish between positive and negative relations. J. Inf. Tech. Politics 5, 1, 73--94.Google ScholarGoogle ScholarCross RefCross Ref
  42. A. Gural Vural, B. Barla Cambazoglu, and Pinar Senkul. 2012a. Sentiment-focused web crawling. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2020--2024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Gural Vural, B. Barla Cambazoglu, Pinar Senkul, and Ozge Tokgoz. 2012b. A framework for sentiment analysis in Turkish: Application to polarity detection of movie reviews in Turkish. In Proceedings of the 27th International Symposium on Computer and Information Sciences. 437--445.Google ScholarGoogle Scholar
  44. Xiaolong Wang, Furu Wei, Xiaohua Liu, Ming Zhou, and Ming Zhang. 2011. Topic sentiment analysis in Twitter: A graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 1031--1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jeonghee Yi, Tetsuya Nasukawa, Razvan Bunescu, and Wayne Niblack. 2003. Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. In Proceedings of the 3rd IEEE International Conference on Data Mining. 427--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Meiyappan Yuvarani, N. Ch. Sriman Narayana Iyengar, and Arputharaj Kannan. 2006. LSCrawler: A framework for an enhanced focused web crawler based on link semantics. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 794--800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Changli Zhang, Daniel Zeng, Jiexun Li, Fei-Yue Wang, and Wanli Zuo. 2009. Sentiment analysis of Chinese documents: From sentence to document level. J. Amer. Soc. Inf. Sci. Technol. 60, 12, 2474--2487. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wei Zhang, Clement Yu, and Weiyi Meng. 2007. Opinion retrieval from blogs. In Proceedings of the 16th ACM International Conference on Information and Knowledge Management. 831--840. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sentiment-Focused Web Crawling

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on the Web
      ACM Transactions on the Web  Volume 8, Issue 4
      October 2014
      178 pages
      ISSN:1559-1131
      EISSN:1559-114X
      DOI:10.1145/2686863
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 November 2014
      • Accepted: 1 July 2014
      • Revised: 1 May 2014
      • Received: 1 July 2013
      Published in tweb Volume 8, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!