skip to main content
research-article

Measuring, Characterizing, and Detecting Facebook Like Farms

Published:20 September 2017Publication History
Skip Abstract Section

Abstract

Online social networks offer convenient ways to reach out to large audiences. In particular, Facebook pages are increasingly used by businesses, brands, and organizations to connect with multitudes of users worldwide. As the number of likes of a page has become a de-facto measure of its popularity and profitability, an underground market of services artificially inflating page likes (“like farms”) has emerged alongside Facebook’s official targeted advertising platform. Nonetheless, besides a few media reports, there is little work that systematically analyzes Facebook pages’ promotion methods. Aiming to fill this gap, we present a honeypot-based comparative measurement study of page likes garnered via Facebook advertising and from popular like farms. First, we analyze likes based on demographic, temporal, and social characteristics and find that some farms seem to be operated by bots and do not really try to hide the nature of their operations, while others follow a stealthier approach, mimicking regular users’ behavior. Next, we look at fraud detection algorithms currently deployed by Facebook and show that they do not work well to detect stealthy farms that spread likes over longer timespans and like popular pages to mimic regular users. To overcome their limitations, we investigate the feasibility of timeline-based detection of like farm accounts, focusing on characterizing content generated by Facebook accounts on their timelines as an indicator of genuine versus fake social activity. We analyze a wide range of features extracted from timeline posts, which we group into two main categories: lexical and non-lexical. We find that like farm accounts tend to re-share content more often, use fewer words and poorer vocabulary, and more often generate duplicate comments and likes compared to normal users. Using relevant lexical and non-lexical features, we build a classifier to detect like farms accounts that achieves a precision higher than 99% and a 93% recall.

References

  1. Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 1 (Jan. 2008), 117--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Charles Arthur. 2013. How Low-Paid Workers at ‘Click Farms’ Create Appearance of Online Popularity. Retrieved from http://gu.com/p/3hmn3/stw.Google ScholarGoogle Scholar
  3. Prudhvi Ratna Badri Satya, Kyumin Lee, Dongwon Lee, Thanh Tran, and Jason (Jiasheng) Zhang. 2016. Uncovering fake likers in online social networks. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM’16). ACM, New York, NY, 2365--2370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. CopyCatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). ACM, New York, NY, 119--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yazan Boshmaf, Dionysios Logothetis, Georgos Siganos, Jorge Lería, José Lorenzo, Matei Ripeanu, and Konstantin Beznosov. 2015. Integro: Leveraging victim prediction for robust fake account detection in OSNs. In Proceedings of the 22nd Annual Network and Distributed System Security Symposium (NDSS’15).Google ScholarGoogle ScholarCross RefCross Ref
  6. Leo Breiman. 2001. Random forests. Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC Press.Google ScholarGoogle Scholar
  8. Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013. API design for machine learning software: Experiences from the scikit-learn project. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Workshop (ECML/PKDD LML’13).Google ScholarGoogle Scholar
  9. Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding the detection of fake accounts in large scale social online services. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’12). USENIX Association, Berkeley, CA, 15--15. http://dl.acm.org/citation.cfm?id=2228298.2228319 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Qiang Cao, Xiaowei Yang, Jieqi Yu, and Christopher Palow. 2014. Uncovering large groups of active malicious accounts in online social networks. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS’14). ACM, New York, NY, 477--488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Brian Carter. 2013. The Like Economy: How Businesses Make Money with Facebook. QUE Publishing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Rory Cellan-Jones. 2012. Who ‘likes’ my Virtual Bagels? (July 2012). Retrieved from http://www.bbc.co.uk/news/technology-18819338.Google ScholarGoogle Scholar
  13. A. Chaabane, G. Acs, and Mohamed Ali Kaafar. 2012. You are what you like! Information leakage through users’ interests. In Proceedings of the 19th Annual Network and Distributed System Security Symposium (NDSS’12). https://hal.inria.fr/hal-00748162.Google ScholarGoogle Scholar
  14. Terence Chen, Abdelberi Chaabane, Pierre Ugo Tournoux, Mohamed-Ali Kaafar, and Roksana Boreli. 2013. How much is too much? Leveraging ads audience estimation to evaluate public profile uniqueness. In Proceedings of the 13th International Symposium on Privacy Enhancing Technologies (PETS’13). Springer, Berlin, 225--244.Google ScholarGoogle ScholarCross RefCross Ref
  15. George Danezis and Prateek Mittal. 2009. SybilInfer: Detecting sybil nodes using social networks. In Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS’09). The Internet Society.Google ScholarGoogle Scholar
  16. Vacha Dave, Saikat Guha, and Yin Zhang. 2012. Measuring and fingerprinting click-spam in ad networks. SIGCOMM Comput. Commun. Rev. 42, 4 (Aug. 2012), 175--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Emiliano De Cristofaro, Arik Friedman, Guillaume Jourjon, Mohamed Ali Kaafar, and M. Zubair Shafiq. 2014. Paying for likes?: Understanding facebook like fraud using honeypots. In Proceedings of the 2014 Conference on Internet Measurement Conference (IMC’14). ACM, New York, NY, 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ali Farghaly and Khaled Shaalan. 2009. Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Inf. Process. 8, 4 (2009), 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Rudolph Flesch. 1948. A new readability yardstick. Journal of Applied Psychology 32, 3 (1948), 221--233.Google ScholarGoogle ScholarCross RefCross Ref
  20. Yoav Freund and Robert E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 1 (Aug. 1997), 119--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hongyu Gao, Jun Hu, Christo Wilson, Zhichun Li, Yan Chen, and Ben Y. Zhao. 2010. Detecting and characterizing social spam campaigns. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (IMC’10). ACM, New York, NY, 35--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alexander Hogenboom, Flavius Frasincar, Franciska de Jong, and Uzay Kaymak. 2015. Using rhetorical structure in sentiment analysis. Commun. ACM 58, 7 (June 2015), 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014a. CatchSync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 941--950. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014b. Inferring strange behavior from connectivity pattern in social networks. In Advances in Knowledge Discovery and Data Mining: 18th Pacific-Asia Conference (PAKDD’14). Springer International, 126--138.Google ScholarGoogle ScholarCross RefCross Ref
  25. Yuval Kluger, Ronen Basri, Joseph T. Chang, and Mark Gerstein. 2003. Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Res. 13, 4 (2003), 703--716.Google ScholarGoogle ScholarCross RefCross Ref
  26. Ron Kohavi and Dan Sommerfield. 1995. Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining (KDD’95). 192--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hyukmin Kwon, Aziz Mohaisen, Jiyoung Woo, Yongdae Kim, Eunjo Lee, and Huy Kang Kim. 2017. Crime scene reconstruction: Online gold farming network analysis. IEEE Trans. Inf. Forens. Secur. 12, 3 (2017), 544--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Justin Lafferty. 2013. How Many Pages Does The Average Facebook User Like? Retrieved from http://allfacebook.com/how-many-pages-does-the-average-facebook-user-like_b115098.Google ScholarGoogle Scholar
  29. Eunjo Lee, Jiyoung Woo, Hyoungshick Kim, Aziz Mohaisen, and Huy Kang Kim. 2016. You are a game bot!: Uncovering game bots in MMORPGs via self-similarity in the wild. In NDSS.Google ScholarGoogle Scholar
  30. Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers: Social honeypots + machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, USA, 435--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Richard Metzger. 2012. Facebook: I Want My Friends Back. Retrieved from http://dangerousminds.net/comments/facebook_i_want_my_friends_back.Google ScholarGoogle Scholar
  32. Derek Muller. 2014. Facebook Fraud. Retrieved from https://www.youtube.com/watch?v=oVfHeWTKjag.Google ScholarGoogle Scholar
  33. Klaus-Robert Müller, Sebastian Mika, Gunnar Rätsch, Koji Tsuda, and Bernhard Schölkopf. 2001. An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks 12, 2 (2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Atif Nazir, Saqib Raza, Chen-Nee Chuah, and Burkhard Schipper. 2010. Ghostbusting facebook: Detecting and characterizing phantom profiles in online social gaming applications. In Proceedings of the 3rd Wonference on Online Social Networks (WOSN’10). USENIX Association, Berkeley, CA, 1--1. http://dl.acm.org/citation.cfm?id=1863190.1863191 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw--Hill, Inc., New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jaron Schneider. 2014. Likes or lies? How perfectly honest businesses can be overrun by Facebook spammers. Retrieved from http://thenextweb.com/facebook/2014/01/23/likes-lies-perfectly-honest-businesses-can-overrun-facebook-spammers/.Google ScholarGoogle Scholar
  37. Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neur. Comput. 13, 7 (July 2001), 29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. J. Senter and Edgar A. Smith. 1967. Automated Readability Index. AMRL-TR-66-22. Retrieved from http://www.dtic.mil/dtic/tr/fulltext/u2/667273.pdf.Google ScholarGoogle Scholar
  39. Max Silberztein. 1989. The lexical analysis of French. In Proceedings of the LITP Spring School on Theoretical Computer Science: Electronic Dictionaries and Automata in Computational Linguistics, Maurice Gross and Dominique Perrin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 93--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Max Silberztein. 1997. The lexical analysis of natural languages. In Finite-State Language Processing, Emmanuel Roche and Yves Schabes (Eds.). MIT Press, Chapter 6, 175--203.Google ScholarGoogle Scholar
  41. Benjamin Snyder. 2015. Facebook added 10 million small business pages in a year. (April 2015). Retrieved from http://fortune.com/2015/04/30/facebook-small-business.Google ScholarGoogle Scholar
  42. Jonghyuk Song, Sangho Lee, and Jong Kim. 2015. CrowdTarget: Target-based detection of crowdturfing in online social networks. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS’15). ACM, New York, NY, 793--804. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Gianluca Stringhini, Manuel Egele, Christopher Kruegel, and Giovanni Vigna. 2012. Poultry markets: On the underground economy of twitter followers. In Proceedings of the 2012 ACM Workshop on Online Social Networks (WOSN’12). ACM, New York, NY, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference (ACSAC’10). ACM, New York, NY, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Gianluca Stringhini, Pierre Mourlanne, Grgoire Jacob, Manuel Egele, Christopher Kruegel, and Giovanni Vigna. 2015. EVILCOHORT: Detecting communities of malicious accounts on online services. In USENIX Security. USENIX Association, 563--578. http://dblp.uni-trier.de/db/conf/uss/uss2015.html#StringhiniMJEKV15 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Gianluca Stringhini, Gang Wang, Manuel Egele, Christopher Kruegel, Giovanni Vigna, Haitao Zheng, and Ben Y. Zhao. 2013. Follow the green: Growth and dynamics in twitter follower markets. In Proceedings of the 2013 Conference on Internet Measurement Conference (IMC’13). ACM, New York, NY, 163--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended accounts in retrospect: An analysis of twitter spam. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference (IMC’11). ACM, New York, NY, 243--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Kurt Thomas, Damon McCoy, Chris Grier, Alek Kolcz, and Vern Paxson. 2013. Trafficking fraudulent accounts: The role of the underground market in twitter spam and abuse. In Proceedings of the 22nd USENIX Conference on Security (SEC’13). USENIX Association, Berkeley, CA, 195--210. http://dl.acm.org/citation.cfm?id=2534766.2534784 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. U. S. Tiwary and Tanveer Siddiqui. 2008. Natural Language Processing and Information Retrieval. Oxford University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Bimal Viswanath, M. Ahmad Bashir, Mark Crovella, Saikat Guha, Krishna P. Gummadi, Balachander Krishnamurthy, and Alan Mislove. 2014. Towards detecting anomalous user behavior in online social networks. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security’14). USENIX Association, San Diego, CA, 223--238. https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/viswanath. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y. Zhao. 2014. Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security’14). USENIX Association, San Diego, CA, 239--254. https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/wang. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. 2012. Analyzing spammers’ social networks for fun and profit: A case study of cyber criminal ecosystem on twitter. In Proceedings of the 21st International Conference on World Wide Web (WWW’12). ACM, New York, NY, 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2011. Uncovering social network sybils in the wild. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference (IMC’11). ACM, New York, NY, 259--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2006. SybilGuard: Defending against sybil attacks via social networks. In Proceedings of the 2006 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’06). ACM, New York, NY, 267--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Harry Zhang. 2004. The optimality of naive bayes. In Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS’04), Valerie Barr and Zdravko Markov (Eds.). AAAI Press, Miami Beach, Florida, USA.Google ScholarGoogle Scholar
  56. Hua-Ping Zhang, Qun Liu, Xue-Qi Cheng, Hao Zhang, and Hong-Kui Yu. 2003a. Chinese lexical analysis using hierarchical hidden markov model. In Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, Volume 17 (SIGHAN’03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Hua-Ping Zhang, Hong-Kui Yu, De-Yi Xiong, and Qun Liu. 2003b. HHMM-based chinese lexical analyzer ICTCLAS. In Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, Volume 17 (SIGHAN’03). Association for Computational Linguistics, Stroudsburg, PA, 184--187. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Measuring, Characterizing, and Detecting Facebook Like Farms

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Privacy and Security
              ACM Transactions on Privacy and Security  Volume 20, Issue 4
              November 2017
              150 pages
              ISSN:2471-2566
              EISSN:2471-2574
              DOI:10.1145/3143524
              Issue’s Table of Contents

              Copyright © 2017 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 20 September 2017
              • Accepted: 1 June 2017
              • Revised: 1 March 2017
              • Received: 1 July 2016
              Published in tops Volume 20, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!