Abstract
Online social networks offer convenient ways to reach out to large audiences. In particular, Facebook pages are increasingly used by businesses, brands, and organizations to connect with multitudes of users worldwide. As the number of likes of a page has become a de-facto measure of its popularity and profitability, an underground market of services artificially inflating page likes (“like farms”) has emerged alongside Facebook’s official targeted advertising platform. Nonetheless, besides a few media reports, there is little work that systematically analyzes Facebook pages’ promotion methods. Aiming to fill this gap, we present a honeypot-based comparative measurement study of page likes garnered via Facebook advertising and from popular like farms. First, we analyze likes based on demographic, temporal, and social characteristics and find that some farms seem to be operated by bots and do not really try to hide the nature of their operations, while others follow a stealthier approach, mimicking regular users’ behavior. Next, we look at fraud detection algorithms currently deployed by Facebook and show that they do not work well to detect stealthy farms that spread likes over longer timespans and like popular pages to mimic regular users. To overcome their limitations, we investigate the feasibility of timeline-based detection of like farm accounts, focusing on characterizing content generated by Facebook accounts on their timelines as an indicator of genuine versus fake social activity. We analyze a wide range of features extracted from timeline posts, which we group into two main categories: lexical and non-lexical. We find that like farm accounts tend to re-share content more often, use fewer words and poorer vocabulary, and more often generate duplicate comments and likes compared to normal users. Using relevant lexical and non-lexical features, we build a classifier to detect like farms accounts that achieves a precision higher than 99% and a 93% recall.
- Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 1 (Jan. 2008), 117--122. Google Scholar
Digital Library
- Charles Arthur. 2013. How Low-Paid Workers at ‘Click Farms’ Create Appearance of Online Popularity. Retrieved from http://gu.com/p/3hmn3/stw.Google Scholar
- Prudhvi Ratna Badri Satya, Kyumin Lee, Dongwon Lee, Thanh Tran, and Jason (Jiasheng) Zhang. 2016. Uncovering fake likers in online social networks. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM’16). ACM, New York, NY, 2365--2370. Google Scholar
Digital Library
- Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. CopyCatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). ACM, New York, NY, 119--130. Google Scholar
Digital Library
- Yazan Boshmaf, Dionysios Logothetis, Georgos Siganos, Jorge Lería, José Lorenzo, Matei Ripeanu, and Konstantin Beznosov. 2015. Integro: Leveraging victim prediction for robust fake account detection in OSNs. In Proceedings of the 22nd Annual Network and Distributed System Security Symposium (NDSS’15).Google Scholar
Cross Ref
- Leo Breiman. 2001. Random forests. Machine Learning. Google Scholar
Digital Library
- Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC Press.Google Scholar
- Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013. API design for machine learning software: Experiences from the scikit-learn project. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Workshop (ECML/PKDD LML’13).Google Scholar
- Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding the detection of fake accounts in large scale social online services. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’12). USENIX Association, Berkeley, CA, 15--15. http://dl.acm.org/citation.cfm?id=2228298.2228319 Google Scholar
Digital Library
- Qiang Cao, Xiaowei Yang, Jieqi Yu, and Christopher Palow. 2014. Uncovering large groups of active malicious accounts in online social networks. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS’14). ACM, New York, NY, 477--488. Google Scholar
Digital Library
- Brian Carter. 2013. The Like Economy: How Businesses Make Money with Facebook. QUE Publishing. Google Scholar
Digital Library
- Rory Cellan-Jones. 2012. Who ‘likes’ my Virtual Bagels? (July 2012). Retrieved from http://www.bbc.co.uk/news/technology-18819338.Google Scholar
- A. Chaabane, G. Acs, and Mohamed Ali Kaafar. 2012. You are what you like! Information leakage through users’ interests. In Proceedings of the 19th Annual Network and Distributed System Security Symposium (NDSS’12). https://hal.inria.fr/hal-00748162.Google Scholar
- Terence Chen, Abdelberi Chaabane, Pierre Ugo Tournoux, Mohamed-Ali Kaafar, and Roksana Boreli. 2013. How much is too much? Leveraging ads audience estimation to evaluate public profile uniqueness. In Proceedings of the 13th International Symposium on Privacy Enhancing Technologies (PETS’13). Springer, Berlin, 225--244.Google Scholar
Cross Ref
- George Danezis and Prateek Mittal. 2009. SybilInfer: Detecting sybil nodes using social networks. In Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS’09). The Internet Society.Google Scholar
- Vacha Dave, Saikat Guha, and Yin Zhang. 2012. Measuring and fingerprinting click-spam in ad networks. SIGCOMM Comput. Commun. Rev. 42, 4 (Aug. 2012), 175--186. Google Scholar
Digital Library
- Emiliano De Cristofaro, Arik Friedman, Guillaume Jourjon, Mohamed Ali Kaafar, and M. Zubair Shafiq. 2014. Paying for likes?: Understanding facebook like fraud using honeypots. In Proceedings of the 2014 Conference on Internet Measurement Conference (IMC’14). ACM, New York, NY, 129--136. Google Scholar
Digital Library
- Ali Farghaly and Khaled Shaalan. 2009. Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Inf. Process. 8, 4 (2009), 14. Google Scholar
Digital Library
- Rudolph Flesch. 1948. A new readability yardstick. Journal of Applied Psychology 32, 3 (1948), 221--233.Google Scholar
Cross Ref
- Yoav Freund and Robert E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 1 (Aug. 1997), 119--139. Google Scholar
Digital Library
- Hongyu Gao, Jun Hu, Christo Wilson, Zhichun Li, Yan Chen, and Ben Y. Zhao. 2010. Detecting and characterizing social spam campaigns. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (IMC’10). ACM, New York, NY, 35--47. Google Scholar
Digital Library
- Alexander Hogenboom, Flavius Frasincar, Franciska de Jong, and Uzay Kaymak. 2015. Using rhetorical structure in sentiment analysis. Commun. ACM 58, 7 (June 2015), 9. Google Scholar
Digital Library
- Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014a. CatchSync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 941--950. Google Scholar
Digital Library
- Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014b. Inferring strange behavior from connectivity pattern in social networks. In Advances in Knowledge Discovery and Data Mining: 18th Pacific-Asia Conference (PAKDD’14). Springer International, 126--138.Google Scholar
Cross Ref
- Yuval Kluger, Ronen Basri, Joseph T. Chang, and Mark Gerstein. 2003. Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Res. 13, 4 (2003), 703--716.Google Scholar
Cross Ref
- Ron Kohavi and Dan Sommerfield. 1995. Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining (KDD’95). 192--197. Google Scholar
Digital Library
- Hyukmin Kwon, Aziz Mohaisen, Jiyoung Woo, Yongdae Kim, Eunjo Lee, and Huy Kang Kim. 2017. Crime scene reconstruction: Online gold farming network analysis. IEEE Trans. Inf. Forens. Secur. 12, 3 (2017), 544--556. Google Scholar
Digital Library
- Justin Lafferty. 2013. How Many Pages Does The Average Facebook User Like? Retrieved from http://allfacebook.com/how-many-pages-does-the-average-facebook-user-like_b115098.Google Scholar
- Eunjo Lee, Jiyoung Woo, Hyoungshick Kim, Aziz Mohaisen, and Huy Kang Kim. 2016. You are a game bot!: Uncovering game bots in MMORPGs via self-similarity in the wild. In NDSS.Google Scholar
- Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers: Social honeypots + machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, USA, 435--442. Google Scholar
Digital Library
- Richard Metzger. 2012. Facebook: I Want My Friends Back. Retrieved from http://dangerousminds.net/comments/facebook_i_want_my_friends_back.Google Scholar
- Derek Muller. 2014. Facebook Fraud. Retrieved from https://www.youtube.com/watch?v=oVfHeWTKjag.Google Scholar
- Klaus-Robert Müller, Sebastian Mika, Gunnar Rätsch, Koji Tsuda, and Bernhard Schölkopf. 2001. An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks 12, 2 (2001). Google Scholar
Digital Library
- Atif Nazir, Saqib Raza, Chen-Nee Chuah, and Burkhard Schipper. 2010. Ghostbusting facebook: Detecting and characterizing phantom profiles in online social gaming applications. In Proceedings of the 3rd Wonference on Online Social Networks (WOSN’10). USENIX Association, Berkeley, CA, 1--1. http://dl.acm.org/citation.cfm?id=1863190.1863191 Google Scholar
Digital Library
- Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw--Hill, Inc., New York, NY. Google Scholar
Digital Library
- Jaron Schneider. 2014. Likes or lies? How perfectly honest businesses can be overrun by Facebook spammers. Retrieved from http://thenextweb.com/facebook/2014/01/23/likes-lies-perfectly-honest-businesses-can-overrun-facebook-spammers/.Google Scholar
- Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neur. Comput. 13, 7 (July 2001), 29. Google Scholar
Digital Library
- R. J. Senter and Edgar A. Smith. 1967. Automated Readability Index. AMRL-TR-66-22. Retrieved from http://www.dtic.mil/dtic/tr/fulltext/u2/667273.pdf.Google Scholar
- Max Silberztein. 1989. The lexical analysis of French. In Proceedings of the LITP Spring School on Theoretical Computer Science: Electronic Dictionaries and Automata in Computational Linguistics, Maurice Gross and Dominique Perrin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 93--110. Google Scholar
Digital Library
- Max Silberztein. 1997. The lexical analysis of natural languages. In Finite-State Language Processing, Emmanuel Roche and Yves Schabes (Eds.). MIT Press, Chapter 6, 175--203.Google Scholar
- Benjamin Snyder. 2015. Facebook added 10 million small business pages in a year. (April 2015). Retrieved from http://fortune.com/2015/04/30/facebook-small-business.Google Scholar
- Jonghyuk Song, Sangho Lee, and Jong Kim. 2015. CrowdTarget: Target-based detection of crowdturfing in online social networks. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS’15). ACM, New York, NY, 793--804. Google Scholar
Digital Library
- Gianluca Stringhini, Manuel Egele, Christopher Kruegel, and Giovanni Vigna. 2012. Poultry markets: On the underground economy of twitter followers. In Proceedings of the 2012 ACM Workshop on Online Social Networks (WOSN’12). ACM, New York, NY, 1--6. Google Scholar
Digital Library
- Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference (ACSAC’10). ACM, New York, NY, 1--9. Google Scholar
Digital Library
- Gianluca Stringhini, Pierre Mourlanne, Grgoire Jacob, Manuel Egele, Christopher Kruegel, and Giovanni Vigna. 2015. EVILCOHORT: Detecting communities of malicious accounts on online services. In USENIX Security. USENIX Association, 563--578. http://dblp.uni-trier.de/db/conf/uss/uss2015.html#StringhiniMJEKV15 Google Scholar
Digital Library
- Gianluca Stringhini, Gang Wang, Manuel Egele, Christopher Kruegel, Giovanni Vigna, Haitao Zheng, and Ben Y. Zhao. 2013. Follow the green: Growth and dynamics in twitter follower markets. In Proceedings of the 2013 Conference on Internet Measurement Conference (IMC’13). ACM, New York, NY, 163--176. Google Scholar
Digital Library
- Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended accounts in retrospect: An analysis of twitter spam. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference (IMC’11). ACM, New York, NY, 243--258. Google Scholar
Digital Library
- Kurt Thomas, Damon McCoy, Chris Grier, Alek Kolcz, and Vern Paxson. 2013. Trafficking fraudulent accounts: The role of the underground market in twitter spam and abuse. In Proceedings of the 22nd USENIX Conference on Security (SEC’13). USENIX Association, Berkeley, CA, 195--210. http://dl.acm.org/citation.cfm?id=2534766.2534784 Google Scholar
Digital Library
- U. S. Tiwary and Tanveer Siddiqui. 2008. Natural Language Processing and Information Retrieval. Oxford University Press. Google Scholar
Digital Library
- Bimal Viswanath, M. Ahmad Bashir, Mark Crovella, Saikat Guha, Krishna P. Gummadi, Balachander Krishnamurthy, and Alan Mislove. 2014. Towards detecting anomalous user behavior in online social networks. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security’14). USENIX Association, San Diego, CA, 223--238. https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/viswanath. Google Scholar
Digital Library
- Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y. Zhao. 2014. Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security’14). USENIX Association, San Diego, CA, 239--254. https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/wang. Google Scholar
Digital Library
- Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. 2012. Analyzing spammers’ social networks for fun and profit: A case study of cyber criminal ecosystem on twitter. In Proceedings of the 21st International Conference on World Wide Web (WWW’12). ACM, New York, NY, 71--80. Google Scholar
Digital Library
- Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2011. Uncovering social network sybils in the wild. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference (IMC’11). ACM, New York, NY, 259--268. Google Scholar
Digital Library
- Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2006. SybilGuard: Defending against sybil attacks via social networks. In Proceedings of the 2006 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’06). ACM, New York, NY, 267--278. Google Scholar
Digital Library
- Harry Zhang. 2004. The optimality of naive bayes. In Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS’04), Valerie Barr and Zdravko Markov (Eds.). AAAI Press, Miami Beach, Florida, USA.Google Scholar
- Hua-Ping Zhang, Qun Liu, Xue-Qi Cheng, Hao Zhang, and Hong-Kui Yu. 2003a. Chinese lexical analysis using hierarchical hidden markov model. In Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, Volume 17 (SIGHAN’03). Google Scholar
Digital Library
- Hua-Ping Zhang, Hong-Kui Yu, De-Yi Xiong, and Qun Liu. 2003b. HHMM-based chinese lexical analyzer ICTCLAS. In Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, Volume 17 (SIGHAN’03). Association for Computational Linguistics, Stroudsburg, PA, 184--187. Google Scholar
Digital Library
Index Terms
Measuring, Characterizing, and Detecting Facebook Like Farms
Recommendations
Characterizing social cascades in flickr
WOSN '08: Proceedings of the first workshop on Online social networksOnline social networking sites like MySpace and Flickr have become a popular way to share and disseminate content. Their massive popularity has led to the viral marketing of content, products, and political campaigns on the sites themselves. Despite the ...
Identifying the influential bloggers in a community
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data MiningBlogging becomes a popular way for a Web user to publish information on the Web. Bloggers write blog posts, share their likes and dislikes, voice their opinions, provide suggestions, report news, and form groups in Blogosphere. Bloggers form their ...
Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceOver the past couple of years, anecdotal evidence has emerged linking coordinated campaigns by state-sponsored actors with efforts to manipulate public opinion on the Web, often around major political events, through dedicated accounts, or “trolls.” ...






Comments