Abstract
Semi-automatic anti-spam algorithms propagate either trust through links from a good seed set (e.g., TrustRank) or distrust through inverse links from a bad seed set (e.g., Anti-TrustRank) to the entire Web. These kinds of algorithms have shown their powers in combating link-based Web spam since they integrate both human judgement and machine intelligence. Nevertheless, there is still much space for improvement. One issue of most existing trust/distust propagation algorithms is that only trust or distrust is propagated and only a good seed set or a bad seed set is used. According to Wu et al. [2006a], a combined usage of both trust and distrust propagation can lead to better results, and an effective framework is needed to realize this insight. Another more serious issue of existing algorithms is that trust or distrust is propagated in nondifferential ways, that is, a page propagates its trust or distrust score uniformly to its neighbors, without considering whether each neighbor should be trusted or distrusted. Such kinds of blind propagating schemes are inconsistent with the original intention of trust/distrust propagation. However, it seems impossible to implement differential propagation if only trust or distrust is propagated. In this article, we take the view that each Web page has both a trustworthy side and an untrustworthy side, and we thusly assign two scores to each Web page: T-Rank, scoring the trustworthiness of the page, and D-Rank, scoring the untrustworthiness of the page. We then propose an integrated framework that propagates both trust and distrust. In the framework, the propagation of T-Rank/D-Rank is penalized by the target's current D-Rank/T-Rank. In other words, the propagation of T-Rank/D-Rank is decided by the target's current (generalized) probability of being trustworthy/untrustworthy; thus a page propagates more trust/distrust to a trustworthy/untrustworthy neighbor than to an untrustworthy/trustworthy neighbor. In this way, propagating both trust and distrust with target differentiation is implemented. We use T-Rank scores to realize spam demotion and D-Rank scores to accomplish spam detection. The proposed Trust-DistrustRank (TDR) algorithm regresses to TrustRank and Anti-TrustRank when the penalty factor is set to 1 and 0, respectively. Thus TDR could be seen as a combinatorial generalization of both TrustRank and Anti-TrustRank. TDR not only makes full use of both trust and distrust propagation, but also overcomes the disadvantages of both TrustRank and Anti-TrustRank. Experimental results on benchmark datasets show that TDR outperforms other semi-automatic anti-spam algorithms for both spam demotion and spam detection tasks under various criteria.
- L. Becchetti, C. Castillo, D. Donato, R. Baeza-Yates, and S. Leonardi. 2008. Link analysis for web spam detection. ACM Trans. Web 2, 1, 2. Google Scholar
Digital Library
- L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. 2006. Using rank propagation and probabilistic counting for link-based spam detection. In Proceedings of the Workshop on Web Mining and Web Usage Analysis (WebKDD'06). ACM Press, New York.Google Scholar
- A. A. Benczur, K. Csalogany, T. Sarlos, and M. Uher. 2005. Spamrank -- Fully automatic link spam detection. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'05). 25--38.Google Scholar
- P. Boldi. 2005. Totalrank: Ranking without damping. In Proceedings of the 14th International Conference on World Wide Web Special Interest Tracks and Posters (WWW'05). ACM Press, New York, 898--899. Google Scholar
Digital Library
- S. Brin and L. Page. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117. Google Scholar
Digital Library
- J. Callan, M. Hoy, C. Yoo, and L. Zhao. 2009. The clueweb09 data set. http://boston.lti.cs.cmu.edu/Data/clueweb09/.Google Scholar
- C. Castillo and B. D. Davison. 2011. Adversarial web search. Foundat. Trends Inf. Retr. 4, 5, 377--486. Google Scholar
Digital Library
- J. Caverlee and L. Liu. 2007. Countering web spam with credibility-based link analysis. In Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing (PODC'07). ACM Press, New York, 157--166. Google Scholar
Digital Library
- Q. Chen, S.-N. Yu, and S. Cheng. 2008. Link variable trustrank for fighting web spam. In Proceedings of the International Conference on Computer Science and Software Engineering (CSSE'08). Vol. 4, IEEE Computer Society, 1004--1007. Google Scholar
Digital Library
- G. Cormack, M. Smucker, and C. Clarke. 2011. Efficient and effective spam filtering and re-ranking for large web datasets. Inf. Retr. 14, 1--25. Google Scholar
Digital Library
- B. D. Davison. 2000. Recognizing nepotistic links on the web. In Proceedings of the Workshop on Artificial Intelligence for Web Search (AAAI'00). 23--28.Google Scholar
- A. Deif. 1982. Advanced Matrix Theory for Scientists and Engineers. Routledge.Google Scholar
- Z. Gyongyi and H. Garcia-Molina. 2005a. Link spam alliances. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). VLDB Endowment, 517--528. Google Scholar
Digital Library
- Z. Gyongyi and H. Garcia-Molina. 2005b. Web spam taxonomy. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'05). 39--47.Google Scholar
- Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. 2004. Combating web spam with trustrank. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB'04). Vol. 30, VLDB Endowment, 576--587. Google Scholar
Digital Library
- M. R. Henzinger, R. Motwani, and C. Silverstein. 2002. Challenges in web search engines. SIGIR Forum 36, 2, 11--22. Google Scholar
Digital Library
- P. Heymann, G. Koutrika, and H. Garcia-Molina. 2007. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Comput. 11, 6, 36--45. Google Scholar
Digital Library
- Q. Jiang, L. Zhang, Y. Zhu, and Y. Zhang. 2008. Larger is better: Seed selection in link-based anti-spamming algorithms. In Proceeding of the 17th International Conference on World Wide Web (WWW'08). ACM Press, New York, 1065--1066. Google Scholar
Digital Library
- J. M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 604--632. Google Scholar
Digital Library
- V. Krishnan and R. Raj. 2006. Web spam detection with anti-trust rank. In Proceedings of the 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'06). ACM Press, New York, 37--40.Google Scholar
- R. Lempel and S. Moran. 2001. Salsa: The stochastic approach for link-structure analysis. ACM Trans. Inf. Syst. 19, 2, 131--160. Google Scholar
Digital Library
- X. Liu, Y. Wang, S. Zhu, and H. Lin. 2013. Combating web spam through trust-distrust propogation with confidence. Pattern Recogn. Lett. 34, 13, 1462--1469. Google Scholar
Digital Library
- P. Metaxas. 2009. Using propagation of distrust to find untrustworthy web neighborhoods. In Proceedings of the 4th International Conference on Internet and Web Applications and Services (ICIW'09). IEEE Computer Society, 516--521. Google Scholar
Digital Library
- L. Nie, B. Wu, and B. Davison. 2007. Winnowing wheat from the chaff: Propagating trust to sift spam from the web. In Proceedings of the 30th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'07). Vol. 23, 869--870. Google Scholar
Digital Library
- C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. 1999. Analysis of a very large web search engine query log. SIGIR Forum 33, 6--12. Google Scholar
Digital Library
- N. Spirin and J. Han. 2012. Survey on web spam detection: Principles and algorithms. ACM SIGKDD Explor. Newslett. 13, 2, 50--64. Google Scholar
Digital Library
- I. Witten, E. Frank, L. Trigg, M. Hall, G. Holmes, and S. Cunningham. 1999. Weka: Practical machine learning tools and techniques with Java implementations. http://researchcommons.waikato.ac.nz/bitstream/handle/10289/1040/uow-cs-wp-1999-11.pdf?sequence=1&isAllowed=y.Google Scholar
- B. Wu and K. Chellapilla. 2007. Extracting link spam using biased random walks from spam seed sets. In Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'07). ACM Press, New York, 37--44. Google Scholar
Digital Library
- B. Wu, V. Goel, and B. D. Davison. 2006a. Propagating trust and distrust to demote web spam. In Proceedings of the Workshop on Models of Trust for the Web (MTW'06).Google Scholar
- B. Wu, V. Goel, and B. D. Davison. 2006b. Topical trustrank: Using topicality to combat web spam. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). ACM Press, New York, 63--72. Google Scholar
Digital Library
- Yahoo!. 2007. Yahoo! research: Web spam collections. http://barcelona.research.yahoo.net/webspam/datasets/Crawled by the Laboratory of Web Algorithmics, University of Milan, http://law.dsi.unimi.it/.Google Scholar
- L. Zhang, Y. Zhang, Y. Zhang, and X. Li. 2006. Exploring both content and link quality for anti-spamming. In Proceedings of the 6th IEEE International Conference on Computer and Information Technology (CIT'06). IEEE Computer Society, 37. Google Scholar
Digital Library
- X. Zhang, B. Han, and W. Liang. 2009b. Automatic seed set expansion for trust propagation based anti-spamming algorithms. In Proceeding of the 11th International Workshop on Web Information and Data Management (WIDM'09). ACM Press, New York, 31--38. Google Scholar
Digital Library
- X. Zhang, Y. Wang, N. Mou, and W. Liang. 2011. Propagating both trust and distrust with target differentiation for combating web spam. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI'11).Google Scholar
- Y. Zhang, Q. Jiang, L. Zhang, and Y. Zhu. 2009b. Exploiting bidirectional links: Making spamming detection easier. In Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM'09). ACM Press, New York, 1839--1842. Google Scholar
Digital Library
- L. Zhao, Q. Jiang, and Y. Zhang. 2008. From good to bad ones: Making spam detection easier. In Proceedings of the 8th IEEE International Conference on Computer and Information Technology Workshops (CIT'08). IEEE Computer Society, 129--134. Google Scholar
Digital Library
- B. Zhou and J. Pei. 2009. Link spam target detection using page farms. ACM Trans. Knowl. Discov. Data 3, 13:1--13:38. Google Scholar
Digital Library
Index Terms
Propagating Both Trust and Distrust with Target Differentiation for Combating Link-Based Web Spam
Recommendations
Propagation of trust and distrust
WWW '04: Proceedings of the 13th international conference on World Wide WebA (directed) network of people connected by ratings or trust scores, and a model for propagating those trust scores, is a fundamental building block in many of today's most successful e-commerce and recommendation systems. We develop a framework of ...
Combating Web spam through trust-distrust propagation with confidence
Semi-automatic anti-spam algorithms propagate either trust through links from a set of good seed pages or distrust through inverse-links from a set of bad seed pages to the entire Web. It has been mentioned that a combined usage of both trust and ...
Propagating both trust and distrust with target differentiation for combating web spam
AAAI'11: Proceedings of the Twenty-Fifth AAAI Conference on Artificial IntelligencePropagating trust/distrust from a set of seed (good/bad) pages to the entire Web has been widely used to combat Web spam. It has been mentioned that a combined use of good and bad seeds can lead to better results. However, little work has been known to ...






Comments