skip to main content
10.1145/1244408.1244413acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesiea-aeiConference Proceedingsconference-collections
Article

Transductive link spam detection

Published:08 May 2007Publication History

ABSTRACT

Web spam can significantly deteriorate the quality of search engines. Early web spamming techniques mainly manipulate page content. Since linkage information is widely used in web search, link-based spamming has also developed. So far, many techniques have been proposed to detect link spam. Those approaches are basically built on link-based web ranking methods.

In contrast, we cast the link spam detection problem into a machine learning problem of classification on directed graphs. We develop discrete analysis on directed graphs, and construct a discrete analogue of classical regularization theory via discrete analysis. A classification algorithm for directed graphs is then derived from the discrete regularization. We have applied the approach to real-world link spam detection problems, and encouraging results have been obtained.

References

  1. A. Ando and T. Zhang. Learning on graph with Laplacian regularization. In Advances in Neural Information Processing Systems 19, Cambridge, MA, 2007. MIT Press.Google ScholarGoogle Scholar
  2. R. Baeza-Yates, P. Boldi, and C. Castillo. Generalizing PageRank: Damping functions for link-based ranking algorithms. In Proc. 29th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval, pages 308--315, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Belkin, I. Matveeva, and P. Niyogi. Regression and regularization on large graphs. In Proc. 17th Annual Conference on Learning Theory, 2004.Google ScholarGoogle Scholar
  4. S. Brin and L. Page. The anatomy of a large scale hypertextual web search engine. In Proc. 7th International World Wide Web Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the Web. In Proc. 9th International World Wide Web Conference, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Carlos, D. Debora, G. Aristides, M. Vanessa, and S. Fabrizio. Know your neighbors: Web spam detection using the web topology. Technical report, 2006.Google ScholarGoogle Scholar
  7. C. Castillo, D. Donato, L. Becchetti, P. Boldi, M. Santini, and S. Vigna. A reference collection for web spam. SIGIR Forum, 40(2), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Chung. Laplacian and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9:1--19, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  9. F. Chung, A. Grigoryan, and S.-T. Yau. Higher eigenvalues and isoperimetric inequalities on Riemannian manifolds and graphs. Communications on Analysis and Geometry, 8:969--1026, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  10. D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129--145, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391--407, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  12. R, El-Yaniv and D. Pechyony. Stable transductive learning. In Proc. 19th Annual Conference on Computational Learning Theory, pages 35--49, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proc. 30th International Conference on Very Large Data Bases, pages 576--587, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. H. Haveliwala. Topic-sensitive pagerank. In Proc. 11th International World Wide Web Conference, pages 517--526, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Hein, J. Audibert, and U. von Luxburg. From graphs to manifolds - weak and strong pointwise consistency of graph Laplacians. In Proc. 18th Annual Conference on Learning Theory, pages 470--485, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Henzinger. Hyperlink analysis for the web. IEEE Internet Computing, 5(1):45--50, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Joachims. Transductive learning via spectral graph partitioning. In Proc. 20th I International Conference on Machine Learning, 2003.Google ScholarGoogle Scholar
  18. J. Jost. Riemannian Geometry and Geometric Analysis. Springer-Verlag, Berlin-Heidelberg, third edition, 2002.Google ScholarGoogle Scholar
  19. J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. The web as a graph: Measurements, models, and methods. In Proc. 5th International Conference on Computing and Combinatorics, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proc. 15th International World Wide Web Conference, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Raj and V. Krishnan. Web spam detection with anti-trust rank. In Proc. 2nd International Workshop on Adversarial Information Retrieval on the Web, pages 37--40, 2006.Google ScholarGoogle Scholar
  23. B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google ScholarGoogle Scholar
  24. T. Simon and D. Koller. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2:45--66, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Tarjan. Depth first search and linear graph algorithms. SIAM Journal on Computing, 1:146--160, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  26. A. Tikhonov and V. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, Washington, DC, 1977.Google ScholarGoogle Scholar
  27. V. Vapnik. Statistical Learning Theory. Wiley, NY, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Wahba. Spline Models for Observational Data. Number 59 in CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.Google ScholarGoogle Scholar
  29. B. Wu and B. D. Davison. Identifying link farm spam pages. In WWW (Special interest tracks and posters), pages 820--829, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf. Learning with local and global consistency. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.Google ScholarGoogle Scholar
  31. D. Zhou, J. Huang, and B. Schölkopf. Learning from labeled and unlabeled data on a directed graph. In Proc. 22th International Conference on Machine Learning, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In Proc. 20th International Conference on Machine Learning, 2003.Google ScholarGoogle Scholar

Index Terms

  1. Transductive link spam detection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!