skip to main content
10.1145/3394486.3403302acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open Access

Grale: Designing Networks for Graph Learning

Published:20 August 2020Publication History

ABSTRACT

How can we find the right graph for semi-supervised learning? In real world applications, the choice of which edges to use for computation is the first step in any graph learning process. Interestingly, there are often many types of similarity available to choose as the edges between nodes, and the choice of edges can drastically affect the performance of downstream semi-supervised learning systems. However, despite the importance of graph design, most of the literature assumes that the graph is static.

In this work, we present Grale, a scalable method we have developed to address the problem of graph design for graphs with billions of nodes. Grale operates by fusing together different measures of (potentially weak) similarity to create a graph which exhibits high task-specific homophily between its nodes. Grale is designed for running on large datasets. We have deployed Grale in more than 20 different industrial settings at Google, including datasets which have tens of billions of nodes, and hundreds of trillions of potential edges to score. By employing locality sensitive hashing techniques, we greatly reduce the number of pairs that need to be scored, allowing us to learn a task specific model and build the associated nearest neighbor graph for such datasets in hours, rather than the days or even weeks that might be required otherwise.

We illustrate this through a case study where we examine the application of Grale to an abuse classification problem on YouTube with hundreds of million of items. In this application, we find that Grale detects a large number of malicious actors on top of hard-coded rules and content classifiers, increasing the total recall by 89% over those approaches alone.

Skip Supplemental Material Section

Supplemental Material

3394486.3403302.mp4

Presentation Video.

References

  1. S. Abu-El-Haija et al. 2017. Learning edge representations via low-rank asymmetric projections. CIKM.Google ScholarGoogle Scholar
  2. M. Al Hasan et al. 2006. Link prediction using supervised learning. SDM Workshops.Google ScholarGoogle Scholar
  3. A. Blum et al. 2001. Learning from labeled and unlabeled data using graph mincuts. ICML.Google ScholarGoogle Scholar
  4. J. Bromley et al. 1994. Signature verification using a "siamese" time delay neural network. NIPS.Google ScholarGoogle Scholar
  5. I. Chami et al. 2020. Machine learning on graphs: a model and comprehensive taxonomy. arXiv preprint arXiv:2005.03675.Google ScholarGoogle Scholar
  6. H. Chen et al. 2018. A tutorial on network embeddings. arXiv preprint arXiv:1808.02590.Google ScholarGoogle Scholar
  7. H. Chen et al. 2018. Enhanced network embeddings via exploiting edge labels. CIKM.Google ScholarGoogle Scholar
  8. H. Chen et al. 2005. Link prediction approach to collaborative filtering. JCDL.Google ScholarGoogle Scholar
  9. P. Cui et al. 2018. A survey on network embedding. TKDE.Google ScholarGoogle Scholar
  10. C. A. R. de Sousa et al. 2013. Influence of graph construction on semi-supervised learning. ECML/PKDD.Google ScholarGoogle Scholar
  11. C. Kanich et al. 2011. Show me the money: characterizing spam-advertised revenue. SEC.Google ScholarGoogle Scholar
  12. M. Karasuyama et al. 2017. Adaptive edge weighting for graph-based learning algorithms. Mach. Learn.Google ScholarGoogle Scholar
  13. D. P. Kingma et al. 2014. Semi-supervised learning with deep generative models. NIPS.Google ScholarGoogle Scholar
  14. G. Koch. 2015. Siamese neural networks for one-shot image recognition. ICML Workshops.Google ScholarGoogle Scholar
  15. K. Levchenko et al. 2011. Click trajectories: end-to-end analysis of the spam value chain. S&P.Google ScholarGoogle Scholar
  16. L. v. d. Maaten et al. 2008. Visualizing data using t-sne. JMLR.Google ScholarGoogle Scholar
  17. D. McCoy et al. 2012. Pharmaleaks: understanding the business of online pharmaceutical affiliate programs. SEC.Google ScholarGoogle Scholar
  18. E. Müller et al. 2009. Evaluating clustering in subspace projections of high dimensional data. VLDB.Google ScholarGoogle Scholar
  19. A. Murua et al. 2008. On potts model clustering, kernel k-means and density estimation. Journal of Computational and Graphical Statistics.Google ScholarGoogle ScholarCross RefCross Ref
  20. B. Perozzi et al. 2014. Deepwalk: online learning of social representations. KDD.Google ScholarGoogle Scholar
  21. B. Perozzi et al. 2014. Focused clustering and outlier detection in large attributed graphs. KDD.Google ScholarGoogle Scholar
  22. B. Perozzi et al. 2016. When recommendation goes wrong: anomalous link discovery in recommendation networks. KDD.Google ScholarGoogle Scholar
  23. N. Ponomareva et al. 2017. Compact multi-class boosted trees. Big Data.Google ScholarGoogle Scholar
  24. N. Ponomareva et al. 2017. Tf boosted trees: a scalable tensorflow based framework for gradient boosting. ECML/PKDD. Y. Altun et al., editors.Google ScholarGoogle Scholar
  25. S. Ravi et al. 2016. Large scale distributed semi-supervised learning using streaming approximation. Artificial Intelligence and Statistics, 519--528.Google ScholarGoogle Scholar
  26. G. T. Report. [n. d.] https://transparencyreport.google.com/youtube-policy/ removals. ().Google ScholarGoogle Scholar
  27. T. Salimans et al. 2016. Improved techniques for training gans. NIPS.Google ScholarGoogle Scholar
  28. D. Samosseiko. 2009. The partnerka-what is it, and why should you care. Virus Bulletin Conference.Google ScholarGoogle Scholar
  29. X. Wu et al. 2018. A quest for structure: jointly learning the graph structure and semi-supervised classification. CIKM.Google ScholarGoogle Scholar
  30. Z. Yang et al. 2016. Revisiting semi-supervised learning with graph embeddings. ICML.Google ScholarGoogle Scholar
  31. YouTube. [n. d.] https://www.youtube.com/intl/en-GB/about/press/. ().Google ScholarGoogle Scholar
  32. Y.-M. Zhang et al. 2013. Fast knn graph construction with locality sensitive hashing. ECML/PKKD.Google ScholarGoogle Scholar
  33. D. Zhou et al. 2003. Learning with local and global consistency. NIPS.Google ScholarGoogle Scholar
  34. X. Zhu et al. 2003. Semi-supervised learning using gaussian fields and harmonic functions. ICML.Google ScholarGoogle Scholar

Index Terms

  1. Grale: Designing Networks for Graph Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader