skip to main content
research-article

Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint

Authors Info & Claims
Published:22 February 2021Publication History
Skip Abstract Section

Abstract

Data summarization, i.e., selecting representative subsets of manageable size out of massive data, is often modeled as a submodular optimization problem. Although there exist extensive algorithms for submodular optimization, many of them incur large computational overheads and hence are not suitable for mining big data. In this work, we consider the fundamental problem of (non-monotone) submodular function maximization with a knapsack constraint, and propose simple yet effective and efficient algorithms for it. Specifically, we propose a deterministic algorithm with approximation ratio 6 and a randomized algorithm with approximation ratio 4, and show that both of them can be accelerated to achieve nearly linear running time at the cost of weakening the approximation ratio by an additive factor of ε. We then consider a more restrictive setting without full access to the whole dataset, and propose streaming algorithms with approximation ratios of 8+ε and 6+ε that make one pass and two passes over the data stream, respectively. As a by-product, we also propose a two-pass streaming algorithm with an approximation ratio of 2+ε when the considered submodular function is monotone. To the best of our knowledge, our algorithms achieve the best performance bounds compared to the state-of-the-art approximation algorithms with efficient implementation for the same problem. Finally, we evaluate our algorithms in two concrete submodular data summarization applications for revenue maximization in social networks and image summarization, and the empirical results show that our algorithms outperform the existing ones in terms of both effectiveness and efficiency.

References

  1. Georgios Amanatidis, Federico Fusco, Philip Lazos, Stefano Leonardi, and Rebecca Reiffenhäuser. 2020. Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint. In Neural Information Processing Systems (NeurIPS), arXiv: 2007.05014.Google ScholarGoogle Scholar
  2. Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. 2014. Streaming submodular maximization: Massive data summarization on the fly. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 671--680.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ashwinkumar Badanidiyuru and Jan Vondrák. 2014. Fast algorithms for maximizing submodular functions. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1497--1514.Google ScholarGoogle ScholarCross RefCross Ref
  4. Eric Balkanski, Adam Breuer, and Yaron Singer. 2018. Non-monotone submodular maximization in exponentially fewer iterations. In Neural Information Processing Systems (NeurIPS). 2353--2364.Google ScholarGoogle Scholar
  5. Mohammadhossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, and Afshin Rostamizadeh. 2019. Categorical feature compression via submodular optimization. In International Conference on Machine Learning (ICML). 515--523.Google ScholarGoogle Scholar
  6. Niv Buchbinder and Moran Feldman. 2018. Deterministic algorithms for submodular maximization problems. ACM Transactions on Algorithms 14, 3 (2018), 1--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Niv Buchbinder and Moran Feldman. 2019. Constrained submodular maximization via a nonsymmetric technique. Mathematics of Operations Research 44, 3 (2019), 988--1005.Google ScholarGoogle ScholarCross RefCross Ref
  8. Niv Buchbinder, Moran Feldman, and Mohit Garg. 2019. Deterministic (1/2+ '')-approximation for submodular maximization over a matroid. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 241--254.Google ScholarGoogle ScholarCross RefCross Ref
  9. Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. 2014. Submodular maximization with cardinality constraints. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1433--1452.Google ScholarGoogle ScholarCross RefCross Ref
  10. Niv Buchbinder, Moran Feldman, Joseph Seffi, and Roy Schwartz. 2015. A tight linear time (1/2)-approximation for unconstrained submodular maximization. SIAM J. Comput. 44, 5 (2015), 1384--1402.Google ScholarGoogle ScholarCross RefCross Ref
  11. Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. 2015. Streaming algorithms for submodular function maximization. In International Colloquium on Automata, Languages, and Programming (ICALP). 318--330.Google ScholarGoogle Scholar
  12. Chandra Chekuri, TS Jayram, and Jan Vondrák. 2015. On multiplicative weight updates for concave and submodular function maximization. In Innovations in Theoretical Computer Science (ITCS). 201--210.Google ScholarGoogle Scholar
  13. Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, and Jon Shlens. 2018. Searching for efficient multi-scale architectures for dense image prediction. In Advances in neural information processing systems (NeurIPS). 8699--8710.Google ScholarGoogle Scholar
  14. Wei Chen, Yajun Wang, and Siyu Yang. 2009. Efficient influence maximization in social networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 199--208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alina Ene and Huy L Nguyen. 2016. Constrained submodular maximization: Beyond 1/e. In IEEE Annual Symposium on Foundations of Computer Science (FOCS). 248--257.Google ScholarGoogle ScholarCross RefCross Ref
  16. Alina Ene and Huy L. Nguyen. 2019. A nearly-linear time algorithm for submodular maximization with a knapsack constraint. In International Colloquium on Automata, Languages, and Programming (ICALP). 53:1--53:12.Google ScholarGoogle Scholar
  17. Matthew Fahrbach, Vahab Mirrokni, and Morteza Zadimoghaddam. 2019. Non-monotone submodular maximization with nearly optimal adaptivity and query complexity. In International Conference on Machine Learning (ICML). 1833-- 1842.Google ScholarGoogle Scholar
  18. Uriel Feige, Vahab S Mirrokni, and Jan Vondrák. 2011. Maximizing non-monotone submodular functions. SIAM J. Comput. 40, 4 (2011), 1133--1153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Moran Feldman, Christopher Harshaw, and Amin Karbasi. 2017. Greed is good: Near-optimal submodular maximization via greedy optimization. In Conference on Learning Theory (COLT). 758--784.Google ScholarGoogle Scholar
  20. Moran Feldman, Joseph Naor, and Roy Schwartz. 2011. A unified continuous greedy algorithm for submodular maximization. In IEEE Annual Symposium on Foundations of Computer Science (FOCS). 570--579.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Shayan Oveis Gharan and Jan Vondrák. 2011. Submodular maximization by simulated annealing. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1098--1116. Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 1, Article 5. Publication date: March 2021. 5:22 Kai Han et al.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ryan Gomes and Andreas Krause. 2010. Budgeted nonparametric learning from data streams. In International Conference on Machine Learning (ICML). 391--398.Google ScholarGoogle Scholar
  23. Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. 2010. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In International Workshop on Internet and Network Economics (WINE). 246--257.Google ScholarGoogle ScholarCross RefCross Ref
  24. Ran Haba, Ehsan Kazemi, Moran Feldman, and Amin Karbasi. 2020. Streaming Submodular Maximization under a ??-Set System Constraint. In International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  25. Shengyuan Hu, Tao Yu, Chuan Guo, Wei-Lun Chao, and Kilian Q Weinberger. 2019. A new defense against adversarial images: Turning a weakness into a strength. In Advances in neural information processing systems (NeurIPS). 1635--1646.Google ScholarGoogle Scholar
  26. Chien-Chung Huang and Naonori Kakimura. 2018. Multi-Pass Streaming Algorithms for Monotone Submodular Function Maximization. preprint, arXiv:1802.06212 (2018).Google ScholarGoogle Scholar
  27. Chien-Chung Huang and Naonori Kakimura. 2019. Improved streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. In Workshop on Algorithms and Data Structures (WADS). 438--451.Google ScholarGoogle ScholarCross RefCross Ref
  28. Chien-Chung Huang, Naonori Kakimura, and Yuichi Yoshida. 2020. Streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. Algorithmica 82, 4 (2020), 1006--1032.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ehsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, and Amin Karbasi. 2019. Submodular streaming in all Its glory: Tight approximation, minimum memory and low adaptive complexity. In International Conference on Machine Learning (ICML). 3311--3320.Google ScholarGoogle Scholar
  30. Samir Khuller, Anna Moss, and Joseph Seffi Naor. 1999. The budgeted maximum coverage problem. Inform. Process. Lett. 70, 1 (1999), 39--45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Andreas Krause and Daniel Golovin. 2014. Tractability: Practical Approaches to Hard Problems. Cambridge University Press. 71--104 pages.Google ScholarGoogle Scholar
  32. Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Reports, University of Toronto (2009).Google ScholarGoogle Scholar
  33. Alan Kuhnle. 2019. Interlaced greedy algorithm for maximization of submodular functions in nearly linear time. In Neural Information Processing Systems (NeurIPS). 2371--2381.Google ScholarGoogle Scholar
  34. Ariel Kulik, Hadas Shachnai, and Tami Tamir. 2013. Approximations for monotone and nonmonotone submodular maximization with knapsack constraints. Mathematics of Operations Research 38, 4 (2013), 729--739.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. 2009. Non-monotone submodular maximization under matroid and knapsack constraints. In ACM Symposium on Theory of Computing (STOC). 323--332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. 2007. Cost-effective outbreak detection in networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 420--429.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jure Leskovec and Andrej Krevl. 2014. SNAP datasets: Stanford large network dataset collection, URL: https://snap.stanford.edu.Google ScholarGoogle Scholar
  38. Wenxin Li and Ness Shroff. 2018. Nearly linear time algorithms and lower bound for submodular maximization. preprint, arXiv:1804.08178 (2018).Google ScholarGoogle Scholar
  39. Yishi Lin,Wei Chen, and John CS Lui. 2017. Boosting information spread: An algorithmic approach. In IEEE International Conference on Data Engineering (ICDE). 883--894.Google ScholarGoogle ScholarCross RefCross Ref
  40. Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. 2016. Fast Constrained Submodular Maximization: Personalized Data Summarization. In International Conference on Machine Learning (ICML). 1358--1367.Google ScholarGoogle Scholar
  41. Baharan Mirzasoleiman, Stefanie Jegelka, and Andreas Krause. 2018. Streaming non-monotone submodular maximization: Personalized video summarization on the fly. In AAAI Conference on Artificial Intelligence (AAAI). 1379--1386.Google ScholarGoogle ScholarCross RefCross Ref
  42. Gamal Sallam and Bo Ji. 2019. Joint placement and allocation of virtual network functions with budget and capacity constraints. In IEEE Conference on Computer Communications (INFOCOM). 523--531.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Gamal Sallam, Zizhan Zheng, and Bo Ji. 2019. Placement and allocation of virtual network functions: Multi-dimensional case. In IEEE International Conference on Network Protocols (ICNP). 1--11.Google ScholarGoogle ScholarCross RefCross Ref
  44. Adish Singla, Sebastian Tschiatschek, and Andreas Krause. 2016. Noisy submodular maximization via adaptive sampling with applications to crowdsourced image collection summarization. In AAAI Conference on Artificial Intelligence (AAAI). 2037--2043.Google ScholarGoogle ScholarCross RefCross Ref
  45. Ruben Sipos, Adith Swaminathan, Pannaga Shivaswamy, and Thorsten Joachims. 2012. Temporal corpus summarization using submodular word coverage. In International Conference on Information and Knowledge Management (CIKM). 754--763.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Maxim Sviridenko. 2004. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters 32, 1 (2004), 41--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Laurence A Wolsey. 1982. Maximising real-valued submodular functions: Primal and dual heuristics for location problems. Mathematics of Operations Research 7, 3 (1982), 410--425. Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 1, Article 5. Publication date: March 2021. Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint 5:23Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Grigory Yaroslavtsev, Samson Zhou, and Dmitrii Avdiukhin. 2020. "Bring your own greedy" + max: Near-optimal 1/2- approximations for submodular knapsack. In International Conference on Artificial Intelligence and Statistics (AISTATS). 3263--3274.Google ScholarGoogle Scholar
  49. Junzhou Zhao, Shuo Shang, Pinghui Wang, John CS Lui, and Xiangliang Zhang. 2019. Submodular optimization over streams with inhomogeneous decays. In AAAI Conference on Artificial Intelligence (AAAI), Vol. 33. 5861--5868.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!