Abstract
Data summarization, i.e., selecting representative subsets of manageable size out of massive data, is often modeled as a submodular optimization problem. Although there exist extensive algorithms for submodular optimization, many of them incur large computational overheads and hence are not suitable for mining big data. In this work, we consider the fundamental problem of (non-monotone) submodular function maximization with a knapsack constraint, and propose simple yet effective and efficient algorithms for it. Specifically, we propose a deterministic algorithm with approximation ratio 6 and a randomized algorithm with approximation ratio 4, and show that both of them can be accelerated to achieve nearly linear running time at the cost of weakening the approximation ratio by an additive factor of ε. We then consider a more restrictive setting without full access to the whole dataset, and propose streaming algorithms with approximation ratios of 8+ε and 6+ε that make one pass and two passes over the data stream, respectively. As a by-product, we also propose a two-pass streaming algorithm with an approximation ratio of 2+ε when the considered submodular function is monotone. To the best of our knowledge, our algorithms achieve the best performance bounds compared to the state-of-the-art approximation algorithms with efficient implementation for the same problem. Finally, we evaluate our algorithms in two concrete submodular data summarization applications for revenue maximization in social networks and image summarization, and the empirical results show that our algorithms outperform the existing ones in terms of both effectiveness and efficiency.
- Georgios Amanatidis, Federico Fusco, Philip Lazos, Stefano Leonardi, and Rebecca Reiffenhäuser. 2020. Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint. In Neural Information Processing Systems (NeurIPS), arXiv: 2007.05014.Google Scholar
- Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. 2014. Streaming submodular maximization: Massive data summarization on the fly. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 671--680.Google Scholar
Digital Library
- Ashwinkumar Badanidiyuru and Jan Vondrák. 2014. Fast algorithms for maximizing submodular functions. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1497--1514.Google Scholar
Cross Ref
- Eric Balkanski, Adam Breuer, and Yaron Singer. 2018. Non-monotone submodular maximization in exponentially fewer iterations. In Neural Information Processing Systems (NeurIPS). 2353--2364.Google Scholar
- Mohammadhossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, and Afshin Rostamizadeh. 2019. Categorical feature compression via submodular optimization. In International Conference on Machine Learning (ICML). 515--523.Google Scholar
- Niv Buchbinder and Moran Feldman. 2018. Deterministic algorithms for submodular maximization problems. ACM Transactions on Algorithms 14, 3 (2018), 1--20.Google Scholar
Digital Library
- Niv Buchbinder and Moran Feldman. 2019. Constrained submodular maximization via a nonsymmetric technique. Mathematics of Operations Research 44, 3 (2019), 988--1005.Google Scholar
Cross Ref
- Niv Buchbinder, Moran Feldman, and Mohit Garg. 2019. Deterministic (1/2+ '')-approximation for submodular maximization over a matroid. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 241--254.Google Scholar
Cross Ref
- Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. 2014. Submodular maximization with cardinality constraints. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1433--1452.Google Scholar
Cross Ref
- Niv Buchbinder, Moran Feldman, Joseph Seffi, and Roy Schwartz. 2015. A tight linear time (1/2)-approximation for unconstrained submodular maximization. SIAM J. Comput. 44, 5 (2015), 1384--1402.Google Scholar
Cross Ref
- Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. 2015. Streaming algorithms for submodular function maximization. In International Colloquium on Automata, Languages, and Programming (ICALP). 318--330.Google Scholar
- Chandra Chekuri, TS Jayram, and Jan Vondrák. 2015. On multiplicative weight updates for concave and submodular function maximization. In Innovations in Theoretical Computer Science (ITCS). 201--210.Google Scholar
- Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, and Jon Shlens. 2018. Searching for efficient multi-scale architectures for dense image prediction. In Advances in neural information processing systems (NeurIPS). 8699--8710.Google Scholar
- Wei Chen, Yajun Wang, and Siyu Yang. 2009. Efficient influence maximization in social networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 199--208.Google Scholar
Digital Library
- Alina Ene and Huy L Nguyen. 2016. Constrained submodular maximization: Beyond 1/e. In IEEE Annual Symposium on Foundations of Computer Science (FOCS). 248--257.Google Scholar
Cross Ref
- Alina Ene and Huy L. Nguyen. 2019. A nearly-linear time algorithm for submodular maximization with a knapsack constraint. In International Colloquium on Automata, Languages, and Programming (ICALP). 53:1--53:12.Google Scholar
- Matthew Fahrbach, Vahab Mirrokni, and Morteza Zadimoghaddam. 2019. Non-monotone submodular maximization with nearly optimal adaptivity and query complexity. In International Conference on Machine Learning (ICML). 1833-- 1842.Google Scholar
- Uriel Feige, Vahab S Mirrokni, and Jan Vondrák. 2011. Maximizing non-monotone submodular functions. SIAM J. Comput. 40, 4 (2011), 1133--1153.Google Scholar
Digital Library
- Moran Feldman, Christopher Harshaw, and Amin Karbasi. 2017. Greed is good: Near-optimal submodular maximization via greedy optimization. In Conference on Learning Theory (COLT). 758--784.Google Scholar
- Moran Feldman, Joseph Naor, and Roy Schwartz. 2011. A unified continuous greedy algorithm for submodular maximization. In IEEE Annual Symposium on Foundations of Computer Science (FOCS). 570--579.Google Scholar
Digital Library
- Shayan Oveis Gharan and Jan Vondrák. 2011. Submodular maximization by simulated annealing. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1098--1116. Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 1, Article 5. Publication date: March 2021. 5:22 Kai Han et al.Google Scholar
Cross Ref
- Ryan Gomes and Andreas Krause. 2010. Budgeted nonparametric learning from data streams. In International Conference on Machine Learning (ICML). 391--398.Google Scholar
- Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. 2010. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In International Workshop on Internet and Network Economics (WINE). 246--257.Google Scholar
Cross Ref
- Ran Haba, Ehsan Kazemi, Moran Feldman, and Amin Karbasi. 2020. Streaming Submodular Maximization under a ??-Set System Constraint. In International Conference on Machine Learning (ICML).Google Scholar
- Shengyuan Hu, Tao Yu, Chuan Guo, Wei-Lun Chao, and Kilian Q Weinberger. 2019. A new defense against adversarial images: Turning a weakness into a strength. In Advances in neural information processing systems (NeurIPS). 1635--1646.Google Scholar
- Chien-Chung Huang and Naonori Kakimura. 2018. Multi-Pass Streaming Algorithms for Monotone Submodular Function Maximization. preprint, arXiv:1802.06212 (2018).Google Scholar
- Chien-Chung Huang and Naonori Kakimura. 2019. Improved streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. In Workshop on Algorithms and Data Structures (WADS). 438--451.Google Scholar
Cross Ref
- Chien-Chung Huang, Naonori Kakimura, and Yuichi Yoshida. 2020. Streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. Algorithmica 82, 4 (2020), 1006--1032.Google Scholar
Digital Library
- Ehsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, and Amin Karbasi. 2019. Submodular streaming in all Its glory: Tight approximation, minimum memory and low adaptive complexity. In International Conference on Machine Learning (ICML). 3311--3320.Google Scholar
- Samir Khuller, Anna Moss, and Joseph Seffi Naor. 1999. The budgeted maximum coverage problem. Inform. Process. Lett. 70, 1 (1999), 39--45.Google Scholar
Digital Library
- Andreas Krause and Daniel Golovin. 2014. Tractability: Practical Approaches to Hard Problems. Cambridge University Press. 71--104 pages.Google Scholar
- Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Reports, University of Toronto (2009).Google Scholar
- Alan Kuhnle. 2019. Interlaced greedy algorithm for maximization of submodular functions in nearly linear time. In Neural Information Processing Systems (NeurIPS). 2371--2381.Google Scholar
- Ariel Kulik, Hadas Shachnai, and Tami Tamir. 2013. Approximations for monotone and nonmonotone submodular maximization with knapsack constraints. Mathematics of Operations Research 38, 4 (2013), 729--739.Google Scholar
Digital Library
- Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. 2009. Non-monotone submodular maximization under matroid and knapsack constraints. In ACM Symposium on Theory of Computing (STOC). 323--332.Google Scholar
Digital Library
- Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. 2007. Cost-effective outbreak detection in networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 420--429.Google Scholar
Digital Library
- Jure Leskovec and Andrej Krevl. 2014. SNAP datasets: Stanford large network dataset collection, URL: https://snap.stanford.edu.Google Scholar
- Wenxin Li and Ness Shroff. 2018. Nearly linear time algorithms and lower bound for submodular maximization. preprint, arXiv:1804.08178 (2018).Google Scholar
- Yishi Lin,Wei Chen, and John CS Lui. 2017. Boosting information spread: An algorithmic approach. In IEEE International Conference on Data Engineering (ICDE). 883--894.Google Scholar
Cross Ref
- Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. 2016. Fast Constrained Submodular Maximization: Personalized Data Summarization. In International Conference on Machine Learning (ICML). 1358--1367.Google Scholar
- Baharan Mirzasoleiman, Stefanie Jegelka, and Andreas Krause. 2018. Streaming non-monotone submodular maximization: Personalized video summarization on the fly. In AAAI Conference on Artificial Intelligence (AAAI). 1379--1386.Google Scholar
Cross Ref
- Gamal Sallam and Bo Ji. 2019. Joint placement and allocation of virtual network functions with budget and capacity constraints. In IEEE Conference on Computer Communications (INFOCOM). 523--531.Google Scholar
Digital Library
- Gamal Sallam, Zizhan Zheng, and Bo Ji. 2019. Placement and allocation of virtual network functions: Multi-dimensional case. In IEEE International Conference on Network Protocols (ICNP). 1--11.Google Scholar
Cross Ref
- Adish Singla, Sebastian Tschiatschek, and Andreas Krause. 2016. Noisy submodular maximization via adaptive sampling with applications to crowdsourced image collection summarization. In AAAI Conference on Artificial Intelligence (AAAI). 2037--2043.Google Scholar
Cross Ref
- Ruben Sipos, Adith Swaminathan, Pannaga Shivaswamy, and Thorsten Joachims. 2012. Temporal corpus summarization using submodular word coverage. In International Conference on Information and Knowledge Management (CIKM). 754--763.Google Scholar
Digital Library
- Maxim Sviridenko. 2004. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters 32, 1 (2004), 41--43.Google Scholar
Digital Library
- Laurence A Wolsey. 1982. Maximising real-valued submodular functions: Primal and dual heuristics for location problems. Mathematics of Operations Research 7, 3 (1982), 410--425. Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 1, Article 5. Publication date: March 2021. Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint 5:23Google Scholar
Digital Library
- Grigory Yaroslavtsev, Samson Zhou, and Dmitrii Avdiukhin. 2020. "Bring your own greedy" + max: Near-optimal 1/2- approximations for submodular knapsack. In International Conference on Artificial Intelligence and Statistics (AISTATS). 3263--3274.Google Scholar
- Junzhou Zhao, Shuo Shang, Pinghui Wang, John CS Lui, and Xiangliang Zhang. 2019. Submodular optimization over streams with inhomogeneous decays. In AAAI Conference on Artificial Intelligence (AAAI), Vol. 33. 5861--5868.Google Scholar
Digital Library
Index Terms
Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint
Recommendations
Deterministic Algorithms for Submodular Maximization Problems
Special Issue on SODA’16 and Regular PapersRandomization is a fundamental tool used in many theoretical and practical areas of computer science. We study here the role of randomization in the area of submodular function maximization. In this area, most algorithms are randomized, and in almost ...
Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint
SIGMETRICS '21Data summarization, a fundamental methodology aimed at selecting a representative subset of data elements from a large pool of ground data, has found numerous applications in big data processing, such as social network analysis [5, 7], crowdsourcing [6],...
Multi-Pass Streaming Algorithms for Monotone Submodular Function Maximization
AbstractWe consider maximizing a monotone submodular function under a cardinality constraint or a knapsack constraint in the streaming setting. In particular, the elements arrive sequentially and at any point of time, the algorithm has access to only a ...






Comments