Abstract
Modern networked systems are increasingly reconfigurable, enabling demand-aware infrastructures whose resources can be adjusted according to the workload they currently serve. Such dynamic adjustments can be exploited to improve network utilization and hence performance, by moving frequently interacting communication partners closer, e.g., collocating them in the same server or datacenter. However, dynamically changing the embedding of workloads is algorithmically challenging: communication patterns are often not known ahead of time, but must be learned. During the learning process, overheads related to unnecessary moves (i.e., re-embeddings) should be minimized. This paper studies a fundamental model which captures the tradeoff between the benefits and costs of dynamically collocating communication partners on $\ell$ servers, in an online manner. Our main contribution is a distributed online algorithm which is asymptotically almost optimal, i.e., almost matches the lower bound (also derived in this paper) on the competitive ratio of any (distributed or centralized) online algorithm. As an application, we show that our algorithm can be used to solve a distributed union find problem in which the sets are stored across multiple servers.
- M. Noormohammadpour and C. S. Raghavendra, “Datacenter traffic control: Understanding techniques and trade-offs,” IEEE Communications Surveys & Tutorials, 2017.Google Scholar
Cross Ref
- M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B. Su, “Scaling distributed machine learning with the parameter server.,” in Proc. USENIX OSDI, vol. 14, pp. 583--598, 2014. Google Scholar
Digital Library
- J. C. Mogul and L. Popa, “What we talk about when we talk about cloud network performance,” SIGCOMM Comput. Commun. Rev. (CCR), Sept. 2012. Google Scholar
Digital Library
- C. Avin and S. Schmid, “Toward demand-aware networking: A theory for self-adjusting networks,” in ACM SIGCOMM Computer Communication Review (CCR), 2018. Google Scholar
Digital Library
- M. Ghobadi, R. Mahajan, A. Phanishayee, N. Devanur, J. Kulkarni, G. Ranade, P.-A. Blanche, H. Rastegarfar, M. Glick, and D. Kilper, “Projector: Agile reconfigurable data center interconnect,” in Proc. ACM SIGCOMM, (New York, NY, USA), pp. 216--229, ACM, 2016. Google Scholar
Digital Library
- A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, “Inside the social network's (datacenter) network,” in Proc. ACM SIGCOMM, Proc. ACM SIGCOMM, (New York, NY, USA), pp. 123--137, ACM, 2015. Google Scholar
Digital Library
- T. Benson, A. Akella, and D. A. Maltz, “Network traffic characteristics of data centers in the wild,” in Proc. ACM SIGCOMM Conference on Internet Measurement (IMC), IMC '10, (New York, NY, USA), pp. 267--280, ACM, 2010. Google Scholar
Digital Library
- C. Avin, A. Loukas, M. Pacut, and S. Schmid, “Online balanced repartitioning,” in Proc. 30th International Symposium on Distributed Computing (DISC), 2016.Google Scholar
Cross Ref
- D. D. Sleator and R. E. Tarjan, “Amortized efficiency of list update and paging rules,” Communications of the ACM, vol. 28, no. 2, pp. 202--208, 1985. Google Scholar
Digital Library
- E. L. Schreiber, R. E. Korf, and M. D. Moffitt, “Optimal multi-way number partitioning,” J. ACM, vol. 65, no. 4, pp. 24:1--24:61, 2018. Google Scholar
Digital Library
- B. A. Galler and M. J. Fischer, “An improved equivalence algorithm,” Commun. ACM, vol. 7, no. 5, pp. 301--303, 1964. Google Scholar
Digital Library
- R. E. Tarjan and J. van Leeuwen, “Worst-case analysis of set union algorithms,” J. ACM, vol. 31, no. 2, pp. 245--281, 1984. Google Scholar
Digital Library
- A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic clustering of the web,” Computer Networks, vol. 29, no. 8--13, pp. 1157--1166, 1997. Google Scholar
Digital Library
- M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. Google Scholar
Digital Library
- D. S. Hochbaum and D. B. Shmoys, “Using dual approximation algorithms for scheduling problems theoretical and practical results,” J. ACM , vol. 34, no. 1, pp. 144--162, 1987. Google Scholar
Digital Library
- G. Cybenko, T. G. Allen, and J. E. Polito, “Practical parallel union-find algorithms for transitive closure and clustering,” International Journal of Parallel Programming, vol. 17, no. 5, pp. 403--423, 1988. Google Scholar
Digital Library
- C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer, “Achieving high utilization with software-driven wan,” in Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 43, pp. 15--26, 2013. Google Scholar
Digital Library
- S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, et al., “B4: Experience with a globally-deployed software defined wan,” Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 43, no. 4, pp. 3--14, 2013. Google Scholar
Digital Library
- P. Patel, D. Bansal, L. Yuan, A. Murthy, A. Greenberg, D. A. Maltz, R. Kern, H. Kumar, M. Zikos, H. Wu, et al., “Ananta: Cloud scale load balancing,” in Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 43, pp. 207--218, 2013. Google Scholar
Digital Library
- D. E. Eisenbud, C. Yi, C. Contavalli, C. Smith, R. Kononov, E. Mann-Hielscher, A. Cilingiroglu, B. Cheyney, W. Shang, and J. D. Hosein, “Maglev: A fast and reliable software network load balancer.,” in Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 523--535, 2016. Google Scholar
Digital Library
- R. Sherwood, M. Chan, A. Covington, G. Gibb, M. Flajslik, N. Handigol, T.-Y. Huang, P. Kazemian, M. Kobayashi, J. Naous, et al., “Carving research slices out of your production networks with openflow,” ACM SIGCOMM Computer Communication Review, vol. 40, no. 1, pp. 129--130, 2010. Google Scholar
Digital Library
- M. Bienkowski, A. Feldmann, J. Grassler, G. Schaffrath, and S. Schmid, “The wide-area virtual service migration problem: A competitive analysis approach,” IEEE/ACM Transactions on Networking (ToN), 2014. Google Scholar
Digital Library
- P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, et al., “P4: Programming protocol-independent packet processors,” Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 44, no. 3, pp. 87--95, 2014. Google Scholar
Digital Library
- D. Firestone, “Smartnic: Accelerating azure's network with fpgas on ocs servers,” https://ocpussummit2016.sched.com/event/68u4/, 2016.Google Scholar
- C. Fuerst, S. Schmid, L. Suresh, and P. Costa, “Kraken: Online and elastic resource reservations for multi-tenant datacenters,” in Proc. 35th IEEE Conference on Computer Communications (INFOCOM), 2016.Google Scholar
Cross Ref
- A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, et al., “Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network,” Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 45, no. 4, pp. 183--197, 2015. Google Scholar
Digital Library
- Cisco, “Cisco global cloud index: Forecast and methodology, 2015--2020,” White Paper, 2015.Google Scholar
- M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, “Data center tcp (dctcp),” in Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 40, pp. 63--74, 2010. Google Scholar
Digital Library
- G. Judd, “Attaining the promise and avoiding the pitfalls of tcp in the datacenter.,” in Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 145--157, 2015. Google Scholar
Digital Library
- N. Hamedazimi, Z. Qazi, H. Gupta, V. Sekar, S. R. Das, J. P. Longtin, H. Shah, and A. Tanwer, “Firefly: A reconfigurable wireless data center fabric using free-space optics,” in Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 44, pp. 319--330, 2014. Google Scholar
Digital Library
- Y. Bartal, M. Charikar, and P. Indyk, “On page migration and other relaxed task systems,” Theoretical Computer Science, vol. 268, no. 1, pp. 43--66, 2001. Also appeared in Proc. of the 8th SODA, pages 43--52, 1997. Google Scholar
Digital Library
- D. L. Black and D. D. Sleator, “Competitive algorithms for replication and migration problems,” 1989.Google Scholar
- M. Bienkowski, A. Feldmann, J. Grassler, G. Schaffrath, and S. Schmid, “The wide-area virtual service migration problem: A competitive analysis approach,” IEEE/ACM Trans. Netw., vol. 22, pp. 165--178, Feb. 2014. Google Scholar
Digital Library
- A. Fiat, Y. Rabani, and Y. Ravid, “Competitive k-server algorithms,” J. Comput. Syst. Sci., vol. 48, no. 3, pp. 410--428, 1994. Google Scholar
Digital Library
- A. Borodin, N. Linial, and M. E. Saks, “An optimal on-line algorithm for metrical task system,” Journal of the ACM, vol. 39, no. 4, pp. 745--763, 1992. Also appeared in Proc. of the 19th STOC, pages 373--382, 1987. Google Scholar
Digital Library
- A. Fiat, R. M. Karp, M. Luby, L. A. McGeoch, D. D. Sleator, and N. E. Young, “Competitive paging algorithms,” Journal of Algorithms, vol. 12, no. 4, pp. 685--699, 1991. Google Scholar
Digital Library
- M. Mendel and S. S. Seiden, “Online companion caching,” Theoretical Computer Science, vol. 324, no. 2--3, pp. 183--200, 2004. Google Scholar
Digital Library
- N. E. Young, “On-line caching as cache size varies,” in Proc. of the2ndACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 241--250, 1991. Google Scholar
Digital Library
- A. Adamaszek, A. Czumaj, M. Englert, and H. R"a cke, “An O(log k)-competitive algorithm for generalized caching,” in Proc. 23rd SODA, pp. 1681--1689, 2012. Google Scholar
Digital Library
- L. Epstein, C. Imreh, A. Levin, and J. Nagy-Gyö rgy, “Online file caching with rejection penalties,” Algorithmica, vol. 71, no. 2, pp. 279--306, 2015. Google Scholar
Digital Library
- L. Vaquero, F. Cuadrado, D. Logothetis, and C. Martella, “Adaptive partitioning for large-scale dynamic graphs,” in Proc. 4th Annual Symposium on Cloud Computing (SOCC), pp. 35:1--35:2, 2013. Google Scholar
Digital Library
- E. Abbe, “Community detection and stochastic block models: Recent developments,” Journal of Machine Learning Research, vol. 18, no. 177, pp. 1--86, 2018. Google Scholar
Digital Library
- U. Feige and R. Krauthgamer, “A polylogarithmic approximation of the minimum bisection,” SIAM Journal on Computing, vol. 31, no. 4, pp. 1090--1118, 2002. Google Scholar
Digital Library
- K. Andreev and H. R"acke, “Balanced graph partitioning,” Theory of Computing Systems, vol. 39, no. 6, pp. 929--939, 2006. Google Scholar
Digital Library
- R. Krauthgamer and U. Feige, “A polylogarithmic approximation of the minimum bisection,” SIAM Review, vol. 48, no. 1, pp. 99--130, 2006. Google Scholar
Digital Library
- S. Rao and A. W. Richa, “New approximation techniques for some ordering problems.,” in SODA, vol. 98, pp. 211--219, 1998. Google Scholar
Digital Library
- M. Yu, Y. Yi, J. Rexford, and M. Chiang, “Rethinking virtual network embedding: substrate support for path splitting and migration,” ACM SIGCOMM Computer Communication Review, vol. 38, no. 2, pp. 17--29, 2008. Google Scholar
Digital Library
- N. Olver, K. Pruhs, K. Schewior, R. Sitters, and L. Stougie, “The itinerant list update problem,” Proc. 16th Workshop on Approximation and Online Algorithms (WAOA), 2018.Google Scholar
Cross Ref
- R. J. Anderson and H. Woll, “Wait-free parallel algorithms for the union-find problem,” in STOC, pp. 370--380, 1991. Google Scholar
Digital Library
- P. K. Agarwal, L. Arge, and K. Yi, “I/o-efficient batched union-find and its applications to terrain analysis,” ACM Trans. Algorithms, vol. 7, no. 1, pp. 11:1--11:21, 2010. Google Scholar
Digital Library
- F. Manne and M. M. A. Patwary, “A scalable parallel union-find algorithm for distributed memory computers,” in PPAM, pp. 186--195, 2009. Google Scholar
Digital Library
- M. M. A. Patwary, J. R. S. Blair, and F. Manne, “Experiments on union-find algorithms for the disjoint-set data structure,” in SEA, pp. 411--423, 2010. Google Scholar
Digital Library
- M. M. A. Patwary, P. Refsnes, and F. Manne, “Multi-core spanning forest algorithms using the disjoint-set data structure,” in IPDPS, pp. 827--835, 2012. Google Scholar
Digital Library
- R. E. Korf, “Multi-way number partitioning,” in IJCAI, pp. 538--543, 2009. Google Scholar
Digital Library
- C. Avin, M. Bienkowski, A. Loukas, M. Pacut, and S. Schmid, “Dynamic balanced graph partitioning,” in SIAM J. Discrete Math (SIDMA), 2019.Google Scholar
Index Terms
Efficient Distributed Workload (Re-)Embedding
Recommendations
Efficient Distributed Workload (Re-)Embedding
SIGMETRICS '19: Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer SystemsModern networked systems are increasingly reconfigurable, enabling demand-aware infrastructures whose resources can be adjusted according to the workload they currently serve. Such dynamic adjustments can be exploited to improve network utilization and ...
Efficient DistributedWorkload (Re-)Embedding
Modern networked systems are increasingly reconfigurable, enabling demand-aware infrastructures whose resources can be adjusted according to the workload they currently serve. Such dynamic adjustments can be exploited to improve network utilization and ...
Tight Analysis of the Smartstart Algorithm for Online Dial-a-Ride on the Line
The online Dial-a-Ride problem is a fundamental online problem in a metric space, where transportation requests appear over time and may be served in any order by a single server with unit speed. Restricted to the real line, online Dial-a-Ride captures ...






Comments