skip to main content
research-article

Efficient Distributed Workload (Re-)Embedding

Published:26 March 2019Publication History
Skip Abstract Section

Abstract

Modern networked systems are increasingly reconfigurable, enabling demand-aware infrastructures whose resources can be adjusted according to the workload they currently serve. Such dynamic adjustments can be exploited to improve network utilization and hence performance, by moving frequently interacting communication partners closer, e.g., collocating them in the same server or datacenter. However, dynamically changing the embedding of workloads is algorithmically challenging: communication patterns are often not known ahead of time, but must be learned. During the learning process, overheads related to unnecessary moves (i.e., re-embeddings) should be minimized. This paper studies a fundamental model which captures the tradeoff between the benefits and costs of dynamically collocating communication partners on $\ell$ servers, in an online manner. Our main contribution is a distributed online algorithm which is asymptotically almost optimal, i.e., almost matches the lower bound (also derived in this paper) on the competitive ratio of any (distributed or centralized) online algorithm. As an application, we show that our algorithm can be used to solve a distributed union find problem in which the sets are stored across multiple servers.

References

  1. M. Noormohammadpour and C. S. Raghavendra, “Datacenter traffic control: Understanding techniques and trade-offs,” IEEE Communications Surveys & Tutorials, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B. Su, “Scaling distributed machine learning with the parameter server.,” in Proc. USENIX OSDI, vol. 14, pp. 583--598, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. C. Mogul and L. Popa, “What we talk about when we talk about cloud network performance,” SIGCOMM Comput. Commun. Rev. (CCR), Sept. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Avin and S. Schmid, “Toward demand-aware networking: A theory for self-adjusting networks,” in ACM SIGCOMM Computer Communication Review (CCR), 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Ghobadi, R. Mahajan, A. Phanishayee, N. Devanur, J. Kulkarni, G. Ranade, P.-A. Blanche, H. Rastegarfar, M. Glick, and D. Kilper, “Projector: Agile reconfigurable data center interconnect,” in Proc. ACM SIGCOMM, (New York, NY, USA), pp. 216--229, ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, “Inside the social network's (datacenter) network,” in Proc. ACM SIGCOMM, Proc. ACM SIGCOMM, (New York, NY, USA), pp. 123--137, ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Benson, A. Akella, and D. A. Maltz, “Network traffic characteristics of data centers in the wild,” in Proc. ACM SIGCOMM Conference on Internet Measurement (IMC), IMC '10, (New York, NY, USA), pp. 267--280, ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Avin, A. Loukas, M. Pacut, and S. Schmid, “Online balanced repartitioning,” in Proc. 30th International Symposium on Distributed Computing (DISC), 2016.Google ScholarGoogle ScholarCross RefCross Ref
  9. D. D. Sleator and R. E. Tarjan, “Amortized efficiency of list update and paging rules,” Communications of the ACM, vol. 28, no. 2, pp. 202--208, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. L. Schreiber, R. E. Korf, and M. D. Moffitt, “Optimal multi-way number partitioning,” J. ACM, vol. 65, no. 4, pp. 24:1--24:61, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. A. Galler and M. J. Fischer, “An improved equivalence algorithm,” Commun. ACM, vol. 7, no. 5, pp. 301--303, 1964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. E. Tarjan and J. van Leeuwen, “Worst-case analysis of set union algorithms,” J. ACM, vol. 31, no. 2, pp. 245--281, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic clustering of the web,” Computer Networks, vol. 29, no. 8--13, pp. 1157--1166, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. S. Hochbaum and D. B. Shmoys, “Using dual approximation algorithms for scheduling problems theoretical and practical results,” J. ACM , vol. 34, no. 1, pp. 144--162, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Cybenko, T. G. Allen, and J. E. Polito, “Practical parallel union-find algorithms for transitive closure and clustering,” International Journal of Parallel Programming, vol. 17, no. 5, pp. 403--423, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer, “Achieving high utilization with software-driven wan,” in Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 43, pp. 15--26, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, et al., “B4: Experience with a globally-deployed software defined wan,” Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 43, no. 4, pp. 3--14, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Patel, D. Bansal, L. Yuan, A. Murthy, A. Greenberg, D. A. Maltz, R. Kern, H. Kumar, M. Zikos, H. Wu, et al., “Ananta: Cloud scale load balancing,” in Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 43, pp. 207--218, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. E. Eisenbud, C. Yi, C. Contavalli, C. Smith, R. Kononov, E. Mann-Hielscher, A. Cilingiroglu, B. Cheyney, W. Shang, and J. D. Hosein, “Maglev: A fast and reliable software network load balancer.,” in Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 523--535, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Sherwood, M. Chan, A. Covington, G. Gibb, M. Flajslik, N. Handigol, T.-Y. Huang, P. Kazemian, M. Kobayashi, J. Naous, et al., “Carving research slices out of your production networks with openflow,” ACM SIGCOMM Computer Communication Review, vol. 40, no. 1, pp. 129--130, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Bienkowski, A. Feldmann, J. Grassler, G. Schaffrath, and S. Schmid, “The wide-area virtual service migration problem: A competitive analysis approach,” IEEE/ACM Transactions on Networking (ToN), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, et al., “P4: Programming protocol-independent packet processors,” Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 44, no. 3, pp. 87--95, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Firestone, “Smartnic: Accelerating azure's network with fpgas on ocs servers,” https://ocpussummit2016.sched.com/event/68u4/, 2016.Google ScholarGoogle Scholar
  25. C. Fuerst, S. Schmid, L. Suresh, and P. Costa, “Kraken: Online and elastic resource reservations for multi-tenant datacenters,” in Proc. 35th IEEE Conference on Computer Communications (INFOCOM), 2016.Google ScholarGoogle ScholarCross RefCross Ref
  26. A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, et al., “Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network,” Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 45, no. 4, pp. 183--197, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Cisco, “Cisco global cloud index: Forecast and methodology, 2015--2020,” White Paper, 2015.Google ScholarGoogle Scholar
  28. M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, “Data center tcp (dctcp),” in Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 40, pp. 63--74, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Judd, “Attaining the promise and avoiding the pitfalls of tcp in the datacenter.,” in Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 145--157, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Hamedazimi, Z. Qazi, H. Gupta, V. Sekar, S. R. Das, J. P. Longtin, H. Shah, and A. Tanwer, “Firefly: A reconfigurable wireless data center fabric using free-space optics,” in Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 44, pp. 319--330, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Bartal, M. Charikar, and P. Indyk, “On page migration and other relaxed task systems,” Theoretical Computer Science, vol. 268, no. 1, pp. 43--66, 2001. Also appeared in Proc. of the 8th SODA, pages 43--52, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. L. Black and D. D. Sleator, “Competitive algorithms for replication and migration problems,” 1989.Google ScholarGoogle Scholar
  33. M. Bienkowski, A. Feldmann, J. Grassler, G. Schaffrath, and S. Schmid, “The wide-area virtual service migration problem: A competitive analysis approach,” IEEE/ACM Trans. Netw., vol. 22, pp. 165--178, Feb. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Fiat, Y. Rabani, and Y. Ravid, “Competitive k-server algorithms,” J. Comput. Syst. Sci., vol. 48, no. 3, pp. 410--428, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Borodin, N. Linial, and M. E. Saks, “An optimal on-line algorithm for metrical task system,” Journal of the ACM, vol. 39, no. 4, pp. 745--763, 1992. Also appeared in Proc. of the 19th STOC, pages 373--382, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Fiat, R. M. Karp, M. Luby, L. A. McGeoch, D. D. Sleator, and N. E. Young, “Competitive paging algorithms,” Journal of Algorithms, vol. 12, no. 4, pp. 685--699, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Mendel and S. S. Seiden, “Online companion caching,” Theoretical Computer Science, vol. 324, no. 2--3, pp. 183--200, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. N. E. Young, “On-line caching as cache size varies,” in Proc. of the2ndACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 241--250, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Adamaszek, A. Czumaj, M. Englert, and H. R"a cke, “An O(log k)-competitive algorithm for generalized caching,” in Proc. 23rd SODA, pp. 1681--1689, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. Epstein, C. Imreh, A. Levin, and J. Nagy-Gyö rgy, “Online file caching with rejection penalties,” Algorithmica, vol. 71, no. 2, pp. 279--306, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. L. Vaquero, F. Cuadrado, D. Logothetis, and C. Martella, “Adaptive partitioning for large-scale dynamic graphs,” in Proc. 4th Annual Symposium on Cloud Computing (SOCC), pp. 35:1--35:2, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. E. Abbe, “Community detection and stochastic block models: Recent developments,” Journal of Machine Learning Research, vol. 18, no. 177, pp. 1--86, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. U. Feige and R. Krauthgamer, “A polylogarithmic approximation of the minimum bisection,” SIAM Journal on Computing, vol. 31, no. 4, pp. 1090--1118, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. K. Andreev and H. R"acke, “Balanced graph partitioning,” Theory of Computing Systems, vol. 39, no. 6, pp. 929--939, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R. Krauthgamer and U. Feige, “A polylogarithmic approximation of the minimum bisection,” SIAM Review, vol. 48, no. 1, pp. 99--130, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. S. Rao and A. W. Richa, “New approximation techniques for some ordering problems.,” in SODA, vol. 98, pp. 211--219, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Yu, Y. Yi, J. Rexford, and M. Chiang, “Rethinking virtual network embedding: substrate support for path splitting and migration,” ACM SIGCOMM Computer Communication Review, vol. 38, no. 2, pp. 17--29, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. N. Olver, K. Pruhs, K. Schewior, R. Sitters, and L. Stougie, “The itinerant list update problem,” Proc. 16th Workshop on Approximation and Online Algorithms (WAOA), 2018.Google ScholarGoogle ScholarCross RefCross Ref
  49. R. J. Anderson and H. Woll, “Wait-free parallel algorithms for the union-find problem,” in STOC, pp. 370--380, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. P. K. Agarwal, L. Arge, and K. Yi, “I/o-efficient batched union-find and its applications to terrain analysis,” ACM Trans. Algorithms, vol. 7, no. 1, pp. 11:1--11:21, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. F. Manne and M. M. A. Patwary, “A scalable parallel union-find algorithm for distributed memory computers,” in PPAM, pp. 186--195, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. M. A. Patwary, J. R. S. Blair, and F. Manne, “Experiments on union-find algorithms for the disjoint-set data structure,” in SEA, pp. 411--423, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. M. A. Patwary, P. Refsnes, and F. Manne, “Multi-core spanning forest algorithms using the disjoint-set data structure,” in IPDPS, pp. 827--835, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. R. E. Korf, “Multi-way number partitioning,” in IJCAI, pp. 538--543, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. C. Avin, M. Bienkowski, A. Loukas, M. Pacut, and S. Schmid, “Dynamic balanced graph partitioning,” in SIAM J. Discrete Math (SIDMA), 2019.Google ScholarGoogle Scholar

Index Terms

  1. Efficient Distributed Workload (Re-)Embedding

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!