skip to main content
research-article

Autonomous Orchestration of Distributed Discrete Event Simulations in the Presence of Resource Uncertainty

Published:01 September 2015Publication History
Skip Abstract Section

Abstract

Discrete event simulations model the behavior of complex, real-world systems. Simulating a wide range of events and conditions provides a more nuanced model, but also increases its computational footprint. To manage these processing requirements in a scalable manner, discrete event simulations can be distributed across multiple computing resources. Orchestrating the simulations in a distributed setting involves coping with resource uncertainty. We consider three key aspects of resource uncertainty: resource failures, heterogeneity, and slowdowns. Each of these aspects is managed autonomously, which involves making accurate predictions of future execution times and latencies while also accounting for differences in hardware capabilities and dynamic resource consumption profiles. Further complicating matters, individual tasks within the simulation are stateful and stochastic, requiring inter-task communication and synchronization to produce accurate outcomes. We deal with these challenges through intelligent state collection and migration, active resource monitoring, and empirical evaluation of resource capabilities under changing conditions. To underscore the viability of our solution, we provide benchmarks using a production discrete event simulation that can simultaneously sustain failures, manage resource heterogeneity, and handle slowdowns while being orchestrated by our framework.

References

  1. A. Bialecki, M. Cafarella, D. Cutting, and O. O’Malley. 2005. Hadoop: A framework for running applications on large clusters built of commodity hardware. Retrieved August 1, 2015 from http://hadoop.apache.org/.Google ScholarGoogle Scholar
  2. M. Chtepen, F. H. A. Claeys, B. Dhoedt, F. De Turck, P. Demeester, and P. A. Vanrolleghem. 2009. Adaptive task checkpointing and replication: Toward efficient fault-tolerant grids. IEEE Transactions on Parallel and Distributed Systems, 20, 2, 180--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. R. Cotton, R. A. Pielke Sr., R. L. Walko, G. E. Liston, C. J. Tremback, H. Jiang, R. L. McAnelly, J. Y. Harrington, M. E. Nicholls, G. G. Carrio, and others. 2003. RAMS 2001: Current status and future directions. Meteorology and Atmospheric Physics 82, 1--4, 5--29.Google ScholarGoogle ScholarCross RefCross Ref
  4. D. Cucuzzo, S. D’Alessio, F. Quaglia, and P. Romano. 2007. A lightweight heuristic-based mechanism for collecting committed consistent global states in optimistic simulation. Proceedings of the International Symposium on Distributed Simulation and Real-Time Applications, 227--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. D’Angelo. 2011. Parallel and distributed simulation from many cores to the public cloud. Proceedings of the International Conference on High Performance Computing and Simulation (HPCS’11).Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Dean and S. Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Communications of the ACM 51, 1, 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. P. Deutsch. 1996. DEFLATE compressed data format specification, version 1.3.Google ScholarGoogle Scholar
  8. M. Eklof, F. Moradi, and R. Ayani. 2005. A framework for fault tolerance in HLA-based distributed simulations. Proceedings of Conference on Winter Simulation, 1182--1189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Ericson and S. Pallickara. 2012. On the performance of high dimensional data clustering and classification algorithms. Future Generation Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Ericson, S. Pallickara, and C. W. Anderson. 2010. Analyzing electroencephalograms using cloud computing techniques. In 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 185--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. H. Feng and E. A. Lee. 2007. Implementation of Real-Time Distributed Discrete-Event Execution with Fault Tolerance. Technical Report. University of California, Berkeley, Berkeley, CA.Google ScholarGoogle Scholar
  12. C. Green and others. 2010. Simulation modeling of alternative control strategies for an HPAI outbreak using NAADSM. In Canadian Association of Veterinary Epidemiology Preventive Medicine (CAVEPM) Meeting, Guelph, Ontario, Canada.Google ScholarGoogle Scholar
  13. N. Harvey, A. Reeves, M. A. Schoenbaum, F. J. Zagmutt-Vergara, C. Dube, A. E. Hill, et al. 2007. The North American animal disease spread model: A simulation model to assist decision making in evaluating animal disease incursions. Preventive Veterinary Medicine 82, 3, 176--197.Google ScholarGoogle ScholarCross RefCross Ref
  14. Heaton Research, Inc. Encog Machine Learning Framework. Retrieved August 1, 2015 from http://www.heatonresearch.com/encog.Google ScholarGoogle Scholar
  15. D. Jefferson and J. Leek. 2010. Application of parallel discrete event simulation to the Space Surveillance Network. In Proceedings of the Advanced Maui Optical and Space Surveillance Technologies Conference, S. Ryan (ed.). Maui Economic Development Board, E, Vol. 34.Google ScholarGoogle Scholar
  16. D. Korn and K. Vo. 2002. The VCDIFF generic differencing and compression data format. Retrieved August 1, 2015 from http://www.heise.de/netze/rfc/rfcs/rfc3284.shtml. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Lee, B.-G. Chun, and R. H. Katz. 2011. Heterogeneity-aware resource allocation and scheduling in the cloud. Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. MacDonald. 2008. XDelta. Retrieved August 1, 2015 from http://xdelta.org.Google ScholarGoogle Scholar
  19. M. Malensek, S. L. Pallickara, and S. Pallickara. 2012. Exploiting geospatial and chronological characteristics in data streams to enable efficient storage and retrievals. Future Generation Computer Systems 29, 4, 1049--1061. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Malensek, Z. Sui, N. Harvey, and S. Pallickara. 2013. Autonomous, failure-resilient orchestration of distributed discrete event simulations. Proceedings of the ACM Cloud and Autonomic Computing Conference. Miami, FL. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Pallickara, J. Ekanayake, and G. Fox. 2009. Granules: A lightweight, streaming runtime for cloud computing with support, for Map-Reduce. In IEEE International Conference on Cluster Computing and Workshops, 2009 (CLUSTER’09). IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  22. A. Park and R. M. Fujimoto. 2006. Aurora: An approach to high throughput parallel simulation. 20th Workshop on Principles of Advanced and Distributed Simulation (PADS’06). 3, 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Park and R. Fujimoto. 2007. A scalable framework for parallel discrete event simulations on desktop grids. In 8th IEEE/ACM International Conference on Grid Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Patterson, A. Brown, P. Broadwell, and others. 2002. Recovery-oriented computing (ROC): Motivation, definition, techniques, and case studies. Technical Report. UCB//CSD-02-1175, University of California, Berkeley Computer Science, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. L. Pendell, J. Leatherman, T. C. Schroeder, and G. S. Alward. 2007. The economic impacts of a foot-and-mouth disease outbreak: A regional analysis. Journal of Agricultural and Applied Economics 39, 0, 19--33.Google ScholarGoogle ScholarCross RefCross Ref
  26. C. Percival. 2006. Matching with mismatches and assorted applications. Ph.D. Dissertation. University of Oxford. Oxford, UK.Google ScholarGoogle Scholar
  27. K. Portacci, A. Reeves, B. Corso, and M. Salman. 2009. Evaluation of vaccination strategies for an outbreak of pseudorabies virus in US commercial swine using the NAADSM. In ISVEE 12: Proceedings of the 12th Symposium of the International Society for Veterinary Epidemiology and Economics, Durban, South Africa. 78.Google ScholarGoogle Scholar
  28. J. L. Ramírez Ortiz and R. M. Jiménez. 2011. Fault-tolerant distributed discrete event simulator based on a p2p architecture. In SIMUL 2011, The 3rd International Conference on Advances in System Simulation. 21--26.Google ScholarGoogle Scholar
  29. N. Roy, A. Dubey, and A. Gokhale. 2011. Efficient autoscaling in the cloud using predictive models for workload forecasting. 2011 IEEE International Conference on Cloud Computing (CLOUD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Vanmechelen, S. De Munck, and J. Broeckhove. 2013. Conservative distributed discrete-event simulation on the Amazon EC2 cloud: An evaluation of time synchronization protocol performance and cost efficiency. Simulation Modelling Practice and Theory 34, 126--143.Google ScholarGoogle ScholarCross RefCross Ref
  31. V. Springel. 2005. The cosmological simulation code gadget-2. Monthly Notices of the Royal Astronomical Society 364, 4, 1105--1134.Google ScholarGoogle ScholarCross RefCross Ref
  32. Z. Sui, N. Harvey, and S. Pallickara. 2013. On the distributed orchestration of stochastic discrete event simulations. Concurrency and Computation: Practice and Experience. DOI:10.1002/cpe.3121 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Autonomous Orchestration of Distributed Discrete Event Simulations in the Presence of Resource Uncertainty

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!