skip to main content
research-article

A Stepwise Auto-Profiling Method for Performance Optimization of Streaming Applications

Published:14 November 2017Publication History
Skip Abstract Section

Abstract

Data stream management systems (DSMSs) are scalable, highly available, and fault-tolerant systems that aggregate and analyze real-time data in motion. To continuously perform analytics on the fly within the stream, state-of-the-art DSMSs host streaming applications as a set of interconnected operators, with each operator encapsulating the semantic of a specific operation. For parallel execution on a particular platform, these operators need to be appropriately replicated in multiple instances that split and process the workload simultaneously. Because the way operators are partitioned affects the resulting performance of streaming applications, it is essential for DSMSs to have a method to compare different operators and make holistic replication decisions to avoid performance bottlenecks and resource wastage. To this end, we propose a stepwise profiling approach to optimize application performance on a given execution platform. It automatically scales distributed computations over streams based on application features and processing power of provisioned resources and builds the relationship between provisioned resources and application performance metrics to evaluate the efficiency of the resulting configuration. Experimental results confirm that the proposed approach successfully fulfills its goals with minimal profiling overhead.

References

  1. Daniel J. Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: A new model and architecture for data stream management. VLDB J. 12, 2 (Aug. 2003), 120--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Lisa Amini, Navendu Jain, Anshul Sehgal, Jeremy Silber, and Olivier Verscheure. 2006. Adaptive control of extreme-scale stream processing systems. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS’06). IEEE Computer Society, 71--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. 2013. Adaptive online scheduling in storm. In Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems (DEBS’13). ACM, 207--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Joshua Auerbach, David F. Bacon, Perry Cheng, and Rodric Rabbah. 2010. Lime: A java-compatible and synthesizable language for heterogeneous architectures. In Proceedings of the ACM International Conference on Object-Oriented Programming Systems Languages and Applications (OOPSLA’10). ACM, 89--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Paolo Bellavista, Antonio Corradi, Andrea Reale, and Nicola Ticca. 2014. Priority-based resource scheduling in distributed stream processing systems for big data applications. In Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC’14). IEEE, 363--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael Cammert, Christoph Heinz, Jurgen Kramer, Bernhard Seeger, Sonny Vaupel, and Udo Wolske. 2007. Flexible multi-threaded scheduling for continuous queries over data streams. In Proceedings of the 23rd IEEE International Conference on Data Engineering Workshop (ICDE’07). IEEE, 624--633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Valeria Cardellini, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli. 2015. Distributed QoS-aware scheduling in storm. In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS’15). ACM, 344--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Valeria Cardellini, Matteo Nardelli, and Dario Luzi. 2016. Elastic stateful stream processing in storm. In Proceedings of the 2016 International Conference on High Performance Computing Simulation (HPCS’16). IEEE, 583--590. Google ScholarGoogle ScholarCross RefCross Ref
  9. Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter Pietzuch. 2013. Integrating scale out and fault tolerance in stream processing using operator state management. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, 725--736.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel R. Madden, Fred Reiss, and Mehul A. Shah. 2003. TelegraphCQ: Continuous dataflow processing. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD’03). ACM, 668--668.Google ScholarGoogle Scholar
  11. Andreas Chatzistergiou and Stratis D. Viglas. 2014. Fast heuristics for near-optimal task allocation in data stream processing over clusters. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM’14). ACM, 1579--1588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. 2000. NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00). ACM, 379--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tathagata Das, Yuan Zhong, Ion Stoica, and Scott Shenker. 2014. Adaptive stream processing using dynamic batch sizing. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’14). ACM, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Anh Vu Do, Junliang Chen, Chen Wang, Young Choon Lee, A. Y. Zomaya, and Bing Bing Zhou. 2011. Profiling applications for virtual machine placement in clouds. In Proceedings of the IEEE International Conference on Cloud Computing (CLOUD’11). IEEE, 660--667.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lorenz Fischer and Abraham Bernstein. 2015. Workload scheduling in distributed stream processors using graph partitioning. In Proceedings of the 2015 IEEE International Conference on Big Data (BigData’15). IEEE Computer Society, 124--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lorenz Fischer, Shen Gao, and Abraham Bernstein. 2015. Machines tuning machines: Configuring distributed stream processors with Bayesian Optimization. In Proceedings of the 2015 IEEE International Conference on Cluster Computing (CLUSTER’15). IEEE, 22--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Tom Z. J. Fu, Jianbing Ding, Richard T. B. Ma, Marianne Winslett, Yin Yang, and Zhenjie Zhang. 2015. DRS: Dynamic resource scheduling for real-time analytics over fast streams. In Proceedings of the IEEE 35th International Conference on Distributed Computing Systems (ICDCS’15). IEEE, 411--420.Google ScholarGoogle ScholarCross RefCross Ref
  18. Bugra Gedik, Scott Schneider, Martin Hirzel, and Kun-Lung Wu. 2014. Elastic scaling for data stream processing. IEEE Trans. Parallel Distrib. Syst. 25, 6 (June 2014), 1447--1463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman Amarasinghe. 2002. A stream compiler for communication-exposed architectures. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’02). ACM, 291--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Vincenzo Gulisano, Ricardo Jimenez-Peris, Marta Patino-Martinez, Claudio Soriente, and Patrick Valduriez. 2012. StreamCloud: An elastic and scalable data streaming system. IEEE Transactions on Parallel and Distributed Systems 23, 12 (Dec. 2012), 2351--2365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Thomas Heinze, Leonardo Aniello, Leonardo Querzoni, and Zbigniew Jerzak. 2014a. Cloud-based data stream processing. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS’14). ACM, 238--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Thomas Heinze, Valerio Pappalardo, Zbigniew Jerzak, and Christof Fetzer. 2014b. Auto-scaling techniques for elastic data stream processing. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS’14). ACM, 318--321.Google ScholarGoogle Scholar
  23. Thomas Heinze, Lars Roediger, Andreas Meister, Yuanzhen Ji, Zbigniew Jerzak, and Christof Fetzer. 2015. Online parameter optimization for elastic data stream processing. In Proceedings of the 6th ACM Symposium on Cloud Computing (SoCC’15). ACM, 276--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nicolas Hidalgo, Daniel Wladdimiro, and Erika Rosas. 2017. Self-adaptive processing graph with operator fission for elastic stream processing. Journal of Systems and Software 127 (2017), 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Amir H. Hormati, Yoonseo Choi, Manjunath Kudlur, Rodric Rabbah, Trevor Mudge, and Scott Mahlke. 2009. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). ACM, 214--223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Waldemar Hummer, Benjamin Satzger, and Schahram Dustdar. 2013. Elastic stream processing in the cloud. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 3, 5 (Sept. 2013), 333--345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Navendu Jain, Lisa Amini, Henrique Andrade, Richard King, Yoonho Park, Philippe Selo, and Chitra Venkatramani. 2006. Design, implementation, and evaluation of the linear road benchmark on the stream processing core. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’06). ACM, 431--442.Google ScholarGoogle Scholar
  28. Jeffrey O. Kephart and David M. Chess. 2003. The vision of autonomic computing. Computer 36, 1 (Jan. 2003), 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter heron: Stream processing at scale. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’15). ACM, 239--250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Teng Li, Jian Tang, and Jielong Xu. 2015. A predictive scheduling framework for fast and distributed stream data processing. In Proceedings of the 2015 IEEE International Conference on Big Data (BigData’15). IEEE Computer Society, 333--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bjn Lohrmann, Peter Janacik, and Odej Kao. 2015. Elastic stream processing with latency guarantees. In Proceedings of the 2015 IEEE 35th International Conference on Distributed Computing Systems (ICDCS’15). IEEE, 399--410. Google ScholarGoogle ScholarCross RefCross Ref
  32. Bjrn Lohrmann, Daniel Warneke, and Odej Kao. 2014. Nephele streaming: Stream processing under QoS constraints at scale. Cluster Comput. 17, 1 (2014), 61--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kasper Grud Skat Madsen, Yongluan Zhou, and Li Su. 2016. Enorm: Efficient window-based computation in large-scale distributed stream processing systems. In Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems (DEBS’16). ACM, 37--48.Google ScholarGoogle Scholar
  34. Tiziano De Matteis and Gabriele Mencagli. 2017. Proactive elasticity and energy awareness in data stream processing. Journal of Systems and Software 127 (2017), 302--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lory Al Moakar, Alexandros Labrinidis, and Panos K. Chrysanthis. 2012. Adaptive class-based scheduling of continuous queries. In Proceeding of the 28th IEEE International Conference on Data Engineering Workshop (ICDE’12). IEEE, 289--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Boyang Peng, Mohammad Hosseini, Zhihao Hong, Reza Farivar, and Roy Campbell. 2015. R-Storm: Resource-aware scheduling in Storm. In Proceedings of the 16th Annual Middleware Conference (Middleware’15). ACM, 149--161.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Feng Qian, Zhaoguang Wang, Alexandre Gerber, Zhuoqing Mao, Subhabrata Sen, and Oliver Spatscheck. 2011. Profiling resource usage for mobile applications: A cross-layer approach. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services (MobiSys’11). ACM, 321--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhengping Qian, Yong He, Chunzhi Su, Zhuojie Wu, Hongyu Zhu, Taizhi Zhang, Lidong Zhou, Yuan Yu, and Zheng Zhang. 2013. TimeStream: Reliable stream computation in the cloud. In Proceedings of the European Conference on Computer Systems (EuroSys’13). ACM, 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sajith Ravindra, Miyuru Dayarathna, and Sanath Jayasena. 2017. Latency aware elastic switching-based stream processing over compressed data streams. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE’17). ACM, 91--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Scott Schneider, Henrique Andrade, Bugra Gedik, Alain Biem, and Kun-Lung Wu. 2009. Elastic scaling of data parallel operators in stream processing. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing (IPDPS’09). IEEE, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Scott Schneider, Martin Hirzel, Bugra Gedik, and Kun-Lung Wu. 2012. Auto-parallelizing stateful distributed streaming applications. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, 53--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Du Shen, Qi Luo, Denys Poshyvanyk, and Mark Grechanik. 2015. Automating performance bottleneck detection using search-based application profiling. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA’15). ACM, 270--281.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Muhammad Aater Suleman, Moinuddin K. Qureshi, Khubaib, and Yale N. Patt. 2010. Feedback-directed pipeline parallelism. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, 147--156.Google ScholarGoogle Scholar
  44. Dawei Sun, Ge Fu, Xinran Liu, and Hong Zhang. 2014. Optimizing data stream graph for big data stream computing in cloud datacenter environments. Int. J. Adv. Comput. Technol. 6, 5 (2014), 53--65.Google ScholarGoogle Scholar
  45. Dawei Sun, Guangyan Zhang, Songlin Yang, Weimin Zheng, Samee U. Khan, and Keqin Li. 2015. Re-Stream: Real-time and energy-efficient resource scheduling in big data stream computing environments. Info. Sci. 319 (Oct. 2015), 92--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Bhuvan Urgaonkar, Prashant Shenoy, and Timothy Roscoe. 2002. Resource overbooking and application profiling in shared hosting platforms. SIGOPS Oper. Syst. Rev. 36, SI (Dec. 2002), 239--254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Rafael Weingartner, Gabriel Beims Brascher, and Carlos Becker Westphall. 2015. Cloud resource management: A survey on forecasting and profiling models. J. Netw. Comput. Appl. 47 (2015), 99--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Joel Wolf, Nikhil Bansal, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Rohit Wagle, Kun-Lung Wu, and Lisa Fleischer. 2008. SODA: An optimizing scheduler for large-scale stream-based distributed computer systems. In Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware (Middleware’08). Springer-Verlag, 306--325.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yingjun Wu and Kian-Lee Tan. 2015. ChronoStream: Elastic stateful stream computation in the cloud. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. IEEE, 723--734. Google ScholarGoogle ScholarCross RefCross Ref
  50. Ying Xing, Stan Zdonik, and Jeong-Hyon Hwang. 2005. Dynamic load distribution in the borealis stream processor. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). IEEE Computer Society, 791--802. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jielong Xu, Zhenhua Chen, Jian Tang, and Sen Su. 2014. T-Storm: Traffic-aware online scheduling in storm. In Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems (ICDCS’14). IEEE Computer Society, 535--544.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Le Xu, Boyang Peng, and Indranil Gupta. 2016. Stela: Enabling stream processing systems to scale-in and scale-out on-demand. In Proceedings of the 2016 IEEE International Conference on Cloud Engineering (IC2E’16). IEEE, 22--31.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Stepwise Auto-Profiling Method for Performance Optimization of Streaming Applications

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!