skip to main content
research-article

Optimizing General-Purpose CPUs for Energy-Efficient Mobile Web Computing

Published:20 March 2017Publication History
Skip Abstract Section

Abstract

Mobile applications are increasingly being built using web technologies as a common substrate to achieve portability and to improve developer productivity. Unfortunately, web applications often incur large performance overhead, directly affecting the user quality-of-service (QoS) experience. Traditional techniques in improving mobile processor performance have mostly been adopting desktop-like design techniques such as increasing single-core microarchitecture complexity and aggressively integrating more cores. However, such a desktop-oriented strategy is likely coming to an end due to the stringent energy and thermal constraints that mobile devices impose. Therefore, we must pivot away from traditional mobile processor design techniques in order to provide sustainable performance improvement while maintaining energy efficiency.

In this article, we propose to combine hardware customization and specialization techniques to improve the performance and energy efficiency of mobile web applications. We first perform design-space exploration (DSE) and identify opportunities in customizing existing general-purpose mobile processors, that is, tuning microarchitecture parameters. The thorough DSE also lets us discover sources of energy inefficiency in customized general-purpose architectures. To mitigate these inefficiencies, we propose, synthesize, and evaluate two new domain-specific specializations, called the Style Resolution Unit and the Browser Engine Cache. Our optimizations boost performance and energy efficiency at the same time while maintaining general-purpose programmability. As emerging mobile workloads increasingly rely more on web technologies, the type of optimizations we propose will become important in the future and are likely to have a long-lasting and widespread impact.

References

  1. 7-cpu. 2017. ARM Cortex-A15 Specification. Retrieved from http://goo.gl/CXYook.Google ScholarGoogle Scholar
  2. Alexa. 2017. Alexa. Retrieved from http://www.alexa.com/.Google ScholarGoogle Scholar
  3. ARM. 2011. Enabling Mobile Innovation with the Cortex-A7 Processor. Retrieved from http://www.arm.com/about/events/enabling-mobile-innovation-with-the-cortex-a7-processor.php.Google ScholarGoogle Scholar
  4. ARM. 2012. Exploring the Design of the Cortex-A15 Processor. Retrieved from http://goo.gl/Pc8hPe.Google ScholarGoogle Scholar
  5. ARM. 2015a. ARM Cortex A15. Retrieved from http://www.arm.com/products/processors/cortex-a/cortex-a15.php.Google ScholarGoogle Scholar
  6. ARM. 2015b. ARM DS-5. Retrieved from http://ds.arm.com/ds-5/optimize/.Google ScholarGoogle Scholar
  7. Omid Azizi, Aqeel Mahesri, Benjamin C. Lee, Sanjay J. Patel, and Mark Horowitz. 2010. Energy performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In Proc. of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Carmen Badea, Mohammad R. Haghighat, Alexandru Nicolau, and Alexander V. Veidenbaum. 2010. Towards parallelizing the layout engine of firefox. In Proc. of USENIX HotPar. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proc. of CODES+ISSS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Battery University. 2011. Battery Statistics. Retrieved from http://goo.gl/90mMeb.Google ScholarGoogle Scholar
  11. Vikram Bhatt, Nathan Goulding-Hotta, Qiaoshi Zheng, Jack Sampson, Steven Swanson, and Michael Bedford Taylor. 2012. SiChrome: Mobile web browsing in hardware to save energy. In DaSi: First Dark Silicon Workshop (2012).Google ScholarGoogle Scholar
  12. Joshua Bixby. 2011a. 2012 Predictions: The Average Web Page Will Hit 1 MB, Google and Siri Will Face Off, and Chrome, Windows 7, and RUM will rise. Retrieved from http://goo.gl/WmcTsx.Google ScholarGoogle Scholar
  13. Joshua Bixby. 2011b. The Relationship Between Faster Mobile Sites and Business KPIs: Case Studies from the Mobile Frontier. Retrieved from http://goo.gl/shnlDF.Google ScholarGoogle Scholar
  14. Emily Blem, Jaikrishnan Menon, and Karthikeyan Sankaralingam. 2013. Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures. In Proc. of HPCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Michael Butler, Tse-Yu Yeh, Yalt Patt, Mitch Alsup, Hunter Scales, and Michael Shebanow. 1991. Single instruction stream parallelism is greater than two. In Proc. of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Calin Cascaval, Seth Fowler, Pablo Montesinos Ortego, Wayne Piekarski, Mehrdad Reshadi, Behnam Robatmili, Michael Weber, and Vrajesh Bhavsar. 2013. Zoomm: A parallel web browser engine for multicore mobile devices. In Proc. of PPoPP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gaurav Chadha, Scott Mahlke, and Satish Narayanasamy. 2014. EFetch: Optimizing instruction fetch for event-driven web applications. In Proc. of PACT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gaurav Chadha, Scott Mahlke, and Satish Narayanasamy. 2015. Accelerating asynchronous programs through event sneak peek. In Proc. of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Construct2. 2015. Construct2. Retrieved from https://www.scirra.com/construct2.Google ScholarGoogle Scholar
  20. G. Dunteman. 1989. Principal Component Analysis. Sage Publications. Google ScholarGoogle ScholarCross RefCross Ref
  21. Kit Eaton. 2013. How 1s Could Cost Amazon $1.6 Billion in Sales. Retrieved from http://goo.gl/qG0M2Q.Google ScholarGoogle Scholar
  22. Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2009. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In Proc. of HPCA. Google ScholarGoogle ScholarCross RefCross Ref
  23. Carlos Flores Fajardo, Zhen Fang, Ravi Iyer, German Fabila Garcia, Seung Eun Lee, and Li Zhao. 2011. Buffer-integrated-cache: A cost-effective SRAM architecture for handheld and embedded platforms. In Proc. of DAC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gigaom. 2013. Portable Native Client. Retrieved from http://goo.gl/Olm3NP.Google ScholarGoogle Scholar
  25. Jeffrey Glueck. 2011. Why Flurry Got It Wrong on Mobile Apps Vs. Web Browsers. Retrieved from http://www.businessinsider.com/why-flurry-got-it-wrong-on-apps-v-browsers-2011-6. (2011).Google ScholarGoogle Scholar
  26. Google. 2015. Chromium browser. http://www.chromium.org/Home.Google ScholarGoogle Scholar
  27. Lauren Guckert, Mike O’Connor, Satheesh Kumar Ravindranath, Zhuoran Zhao, and Vijay Janapa Reddi. 2013. A case for persistent caching of compiled javascript code in mobile web browsers. In Workshop on AMAS-BT.Google ScholarGoogle Scholar
  28. Qi Guo, Tianshi Chen, Yunji Chen, Zhihua Zhou, Weiwu Hu, and Zhiwei Xu. 2011. Effective and efficient microprocessor design space exploration using unlabeled design configurations. In Proc. of IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Gutierrez, R. Dreslinski, A. Saidi, C. Emmons, N. Paver, T. Wenisch, and T. Mudge. 2011. Full-system analysis and characterization of interactive smartphone applications. In Proc. of IISWC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Erik G. Hallnor and Steven K. Reinhardt. 2000. A fully associative software-managed cache design. In Proc. of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Matthew Halpern, Yuhao Zhu, and Vijay Janapa Reddi. 2016. Mobile CPU’s rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction. In Proc. of HPCA.Google ScholarGoogle ScholarCross RefCross Ref
  32. Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proc. of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hardkernel. 2015. ODROID-XU+E Development Board. Retrieved from http://goo.gl/Ige0Jp.Google ScholarGoogle Scholar
  34. Frank E. Harrell. 2001. Regression Modeling Strategies. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  36. Urs Hoelzle. 2012. The Google Gospel of Speed. Retrieved from https://goo.gl/fTd0f0.Google ScholarGoogle Scholar
  37. Junxian Huang, Feng Qian, Alexandre Gerber, Z. Morley Mao, Subhabrata Sen, and Oliver Spatscheck. 2012. A close examination of performance and power characteristics of 4G LTE networks. In Proc. of MobiSys. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Intel. 2012. Intel Atom Processor Z2460. Retrieved from http://download.intel.com/newsroom/kits/ces/2012/pdfs/AtomprocessorZ2460.pdf.Google ScholarGoogle Scholar
  39. Intel. 2013. Technology Insight: Intel Silvermont Microarchitecture. Retrieved from http://www.anandtech.com/show/6936/intels-silvermont-architecture-revealed-getting-serious-about-mobile.Google ScholarGoogle Scholar
  40. Johnson Kin, Munish Gupta, and William H. Mangione-Smith. 1997. The filter cache: An energy efficient memory structure. In Proc. of MICRO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Brian Klug and Anand Lal Shimpi. 2011. Krait Cache and Memory Hierarchy. Retrieved from http://goo.gl/ZuO7X2.Google ScholarGoogle Scholar
  42. Theo Kluter, Philip Brisk, Edoardo Charbon, and Paolo Ienne. 2013. Way stealing: A unified data cache and architecturally visible storage for instruction set extensions. In IEEE Transactions on VLSI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the walkers: Accelerating index traversals for in-memory databases. In Proc. of MICRO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Kssmetrics. 2011a. How Loading Time Affects Your Bottom Line. Retrieved from https://goo.gl/XlWGBK.Google ScholarGoogle Scholar
  45. Kssmetrics. 2011b. Speed Is a Killer. Retrieved from http://goo.gl/4PfsJL.Google ScholarGoogle Scholar
  46. Frederic Lardinois. 2013. Mozilla and Epic Games Bring Unreal Engine 3 to the Web. Retrieved from http://techcrunch.com/2013/03/27/mozilla-and-epic-gam es-bring-unreal-engine-3-to-the-web-no-plugin-needed/.Google ScholarGoogle Scholar
  47. Benjamin C. Lee and David M. Brooks. 2006. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proc. of ASPLOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Paul Lewis. 2014. Rendering Performance. Retrieved from https://goo.gl/Ff5HrD.Google ScholarGoogle Scholar
  49. Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proc. of MICRO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yingmin Li, Mark Hempstead, Patrick Mauro, David Brooks, Zhigang Hu, and Kevin Skadron. 2005. Power and thermal effects of SRAM vs. latch mux design styles and clocking gating choices. In Proc. of ISLPED. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztian Flautner. 2006. SODA: A low-power architecture for software radio. In Proc. of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Dimitrios Lymberopoulos, Oriana Riva, Karin Strauss, Akshay Mittal, and Alexandros Ntoulas. 2012. PocketWeb: Instant web browsing for mobile devices. In Proc. of ASPLOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Haohui Mai, Shuo Tang, Samuel T. King, Calin Cascaval, and Montesinos Pablo. 2012. A case for parallelizing web pages. In Proc. of USENIX HotPar. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Mojtaba Mehrara, Po-Chun Hsu, Mehrzad Samadi, and Scott Mahlke. 2011. Dynamic parallelization of javascript applications using an ultra-lightweight speculation mechanism. In Proc. of HPCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Mojtaba Mehrara and Scott Mahlke. 2011. Dynamically accelerating client-side web applications through decoupled execution. In Proc. of CGO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Leo A. Meyerovich and Rastislav. Bodik. 2010. Fast and parallel webpage layout. In Proc. of WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Leo A. Meyerovich and Rastislav Bodik. 2012. FTL: Synthesizing a parallel layout engine. In Proc. of ECOOP.Google ScholarGoogle Scholar
  58. Leo A. Meyerovich, Matthew E. Torok, Eric Atkinson, and Rastislav Bodik. 2013. Parallel schedule synthesis for attribute grammars. In Proc. of PPoPP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Rustam Miftakhutdinov, Eiman Ebrahimi, and Yale N. Patt. 2012. Predicting performance impact of DVFS for realistic memory systems. In Proc. of MICRO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Mozilla. 2015. Servo. Retrieved from https://github.com/mozilla/servo.Google ScholarGoogle Scholar
  61. NVidia. 2013. Hardware Support for WebRTC in Tegra4. Retrieved from http://blogs.nvidia.com/blog/2013/05/17/nvidia-shows-off-first-1080p-high-def-mobile-video-conferencing-at-google-io-with-tegra-4/.Google ScholarGoogle Scholar
  62. OpenHub. 2017. Chromium Project Summary: Languages. https://goo.gl/XQb3EO.Google ScholarGoogle Scholar
  63. Dhinakaran Pandiyan, Shin-Ying Lee, and Carole-Jean Wu. 2013. Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite-mobilebench. In Proc. of IISWC. Google ScholarGoogle ScholarCross RefCross Ref
  64. Avadh Patel, Furat Afram, Shunfei Chen, and Kanad Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In Proc. of DAC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Yale N. Patt, Sanjay J. Patel, Marius Evers, Daniel H. Friendly, and Jared Stark. 1997. One billion transistors, one uniprocessor, one chip. Computer 30, 9 (1997), 51--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Aashish Phansalkar, Ajay Joshi, Lieven Eeckhout, and Lizy John. 2006. Four Generations of SPEC CPU Benchmarks: What Has Changed and What Has Not. Technical Report LCA-TR-041026-01-1. The University of Texas as Austin.Google ScholarGoogle Scholar
  67. Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, and Mark A. Horowitz. 2013. Convolution engine: Balancing efficiency 8 flexibility in specialized computing. In Proc. of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proc. of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. P. Ratanaworabhan, B. Livshits, D. Simmons, and B. Zorn. 2009. JSMeter: Characterizing real-world behavior of Javascript programs. In Prof. of Webapps.Google ScholarGoogle Scholar
  70. Samsung. 2015. Samsung Exynos 5410 SoC. Retrieved from http://goo.gl/KpbHm3.Google ScholarGoogle Scholar
  71. Fred Schlachter. 2013. No Moore’s law for batteries. In Proc. of National Academy of Science of the United States of America. Google ScholarGoogle ScholarCross RefCross Ref
  72. Open Signal. 2014. Android Fragmentation Visualized. Retrieved from http://goo.gl/ODlx4z.Google ScholarGoogle Scholar
  73. Shikhir Singh. 2015. HTML5 On the Rise: No Longer Ahead of Its Time. Retrieved from http://goo.gl/yuEVCy.Google ScholarGoogle Scholar
  74. Mac Slocum. 2011. You Can’t Get Away With a Bad Mobile Experience Anymore. Retrieved from http://goo.gl/T3812z.Google ScholarGoogle Scholar
  75. Narendran Thiagarajan, Gaurav Aggarwal, Angela Nicoara, Dan Boneh, and Jatinder Pal Singh. 2012. Who killed my battery?: Analyzing mobile browser energy consumption. In Proc. of WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi. 2008. CACTI 5.1. http://www.hpl.hp.com/research/cacti/.Google ScholarGoogle Scholar
  77. W3C. 2014. CSS Cascading Order. Retrieved from https://goo.gl/PkKg92.Google ScholarGoogle Scholar
  78. David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Katie Baynes, Aamer Jaleel, and Bruce Jacob. 2005. DRAMsim: A memory-system simulator. In Computer Architecture News. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Zhen Wang, Felix Xiaozhu Lin, Lin Zhong, and Mansoor Chishtie. 2011. Why are web browsers slow on smartphones? In Proc. of HotMobile. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Zhen Wang, Felix Xiaozhu Lin, Lin Zhong, and Mansoor Chishtie. 2012. How far can client-only solutions go for mobile browser speed?. In Proc. of WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. WebKit. 2015. WebKit. http://www.webkit.org.Google ScholarGoogle Scholar
  82. Mark Woh, Sangwon Seo, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztian Flautner. 2009. AnySP: Anytime anywhere anyway signal processing. In Proc. of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Dong Ye, Joydeep Ray, Christophe Harle, and David Kaeli. 2006. Performance characterization of SPEC CPU2006 integer benchmarks on x86-64 architecture. In Prof. of IISWC. Google ScholarGoogle ScholarCross RefCross Ref
  84. Allan Yogasingam. 2013. Teardown: Samsung Galaxy S4. http://goo.gl/BQl4Dg.Google ScholarGoogle Scholar
  85. Kaimin Zhang, Lu Wang, Aimin Pan, and Bin Benjamin Zhu. 2010. Smart caching for web browsers. In Proc. of WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Yuhao Zhu, Matthew Halpern, and Vijay Janapa Reddi. 2015a. Event-based scheduling for energy-efficient QoS (eQoS) in mobile web applications. In Proc. of HPCA. Google ScholarGoogle ScholarCross RefCross Ref
  87. Yuhao Zhu, Matthew Halpern, and Vijay Janapa Reddi. 2015b. The role of the CPU in energy-efficient mobile web browsing. In IEEE Micro. Google ScholarGoogle ScholarCross RefCross Ref
  88. Yuhao Zhu and Vijay Janapa Reddi. 2013. High-performance and energy-efficient mobile web browsing on big/little systems. In Proc. of HPCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Yuhao Zhu and Vijay Janapa Reddi. 2014. WebCore: Architectural support for mobile web browsing. In Proc. of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Yuhao Zhu and Vijay Janapa Reddi. 2016. GreenWeb: Language extensions for QoS-aware energy-efficient mobile web computing. In Proc. of PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Yuhao Zhu, Aditya Srikanth, Jingwen Leng, and Vijay Janapa Reddi. 2014. Exploiting webpage characteristics for energy-efficient mobile web browsing. In Computer Architecture Letters. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing General-Purpose CPUs for Energy-Efficient Mobile Web Computing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Computer Systems
      ACM Transactions on Computer Systems  Volume 35, Issue 1
      February 2017
      101 pages
      ISSN:0734-2071
      EISSN:1557-7333
      DOI:10.1145/3067095
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 March 2017
      • Revised: 1 January 2017
      • Accepted: 1 January 2017
      • Received: 1 January 2016
      Published in tocs Volume 35, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!