Abstract
Mobile applications are increasingly being built using web technologies as a common substrate to achieve portability and to improve developer productivity. Unfortunately, web applications often incur large performance overhead, directly affecting the user quality-of-service (QoS) experience. Traditional techniques in improving mobile processor performance have mostly been adopting desktop-like design techniques such as increasing single-core microarchitecture complexity and aggressively integrating more cores. However, such a desktop-oriented strategy is likely coming to an end due to the stringent energy and thermal constraints that mobile devices impose. Therefore, we must pivot away from traditional mobile processor design techniques in order to provide sustainable performance improvement while maintaining energy efficiency.
In this article, we propose to combine hardware customization and specialization techniques to improve the performance and energy efficiency of mobile web applications. We first perform design-space exploration (DSE) and identify opportunities in customizing existing general-purpose mobile processors, that is, tuning microarchitecture parameters. The thorough DSE also lets us discover sources of energy inefficiency in customized general-purpose architectures. To mitigate these inefficiencies, we propose, synthesize, and evaluate two new domain-specific specializations, called the Style Resolution Unit and the Browser Engine Cache. Our optimizations boost performance and energy efficiency at the same time while maintaining general-purpose programmability. As emerging mobile workloads increasingly rely more on web technologies, the type of optimizations we propose will become important in the future and are likely to have a long-lasting and widespread impact.
- 7-cpu. 2017. ARM Cortex-A15 Specification. Retrieved from http://goo.gl/CXYook.Google Scholar
- Alexa. 2017. Alexa. Retrieved from http://www.alexa.com/.Google Scholar
- ARM. 2011. Enabling Mobile Innovation with the Cortex-A7 Processor. Retrieved from http://www.arm.com/about/events/enabling-mobile-innovation-with-the-cortex-a7-processor.php.Google Scholar
- ARM. 2012. Exploring the Design of the Cortex-A15 Processor. Retrieved from http://goo.gl/Pc8hPe.Google Scholar
- ARM. 2015a. ARM Cortex A15. Retrieved from http://www.arm.com/products/processors/cortex-a/cortex-a15.php.Google Scholar
- ARM. 2015b. ARM DS-5. Retrieved from http://ds.arm.com/ds-5/optimize/.Google Scholar
- Omid Azizi, Aqeel Mahesri, Benjamin C. Lee, Sanjay J. Patel, and Mark Horowitz. 2010. Energy performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In Proc. of ISCA. Google Scholar
Digital Library
- Carmen Badea, Mohammad R. Haghighat, Alexandru Nicolau, and Alexander V. Veidenbaum. 2010. Towards parallelizing the layout engine of firefox. In Proc. of USENIX HotPar. Google Scholar
Digital Library
- Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proc. of CODES+ISSS. Google Scholar
Digital Library
- Battery University. 2011. Battery Statistics. Retrieved from http://goo.gl/90mMeb.Google Scholar
- Vikram Bhatt, Nathan Goulding-Hotta, Qiaoshi Zheng, Jack Sampson, Steven Swanson, and Michael Bedford Taylor. 2012. SiChrome: Mobile web browsing in hardware to save energy. In DaSi: First Dark Silicon Workshop (2012).Google Scholar
- Joshua Bixby. 2011a. 2012 Predictions: The Average Web Page Will Hit 1 MB, Google and Siri Will Face Off, and Chrome, Windows 7, and RUM will rise. Retrieved from http://goo.gl/WmcTsx.Google Scholar
- Joshua Bixby. 2011b. The Relationship Between Faster Mobile Sites and Business KPIs: Case Studies from the Mobile Frontier. Retrieved from http://goo.gl/shnlDF.Google Scholar
- Emily Blem, Jaikrishnan Menon, and Karthikeyan Sankaralingam. 2013. Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures. In Proc. of HPCA. Google Scholar
Digital Library
- Michael Butler, Tse-Yu Yeh, Yalt Patt, Mitch Alsup, Hunter Scales, and Michael Shebanow. 1991. Single instruction stream parallelism is greater than two. In Proc. of ISCA. Google Scholar
Digital Library
- Calin Cascaval, Seth Fowler, Pablo Montesinos Ortego, Wayne Piekarski, Mehrdad Reshadi, Behnam Robatmili, Michael Weber, and Vrajesh Bhavsar. 2013. Zoomm: A parallel web browser engine for multicore mobile devices. In Proc. of PPoPP. Google Scholar
Digital Library
- Gaurav Chadha, Scott Mahlke, and Satish Narayanasamy. 2014. EFetch: Optimizing instruction fetch for event-driven web applications. In Proc. of PACT. Google Scholar
Digital Library
- Gaurav Chadha, Scott Mahlke, and Satish Narayanasamy. 2015. Accelerating asynchronous programs through event sneak peek. In Proc. of ISCA. Google Scholar
Digital Library
- Construct2. 2015. Construct2. Retrieved from https://www.scirra.com/construct2.Google Scholar
- G. Dunteman. 1989. Principal Component Analysis. Sage Publications. Google Scholar
Cross Ref
- Kit Eaton. 2013. How 1s Could Cost Amazon $1.6 Billion in Sales. Retrieved from http://goo.gl/qG0M2Q.Google Scholar
- Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2009. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In Proc. of HPCA. Google Scholar
Cross Ref
- Carlos Flores Fajardo, Zhen Fang, Ravi Iyer, German Fabila Garcia, Seung Eun Lee, and Li Zhao. 2011. Buffer-integrated-cache: A cost-effective SRAM architecture for handheld and embedded platforms. In Proc. of DAC. Google Scholar
Digital Library
- Gigaom. 2013. Portable Native Client. Retrieved from http://goo.gl/Olm3NP.Google Scholar
- Jeffrey Glueck. 2011. Why Flurry Got It Wrong on Mobile Apps Vs. Web Browsers. Retrieved from http://www.businessinsider.com/why-flurry-got-it-wrong-on-apps-v-browsers-2011-6. (2011).Google Scholar
- Google. 2015. Chromium browser. http://www.chromium.org/Home.Google Scholar
- Lauren Guckert, Mike O’Connor, Satheesh Kumar Ravindranath, Zhuoran Zhao, and Vijay Janapa Reddi. 2013. A case for persistent caching of compiled javascript code in mobile web browsers. In Workshop on AMAS-BT.Google Scholar
- Qi Guo, Tianshi Chen, Yunji Chen, Zhihua Zhou, Weiwu Hu, and Zhiwei Xu. 2011. Effective and efficient microprocessor design space exploration using unlabeled design configurations. In Proc. of IJCAI. Google Scholar
Digital Library
- A. Gutierrez, R. Dreslinski, A. Saidi, C. Emmons, N. Paver, T. Wenisch, and T. Mudge. 2011. Full-system analysis and characterization of interactive smartphone applications. In Proc. of IISWC. Google Scholar
Digital Library
- Erik G. Hallnor and Steven K. Reinhardt. 2000. A fully associative software-managed cache design. In Proc. of ISCA. Google Scholar
Digital Library
- Matthew Halpern, Yuhao Zhu, and Vijay Janapa Reddi. 2016. Mobile CPU’s rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction. In Proc. of HPCA.Google Scholar
Cross Ref
- Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proc. of ISCA. Google Scholar
Digital Library
- Hardkernel. 2015. ODROID-XU+E Development Board. Retrieved from http://goo.gl/Ige0Jp.Google Scholar
- Frank E. Harrell. 2001. Regression Modeling Strategies. Springer. Google Scholar
Digital Library
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning. Springer. Google Scholar
Cross Ref
- Urs Hoelzle. 2012. The Google Gospel of Speed. Retrieved from https://goo.gl/fTd0f0.Google Scholar
- Junxian Huang, Feng Qian, Alexandre Gerber, Z. Morley Mao, Subhabrata Sen, and Oliver Spatscheck. 2012. A close examination of performance and power characteristics of 4G LTE networks. In Proc. of MobiSys. Google Scholar
Digital Library
- Intel. 2012. Intel Atom Processor Z2460. Retrieved from http://download.intel.com/newsroom/kits/ces/2012/pdfs/AtomprocessorZ2460.pdf.Google Scholar
- Intel. 2013. Technology Insight: Intel Silvermont Microarchitecture. Retrieved from http://www.anandtech.com/show/6936/intels-silvermont-architecture-revealed-getting-serious-about-mobile.Google Scholar
- Johnson Kin, Munish Gupta, and William H. Mangione-Smith. 1997. The filter cache: An energy efficient memory structure. In Proc. of MICRO. Google Scholar
Digital Library
- Brian Klug and Anand Lal Shimpi. 2011. Krait Cache and Memory Hierarchy. Retrieved from http://goo.gl/ZuO7X2.Google Scholar
- Theo Kluter, Philip Brisk, Edoardo Charbon, and Paolo Ienne. 2013. Way stealing: A unified data cache and architecturally visible storage for instruction set extensions. In IEEE Transactions on VLSI. Google Scholar
Digital Library
- Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the walkers: Accelerating index traversals for in-memory databases. In Proc. of MICRO. Google Scholar
Digital Library
- Kssmetrics. 2011a. How Loading Time Affects Your Bottom Line. Retrieved from https://goo.gl/XlWGBK.Google Scholar
- Kssmetrics. 2011b. Speed Is a Killer. Retrieved from http://goo.gl/4PfsJL.Google Scholar
- Frederic Lardinois. 2013. Mozilla and Epic Games Bring Unreal Engine 3 to the Web. Retrieved from http://techcrunch.com/2013/03/27/mozilla-and-epic-gam es-bring-unreal-engine-3-to-the-web-no-plugin-needed/.Google Scholar
- Benjamin C. Lee and David M. Brooks. 2006. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proc. of ASPLOS. Google Scholar
Digital Library
- Paul Lewis. 2014. Rendering Performance. Retrieved from https://goo.gl/Ff5HrD.Google Scholar
- Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proc. of MICRO. Google Scholar
Digital Library
- Yingmin Li, Mark Hempstead, Patrick Mauro, David Brooks, Zhigang Hu, and Kevin Skadron. 2005. Power and thermal effects of SRAM vs. latch mux design styles and clocking gating choices. In Proc. of ISLPED. Google Scholar
Digital Library
- Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztian Flautner. 2006. SODA: A low-power architecture for software radio. In Proc. of ISCA. Google Scholar
Digital Library
- Dimitrios Lymberopoulos, Oriana Riva, Karin Strauss, Akshay Mittal, and Alexandros Ntoulas. 2012. PocketWeb: Instant web browsing for mobile devices. In Proc. of ASPLOS. Google Scholar
Digital Library
- Haohui Mai, Shuo Tang, Samuel T. King, Calin Cascaval, and Montesinos Pablo. 2012. A case for parallelizing web pages. In Proc. of USENIX HotPar. Google Scholar
Digital Library
- Mojtaba Mehrara, Po-Chun Hsu, Mehrzad Samadi, and Scott Mahlke. 2011. Dynamic parallelization of javascript applications using an ultra-lightweight speculation mechanism. In Proc. of HPCA. Google Scholar
Digital Library
- Mojtaba Mehrara and Scott Mahlke. 2011. Dynamically accelerating client-side web applications through decoupled execution. In Proc. of CGO. Google Scholar
Digital Library
- Leo A. Meyerovich and Rastislav. Bodik. 2010. Fast and parallel webpage layout. In Proc. of WWW. Google Scholar
Digital Library
- Leo A. Meyerovich and Rastislav Bodik. 2012. FTL: Synthesizing a parallel layout engine. In Proc. of ECOOP.Google Scholar
- Leo A. Meyerovich, Matthew E. Torok, Eric Atkinson, and Rastislav Bodik. 2013. Parallel schedule synthesis for attribute grammars. In Proc. of PPoPP. Google Scholar
Digital Library
- Rustam Miftakhutdinov, Eiman Ebrahimi, and Yale N. Patt. 2012. Predicting performance impact of DVFS for realistic memory systems. In Proc. of MICRO. Google Scholar
Digital Library
- Mozilla. 2015. Servo. Retrieved from https://github.com/mozilla/servo.Google Scholar
- NVidia. 2013. Hardware Support for WebRTC in Tegra4. Retrieved from http://blogs.nvidia.com/blog/2013/05/17/nvidia-shows-off-first-1080p-high-def-mobile-video-conferencing-at-google-io-with-tegra-4/.Google Scholar
- OpenHub. 2017. Chromium Project Summary: Languages. https://goo.gl/XQb3EO.Google Scholar
- Dhinakaran Pandiyan, Shin-Ying Lee, and Carole-Jean Wu. 2013. Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite-mobilebench. In Proc. of IISWC. Google Scholar
Cross Ref
- Avadh Patel, Furat Afram, Shunfei Chen, and Kanad Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In Proc. of DAC. Google Scholar
Digital Library
- Yale N. Patt, Sanjay J. Patel, Marius Evers, Daniel H. Friendly, and Jared Stark. 1997. One billion transistors, one uniprocessor, one chip. Computer 30, 9 (1997), 51--57. Google Scholar
Digital Library
- Aashish Phansalkar, Ajay Joshi, Lieven Eeckhout, and Lizy John. 2006. Four Generations of SPEC CPU Benchmarks: What Has Changed and What Has Not. Technical Report LCA-TR-041026-01-1. The University of Texas as Austin.Google Scholar
- Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, and Mark A. Horowitz. 2013. Convolution engine: Balancing efficiency 8 flexibility in specialized computing. In Proc. of ISCA. Google Scholar
Digital Library
- Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proc. of ISCA. Google Scholar
Digital Library
- P. Ratanaworabhan, B. Livshits, D. Simmons, and B. Zorn. 2009. JSMeter: Characterizing real-world behavior of Javascript programs. In Prof. of Webapps.Google Scholar
- Samsung. 2015. Samsung Exynos 5410 SoC. Retrieved from http://goo.gl/KpbHm3.Google Scholar
- Fred Schlachter. 2013. No Moore’s law for batteries. In Proc. of National Academy of Science of the United States of America. Google Scholar
Cross Ref
- Open Signal. 2014. Android Fragmentation Visualized. Retrieved from http://goo.gl/ODlx4z.Google Scholar
- Shikhir Singh. 2015. HTML5 On the Rise: No Longer Ahead of Its Time. Retrieved from http://goo.gl/yuEVCy.Google Scholar
- Mac Slocum. 2011. You Can’t Get Away With a Bad Mobile Experience Anymore. Retrieved from http://goo.gl/T3812z.Google Scholar
- Narendran Thiagarajan, Gaurav Aggarwal, Angela Nicoara, Dan Boneh, and Jatinder Pal Singh. 2012. Who killed my battery?: Analyzing mobile browser energy consumption. In Proc. of WWW. Google Scholar
Digital Library
- Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi. 2008. CACTI 5.1. http://www.hpl.hp.com/research/cacti/.Google Scholar
- W3C. 2014. CSS Cascading Order. Retrieved from https://goo.gl/PkKg92.Google Scholar
- David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Katie Baynes, Aamer Jaleel, and Bruce Jacob. 2005. DRAMsim: A memory-system simulator. In Computer Architecture News. Google Scholar
Digital Library
- Zhen Wang, Felix Xiaozhu Lin, Lin Zhong, and Mansoor Chishtie. 2011. Why are web browsers slow on smartphones? In Proc. of HotMobile. Google Scholar
Digital Library
- Zhen Wang, Felix Xiaozhu Lin, Lin Zhong, and Mansoor Chishtie. 2012. How far can client-only solutions go for mobile browser speed?. In Proc. of WWW. Google Scholar
Digital Library
- WebKit. 2015. WebKit. http://www.webkit.org.Google Scholar
- Mark Woh, Sangwon Seo, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and Krisztian Flautner. 2009. AnySP: Anytime anywhere anyway signal processing. In Proc. of ISCA. Google Scholar
Digital Library
- Dong Ye, Joydeep Ray, Christophe Harle, and David Kaeli. 2006. Performance characterization of SPEC CPU2006 integer benchmarks on x86-64 architecture. In Prof. of IISWC. Google Scholar
Cross Ref
- Allan Yogasingam. 2013. Teardown: Samsung Galaxy S4. http://goo.gl/BQl4Dg.Google Scholar
- Kaimin Zhang, Lu Wang, Aimin Pan, and Bin Benjamin Zhu. 2010. Smart caching for web browsers. In Proc. of WWW. Google Scholar
Digital Library
- Yuhao Zhu, Matthew Halpern, and Vijay Janapa Reddi. 2015a. Event-based scheduling for energy-efficient QoS (eQoS) in mobile web applications. In Proc. of HPCA. Google Scholar
Cross Ref
- Yuhao Zhu, Matthew Halpern, and Vijay Janapa Reddi. 2015b. The role of the CPU in energy-efficient mobile web browsing. In IEEE Micro. Google Scholar
Cross Ref
- Yuhao Zhu and Vijay Janapa Reddi. 2013. High-performance and energy-efficient mobile web browsing on big/little systems. In Proc. of HPCA. Google Scholar
Digital Library
- Yuhao Zhu and Vijay Janapa Reddi. 2014. WebCore: Architectural support for mobile web browsing. In Proc. of ISCA. Google Scholar
Digital Library
- Yuhao Zhu and Vijay Janapa Reddi. 2016. GreenWeb: Language extensions for QoS-aware energy-efficient mobile web computing. In Proc. of PLDI. Google Scholar
Digital Library
- Yuhao Zhu, Aditya Srikanth, Jingwen Leng, and Vijay Janapa Reddi. 2014. Exploiting webpage characteristics for energy-efficient mobile web browsing. In Computer Architecture Letters. Google Scholar
Digital Library
Index Terms
Optimizing General-Purpose CPUs for Energy-Efficient Mobile Web Computing
Recommendations
An Efficient Acceleration of Symmetric Key Cryptography Using General Purpose Graphics Processing Unit
SECURWARE '10: Proceedings of the 2010 Fourth International Conference on Emerging Security Information, Systems and TechnologiesGraphics Processing Units (GPU) have been the extensive research topic in recent years and have been successfully applied to general purpose applications other than computer graphical area. The nVidia CUDA programming model provides a straightforward ...
Era of customization and specialization
ASAP '11: Proceedings of the ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and ProcessorsSummary form only given. In order to drastically improve the energy efficiency, we believe that future computer processors need to go beyond parallelization, and provide architecture support of customization and specialization so that the processor ...
Employing Software-Managed Caches in OpenACC: Opportunities and Benefits
Inaugural IssueThe OpenACC programming model has been developed to simplify accelerator programming and improve development productivity. In this article, we investigate the main limitations faced by OpenACC in harnessing all capabilities of GPU-like accelerators. We ...






Comments