Abstract
We are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class of devices, energy efficiency is a first-class concern due to the limited battery capacity and thermal power budget. We find that data movement is a major contributor to the total system energy and execution time in consumer devices. The energy and performance costs of moving data between the memory system and the compute units are significantly higher than the costs of computation. As a result, addressing data movement is crucial for consumer devices. In this work, we comprehensively analyze the energy and performance impact of data movement for several widely-used Google consumer workloads: (1) the Chrome web browser; (2) TensorFlow Mobile, Google's machine learning framework; (3) video playback, and (4) video capture, both of which are used in many video services such as YouTube and Google Hangouts. We find that processing-in-memory (PIM) can significantly reduce data movement for all of these workloads, by performing part of the computation close to memory. Each workload contains simple primitives and functions that contribute to a significant amount of the overall data movement. We investigate whether these primitives and functions are feasible to implement using PIM, given the limited area and power constraints of consumer devices. Our analysis shows that offloading these primitives to PIM logic, consisting of either simple cores or specialized accelerators, eliminates a large amount of data movement, and significantly reduces total system energy (by an average of 55.4% across the workloads) and execution time (by an average of 54.2%).
- D. Abts, “Lost in the Bermuda Triangle: Complexity, Energy, and Performance,” in WCED, 2006.Google Scholar
- R. Adolf, S. Rama, B. Reagen, G.-Y. Wei, and D. Brooks, “Fathom: Reference Workloads for Modern Deep Learning Methods,” in IISWC, 2016.Google Scholar
Cross Ref
- J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi, “A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing,” in ISCA, 2015. Google Scholar
Digital Library
- J. Ahn, S. Yoo, O. Mutlu, and K. Choi, “PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture,” in ISCA, 2015. Google Scholar
Digital Library
- B. Akin, F. Franchetti, and J. C. Hoe, “Data Reorganization in Memory Using 3D-Stacked DRAM,” in ISCA, 2015. Google Scholar
Digital Library
- A. Al-Shuwaili and O. Simeone, “Energy-Efficient Resource Allocation for Mobile Edge Computing-Based Augmented Reality Applications,” IEEE Wireless Communications Letters, 2017.Google Scholar
- Alexa Internet, Inc., “Website Traffic, Statistics and Analytics,” http://www.alexa.com/siteinfo/.Google Scholar
- M. Alzantot, Y. Wang, Z. Ren, and M. B. Srivastava, “RSTensorFlow: GPU Enabled TensorFlow for Deep Learning on Commodity Android Devices,” in EMDL, 2017. Google Scholar
Digital Library
- ARM Holdings PLC, “ARM Cortex-R8,” https://developer.arm.com/products/processors/cortex-r/cortex-r8.Google Scholar
- N. Binkert, B. Beckman, A. Saidi, G. Black, and A. Basu, “The gem5 Simulator,” Comp. Arch. News, 2011. Google Scholar
Digital Library
- J. Bonwick and B. Moore, “ZFS: The Last Word in File Systems,” https://csde.washington.edu/ mbw/OLD/UNIX/zfs_lite.pdf, 2007.Google Scholar
- A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, K. Hsieh, K. T. Malladi, H. Zheng, and O. Mutlu, “LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory,” IEEE CAL, 2017.Google Scholar
Cross Ref
- F. Bossen, B. Bross, K. Suhring, and D. Flynn, “HEVC Complexity and Implementation Analysis,” IEEE CSVT, 2012. Google Scholar
Digital Library
- Q. Cao, N. Balasubramanian, and A. Balasubramanian, “MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU,” in EMDL, 2017. Google Scholar
Digital Library
- A. Carroll and G. Heiser, “An Analysis of Power Consumption in a Smartphone,” in USENIX ATC, 2010. Google Scholar
Digital Library
- G. Chadha, S. Mahlke, and S. Narayanasamy, “EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications,” in PACT, 2014. Google Scholar
Digital Library
- G. Chadha, S. Mahlke, and S. Narayanasamy, “Accelerating Asynchronous Programs Through Event Sneak Peek,” in ISCA, 2015. Google Scholar
Digital Library
- D. Chatzopoulos, C. Bermejo, Z. Huang, and P. Hui, “Mobile Augmented Reality Survey: From Where We Are to Where We Go,” IEEE Access, 2017.Google Scholar
Cross Ref
- Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” JSSC, 2017.Google Scholar
Cross Ref
- J.-A. Choi and Y.-S. Ho, “Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding,” in PCM, 2008. Google Scholar
Digital Library
- C. Chou, P. Nair, and M. K. Qureshi, “Reducing Refresh Power in Mobile Devices with Morphable ECC,” in DSN, 2015. Google Scholar
Digital Library
- Chromium Project, “Blink Rendering Engine,” https://www.chromium.org/blink.Google Scholar
- Chromium Project, “Catapult: Telemetry,” https://chromium.googlesource.com/catapult/Google Scholar
- /HEAD/telemetry/README.md.Google Scholar
- Chromium Project, “GPU Rasterization in Chromium,” https://www.chromium.org/developers/design-documents/gpu-accelerated-compositing-in-chrome, 2014.Google Scholar
- Cisco Systems, Inc., “Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2016--2021 White Paper,” http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/mobile-white-paper-c11--520862.html, 2017.Google Scholar
- E. Cuervo, A. Balasubramanian, D. Cho, A. Wolman, S. Saroiu, R. Chandra, and P. Bahl, “MAUI: Making Smartphones Last Longer with Code Offload,” in MobiSys, 2010. Google Scholar
Digital Library
- H. Deng, X. Zhu, and Z. Chen, “An Efficient Implementation for H.264 Decoder,” in ICCSIT, 2010.Google Scholar
- T. M. T. Do, J. Blom, and D. Gatica-Perez, “Smartphone Usage in the Wild: A Large-Scale Analysis of Applications and Context,” in ICMI, 2011. Google Scholar
Digital Library
- J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang, I. Kim, and G. Daglikoca, “The Architecture of the DIVA Processing-in-memory Chip,” in ICS, 2002. Google Scholar
Digital Library
- M. Drumond, A. Daglis, N. Mirzadeh, D. Ustiugov, J. Picorel, B. Falsafi, B. Grot, and D. Pnevmatikatos, “The Mondrian Data Engine,” in ISCA, 2017. Google Scholar
Digital Library
- P. Dubroy and R. Balakrishnan, “A Study of Tabbed Browsing Among Mozilla Firefox Users,” in CHI, 2010. Google Scholar
Digital Library
- eMarketer, Inc., “Slowing Growth Ahead for Worldwide Internet Audience,” https://www.emarketer.com/article/slowing-growth-ahead-worldwide-internet-audience/1014045'soc1001, 2016.Google Scholar
- Ericsson, Inc., “Ericsson Mobility Report: On the Pulse of the Networked Society,” https://www.ericsson.com/res/docs/2015/ericsson-mobility-report-june-2015.pdf, 2015.Google Scholar
- Facebook, Inc., “Instagram,” https://www.instagram.com/.Google Scholar
- M. Gao, G. Ayers, and C. Kozyrakis, “Practical Near-Data Processing for In-Memory Analytics Frameworks,” in PACT, 2015. Google Scholar
Digital Library
- M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis, “TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory,” in ASPLOS, 2017. Google Scholar
Digital Library
- S. Ghose, K. Hsieh, A. Boroumand, R. Ausavarungnirun, and O. Mutlu, “Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions,” arxiv {cs.AR}, 2018.Google Scholar
- Google LLC, “Android,” https://www.android.com/.Google Scholar
- Google LLC, “Chrome Browser,” https://www.google.com/chrome/browser/.Google Scholar
- Google LLC, “Chromebook,” https://www.google.com/chromebook/.Google Scholar
- Google LLC, “gemmlowp: a small self-contained low-precision GEMM library,” https://github.com/google/gemmlowp.Google Scholar
- Google LLC, “Gmail,” https://www.google.com/gmail/.Google Scholar
- Google LLC, “Google Calendar,” https://calendar.google.com/.Google Scholar
- Google LLC, “Google Docs,” https://docs.google.com/.Google Scholar
- Google LLC, “Google Hangouts,” https://hangouts.google.com/.Google Scholar
- Google LLC, “Google Photos,” https://photos.google.com/.Google Scholar
- Google LLC, “Google Search,” https://www.google.com/.Google Scholar
- Google LLC, “Google Search: About Google App,” https://www.google.com/search/about/.Google Scholar
- Google LLC, “Google Translate,” https://translate.google.com/.Google Scholar
- Google LLC, “Google Translate App,” https://translate.google.com/intl/en/about/.Google Scholar
- Google LLC, “Skia Graphics Library,” https://skia.org/.Google Scholar
- Google LLC, “TensorFlow: Mobile,” https://www.tensorflow.org/mobile/.Google Scholar
- Google LLC, “YouTube,” https://www.youtube.com/.Google Scholar
- Google LLC, “YouTube for Press,” https://www.youtube.com/yt/about/press/.Google Scholar
- A. Grange, P. de Rivaz, and J. Hunt, “VP9 Bitstream & Decoding Process Specification,” http://storage.googleapis.com/downloads.webmproject.org/docs/vp9/vp9-bitstream-specification-v0.6--20160331-draft.pdf.Google Scholar
- Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T. M. Low, L. Pileggi, J. C. Hoe, and F. Franchetti, “3D-Stacked Memory-Side Acceleration: Accelerator and System Design,” in WoNDP, 2014.Google Scholar
- A. Gutierrez, R. G. Dreslinski, T. F. Wenisch, T. Mudge, A. Saidi, C. Emmons, and N. Paver, “Full-System Analysis and Characterization of Interactive Smartphone Applications,” in IISWC, 2011. Google Scholar
Digital Library
- H. Habli, J. Lilius, and J. Ersfolk, “Analysis of Memory Access Optimization for Motion Compensation Frames in MPEG-4,” in SOC, 2009. Google Scholar
Digital Library
- R. Hadidi, L. Nai, H. Kim, and H. Kim, “CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-in-Memory,” ACM TACO, 2017. Google Scholar
Digital Library
- M. Halpern, Y. Zhu, and V. J. Reddi, “Mobile CPU's Rise to Power: Quantifying the Impact of Generational Mobile CPU Design Trends on Performance, Energy, and User Satisfaction,” in HPCA, 2016.Google Scholar
Cross Ref
- S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, and B. Dally, “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” in ISCA, 2016. Google Scholar
Digital Library
- K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” in ECCV, 2016.Google Scholar
Cross Ref
- B. Heater, “As Chromebook Sales Soar in Schools, Apple and Microsoft Fight Back,” https://techcrunch.com/2017/04/27/as-chromebook-sales-soar-in-schools-apple-and-microsoft-fight-back/, 2017.Google Scholar
- M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro, “H.264/AVC Baseline Profile Decoder Complexity Analysis,” CSVT, 2003. Google Scholar
Digital Library
- K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O'Conner, N. Vijaykumar, O. Mutlu, and S. Keckler, “Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems,” in ISCA, 2016. Google Scholar
Digital Library
- K. Hsieh, S. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu, “Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation,” in ICCD, 2016.Google Scholar
Cross Ref
- “HTTP Archive,” http://httparchive.org/.Google Scholar
- Y. Huang, Z. Zha, M. Chen, and L. Zhang, “Moby: A Mobile Benchmark Suite for Architectural Simulators,” in ISPASS, 2014.Google Scholar
Cross Ref
- D. A. Huffman, “A Method for the Construction of Minimum Redundancy Codes,” Proc. IRE, 1952.Google Scholar
Cross Ref
- D. Hwang, “Native One-Copy Texture Uploads,” https://01.org/chromium/2016/native-one-copy-texture-uploads-for-chrome-OS, 2016.Google Scholar
- Hybrid Memory Cube Consortium, “HMC Specification 2.0,” 2014.Google Scholar
- Intel Corp., “Intel Celeron Processor N3060,” https://ark.intel.com/products/91832/Intel-Celeron-Processor-N3060--2M-Cache-up-to-2_48-GHz.Google Scholar
- Intel Corp., “Software vs. GPU Rasterization in Chromium,” https://software.intel.com/en-us/articles/software-vs-gpu-rasterization-in-chromium.Google Scholar
- J. Jeddeloh and B. Keeth, “Hybrid Memory Cube New DRAM Architecture Increases Density and Performance,” in VLSIT, 2012.Google Scholar
Cross Ref
- JEDEC Solid State Technology Assn., “JESD235: High Bandwidth Memory (HBM) DRAM,” 2013.Google Scholar
- S. Jennings, “Transparent Memory Compression in Linux,” https://events.static.linuxfound.org/sites/events/files/slides/tmc_sjennings_linuxcon2013.pdf, 2013.Google Scholar
- E. Kalali and I. Hamzaoglu, “A Low Energy HEVC Sub-Pixel Interpolation Hardware,” in ICIP, 2014.Google Scholar
Cross Ref
- J. Kane and Q. Yang, “Compression Speed Enhancements to LZO for Multi-Core Systems,” in SBAC-PAD, 2012. Google Scholar
Digital Library
- Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas, “FlexRAM: Toward an Advanced Intelligent Memory System,” in ICCD, 2012.Google Scholar
Digital Library
- S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, “GPUs and the Future of Parallel Computing,” IEEE Micro, 2011. Google Scholar
Digital Library
- D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay, “NeuroCube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory,” in ISCA, 2016. Google Scholar
Digital Library
- J. S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, and O. Mutlu, “GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies,” BMC Genomics, 2018.Google Scholar
Cross Ref
- Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter, “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” in HPCA, 2010.Google Scholar
- Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter, “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” in MICRO, 2010. Google Scholar
Digital Library
- Y. Kim, W. Yang, and O. Mutlu, “Ramulator: A Fast and Extensible DRAM Simulator,” IEEE CAL, 2015. Google Scholar
Digital Library
- P. M. Kogge, “EXECUBE: A New Architecture for Scaleable MPPs,” in ICPP, 1994. Google Scholar
Digital Library
- Z. Lai, Y. C. Hu, Y. Cui, L. Sun, and N. Dai, “Furion: Engineering High-Quality Immersive Virtual Reality on Today's Mobile Devices,” in MobiCom, 2017. Google Scholar
Digital Library
- M. J. Langroodi, J. Peters, and S. Shirmohammadi, “Decoder-Complexity-Aware Encoding of Motion Compensation for Multiple Heterogeneous Receivers,” TOMM, 2015. Google Scholar
Digital Library
- C. Lee and Y. Yu, “Design of a Motion Compensation Unit for H.264 Decoder Using 2-Dimensional Circular Register Files,” in ISOCC, 2008.Google Scholar
- D. Lee, S. Ghose, G. Pekhimenko, S. Khan, and O. Mutlu, “Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost,” ACM TACO, 2016. Google Scholar
Digital Library
- P. Lewis, “Avoiding Unnecessary Paints,” https://www.html5rocks.com/en/tutorials/speed/unnecessary-paints/, 2013.Google Scholar
- T. Li, C. An, X. Xiao, A. T. Campbell, and X. Zhou, “Real-Time Screen-Camera Communication Behind Any Scene,” in MobiSys, 2015. Google Scholar
Digital Library
- F. Liu, P. Shu, H. Jin, L. Ding, J. Yu, D. Niu, and B. Li, “Gearing Resource-Poor Mobile Devices with Powerful Clouds: Architectures, Challenges, and Applications,” IEEE Wireless Communications, 2013.Google Scholar
- G. H. Loh, “3D-Stacked Memory Architectures for Multi-Core Processors,” in ISCA, 2008. Google Scholar
Digital Library
- K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz, “Smart Memories: A Modular Reconfigurable Architecture,” in ISCA, 2000. Google Scholar
Digital Library
- Mentor Graphics Corp., “Catapult High-Level Synthesis,” https://www.mentor.com/hls-lp/catapult-high-level-synthesis/.Google Scholar
- Microsoft Corp., “Skype,” https://www.skype.com/.Google Scholar
- A. Mirhosseini, A. Agrawal, and J. Torrellas, “Survive: Pointer-Based In-DRAM Incremental Checkpointing for Low-Cost Data Persistence and Rollback-Recovery,” IEEE CAL, 2017.Google Scholar
Cross Ref
- N. Mirzadeh, O. Kocberber, B. Falsafi, and B. Grot, “Sort vs. Hash Join Revisited for Near-Memory Execution,” in ASBD, 2007.Google Scholar
- B. Moatamed, Arjun, F. Shahmohammadi, R. Ramezani, A. Naeim, and M. Sarrafzadeh, “Low-Cost Indoor Health Monitoring System,” in BSN, 2016.Google Scholar
Cross Ref
- A. Mosenia, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “CABA: Continuous Authentication Based on BioAura,” IEEE TC, 2017. Google Scholar
Digital Library
- A. Mosenia, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “Wearable Medical Sensor-Based System Design: A Survey,” MSCS, 2017.Google Scholar
Cross Ref
- S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” in MICRO, 2011. Google Scholar
Digital Library
- N. Muralimanohar, R. Balasubramonian, and N. Jouppi, “Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0,” in MICRO, 2007. Google Scholar
Digital Library
- N. C. Nachiappan, H. Zhang, J. Ryoo, N. Soundararajan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “VIP: Virtualizing IP Chains on Handheld Platforms,” in ISCA, 2015. Google Scholar
Digital Library
- L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, and H. Kim, “GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks,” in HPCA, 2017.Google Scholar
Cross Ref
- G. Narancic, P. Judd, D. Wu, I. Atta, M. Elnacouzi, J. Zebchuk, J. Albericio, N. E. Jerger, A. Moshovos, K. Kutulakos, and S. Gadelrab, “Evaluating the Memory System Behavior of Smartphone Workloads,” in SAMOS, 2014.Google Scholar
- Net Applications, “Market Share Statistics for Internet Technologies,” https://www.netmarketshare.com/.Google Scholar
- A. M. Nia, M. Mozaffari-Kermani, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, “Energy-Efficient Long-term Continuous Personal Health Monitoring,” MSCS, 2015. Google Scholar
Digital Library
- Nielsen Norman Group, “Page Parking: Millennials' Multi-Tab Mania,” https://www.nngroup.com/articles/multi-tab-page-parking/.Google Scholar
- M. F. X. J. Oberhumer, “LZO Real-Time Data Compression Library,” http://www.oberhumer.com/opensource/lzo/, 2018.Google Scholar
- M. Oskin, F. T. Chong, and T. Sherwood, “Active Pages: A Computation Model for Intelligent Memory,” in ISCA, 1998. Google Scholar
Digital Library
- D. Pandiyan, S.-Y. Lee, and C.-J. Wu, “Performance, Energy Characterizations and Architectural Implications of an Emerging Mobile Platform Benchmark Suite -- MobileBench,” in IISWC, 2013.Google Scholar
Cross Ref
- D. Pandiyan and C.-J. Wu, “Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms,” in IISWC, 2014.Google Scholar
- D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, “A Case for Intelligent RAM,” IEEE Micro, 1997. Google Scholar
Digital Library
- A. Pattnaik, X. Tang, A. Jog, O. Kayiran, A. K. Mishra, M. T. Kandemir, O. Mutlu, and C. R. Das, “Scheduling Techniques for GPU Architectures with Processing-in-Memory Capabilities,” in PACT, 2016. Google Scholar
Digital Library
- B. Popper, “Google Services Monthly Active Users,” https://www.theverge.com/2017/5/17/15654454/android-reaches-2-billion-monthly-active-users, 2017.Google Scholar
- Qualcomm Technologies, Inc., “Snapdragon 835 Mobile Platform,” https://www.qualcomm.com/products/snapdragon/processors/835.Google Scholar
- B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G.-Y. Wei, and D. Brooks, “Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators,” in ISCA, 2016. Google Scholar
Digital Library
- J. Ren and N. Kehtarnavaz, “Comparison of Power Consumption for Motion Compensation and Deblocking Filters in High Definition Video Coding,” in ISCE, 2007.Google Scholar
Cross Ref
- P. V. Rengasamy, H. Zhang, N. Nachiappan, S. Zhao, A. Sivasubramaniam, M. T. Kandemir, and C. R. Das, “Characterizing Diverse Handheld Apps for Customized Hardware Acceleration,” in IISWC, 2017.Google Scholar
Cross Ref
- O. Rodeh, J. Bacik, and C. Mason, “BTRFS: The Linux B-Tree Filesystem,” ACM TOS, 2013. Google Scholar
Digital Library
- S. Rosen, A. Nikravesh, Y. Guo, Z. M. Mao, F. Qian, and S. Sen, “Revisiting Network Energy Efficiency of Mobile Apps: Performance in the Wild,” in IMC, 2015. Google Scholar
Digital Library
- F. Ross, “Migrating to LPDDR3: An Overview of LPDDR3 Commands, Operations, and Functions,” in JEDEC LPDDR3 Symposium, 2012.Google Scholar
- V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Fast Bulk Bitwise AND and OR in DRAM,” CAL, 2015. Google Scholar
Digital Library
- V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology,” in MICRO, 2017. Google Scholar
Digital Library
- V. Seshadri and O. Mutlu, “The Processing Using Memory Paradigm: In-DRAM Bulk Copy, Initialization, Bitwise AND and OR,” arXiv:1610.09603 {cs:AR}, 2016.Google Scholar
- V. Seshadri and O. Mutlu, “Simple Operations in Memory to Reduce Data Movement,” in Advances in Computers, Volume 106, 2017.Google Scholar
- D. E. Shaw, S. J. Stolfo, H. Ibrahim, B. Hillyer, G. Wiederhold, and J. A. Andrews, “The NON-VON Database Machine: A Brief Overview,” IEEE DEB, 1981.Google Scholar
- D. Shingari, A. Arunkumar, and C.-J. Wu, “Characterization and Throttling-Based Mitigation of Memory Interference for Heterogeneous Smartphones,” in IISWC, 2015. Google Scholar
Digital Library
- K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in ICLR, 2015.Google Scholar
- R. Smith, “Apple's A9 SoC Is Dual Sourced From Samsung & TSMC,” https://www.anandtech.com/show/9665/apples-a9-soc-is-dual-sourced-from-samsung-tsmc, 2015.Google Scholar
- J. Stankowski, D. Karwowski, K. Klimaszewski, K. Wegner, O. Stankiewicz, and T. Grajek, “Analysis of the Complexity of the HEVC Motion Estimation,” in IWSSIP, 2016.Google Scholar
Cross Ref
- H. S. Stone, “A Logic-in-Memory Computer,” IEEE TC, 1970. Google Scholar
Digital Library
- R. Sukale, “What Are Reflows and Repaints and How to Avoid Them,” http://javascript.tutorialhorizon.com/2015/06/06/what-are-reflows-and-repaints-and-how-to-avoid-them/, 2015.Google Scholar
- S. Sutardja, “The Future of IC Design Innovation,” in ISSCC, 2015.Google Scholar
Cross Ref
- C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” in AAAI, 2017.Google Scholar
Cross Ref
- X. Tang, O. Kislal, M. Kandemir, and M. Karakoy, “Data Movement Aware Computation Partitioning,” in MICRO, 2017. Google Scholar
Digital Library
- TechInsights, “Samsung Galaxy S6,” http://www.techinsights.com/about-techinsights/overview/blog/inside-the-samsung-galaxy-s6/.Google Scholar
- R. Thompson, “Improve Rendering Performance with Dev Tools,” ttps://inviqa.com/blog/improve-rendering-performance-dev-tools, 2014.Google Scholar
- G. Toderici, D. Vincent, N. Johnston, S. J. Hwang, D. Minnen, J. Shor, and M. Covell, “Full Resolution Image Compression with Recurrent Neural Networks,” in CVPR, 2017.Google Scholar
Cross Ref
- Twitter, Inc., “Twitter,” https://www.twitter.com/.Google Scholar
- E. Vasilakis, “An Instruction Level Energy Characterization of ARM Processors,” Foundation of Research and Technology Hellas, Inst. of Computer Science, Tech. Rep. FORTH-ICS/TR-450, 2015.Google Scholar
- WebM Project, “Hardware: SoCs Supporting VP8/VP9,” http://wiki.webmproject.org/hardware/socs.Google Scholar
- WebM Project, “WebM Repositories -- libvpx: VP8/VP9 Codec SDK,” https://www.webmproject.org/code/.Google Scholar
- WebM Project, “WebM Video Hardware RTLs,” https://www.webmproject.org/hardware/.Google Scholar
- S. Wegner, A. Cowsky, C. Davis, D. James, D. Yang, R. Fontaine, and J. Morrison, “Apple iPhone 7 Teardown,” http://www.techinsights.com/about-techinsights/overview/blog/apple-iphone-7-teardown/, 2016.Google Scholar
- A. Wei, “Qualcomm Snapdragon 835 First to 10 nm,” http://www.techinsights.com/about-techinsights/overview/blog/qualcomm-snapdragon-835-first-to-10-nm/, 2017.Google Scholar
- WordPress Foundation, “WordPress,” https://www.wordpress.com/.Google Scholar
- S. L. Xi, O. Babarinsa, M. Athanassoulis, and S. Idreos, “Beyond the Wall: Near-Data Processing for Databases,” in DaMoN, 2015. Google Scholar
Digital Library
- C. Xie, S. L. Song, J. Wang, W. Zhang, and X. Fu, “Processing-in-Memory Enabled Graphics Processors for 3D Rendering,” in HPCA, 2017.Google Scholar
Cross Ref
- Xiph.Org Foundation, “Derf Video Test Collection,” https://media.xiph.org/video/derf/.Google Scholar
- D. P. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski, “TOP-PIM: Throughput-Oriented Programmable Processing in Memory,” in HPDC, 2014. Google Scholar
Digital Library
- H. Zhang, P. V. Rengasamy, S. Zhao, N. C. Nachiappan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “Race-To-SleepGoogle Scholar
- Content CachingGoogle Scholar
- Display Caching: A Recipe for Energy-eficient Video Streaming on Handhelds,” in MICRO, 2017.Google Scholar
- H. Zhang, P. V. Rengasamy, S. Zhao, N. C. Nachiappan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, “Race-to-SleepGoogle Scholar
- Content CachingGoogle Scholar
- Display Caching: A Recipe for Energy-Efficient Video Streaming on Handhelds,” in MICRO, 2017.Google Scholar
- X. Zhang, J. Li, H. Wang, D. Xiong, J. Qu, H. Shin, J. P. Kim, and T. Zhang, “Realizing Transparent OS/Apps Compression in Mobile Devices at Zero Latency Overhead,” IEEE TC, 2017.Google Scholar
Cross Ref
- S. Zhu and K.-K. Ma, “A New Diamond Search Algorithm for Fast Block Matching Motion Estimation,” in ICICS, 1997.Google Scholar
- Y. Zhu and V. J. Reddi, “WebCore: Architectural Support for Mobile Web Browsing,” in ISCA, 2014. Google Scholar
Digital Library
- Y. Zhu and V. J. Reddi, “GreenWeb: Language Extensions for Energy-Efficient Mobile Web Computing,” in PLDI, 2016. Google Scholar
Digital Library
- J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression,” TIT, 1977. Google Scholar
Digital Library
Index Terms
Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks
Recommendations
Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating SystemsWe are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class of devices, energy efficiency is a first-class concern due to the ...
Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation
GLSVLSI '19: Proceedings of the 2019 on Great Lakes Symposium on VLSIToday's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: 1) data access from memory is already a ...
SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures
POMACSSeveral manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they ...







Comments