Abstract
DRAM cells need periodic refresh to maintain data integrity. With high capacity DRAMs, DRAM refresh poses a significant performance bottleneck as the number of rows to be refreshed (and hence the refresh cycle time, tRFC) with each refresh command increases. Modern day DRAMs perform refresh at a rank-level, while LPDDRs used in mobile environments support refresh at a per-bank level. Rank-level refresh degrades the performance significantly since none of the banks in a rank can serve the on-demand requests. Per-bank refresh alleviates some of the performance bottlenecks as the other banks in a rank are available for on-demand requests. Typical DRAM retention time is in the order several of milliseconds, viz, 64msec for environments operating in temperatures below 85 deg C and 32msec for environments operating above 85 deg C.
With systems moving towards increased consolidation (ex: virtualized environments), DRAM refresh becomes a significant bottleneck as it reduces the available overall DRAM bandwidth per task. In this work, we propose a hardware-software co-design to mitigate DRAM refresh overheads by exposing the hardware address mapping and DRAM refresh schedule to the Operating System. We propose a novel DRAM refresh-aware process scheduling algorithm in OS which schedules applications on cores such that none of the on-demand requests from the application are stalled by refreshes. Extensive evaluation of our proposed co-design on multi-programmed SPEC CPU2006 workloads show significant performance improvement compared to the previously proposed hardware only approaches.
- Linux cgroups. http://goo.gl/tTiwSl.Google Scholar
- Linux debugfs. https://goo.gl/sdBhIh.Google Scholar
- linuxcfsLinux CFS Scheduler. https://goo.gl/hjVjJl,natexlaba.Google Scholar
- linuxkernelbookUnderstanding the Linux Kernel. http://goo.gl/8P7gJR,natexlabb.Google Scholar
- NAS. https://www.nas.nasa.gov/publications/npb.html.Google Scholar
- SPEC 2006. https://www.spec.org/cpu2006/.Google Scholar
- STREAM. https://www.cs.virginia.edu/stream/.Google Scholar
- ddr3JEDEC. DDR3 SDRAM Standard, 2012\natexlaba.Google Scholar
- ddr4JEDEC. DDR4 SDRAM Standard, 2012\natexlabb.Google Scholar
- JEDEC. Low Power Double Data Rate 3 (LPDDR3), 2012.Google Scholar
- I. Bhati, Z. Chishti, and B. Jacob. Coordinated refresh: Energy efficient techniques for DRAM refresh scheduling. In Proceedings of the 2013 International Symposium on Low Power Electronics and Design, ISLPED, 2013.Google Scholar
Cross Ref
- I. Bhati, Z. Chishti, S.-L. Lu, and B. Jacob. Flexible auto-refresh: Enabling scalable and energy-efficient DRAM refresh reductions. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA, 2015.Google Scholar
Digital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 2011.Google Scholar
- J. D. Booth, J. B. Kotra, H. Zhao, M. Kandemir, and P. Raghavan. Phase detection with hidden markov models for dvfs on many-core processors. In 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS, 2015. Google Scholar
Cross Ref
- K. K. W. Chang, D. Lee, Z. Chishti, A. R. Alameldeen, C. Wilkerson, Y. Kim, and O. Mutlu. Improving DRAM performance by parallelizing refreshes with accesses. In the 20th International Symposium on High Performance Computer Architecture, HPCA, 2014. Google Scholar
Cross Ref
- N. Chatterjee, N. Muralimanohar, R. Balasubramonian, A. Davis, and N. P. Jouppi. Staged reads: Mitigating the impact of DRAM writes on DRAM reads. In Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture, HPCA, 2012.Google Scholar
Digital Library
- V. V. Fedorov, A. L. N. Reddy, and P. V. Gratz. Shared last-level caches and the case for longer timeslices. In Proceedings of the 2015 International Symposium on Memory Systems, MEMSYS, 2015. Google Scholar
Digital Library
- M. K. Jeong, D. H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez. Balancing DRAM locality and parallelism in shared memory CMP systems. In IEEE International Symposium on High-Performance Comp Architecture, HPCA, 2012. Google Scholar
Digital Library
- D. Kaseridis, J. Stuecheli, and L. K. John. Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, 2011. Google Scholar
Digital Library
- O. Kislal, M. T. Kandemir, and J. B. Kotra. Cache-aware approximate computing for decision tree learning. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW, 2016. Google Scholar
Cross Ref
- J. B. Kotra, M. Arjomand, D. Guttman, M. T. Kandemir, and C. R. Das. Re-NUCA: A practical nuca architecture for reram based last-level caches. In 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS, 2016. Google Scholar
Cross Ref
- Liu, Jaiyen, Veras, and Mutlu]raidrJ. Liu, B. Jaiyen, R. Veras, and O. Mutlu. Raidr: Retention-aware intelligent DRAM refresh. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA, 2012\natexlaba.Google Scholar
- J. Liu, J. B. Kotra, W. Ding, and M. Kandemir. Network footprint reduction through data access and computation placement in noc-based manycores. In Proceedings of the 52Nd Annual Design Automation Conference, DAC, 2015. Google Scholar
Digital Library
- Liu, Cui, Xing, Bao, Chen, and Wu]LiupactL. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, and C. Wu. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT, 2012\natexlabb.Google Scholar
- S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2011.Google Scholar
Digital Library
- a, and Fedorova]linuxschedeurosysJ. Lozi, B. Lepers, J. R. Funston, F. Gaud, V. Quéma, and A. Fedorova. The Linux scheduler: a decade of wasted cores. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys, 2016.Google Scholar
Digital Library
- z]jananiiscaJ. Mukundan, H. Hunter, K.-h. Kim, J. Stuecheli, and J. F. Martínez. Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA, 2013.Google Scholar
Digital Library
- P. Nair, C. C. Chou, and M. K. Qureshi. A case for refresh pausing in DRAM memory systems. In IEEE 19th International Symposium on High Performance Computer Architecture, HPCA, 2013. Google Scholar
Digital Library
- M. Poremba and Y. Xie. Nvmain: An architectural-level main memory simulator for emerging non-volatile memories. In IEEE Computer Society Annual Symposium on VLSI, ISVLSI, 2012.Google Scholar
Digital Library
- M. K. Qureshi, D. H. Kim, S. Khan, P. J. Nair, and O. Mutlu. Avatar: A variable-retention-time (vrt) aware refresh for DRAM systems. In IEEE/IFIP International Conference on Dependable Systems and Networks, DSN, 2015.Google Scholar
Digital Library
- S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA, 2000. Google Scholar
Digital Library
- J. Stuecheli, D. Kaseridis, H. C. Hunter, and L. K. John. Elastic refresh: Techniques to mitigate refresh penalties in high density memory. In the 43rd Annual International Symposium on Microarchitecture, MICRO, 2010.Google Scholar
Digital Library
- K. Swaminathan, J. B. Kotra, H. Liu, J. Sampson, M. Kandemir, and V. Narayanan. Thermal-aware application scheduling on device-heterogeneous embedded architectures. 2015 28th International Conference on VLSI Design, 2015.Google Scholar
Cross Ref
- X. Tang, M. Kandemir, P. Yedlapalli, and J. B. Kotra. Improving bank-level parallelism for irregular applications. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, 2016. Google Scholar
Cross Ref
- R. K. Venkatesan, S. Herr, and E. Rotenberg. Retention-aware placement in DRAM (rapid): software methods for quasi-non-volatile DRAM. In The Twelfth International Symposium on High-Performance Computer Architecture, HPCA, 2006.Google Scholar
Cross Ref
- P. Yedlapalli, J. B. Kotra, E. Kultursay, M. Kandemir, C. R. Das, and A. Sivasubramaniam. Meeting midway: Improving CMP performance with memory-side prefetching. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT, 2013.Google Scholar
- H. Yun, R. Mancuso, Z. P. Wu, and R. Pellizzoni. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium, RTAS, 2014.Google Scholar
Cross Ref
- T. Zhang, M. Poremba, C. Xu, G. Sun, and Y. Xie. Cream: A concurrent-refresh-aware DRAM memory architecture. In The 20th International Symposium on High Performance Computer Architecture, HPCA, 2014. Google Scholar
Cross Ref
Recommendations
Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling
Asplos'17DRAM cells need periodic refresh to maintain data integrity. With high capacity DRAMs, DRAM refresh poses a significant performance bottleneck as the number of rows to be refreshed (and hence the refresh cycle time, tRFC) with each refresh command ...
Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsDRAM cells need periodic refresh to maintain data integrity. With high capacity DRAMs, DRAM refresh poses a significant performance bottleneck as the number of rows to be refreshed (and hence the refresh cycle time, tRFC) with each refresh command ...
Per-bank refresh with adaptive early termination for high density DRAM
ICCIP '18: Proceedings of the 4th International Conference on Communication and Information ProcessingDRAM, which is mainly used as main memory, requires a refresh operation to maintain the integrity of stored data. Since memory read and write operations to a bank are not allowed while the bank is being refreshed, a lot of memory accesses may be blocked ...







Comments