skip to main content
research-article

Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling

Published:04 April 2017Publication History
Skip Abstract Section

Abstract

DRAM cells need periodic refresh to maintain data integrity. With high capacity DRAMs, DRAM refresh poses a significant performance bottleneck as the number of rows to be refreshed (and hence the refresh cycle time, tRFC) with each refresh command increases. Modern day DRAMs perform refresh at a rank-level, while LPDDRs used in mobile environments support refresh at a per-bank level. Rank-level refresh degrades the performance significantly since none of the banks in a rank can serve the on-demand requests. Per-bank refresh alleviates some of the performance bottlenecks as the other banks in a rank are available for on-demand requests. Typical DRAM retention time is in the order several of milliseconds, viz, 64msec for environments operating in temperatures below 85 deg C and 32msec for environments operating above 85 deg C.

With systems moving towards increased consolidation (ex: virtualized environments), DRAM refresh becomes a significant bottleneck as it reduces the available overall DRAM bandwidth per task. In this work, we propose a hardware-software co-design to mitigate DRAM refresh overheads by exposing the hardware address mapping and DRAM refresh schedule to the Operating System. We propose a novel DRAM refresh-aware process scheduling algorithm in OS which schedules applications on cores such that none of the on-demand requests from the application are stalled by refreshes. Extensive evaluation of our proposed co-design on multi-programmed SPEC CPU2006 workloads show significant performance improvement compared to the previously proposed hardware only approaches.

References

  1. Linux cgroups. http://goo.gl/tTiwSl.Google ScholarGoogle Scholar
  2. Linux debugfs. https://goo.gl/sdBhIh.Google ScholarGoogle Scholar
  3. linuxcfsLinux CFS Scheduler. https://goo.gl/hjVjJl,natexlaba.Google ScholarGoogle Scholar
  4. linuxkernelbookUnderstanding the Linux Kernel. http://goo.gl/8P7gJR,natexlabb.Google ScholarGoogle Scholar
  5. NAS. https://www.nas.nasa.gov/publications/npb.html.Google ScholarGoogle Scholar
  6. SPEC 2006. https://www.spec.org/cpu2006/.Google ScholarGoogle Scholar
  7. STREAM. https://www.cs.virginia.edu/stream/.Google ScholarGoogle Scholar
  8. ddr3JEDEC. DDR3 SDRAM Standard, 2012\natexlaba.Google ScholarGoogle Scholar
  9. ddr4JEDEC. DDR4 SDRAM Standard, 2012\natexlabb.Google ScholarGoogle Scholar
  10. JEDEC. Low Power Double Data Rate 3 (LPDDR3), 2012.Google ScholarGoogle Scholar
  11. I. Bhati, Z. Chishti, and B. Jacob. Coordinated refresh: Energy efficient techniques for DRAM refresh scheduling. In Proceedings of the 2013 International Symposium on Low Power Electronics and Design, ISLPED, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  12. I. Bhati, Z. Chishti, S.-L. Lu, and B. Jacob. Flexible auto-refresh: Enabling scalable and energy-efficient DRAM refresh reductions. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 2011.Google ScholarGoogle Scholar
  14. J. D. Booth, J. B. Kotra, H. Zhao, M. Kandemir, and P. Raghavan. Phase detection with hidden markov models for dvfs on many-core processors. In 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  15. K. K. W. Chang, D. Lee, Z. Chishti, A. R. Alameldeen, C. Wilkerson, Y. Kim, and O. Mutlu. Improving DRAM performance by parallelizing refreshes with accesses. In the 20th International Symposium on High Performance Computer Architecture, HPCA, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  16. N. Chatterjee, N. Muralimanohar, R. Balasubramonian, A. Davis, and N. P. Jouppi. Staged reads: Mitigating the impact of DRAM writes on DRAM reads. In Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture, HPCA, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. V. Fedorov, A. L. N. Reddy, and P. V. Gratz. Shared last-level caches and the case for longer timeslices. In Proceedings of the 2015 International Symposium on Memory Systems, MEMSYS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. K. Jeong, D. H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez. Balancing DRAM locality and parallelism in shared memory CMP systems. In IEEE International Symposium on High-Performance Comp Architecture, HPCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Kaseridis, J. Stuecheli, and L. K. John. Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. O. Kislal, M. T. Kandemir, and J. B. Kotra. Cache-aware approximate computing for decision tree learning. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  21. J. B. Kotra, M. Arjomand, D. Guttman, M. T. Kandemir, and C. R. Das. Re-NUCA: A practical nuca architecture for reram based last-level caches. In 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  22. Liu, Jaiyen, Veras, and Mutlu]raidrJ. Liu, B. Jaiyen, R. Veras, and O. Mutlu. Raidr: Retention-aware intelligent DRAM refresh. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA, 2012\natexlaba.Google ScholarGoogle Scholar
  23. J. Liu, J. B. Kotra, W. Ding, and M. Kandemir. Network footprint reduction through data access and computation placement in noc-based manycores. In Proceedings of the 52Nd Annual Design Automation Conference, DAC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Liu, Cui, Xing, Bao, Chen, and Wu]LiupactL. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, and C. Wu. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT, 2012\natexlabb.Google ScholarGoogle Scholar
  25. S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. a, and Fedorova]linuxschedeurosysJ. Lozi, B. Lepers, J. R. Funston, F. Gaud, V. Quéma, and A. Fedorova. The Linux scheduler: a decade of wasted cores. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. z]jananiiscaJ. Mukundan, H. Hunter, K.-h. Kim, J. Stuecheli, and J. F. Martínez. Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Nair, C. C. Chou, and M. K. Qureshi. A case for refresh pausing in DRAM memory systems. In IEEE 19th International Symposium on High Performance Computer Architecture, HPCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Poremba and Y. Xie. Nvmain: An architectural-level main memory simulator for emerging non-volatile memories. In IEEE Computer Society Annual Symposium on VLSI, ISVLSI, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. K. Qureshi, D. H. Kim, S. Khan, P. J. Nair, and O. Mutlu. Avatar: A variable-retention-time (vrt) aware refresh for DRAM systems. In IEEE/IFIP International Conference on Dependable Systems and Networks, DSN, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Stuecheli, D. Kaseridis, H. C. Hunter, and L. K. John. Elastic refresh: Techniques to mitigate refresh penalties in high density memory. In the 43rd Annual International Symposium on Microarchitecture, MICRO, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Swaminathan, J. B. Kotra, H. Liu, J. Sampson, M. Kandemir, and V. Narayanan. Thermal-aware application scheduling on device-heterogeneous embedded architectures. 2015 28th International Conference on VLSI Design, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  34. X. Tang, M. Kandemir, P. Yedlapalli, and J. B. Kotra. Improving bank-level parallelism for irregular applications. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  35. R. K. Venkatesan, S. Herr, and E. Rotenberg. Retention-aware placement in DRAM (rapid): software methods for quasi-non-volatile DRAM. In The Twelfth International Symposium on High-Performance Computer Architecture, HPCA, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  36. P. Yedlapalli, J. B. Kotra, E. Kultursay, M. Kandemir, C. R. Das, and A. Sivasubramaniam. Meeting midway: Improving CMP performance with memory-side prefetching. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT, 2013.Google ScholarGoogle Scholar
  37. H. Yun, R. Mancuso, Z. P. Wu, and R. Pellizzoni. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium, RTAS, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  38. T. Zhang, M. Poremba, C. Xu, G. Sun, and Y. Xie. Cream: A concurrent-refresh-aware DRAM memory architecture. In The 20th International Symposium on High Performance Computer Architecture, HPCA, 2014. Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 52, Issue 4
    ASPLOS '17
    April 2017
    811 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3093336
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
      April 2017
      856 pages
      ISBN:9781450344654
      DOI:10.1145/3037697

    Copyright © 2017 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 4 April 2017

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!