skip to main content
research-article

MaPHeA: A Framework for Lightweight Memory Hierarchy-aware Profile-guided Heap Allocation

Published:13 December 2022Publication History
Skip Abstract Section

Abstract

Hardware performance monitoring units (PMUs) are a standard feature in modern microprocessors, providing a rich set of microarchitectural event samplers. Recently, numerous profile-guided optimization (PGO) frameworks have exploited them to feature much lower profiling overhead compared to conventional instrumentation-based frameworks. However, existing PGO frameworks mainly focus on optimizing the layout of binaries; they overlook rich information provided by the PMU about data access behaviors over the memory hierarchy. Thus, we propose MaPHeA, a lightweight Memory hierarchy-aware Profile-guided Heap Allocation framework applicable to both HPC and embedded systems. MaPHeA guides and applies the optimized allocation of dynamically allocated heap objects with very low profiling overhead and without additional user intervention to improve application performance. To demonstrate the effectiveness of MaPHeA, we apply it to optimizing heap object allocation in an emerging DRAM-NVM heterogeneous memory system (HMS), selective huge-page utilization, and controlling the cacheability of the objects with the low temporal locality. In an HMS, by identifying and placing frequently accessed heap objects to the fast DRAM region, MaPHeA improves the performance of memory-intensive graph-processing and Redis workloads by 56.0% on average over the default configuration that uses DRAM as a hardware-managed cache of slow NVM. By identifying large heap objects that cause frequent TLB misses and allocating them to huge pages, MaPHeA increases the performance of the read and update operations of Redis by 10.6% over the transparent huge-page implementation of Linux. Also, by distinguishing the objects that cause cache pollution due to their low temporal locality and applying write-combining to them, MaPHeA improves the performance of STREAM and RADIX workloads by 20.0% on average over the system without cacheability control.

REFERENCES

  1. [1] Adl-Tabatabai A.-R., Hudson R. L., Serrano M. J., and Subramoney S.. 2004. Prefetch injection based on hardware monitoring and object metadata. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Agarwal N. and Wenisch T. F.. 2017. Thermostat: Application-transparent page management for two-tiered main memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Akram S., Sartor J. B., McKinley K. S., and Eeckhout L.. 2018. Write-rationing garbage collection for hybrid memories. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] AMD. 2017. AMD64 Architecture Programmer’s Manual Volume 2: System Programming. Retrieved from https://www.amd.com/system/files/TechDocs/24593.pdf.Google ScholarGoogle Scholar
  5. [5] Ang J. A., Wheeler B. W. Barrett, K. B., and Murphy R. C.. 2010. Introducing the Graph 500. DOI: https://www.osti.gov/biblio/1014641Google ScholarGoogle Scholar
  6. [6] Arafa M., Fahim B., Kottapalli S., Kumar A., Looi L. P., Mandava S., Rudoff A., Steiner I. M., Valentine B., Vedaraman G., and Vora S.. 2019. Cascade Lake: Next generation Intel Xeon scalable processor. IEEE Micro 39 (2019), 2936. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] ARM. 2019. ARM® ARM Architecture Reference Manual Armv8, for Armv8-A Architecture Profile. Retrieved from https://documentation-service.arm.com/static/60119835773bb020e3de6fee?token=.Google ScholarGoogle Scholar
  8. [8] Ayers G., Ahn J. H., Kozyrakis C., and Ranganathan P.. 2018. Memory hierarchy for web search. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Barr T. W., Cox A. L., and Rixner S.. 2011. SpecTLB: A mechanism for speculative address translation. In Proceedings of the 38th Annual International Symposium on Computer Architecture. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Basu A., Gandhi J., Chang J., Hill M. D., and Swift M. M.. 2013. Efficient virtual memory for big memory servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Beamer S., Asanovic K., and Patterson D.. 2015. Locality exists in graph processing: Workload characterization on an Ivy Bridge server. In Proceedings of the IEEE International Symposium on Workload Characterization. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Beamer S., Asanović K., and Patterson D.. 2017. The GAP Benchmark Suite. arXiv:1508.03619 [cs.DC].Google ScholarGoogle Scholar
  13. [13] Cantalupo C., Venkatesan V., Hammond J., Czurlyo K., and Hammond S. D.. 2015. memkind: An extensible heap memory manager for heterogeneous memory platforms and mixed memory policies.DOI: https://www.osti.gov/biblio/1245908Google ScholarGoogle Scholar
  14. [14] Chen D., Moseley T., and Li D. X.. 2017. AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications. In Proceedings of the International Symposium on Code Generation and Optimization. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Chen D., Vachharajani N., Hundt R., Liao S., Ramasamy V., Yuan P., Chen W., and Zheng W.. 2010. Taming hardware event samples for FDO compilation. In Proceedings of International Symposium on Code Generation and Optimization. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Chen D., Vachharajani N., Hundt R., Liao S., Ramasamy V., Yuan P., Chen W., and Zheng W.. 2013. Taming hardware event samples for precise and versatile feedback directed optimizations. IEEE Trans. Comput. 62 (2013), 376389. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Chen Y., Lin Z., Pienta R., Kahng M., and Chau D. H.. 2014. Towards scalable graph computation on mobile devices. In Proceedings of the IEEE International Conference on Big Data. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Chen Y., Peng I. B., Peng Z., Liu X., and Ren B.. 2020. ATMem: Adaptive data placement in graph applications on heterogeneous memories. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Chilimbi T., Hill M. D., and Larus J. R.. 1999. Cache-conscious structure layout. In Proceedings of the ACM SIGPLAN’99 Conference on Programming Language Design and Implementation. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Cooper B. F., Silberstein A., Tam E., Ramakrishnan R., and Sears R.. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Davis T. A. and Hu Y.. 2011. The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38 (2011), 125. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Melo A. C. de. 2009. Performance counters on Linux. In Proceedings of the Linux Plumbers Conference.Google ScholarGoogle Scholar
  23. [23] Dulloor S. R., Roy A., Zhao Z., Sundaram N., Satish N., Sankaran R., Jackson J., and Schwan K.. 2016. Data tiering in heterogeneous memory systems. In Proceedings of the 11th European Conference on Computer Systems. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] GNU. 2016. GCC. Retrieved from https://github.com/gcc-mirror/gcc.Google ScholarGoogle Scholar
  25. [25] Google. 2019. AutoFDO. Retrieved from: https://github.com/google/autofdo.Google ScholarGoogle Scholar
  26. [26] Greenspan D.. 2019. LLAMA—Automatic memory allocations: An LLVM pass and library for automatically determining memory allocations. In Proceedings of the International Symposium on Memory Systems. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Hirofuchi T. and Takano R.. 2019. The Preliminary Evaluation of a Hypervisor-Based Virtualization Mechanism for Intel Optane DC Persistent Memory Module. arXiv:1907.12014 [cs.OS].Google ScholarGoogle Scholar
  28. [28] Hu J., Xie M., Pan C., Xue C. J., Zhuge Q., and Sha E. H.. 2015. Low overhead software wear leveling for hybrid PCM + DRAM main memory on embedded systems. IEEE Trans. Very Large Scale Integ. Syst. 23 (2015), 654663. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Hu J., Zhuge Q., Xue C. J., Tseng W.-C., and Sha E. H.. 2013. Software enabled wear-leveling for hybrid PCM main memory on embedded systems. In Proceedings of the Conference on Design, Automation and Test in Europe. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Hubicka J.. 2005. Profile driven optimisations in GCC. In Proceedings of the GCC Summit.Google ScholarGoogle Scholar
  31. [31] IBM. 2018. POWER9 Performance Monitor Unit User’s Guide. Retrieved from https://wiki.raptorcs.com/w/images/6/6b/POWER9_PMU_UG_v12_28NOV2018_pub.pdf.Google ScholarGoogle Scholar
  32. [32] Intel. 2018. Memory Optimizer. Retrieved from https://github.com/intel/memory-optimizer.Google ScholarGoogle Scholar
  33. [33] Intel. 2018. Persistent Memory Documentation. Retrieved from https://docs.pmem.io/persistent-memory/.Google ScholarGoogle Scholar
  34. [34] Intel. 2019. MEMKIND. Retrieved from https://github.com/memkind/memkind.Google ScholarGoogle Scholar
  35. [35] Intel. 2021. Intel® 64 and IA-32 Architectures Optimization Reference Manual. Retrieved from https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-optimization-reference-manual.Google ScholarGoogle Scholar
  36. [36] Intel. 2021. Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes 3B: System Programming Guide. Retrieved from https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-volume-3b-system-programming-guide-part-2.Google ScholarGoogle Scholar
  37. [37] JEDEC. 2012. JEDEC Standard: DDR4 SDRAM.Google ScholarGoogle Scholar
  38. [38] JEDEC. 2015. High Bandwidth Memory (HBM) DRAM.Google ScholarGoogle Scholar
  39. [39] Jung D., Li S., and Ahn J.. 2016. Large pages on steroids: Small ideas to accelerate big memory applications. IEEE Comput. Archit. Lett. 15 (2016), 101104. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Kanev S., Darago J. P., Hazelwood K., Ranganathan P., Moseley T., Wei G.-Y., and Brooks D.. 2015. Profiling a warehouse-scale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Kanev S., Xi S. L., Wei G.-Y., and Brooks D.. 2017. Mallacc: Accelerating memory allocation. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Kannan S., Gavrilovska A., Gupta V., and Schwan K.. 2017. HeteroOS: OS design for heterogeneous memory management in datacenter. In Proceedings of the 44th Annual International Symposium on Computer Architecture. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Khaldi D. and Chapman B.. 2016. Towards automatic HBM allocation using LLVM: A case study with Knights Landing. In Proceedings of the 3rd Workshop on the LLVM Compiler Infrastructure in HPC. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Krishnaiyer R., Kultursay E., Chawla P., Preis S., Zvezdin A., and Saito H.. 2013. Compiler-based data prefetching and streaming non-temporal store generation for the Intel(R) Xeon Phi(TM) coprocessor. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Kwak H., Lee C., Park H., and Moon S.. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Kwon Y., Yu H., Peter S., Rossbach C. J., and Witchel E.. 2016. Coordinated and efficient huge page management with Ingens. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Lee S., Jeon B., Kang K., Ka D., Kim N., Kim Y., Hong Y., Kang M., Min J., Lee M., Jeong C., Kim K., Lee D., Shin J., Han Y., Shim Y., Kim Y., Kim Y., Kim H., Yun J., Kim B., Han S., Lee C., Song J., Song H., Park I., Kim Y., Chun J., and Oh J.. 2019. 23.4 A 512GB 1.1 V Managed DRAM solution with 16GB ODP and media controller. In Proceedings of the IEEE International Solid-State Circuits Conference. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Leidel J. and Murphy R. C.. 2015. Hybrid Memory Cube System Interconnect Directory-Based Cache Coherence Methodology. US Patent App. 14/706,516.Google ScholarGoogle Scholar
  49. [49] Linux. 2009. Transparent Hugepages. Retrieved from https://lwn.net/Articles/359158.Google ScholarGoogle Scholar
  50. [50] Linux. 2018. PMEM NUMA Node and Hotness Accounting/Migration. Retrieved from https://lkml.org/lkml/2018/12/26/138.Google ScholarGoogle Scholar
  51. [51] Looi L. and Jianping J. X.. 2019. Intel Optane data center persistent memory. In Proceedings of the IEEE Hot Chips 31 Symposium. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Luk C.-K., Muth R., Patil H., Cohn R., and Lowney G.. 2004. Ispike: A post-link optimizer for the Intel Itanium architecture. In Proceedings of the International Symposium on Code Generation and Optimization. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Maas M., Andersen D. G., Isard M., Javanmard M. M., McKinley K. S., and Raffel C.. 2020. Learning-based memory allocation for C++ server workloads. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Magee J. and Qasem A.. 2009. A case for compiler-driven superpage allocation. In Proceedings of the 47th Annual Southeast Regional Conference. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Malik M. and Homayoun H.. 2015. Big data on low power cores: Are low power embedded processors a good fit for the big data workloads? In Proceedings of the 33rd IEEE International Conference on Computer Design. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Marathe J. and Mueller F.. 2006. Hardware profile-guided automatic page placement for ccNUMA systems. In Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] McCalpin J. D.. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Societ. Technic. Committ. Comput. Archit. Newsl. 84 (1995), 1925.Google ScholarGoogle Scholar
  58. [58] Merrill J.. 2003. GENERIC and GIMPLE: A new tree representation for entire functions. In Proceedings of the GCC Summit.Google ScholarGoogle Scholar
  59. [59] Meswani M. R., Blagodurov S., Roberts D., Slice J., Ignatowski M., and Loh G. H.. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In Proceedings of the IEEE 21st International Symposium on High Performance Computer Architecture. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Micron. 2016. 3D XPoint Technology. Retrieved from https://www.micron.com/products/advanced-solutions/3d-xpoint-technology.Google ScholarGoogle Scholar
  61. [61] Nai L., Xia Y., Tanase I. G., Kim H., and Lin C.-Y.. 2015. GraphBIG: Understanding graph computing in the context of industrial solutions. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Narayan A., Zhang T., Aga S., Narayanasamy S., and Coskun A.. 2018. MOCA: Memory object classification and allocation in heterogeneous memory systems. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Navarro J., Iyer S., Druschel P., and Cox A.. 2002. Practical, transparent operating system support for superpages. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Oh D., Moon Y., Lee E., Ham T. J., Park Y., Lee J. W., and Ahn J.. 2021. MaPHeA: A lightweight memory hierarchy-aware profile-guided heap allocation framework. In Proceedings of the 22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Olson M. B., Teague J. T., Rao D., Jantz M. R., Doshi K. A., and Kulkarni P. A.. 2018. Cross-layer memory management to improve DRAM energy efficiency. ACM Trans. Archit. Code Optim. 15 (2018), 127. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Ottoni G. and Maher B.. 2017. Optimizing function placement for large-scale data-center applications. In Proceedings of the International Symposium on Code Generation and Optimization. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Panchenko M., Auler R., Nell B., and Ottoni G.. 2019. BOLT: A practical binary optimizer for data centers and beyond. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Panigrahi P. K. and Tripathy H. K.. 2015. Low complexicity graph based navigation and path finding of mobile robot using BFS. In Proceedings of the 2nd International Conference on Perception and Machine Intelligence. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Panwar A., Bansal S., and Gopinath K.. 2019. HawkEye: Efficient fine-grained os support for huge pages. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Pettis K. and Hansen R. C.. 1990. Profile guided code positioning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Qureshi M. K. and Loh G. H.. 2012. Fundamental latency trade-off in architecting DRAM caches: Outperforming impractical SRAM-tags with a simple and practical design. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. [72] Qureshi M. K., Srinivasan V., and Rivers J. A.. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. [73] Redis. 2020. redis.io. Retrieved from https://redis.io.Google ScholarGoogle Scholar
  74. [74] Rudoff A.. 2017. Persistent memory programming. Login: Usenix Mag. 42 (2017), 3440.Google ScholarGoogle Scholar
  75. [75] Sandberg A., Eklöv D., and Hagersten E.. 2010. Reducing cache pollution through detection and elimination of non-temporal memory accesses. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. [76] Savage J. and Jones T. M.. 2020. HALO: Post-link heap-layout optimisation. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. [77] Seidl M. L. and Zorn B. G.. 1998. Segregating heap objects by reference behavior and lifetime. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. [78] Servat H., Peña A. J., Llort G., Mercadal E., Hoppe H., and Labarta J.. 2017. Automating the application data placement in hybrid memory systems. In Proceedings of the IEEE International Conference on Cluster Computing. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  79. [79] Tian Y., Puthoor S., Greathouse J. L., Beckmann B. M., and Jiménez D. A.. 2015. Adaptive GPU cache bypassing. In Proceedings of the 8th Workshop on General Purpose Processing Using GPUs. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. [80] Wang C., Cui H., Cao T., Zigman J., Volos H., Mutlu O., Lv F., Feng X., and Xu G. H.. 2019. Panthera: Holistic memory management for big data processing over hybrid memories. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. [81] Wang L., Zhan J., Luo C., Zhu Y., Yang Q., He Y., Gao W., Jia Z., Shi Y., Zhang S., Zheng C., Lu G., Zhan K., Li X., and Qiu B.. 2014. BigDataBench: A big data benchmark suite from internet services. In Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  82. [82] Wen S., Cherkasova L., Lin F. X., and Liu X.. 2018. ProfDP: A lightweight profiler to guide data placement in heterogeneous memory systems. In Proceedings of the International Conference on Supercomputing. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Wicht B., Vitillo R. A., Chen D., and Levinthal D.. 2014. Hardware Counted Profile-Guided Optimization. arXiv:1411.6361 [cs.PL].Google ScholarGoogle Scholar
  84. [84] Woo S. C., Ohara M., Torrie E., Singh J. P., and Gupta A.. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Wu K., Huang T., and Li D.. 2017. Unimem: Runtime data management on non-volatile memory-based heterogeneous main memory. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. [86] Wu K., Ren J., and Li D.. 2018. Runtime data management on non-volatile memory-based heterogeneous memory for task-parallel programs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. [87] Xie X., Liang Y., Sun G., and Chen D.. 2013. An efficient compiler framework for cache bypassing on GPUs. In Proceedings of the International Conference on Computer-aided Design. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  88. [88] Xie X., Liang Y., Wang Y., Sun G., and Wang T.. 2015. Coordinated static and dynamic cache bypassing for GPUs. In Proceedings of the IEEE 21st International Symposium on High Performance Computer Architecture. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Yan Z., Lustig D., Nellans D., and Bhattacharjee A.. 2019. Nimble page management for tiered memory systems. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. [90] Yang A. M., Österlund E., and Wrigstad T.. 2020. Improving program locality in the GC using hotness. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. [91] Zhang Z., Jia Z., Liu P., and Ju L.. 2016. Energy efficient real-time task scheduling for embedded systems with hybrid main memory. In Proceedings of the IEEE 20th International Conference on Embedded and Real-time Computing Systems and ApplicationsDOI:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. MaPHeA: A Framework for Lightweight Memory Hierarchy-aware Profile-guided Heap Allocation

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 22, Issue 1
    January 2023
    512 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3567467
    • Editor:
    • Tulika Mitra
    Issue’s Table of Contents

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 December 2022
    • Online AM: 31 March 2022
    • Accepted: 20 March 2022
    • Revised: 10 January 2022
    • Received: 15 October 2021
    Published in tecs Volume 22, Issue 1

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Refereed
  • Article Metrics

    • Downloads (Last 12 months)470
    • Downloads (Last 6 weeks)69

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!