ABSTRACT
The TLB is increasingly a bottleneck for big data applications. In most designs, the number of TLB entries are highly constrained by latency requirements, and growing much more slowly than the working sets of applications. Many solutions to this problem, such as huge pages, perforated pages, or TLB coalescing, rely on physical contiguity for performance gains, yet the cost of defragmenting memory can easily nullify these gains. This paper introduces mosaic pages, which increase TLB reach by compressing multiple, discrete translations into one TLB entry. Mosaic leverages virtual contiguity for locality, but does not use physical contiguity. Mosaic relies on recent advances in hashing theory to constrain memory mappings, in order to realize this physical address compression without reducing memory utilization or increasing swapping. This paper presents a full-system prototype of Mosaic, in gem5 and modified Linux. In simulation and with comparable hardware to a traditional design, mosaic reduces TLB misses in several workloads by 6-81%. Our results show that Mosaic’s constraints on memory mappings do not harm performance, we never see conflicts before memory is 98% full in our experiments — at which point, a traditional design would also likely swap. Once memory is over-committed, Mosaic swaps fewer pages than Linux in most cases. Finally, we present timing and area analysis for a verilog implementation of the hashing function required on the critical path for the TLB, and show that on a commercial 28nm CMOS process; the circuit runs at a maximum frequency of 4 GHz, indicating that a mosaic TLB is unlikely to affect clock frequency.
- Jeongseob Ahn, Seongwook Jin, and Jaehyuk Huh. 2015. Fast Two-Level Address Translation for Virtualized Systems. IEEE Trans. Comput., 64, 12 (2015), dec, 3461–3474. issn:0018-9340 https://doi.org/10.1109/tc.2015.2401022
Google Scholar
Digital Library
- Chloe Alverti, Stratos Psomadakis, Vasileios Karakostas, Jayneel Gandhi, Konstantinos Nikas, Georgios Goumas, and Nectarios Koziris. 2020. Enhancing and Exploiting Contiguity for Fast Memory Virtualization. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA ’20). IEEE, Virtual Event. 515–528. isbn:9781728146614 https://doi.org/10.1109/ISCA45697.2020.00050
Google Scholar
Digital Library
- Thomas W. Barr, Alan L. Cox, and Scott Rixner. 2010. Translation Caching: Skip, Don’t Walk (the Page Table). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA ’10). ACM, New York, NY, USA. 48–59. isbn:9781450300537 https://doi.org/10.1145/1815961.1815970
Google Scholar
Digital Library
- Thomas W. Barr, Alan L. Cox, and Scott Rixner. 2011. SpecTLB: A Mechanism for Speculative Address Translation. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA ’11). ACM, New York, NY, USA. 307–318. isbn:9781450304726 https://doi.org/10.1145/2000064.2000101
Google Scholar
Digital Library
- Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. 2013. Efficient Virtual Memory for Big Memory Servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA ’13). ACM, New York, NY, USA. 237–248. isbn:9781450320795 https://doi.org/10.1145/2485922.2485943
Google Scholar
Digital Library
- Arkaprava Basu, Mark D. Hill, and Michael M. Swift. 2012. Reducing Memory Reference Energy with Opportunistic Virtual Caching. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA ’12). IEEE Computer Society, USA. 297–308. isbn:9781450316422
Google Scholar
Digital Library
- Michael A. Bender, Abhishek Bhattacharjee, Alex Conway, Martín Farach-Colton, Rob Johnson, Sudarsun Kannan, William Kuszmaul, Nirjhar Mukherjee, Don Porter, Guido Tagliavini, Janet Vorobyeva, and Evan West. 2021. Paging and the Address-Translation Problem. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’21). ACM, New York, NY, USA. 105–117. isbn:9781450380706 https://doi.org/10.1145/3409964.3461814
Google Scholar
Digital Library
- Michael A. Bender, Alex Conway, Martín Farach-Colton, William Kuszmaul, and Guido Tagliavini. 2021. All-Purpose Hashing. https://doi.org/10.48550/ARXIV.2109.04548
Google Scholar
- Michael A. Bender, Alex Conway, Martín Farach-Colton, William Kuszmaul, and Guido Tagliavini. 2023. Tiny Pointers. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’23). Society for Industrial and Applied Mathematics, USA. 477–508. https://doi.org/10.1137/1.9781611977554.ch21 arxiv:https://epubs.siam.org/doi/pdf/10.1137/1.9781611977554.ch21.
Google Scholar
Cross Ref
- Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, and Srilatha Manne. 2008. Accelerating Two-Dimensional Page Walks for Virtualized Systems. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). ACM, New York, NY, USA. 26–35. isbn:9781595939586 https://doi.org/10.1145/1346281.1346286
Google Scholar
Digital Library
- Abhishek Bhattacharjee. 2013. Large-Reach Memory Management Unit Caches. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA. 383–394. isbn:9781450326384 https://doi.org/10.1145/2540708.2540741
Google Scholar
Digital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News, 39, 2 (2011), aug, 1–7. issn:0163-5964 https://doi.org/10.1145/2024716.2024718
Google Scholar
Digital Library
- Yann Collet. 2016. xxHash: Extremely fast hash algorithm. https://cyan4973.github.io/xxHash/
Google Scholar
- Intel Coorporation. 2022. Intel 64 and IA-32 architectures optimization reference manual.
Google Scholar
- Cort Dougan, Paul Mackerras, and Victor Yodaiken. 1999. Optimizing the Idle Task and Other MMU Tricks. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI ’99). USENIX Association, USA. 229–237. isbn:1880446391 https://doi.org/10.5555/296806.296833
Google Scholar
Digital Library
- Yu Du, Miao Zhou, Bruce R Childers, Daniel Mossé, and Rami Melhem. 2015. Supporting Superpages in Non-Contiguous Physical Memory. In Proceedings of the 21st International Symposium on High Performance Computer Architecture (HPCA ’15). IEEE, USA. 223–234. https://doi.org/10.1109/hpca.2015.7056035
Google Scholar
Cross Ref
- Stephane Eranian and David Mosberger. 2000. The Linux/ia64 Project: Kernel Design and Status Update. HP LABORATORIES TECHNICAL REPORT HPL.
Google Scholar
- James R. Goodman. 1987. Coherency for Multiprocessor Virtual Address Caches. In Proceedings of the Second International Conference on Architectual Support for Programming Languages and Operating Systems (ASPLOS II). ACM, ew York, NY, USA. 72–81. isbn:0818608056 https://doi.org/10.1145/36206.36186
Google Scholar
Digital Library
- Mel Gorman. 2010. Linux Huge Pages. https://lwn.net/Articles/375096/
Google Scholar
- Mel Gorman. 2018. AMD Zen Architecture. https://en.wikichip.org/wiki/amd/microarchitectures/zen
Google Scholar
- Charles Gray, Matthew Chapman, Peter Chubb, David Mosberger-Tang, and Gernot Heiser. 2005. Itanium: A System Implementor’s Tale. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC ’05). USENIX Association, USA. 264–278.
Google Scholar
- Joe Heinrich. 1994. MIPS R4000 Microprocessor User’s Manual.
Google Scholar
- Mark D. Hill and Alan Jay Smith. 1984. Experimental Evaluation of On-Chip Microprocessor Cache Memories. In Proceedings of the 11th Annual International Symposium on Computer Architecture (ISCA ’84). ACM, New York, NY, USA. 158–166. isbn:0818605383 https://doi.org/10.1145/800015.808178
Google Scholar
Digital Library
- Michal Hocko and Tomas Kalibera. 2010. Reducing Performance Non-Determinism via Cache-Aware Page Allocation Strategies. In Proceedings of the First Joint WOSP/SIPEW International Conference on Performance Engineering (WOSP/SIPEW ’10). ACM, New York, NY, USA. 223–234. isbn:9781605585635 https://doi.org/10.1145/1712605.1712640
Google Scholar
Digital Library
- Jerry Huck and Jim Hays. 1993. Architectural Support for Translation Table Management in Large Address Space Machines. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA ’93). ACM, New York, NY, USA. 39–50. isbn:0-8186-3810-9 https://doi.org/10.1145/165123.165128
Google Scholar
Digital Library
- Bruce L. Jacob and Trevor N. Mudge. 1998. A Look at Several Memory Management Units, TLB-Refill Mechanisms, and Page Table Organizations. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). ACM, New York, NY, USA. 295–306. isbn:1581131070 https://doi.org/10.1145/291069.291065
Google Scholar
Digital Library
- Konstantinos Kanellopoulos, Rahul Bera, Kosta Stojiljkovic, Can Firtina, Rachata Ausavarungnirun, Nastaran Hajinazar, Jisung Park, Nandita Vijaykumar, and Onur Mutlu. 2022. Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping. https://doi.org/10.48550/arXiv.2211.12205 arxiv:2211.12205.
Google Scholar
- Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman Ünsal. 2015. Redundant Memory Mappings for Fast Access to Large Memories. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA ’15). ACM, New York, NY, USA. 66–78. isbn:9781450334020 https://doi.org/10.1145/2749469.2749471
Google Scholar
Digital Library
- Vasileios Karakostas, Jayneel Gandhi, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman S. Unsal. 2016. Energy-efficient address translation. In Proceedings of the 22nd International Symposium on High Performance Computer Architecture (HPCA ’16). IEEE, USA. 631–643. https://doi.org/10.1109/HPCA.2016.7446100
Google Scholar
Cross Ref
- Stefanos Kaxiras and Alberto Ros. 2013. A New Perspective for Efficient Virtual-Cache Coherence. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA ’13). ACM, New York, NY, USA. 535–546. isbn:9781450320795 https://doi.org/10.1145/2485922.2485968
Google Scholar
Digital Library
- Richard E Kessler and Mark D Hill. 1992. Page Placement Algorithms for Large Real-Indexed Caches. ACM Transactions on Computer Systems, 10, 4 (1992), nov, 338–359. issn:0734-2071 https://doi.org/10.1145/138873.138876
Google Scholar
Digital Library
- Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and Efficient Huge Page Management with Ingens. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI ’16). USENIX Association, USA. 705–721. isbn:978-1-931971-33-1 https://doi.org/10.5555/3026877.3026931
Google Scholar
Digital Library
- John S. Liptay. 1968. Structural Aspects of the System/360 Model 85: II the Cache. IBM Systems Journal, 7, 1 (1968), mar, 15–21. issn:0018-8670 https://doi.org/10.1147/sj.71.0015
Google Scholar
Digital Library
- Artemiy Margaritov, Dmitrii Ustiugov, Edouard Bugnion, and Boris Grot. 2019. Prefetched Address Translation. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52). ACM, New York, NY, USA. 1023–1036. isbn:9781450369381 https://doi.org/10.1145/3352460.3358294
Google Scholar
Digital Library
- Chris Mellor. 2022. SK hynix announces CXL 2 memory cards and SDK. https://blocksandfiles.com/2022/08/02/sk-hynix-announces-cxl-2-memory-cards-and-sdk/
Google Scholar
- 2022. "Disable Transparent Huge Pages (THP)". https://www.mongodb.com/docs/manual/tutorial/transparent-huge-pages/
Google Scholar
- Juan Navarro, Sitaram Iyer, Peter Druschel, and Alan Cox. 2002. Practical, Transparent Operating System Support for Superpages. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI ’02). USENIX Association, USA. 89–104. isbn:9781450301114 https://doi.org/10.5555/1060289.1060299
Google Scholar
Digital Library
- Prashant Pandey, Michael A. Bender, Alex Conway, Martín Farach-Colton, William Kuszmaul, Guido Tagliavini, and Rob Johnson. 2023. IcebergHT: High Performance PMEM Hash Tables Through Stability and Low Associativity. In Proceedings of the 2023 International Conference on Management of Data, to be published (SIGMOD ’23). ACM, New York, NY, USA.
Google Scholar
- Ashish Panwar, Sorav Bansal, and K. Gopinath. 2019. HawkEye: Efficient Fine-Grained OS Support for Huge Pages. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’19). ACM, New York, NY, USA. 347–360. isbn:9781450362405 https://doi.org/10.1145/3297858.3304064
Google Scholar
Digital Library
- Chang Hyun Park, Sanghoon Cha, Bokyeong Kim, Youngjin Kwon, David Black-Schaffer, and Jaehyuk Huh. 2020. Perforated Page: Supporting Fragmented Memory Allocation for Large Pages. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA ’20). IEEE Press, Virtual Event. 913–925. isbn:9781728146614 https://doi.org/10.1109/ISCA45697.2020.00079
Google Scholar
Digital Library
- Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh. 2016. Efficient Synonym Filtering and Scalable Delayed Translation for Hybrid Virtual Caching. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA ’16). IEEE, Seoul, Republic of Korea. 217–229. isbn:9781467389471 https://doi.org/10.1109/ISCA.2016.28
Google Scholar
Digital Library
- Chang Hyun Park, Taekyung Heo, Jungi Jeong, and Jaehyuk Huh. 2017. Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA ’17). ACM, New York, NY, USA. 444–456. isbn:9781450348928 https://doi.org/10.1145/3079856.3080217
Google Scholar
Digital Library
- Mihai Patrascu and Mikkel Thorup. 2011. The Power of Simple Tabulation Hashing. In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing (STOC ’11). ACM, New York, NY, USA. 1–10. isbn:9781450306911 https://doi.org/10.1145/1993636.1993638
Google Scholar
Digital Library
- Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H. Loh. 2014. Increasing TLB reach by exploiting clustering in page translations. In Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA ’14). IEEE, Los Alamitos, CA, USA. 558–567. issn:1530-0897 https://doi.org/10.1109/HPCA.2014.6835964
Google Scholar
Cross Ref
- Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. 2012. CoLT: Coalesced Large-Reach TLBs. In Proceedings of the 45th International Symposium on Microarchitecture (MICRO-45). IEEE, USA. 258–269. https://doi.org/10.1109/MICRO.2012.32
Google Scholar
Digital Library
- Binh Pham, Ján Veselý, Gabriel H. Loh, and Abhishek Bhattacharjee. 2015. Large Pages and Lightweight Memory Management in Virtualized Environments: Can You Have It Both Ways? In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA. 1–12. isbn:9781450340342 https://doi.org/10.1145/2830772.2830773
Google Scholar
Digital Library
- Javier Picorel, Djordje Jevdjic, and Babak Falsafi. 2017. Near-Memory Address Translation. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques (PACT ’17). IEEE Computer Society, Los Alamitos, CA, USA. 303–317. https://doi.org/10.1109/PACT.2017.56
Google Scholar
Cross Ref
- 2022. Redis Administration. https://redis.io/docs/manual/admin/
Google Scholar
- Dimitrios Skarlatos, Apostolos Kokolis, Tianyin Xu, and Josep Torrellas. 2020. Elastic Cuckoo Page Tables: Rethinking Virtual Memory Translation for Parallelism. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). ACM, New York, NY, USA. 1093–1108. isbn:9781450371025 https://doi.org/10.1145/3373376.3378493
Google Scholar
Digital Library
- Alan Jay Smith. 1978. A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory. IEEE Transactions on Software Engineering, SE-4, 2 (1978), mar, 121–130. issn:0098-5589 https://doi.org/10.1109/TSE.1978.231482
Google Scholar
Digital Library
- 2021. Transparent huge memory pages and Splunk performance. https://docs.splunk.com/Documentation/Splunk/7.3.1/ReleaseNotes/SplunkandTHP
Google Scholar
- Jovan Stojkovic, Dimitrios Skarlatos, Apostolos Kokolis, Tianyin Xu, and Josep Torrellas. 2022. Parallel Virtualized Memory Translation with Nested Elastic Cuckoo Page Tables. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22). ACM, New York, NY, USA. 84–97. isbn:9781450392051 https://doi.org/10.1145/3503222.3507720
Google Scholar
Digital Library
- Mark Swanson, Leigh Stoller, and John Carter. 1998. Increasing TLB Reach Using Superpages Backed by Shadow Memory. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA ’98). IEEE Computer Society, USA. 204–213. isbn:0818684917 https://doi.org/10.1145/279361.279388
Google Scholar
Digital Library
- Michael M. Swift. 2017. Towards O(1) Memory. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS ’17). ACM, New York, NY, USA. 7–11. isbn:9781450350686 https://doi.org/10.1145/3102980.3102982
Google Scholar
Digital Library
- Madhusudhan Talluri and Mark D. Hill. 1994. Surpassing the TLB Performance of Superpages with Less Operating System Support. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI). ACM, New York, NY, USA. 171–182. isbn:0897916603 https://doi.org/10.1145/195473.195531
Google Scholar
Digital Library
- M. Talluri, M. D. Hill, and Y. A. Khalidi. 1995. A New Page Table for 64-Bit Address Spaces. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP ’95). ACM, New York, NY, USA. 184–200. isbn:0897917154 https://doi.org/10.1145/224056.224071
Google Scholar
Digital Library
- Xulong Tang, Ziyu Zhang, Weizheng Xu, Mahmut Taylan Kandemir, Rami Melhem, and Jun Yang. 2020. Enhancing Address Translations in Throughput Processors via Compression. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT ’20). ACM, New York, NY, USA. 191–204. https://doi.org/10.1145/3410463.3414633
Google Scholar
Digital Library
- George Taylor, Peter Davies, and Michael Farmwald. 1990. The TLB Slice—a Low-Cost High-Speed Address Translation Mechanism. In Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA ’90). ACM, New York, NY, USA. 355–363. isbn:0897913663 https://doi.org/10.1145/325164.325161
Google Scholar
Digital Library
- Berthold Vöcking. 2003. How Asymmetry Helps Load Balancing. Journal of the ACM, 50, 4 (2003), jul, 568–589. issn:0004-5411 https://doi.org/10.1145/792538.792546
Google Scholar
Digital Library
- 2022. VoltDB Administrator’s Guide, S2.3 - Configure Memory Management. https://docs.voltdb.com/AdminGuide/adminmemmgt.php
Google Scholar
- W. H. Wang, J.-L. Baer, and H. M. Levy. 1989. Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy. In Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA ’89). ACM, New York, NY, USA. 140–148. isbn:0897913191 https://doi.org/10.1145/74925.74942
Google Scholar
Digital Library
- Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. 2019. Translation Ranger: Operating System Support for Contiguity-Aware TLBs. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA ’19). ACM, New York, NY, USA. 698–710. isbn:9781450366694 https://doi.org/10.1145/3307650.3322223
Google Scholar
Digital Library
- Idan Yaniv and Dan Tsafrir. 2016. Hash, Don’t Cache (the Page Table). In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science (SIGMETRICS ’16). ACM, New York, NY, USA. 337–350. isbn:9781450342667 https://doi.org/10.1145/2896377.2901456
Google Scholar
Digital Library
- Hongil Yoon and Gurindar S. Sohi. 2016. Revisiting virtual L1 caches: A practical design using dynamic synonym remapping. In Proceedings of the 22nd International Symposium on High Performance Computer Architecture (HPCA ’16). IEEE, USA. 212–224. https://doi.org/10.1109/HPCA.2016.7446066
Google Scholar
Cross Ref
- Lixin Zhang, Evan Speight, Ram Rajamony, and Jiang Lin. 2010. Enigma: Architectural and Operating System Support for Reducing the Impact of Address Translation. In Proceedings of the 24th ACM International Conference on Supercomputing (ICS ’10). ACM, New York, NY, USA. 159–168. isbn:9781450300186 https://doi.org/10.1145/1810085.1810109
Google Scholar
Digital Library
- Weixi Zhu, Alan L. Cox, and Scott Rixner. 2020. A Comprehensive Analysis of Superpage Management Mechanisms and Policies. In Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference (ATC ’20). USENIX Association, USA. Article 57, 14 pages. isbn:978-1-939133-14-4 https://doi.org/10.5555/3489146.3489203
Google Scholar
Digital Library
- Sudarsun Kannan and Jaehyun Han. 2023. oscarlab/mosaic-asplos23-artifacts: Mosaic ASPLOS’23 Artifacts. https://doi.org/10.5281/zenodo.7709303
Google Scholar
Digital Library
Index Terms
Mosaic Pages: Big TLB Reach with Small Pages
Recommendations
Filtering Translation Bandwidth with Virtual Caching
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating SystemsHeterogeneous computing with GPUs integrated on the same chip as CPUs is ubiquitous, and to increase programmability many of these systems support virtual address accesses from GPU hardware. However, this entails address translation on every memory ...
Filtering Translation Bandwidth with Virtual Caching
ASPLOS '18Heterogeneous computing with GPUs integrated on the same chip as CPUs is ubiquitous, and to increase programmability many of these systems support virtual address accesses from GPU hardware. However, this entails address translation on every memory ...
DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems
Conventional on-chip TLB hierarchies are unable to fully cover the growing application working-set sizes. To make things worse, Last-Level TLB (LLT) misses require multiple accesses to the page table even with the use of page walk caches. Consequently, ...






Comments