skip to main content
10.1145/3466752.3480075acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

COSPlay: Leveraging Task-Level Parallelism for High-Throughput Synchronous Persistence

Published: 17 October 2021 Publication History

Abstract

A key challenge in programming crash-consistent applications for Persistent Memory (PM) is achieving high performance while controlling the order of PM updates. Managing persist ordering from the CPU typically requires frequent synchronization points, which expose the PM’s high persist latency on the execution’s critical path. To mitigate this overhead, prior proposals relax the persistency model and decouple persistence from the program’s volatile execution, delegating persistence ordering to specialized hardware mechanisms such that persistent state lags behind volatile state. In this work, we identify the opportunity to mitigate the effect of persist latency by leveraging the task-level parallelism available in many PM applications, while preserving the stricter semantics of synchronous persistence and the familiar x86 persistency model.
We introduce COSPlay, a software-hardware co-design that employs coroutines and rapid userspace context switching to hide persist latency by overlapping persist operations across concurrent tasks. Modest CPU extensions enable the hardware to fully overlap persists of different contexts, while preserving intra-context ordering to meet crash consistency requirements. COSPlay boosts the throughput of crash-consistent applications by up to 1.7 × on systems with basic PM support. For systems with higher persist latency due to added backend memory operations, such as encryption and deduplication, COSPlay’s performance gains grow to 2.2 − 7.3 ×.

References

[1]
Mohammad A. Alshboul, James Tuck, and Yan Solihin. 2018. Lazy Persistency: A High-Performing and Write-Efficient Software Persistency Technique. In Proceedings of the 45th International Symposium on Computer Architecture (ISCA). 439–451.
[2]
Luiz André Barroso, Mike Marty, David A. Patterson, and Parthasarathy Ranganathan. 2017. Attack of the killer microseconds. Commun. ACM 60, 4 (2017), 48–54.
[3]
boost-coroutines [n.d.]. Performance of coroutine switching in the Boost 1.74.0 C++ library. https://www.boost.org/doc/libs/1_74_0/libs/context/doc/html/context/performance.html
[4]
Dhruva R. Chakrabarti, Hans-Juergen Boehm, and Kumud Bhandari. 2014. Atlas: leveraging locks for non-volatile memory consistency. In Proceedings of the 29th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 433–452.
[5]
co2 [n.d.]. CO2 - Coroutine II: A C++ await/yield emulation library for stackless coroutine. https://github.com/jamboree/co2
[6]
Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XVI). 105–118.
[7]
Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin C. Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP). 133–146.
[8]
cpp-coroutines [n.d.]. Coroutines in C++20. https://en.cppreference.com/w/cpp/language/coroutines
[9]
Mahesh Dananjaya, Vasilis Gavrielatos, Arpit Joshi, and Vijay Nagarajan. 2020. Lazy Release Persistency. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXV). 1173–1186.
[10]
Vaibhav Gogte, Stephan Diestelhorst, William Wang, Satish Narayanasamy, Peter M. Chen, and Thomas F. Wenisch. 2018. Persistency for synchronization-free regions. In Proceedings of the ACM SIGPLAN 2018 Conference on Programming Language Design and Implementation (PLDI). 46–61.
[11]
Vaibhav Gogte, William Wang, Stephan Diestelhorst, Peter M. Chen, Satish Narayanasamy, and Thomas F. Wenisch. 2020. Relaxed Persist Ordering Using Strand Persistency. In Proceedings of the 47th International Symposium on Computer Architecture (ISCA). 652–665.
[12]
Dibakar Gope, Arkaprava Basu, Sooraj Puthoor, and Mitesh Meswani. 2018. A Case for Scoped Persist Barriers in GPUs. In Proceedings of the 11th Workshop on General Purpose GPUs. 2–12.
[13]
Siddharth Gupta, Alexandros Daglis, and Babak Falsafi. 2019. Distributed Logless Atomic Durability with Persistent Memory. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 466–478.
[14]
Swapnil Haria, Mark D. Hill, and Michael M. Swift. 2020. MOD: Minimally Ordered Durable Datastructures for Persistent Memory. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXV). 775–788.
[15]
Yongjun He, Jiacheng Lu, and Tianzheng Wang. 2020. CoroBase: Coroutine-Oriented Main-Memory Database Engine. Proc. VLDB Endow. 14, 3 (2020), 431–444.
[16]
Terry Ching-Hsiang Hsu, Helge Brügner, Indrajit Roy, Kimberly Keeton, and Patrick Eugster. 2017. NVthreads: Practical Persistence for Multi-threaded Applications. In Proceedings of the 2017 EuroSys Conference. 468–482.
[17]
Intel [n.d.]. Intel Cascade Lake Microarchitecture. https://en.wikichip.org/wiki/intel/microarchitectures/cascade_lake
[18]
Intel Corporation. 2020. Intel® architecture instruction set extensions and future features programming reference.https://www.intel.com/content/www/us/en/products/memory-storage/optane-dc-persistent-memory.html
[19]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R. Dulloor, Jishen Zhao, and Steven Swanson. 2019. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module. CoRR abs/1903.05714(2019).
[20]
Christopher Jonathan, Umar Farooq Minhas, James Hunter, Justin J. Levandoski, and Gor V. Nishanov. 2018. Exploiting Coroutines to Attack the ”Killer Nanoseconds”. Proc. VLDB Endow. 11, 11 (2018), 1702–1714.
[21]
Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, and Stratis Viglas. 2015. Efficient persist barriers for multicores. In Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 660–671.
[22]
Arpit Joshi, Vijay Nagarajan, Stratis Viglas, and Marcelo Cintra. 2017. ATOM: Atomic Durability in Non-volatile Memory through Hardware Logging. In Proceedings of the 23rd IEEE Symposium on High-Performance Computer Architecture (HPCA). 361–372.
[23]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In Proceedings of the 12th Symposium on Operating System Design and Implementation (OSDI). 185–201.
[24]
Aasheesh Kolli, Vaibhav Gogte, Ali G. Saidi, Stephan Diestelhorst, Peter M. Chen, Satish Narayanasamy, and Thomas F. Wenisch. 2017. Language-level persistency. In Proceedings of the 44th International Symposium on Computer Architecture (ISCA). 481–493.
[25]
Aasheesh Kolli, Jeff Rosen, Stephan Diestelhorst, Ali G. Saidi, Steven Pelley, Sihang Liu, Peter M. Chen, and Thomas F. Wenisch. 2016. Delegated persist ordering. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 58:1–58:13.
[26]
Chuanpeng Li, Chen Ding, and Kai Shen. 2007. Quantifying the cost of context switch. In Experimental Computer Science. 2.
[27]
Changhui Lin, Vijay Nagarajan, and Rajiv Gupta. 2014. Fence Scoping. In Proceedings of the 2014 ACM/IEEE Conference on Supercomputing (SC). 105–116.
[28]
Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, Weimin Zheng, and Jinglei Ren. 2017. DudeTM: Building Durable Transactions with Decoupling for Persistent Memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXII). 329–343.
[29]
Sihang Liu, Korakit Seemakhupt, Gennady Pekhimenko, Aasheesh Kolli, and Samira Manabi Khan. 2019. Janus: optimizing memory and storage support for non-volatile memory systems. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA). 143–156.
[30]
Sara Mahdizadeh-Shahri, Seyed Armin Vakil-Ghahani, and Aasheesh Kolli. 2020. (Almost) Fence-less Persist Ordering. In Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 539–554.
[31]
Amirsaman Memaripour, Joseph Izraelevitz, and Steven Swanson. 2020. Pronto: Easy and Fast Persistence for Volatile Data Structures. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXV). 789–806.
[32]
Sanketh Nalli, Swapnil Haria, Mark D. Hill, Michael M. Swift, Haris Volos, and Kimberly Keeton. 2017. An Analysis of Persistent Memory Use with WHISPER. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXII). 135–148.
[33]
Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2015. Latency-Tolerant Software Distributed Shared Memory. In Proceedings of the 2015 USENIX Annual Technical Conference (ATC). 291–305.
[34]
Tri Minh Nguyen and David Wentzlaff. 2018. PiCL: A Software-Transparent, Persistent Cache Log for Nonvolatile Main Memory. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 507–519.
[35]
Stanko Novakovic, Yizhou Shan, Aasheesh Kolli, Michael Cui, Yiying Zhang, Haggai Eran, Boris Pismenny, Liran Liss, Michael Wei, Dan Tsafrir, and Marcos K. Aguilera. 2019. Storm: a fast transactional dataplane for remote data structures. In Proceedings of the 12th ACM International Conference on Systems and Storage (SYSTOR). 97–108.
[36]
optane [n.d.]. Intel Optane DC Persistent Memory. https://www.intel.com/content/www/us/en/products/memory-storage/optane-dc-persistent-memory.html
[37]
Steven Pelley, Peter M. Chen, and Thomas F. Wenisch. 2014. Memory persistency. In Proceedings of the 41st International Symposium on Computer Architecture (ISCA). 265–276.
[38]
Georgios Psaropoulos, Thomas Legler, Norman May, and Anastasia Ailamaki. 2017. Interleaving with Coroutines: A Practical Approach for Robust Index Joins. Proc. VLDB Endow. 11, 2 (2017), 230–242.
[39]
Georgios Psaropoulos, Ismail Oukid, Thomas Legler, Norman May, and Anastasia Ailamaki. 2019. Bridging the Latency Gap between NVM and DRAM for Latency-bound Operations. In Proceedings of the 15th International Workshop on Data Management on New Hardware (DaMoN). 13:1–13:8.
[40]
J.P. Shen and M.H. Lipasti. 2013. Modern Processor Design: Fundamentals of Superscalar Processors. Waveland Press.
[41]
Seunghee Shin, Satish Kumar Tirukkovalluri, James Tuck, and Yan Solihin. 2017. Proteus: a flexible and fast software supported hardware logging approach for NVM. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 178–190.
[42]
Seunghee Shin, James Tuck, and Yan Solihin. 2017. Hiding the Long Latency of Persist Barriers Using Speculative Execution. In Proceedings of the 44th International Symposium on Computer Architecture (ISCA). 175–186.
[43]
G. Edward Suh, Dwaine E. Clarke, Blaise Gassend, Marten van Dijk, and Srinivas Devadas. 2003. Efficient Memory Integrity Verification and Encryption for Secure Processors. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 339–350.
[44]
Daniel Sánchez and Christos Kozyrakis. 2013. ZSim: fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of the 40th International Symposium on Computer Architecture (ISCA). 475–486.
[45]
Dan Tsafrir. 2007. The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops). In Experimental Computer Science. 4.
[46]
Alexander van Renen, Lukas Vogel, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2020. Building blocks for persistent memory. VLDB J. 29, 6 (2020), 1223–1241.
[47]
Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: lightweight persistent memory. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XVI). 91–104.
[48]
Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. 2018. Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!. In Proceedings of the 13th Symposium on Operating System Design and Implementation (OSDI). 233–251.
[49]
Jishen Zhao, Sheng Li, Doe Hyun Yoon, Yuan Xie, and Norman P. Jouppi. 2013. Kiln: closing the performance gap between systems with and without persistence support. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 421–432.
[50]
Pengfei Zuo, Yu Hua, Ming Zhao, Wen Zhou, and Yuncheng Guo. 2018. Improving the Performance and Endurance of Encrypted Non-Volatile Main Memory through Deduplicating Writes. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 442–454.

Cited By

View all
  • (2023)JASS: A Tunable Checkpointing System for NVM-Based Systems2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00032(164-173)Online publication date: 18-Dec-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
October 2021
1322 pages
ISBN:9781450385572
DOI:10.1145/3466752
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. coroutines
  2. crash consistency
  3. persist ordering
  4. persistent memory
  5. task-level parallelism

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MICRO '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)61
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)JASS: A Tunable Checkpointing System for NVM-Based Systems2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00032(164-173)Online publication date: 18-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media