skip to main content
research-article

M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

In the last decade, the number of available cores increased and heterogeneity grew. In this work, we ask the question whether the design of the current operating systems (OSes) is still appropriate if these trends continue and lead to abundantly available but heterogeneous cores, or whether it forces a fundamental rethinking of how systems are designed. We argue that: 1. hiding heterogeneity behind a common hardware interface unifies, to a large extent, the control and coordination of cores and accelerators in the OS, 2. isolating at the network-on-chip rather than with processor features (like privileged mode, memory management unit, ...), allows running untrusted code on arbitrary cores, and 3. providing OS services via protocols over the network-on-chip, instead of via system calls, makes them accessible to arbitrary types of cores as well.

In summary, this turns accelerators into first-class citizens and enables a single and convenient programming environment for all cores without the need to trust any application.

In this paper, we introduce network-on-chip-level isolation, present the design of our microkernel-based OS, M3, and the common hardware interface, and evaluate the performance of our prototype in comparison to Linux. A bit surprising, without using accelerators, M3 outperforms Linux in some application-level benchmarks by more than a factor of five.

References

  1. BusyBox. http://www.busybox.net/. last checked: 01/19/2015.Google ScholarGoogle Scholar
  2. An introduction to the Intel® QuickPath interconnect. http://www.intel.de/content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf. last checked: 01/19/2015.Google ScholarGoogle Scholar
  3. J. Ahn, M. Fiorentino, R. G. Beausoleil, N. Binkert, A. Davis, D. Fattal, N. P. Jouppi, M. McLaren, C. M. Santori, R. S. Schreiber, S. M. Spillane, D. Vantrease, and Q. Xu. Devices and architectures for photonic chip-scale integration. Applied Physics A, 95(4):989--997, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Alpert, C. Dubnicki, E.W. Felten, and K. Li. Design and implementation of NX message passing using Shrimp virtual memory mapped communication. In Proceedings of the 1996 International Conference on Parallel Processing, volume 1, pages 111--119, Aug 1996.Google ScholarGoogle ScholarCross RefCross Ref
  5. Oliver Arnold, Emil Matus, Benedikt Noethen, Markus Winter, Torsten Limberg, and Gerhard Fettweis. Tomahawk: Parallelism and heterogeneity in communications signal processing MPSoCs. ACM Transactions on Embedded Computing Systems, 13(3s):107:1--107:24, March 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F.J. Ballesteros, N. Evans, C. Forsyth, G. Guardiola, J. McKie, R. Minnich, and E. Soriano-Salvador. NIX: A case for a manycore system for cloud computing. Bell Labs Technical Journal, 17(2):41--54, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Antonio Barbalace, Marina Sadini, Saif Ansary, Christopher Jelesnianski, Akshay Ravichandran, Cagil Kendir, Alastair Murray, and Binoy Ravindran. Popcorn: Bridging the programmability gap in heterogeneous-ISA platforms. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys '15), pages 29:1--29:16, New York, NY, USA, 2015. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09), pages 29--44, New York, NY, USA, 2009. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cadence. Xtensa customizable processor. http://ip.cadence.com. last checked: 01/19/2015.Google ScholarGoogle Scholar
  10. Koushik Chakraborty, Philip M. Wells, and Gurindar S. Sohi. Computation spreading: Employing hardware migration to specialize CMP cores on-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII), pages 283--292, New York, NY, USA, 2006. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Emilio G. Cota, Paolo Mantovani, Giuseppe Di Guglielmo, and Luca P. Carloni. An analysis of accelerator coupling in heterogeneous architectures. In Proceedings of the 52nd Annual Design Automation Conference (DAC '15), pages 202:1--202:6, New York, NY, USA, 2015. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R.H. Dennard, V.L. Rideout, E. Bassous, and A.R. LeBlanc. Design of ion-implanted MOSFET's with very small physical dimensions. Solid-State Circuits, IEEE Journal of, 9(5):256--268, Oct 1974.Google ScholarGoogle Scholar
  13. Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 9(3):143--155, March 1966.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Manuel Fahndrich, Mark Aiken, Chris Hawblitzel, Orion Hodson, Galen Hunt, James R. Larus, and Steven Levi. Language support for fast and reliable message-based communication in Singularity OS. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems, pages 177--190, New York, NY, USA, 2006. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Norman Feske. A case study on the cost and benefit of dynamic RPC marshalling for low-level system components. ACM SIGOPS Operating Systems Review, 41(4):40--48, July 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Fiorin, G. Palermo, S. Lukovic, V. Catalano, and C. Silvano. Secure memory accesses on networks-on-chip. IEEE Transactions on Computers, 57(9):1216--1229, Sept 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03), pages 29--43, New York, NY, USA, 2003. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(4):6--15, July 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Norman Hardy. KeyKOS architecture. ACM SIGOPS Operating Systems Review, 19(4):8--25, October 1985.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. John Heinlein, Kourosh Gharachorloo, Scott Dresser, and Anoop Gupta. Integration of message passing and shared memory in the Stanford FLASH multiprocessor. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 38--50, New York, NY, USA, 1994. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K.U. Jarvinen and J.O. Skytta. High-speed elliptic curve cryptography accelerator for koblitz curves. In 16th International Symposium on Field-Programmable Custom Computing Machines (FCCM '08), pages 109--118, April 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sangman Kim, Seonggu Huh, Yige Hu, Xinya Zhang, Amir Wated, Emmett Witchel, and Mark Silberstein. GPUnet: Networking abstractions for GPU programs. In Proceedings of the International Conference on Operating Systems Design and Implementation, pages 6--8, 2014.Google ScholarGoogle Scholar
  23. Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. seL4: Formal verification of an OS kernel. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pages 207--220, New York, NY, USA, 2009. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. George Kurian, Jason E. Miller, James Psota, Jonathan Eastep, Jifeng Liu, Jurgen Michel, Lionel C. Kimerling, and Anant Agarwal. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10), pages 477--488, New York, NY, USA, 2010. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The Stanford FLASH multiprocessor. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 302--313, Apr 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Adam Lackorzynski and Alexander Warg. Taming subsystems: Capabilities as universal resource access control in L4. In Proceedings of the Second Workshop on Isolation and Integration in Embedded Systems (IIES '09), pages 25--30, New York, NY, USA, 2009. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Liedtke. On micro-kernel construction. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP '95), pages 237--250, New York, NY, USA, 1995. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. Thin servers with smart pipes: Designing SoC accelerators for memcached. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13), pages 36--47, New York, NY, USA, 2013. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Felix Xiaozhu Lin, Zhen Wang, and Lin Zhong. K2: A mobile operating system for heterogeneous coherence domains. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14), pages 285--300, New York, NY, USA, 2014. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. PuDianNao: A polyvalent machine learning accelerator. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 369--381. ACM, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Mackenzie, J. Kubiatowicz, M. Frank, W. Lee, W. Lee, A. Agarwal, and M.F. Kaashoek. Exploiting two-case delivery for fast protected messaging. In Fourth International Symposium on High-Performance Computer Architecture, pages 231--242, Feb 1998.Google ScholarGoogle ScholarCross RefCross Ref
  32. Avantika Mathur, Mingming Cao, Suparna Bhattacharya, Andreas Dilger, Alex Tomas, and Laurent Vivier. The new ext4 filesystem: current status and future plans. In Proceedings of the Linux Symposium, volume 2, pages 21--33, 2007.Google ScholarGoogle Scholar
  33. Edmund B. Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, and Galen Hunt. Helios: Heterogeneous multiprocessing with satellite kernels. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09), pages 221--234, New York, NY, USA, 2009. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mike Parker, Al Davis, and Wilson Hsieh. Message-passing for the 21st century: Integrating user-level networks with SMT. In Proceedings of the 5th Workshop on Multithreaded Execution, Architecture and Compilation, 2001.Google ScholarGoogle Scholar
  35. Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey, et al. Plan 9 from Bell Labs. In Proceedings of the Summer 1990 UKUUG Conference, pages 1--9. London, UK, 1990.Google ScholarGoogle Scholar
  36. J. Porquet, A. Greiner, and C. Schwarz. NoC-MPU: A secure architecture for flexible co-hosting on shared memory MPSoCs. In Design, Automation Test in Europe Conference Exhibition (DATE), 2011, pages 1--4, March 2011.Google ScholarGoogle ScholarCross RefCross Ref
  37. Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, and Mark Horowitz. Convolution engine: Balancing efficiency and flexibility in specialized computing. Communications of the ACM, 58(4):85--93, March 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ohad Rodeh, Josef Bacik, and Chris Mason. BTRFS: The Linux B-tree filesystem. ACM Transactions on Storage (TOS), 9(3):9:1--9:32, August 2013.Google ScholarGoogle Scholar
  39. Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11), pages 233--248, New York, NY, USA, 2011. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. GPUfs: Integrating a file system with GPUs. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13), pages 485--498, New York, NY, USA, 2013. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Livio Soares and Michael Stumm. FlexSC: Flexible system call scheduling with exception-less system calls. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI '10), pages 1--8, Berkeley, CA, USA, 2010. USENIX Association.Google ScholarGoogle Scholar
  42. Udo Steinberg and Bernhard Kauer. NOVA: A microhypervisor-based secure virtualization architecture. In Proceedings of the 5th European Conference on Computer Systems (EuroSys '10), pages 209--222, New York, NY, USA, 2010. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M.B. Taylor. A landscape of the new dark silicon design regime. IEEE Micro, 33(5):8--19, Sept 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. David Wentzlaff and Anant Agarwal. Factored operating systems (fos): The case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review, 43(2):76--85, April 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jonathan Woodruff, Robert N.M. Watson, David Chisnall, Simon W. Moore, Jonathan Anderson, Brooks Davis, Ben Laurie, Peter G. Neumann, Robert Norton, and Michael Roe. The CHERI capability model: Revisiting RISC in an age of risk. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14), pages 457--468, Piscataway, NJ, USA, 2014. IEEE Press.Google ScholarGoogle ScholarCross RefCross Ref
  46. Lisa Wu, Raymond J. Barker, Martha A. Kim, and Kenneth A. Ross. Navigating big data with high-throughput, energy-efficient data partitioning. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13), pages 249--260, New York, NY, USA, 2013. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wei Yu and Yun He. A high performance CABAC decoding architecture. IEEE Transactions on Consumer Electronics, 51(4):1352--1359, Nov 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 51, Issue 4
          ASPLOS '16
          April 2016
          774 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2954679
          • Editor:
          • Andy Gill
          Issue’s Table of Contents
          • cover image ACM Conferences
            ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
            March 2016
            824 pages
            ISBN:9781450340915
            DOI:10.1145/2872362
            • General Chair:
            • Tom Conte,
            • Program Chair:
            • Yuanyuan Zhou

          Copyright © 2016 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 March 2016

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!