Abstract
In the last decade, the number of available cores increased and heterogeneity grew. In this work, we ask the question whether the design of the current operating systems (OSes) is still appropriate if these trends continue and lead to abundantly available but heterogeneous cores, or whether it forces a fundamental rethinking of how systems are designed. We argue that: 1. hiding heterogeneity behind a common hardware interface unifies, to a large extent, the control and coordination of cores and accelerators in the OS, 2. isolating at the network-on-chip rather than with processor features (like privileged mode, memory management unit, ...), allows running untrusted code on arbitrary cores, and 3. providing OS services via protocols over the network-on-chip, instead of via system calls, makes them accessible to arbitrary types of cores as well.
In summary, this turns accelerators into first-class citizens and enables a single and convenient programming environment for all cores without the need to trust any application.
In this paper, we introduce network-on-chip-level isolation, present the design of our microkernel-based OS, M3, and the common hardware interface, and evaluate the performance of our prototype in comparison to Linux. A bit surprising, without using accelerators, M3 outperforms Linux in some application-level benchmarks by more than a factor of five.
- BusyBox. http://www.busybox.net/. last checked: 01/19/2015.Google Scholar
- An introduction to the Intel® QuickPath interconnect. http://www.intel.de/content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf. last checked: 01/19/2015.Google Scholar
- J. Ahn, M. Fiorentino, R. G. Beausoleil, N. Binkert, A. Davis, D. Fattal, N. P. Jouppi, M. McLaren, C. M. Santori, R. S. Schreiber, S. M. Spillane, D. Vantrease, and Q. Xu. Devices and architectures for photonic chip-scale integration. Applied Physics A, 95(4):989--997, 2009.Google Scholar
Cross Ref
- R. Alpert, C. Dubnicki, E.W. Felten, and K. Li. Design and implementation of NX message passing using Shrimp virtual memory mapped communication. In Proceedings of the 1996 International Conference on Parallel Processing, volume 1, pages 111--119, Aug 1996.Google Scholar
Cross Ref
- Oliver Arnold, Emil Matus, Benedikt Noethen, Markus Winter, Torsten Limberg, and Gerhard Fettweis. Tomahawk: Parallelism and heterogeneity in communications signal processing MPSoCs. ACM Transactions on Embedded Computing Systems, 13(3s):107:1--107:24, March 2014.Google Scholar
Digital Library
- F.J. Ballesteros, N. Evans, C. Forsyth, G. Guardiola, J. McKie, R. Minnich, and E. Soriano-Salvador. NIX: A case for a manycore system for cloud computing. Bell Labs Technical Journal, 17(2):41--54, 2012.Google Scholar
Digital Library
- Antonio Barbalace, Marina Sadini, Saif Ansary, Christopher Jelesnianski, Akshay Ravichandran, Cagil Kendir, Alastair Murray, and Binoy Ravindran. Popcorn: Bridging the programmability gap in heterogeneous-ISA platforms. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys '15), pages 29:1--29:16, New York, NY, USA, 2015. ACM.Google Scholar
Digital Library
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09), pages 29--44, New York, NY, USA, 2009. ACM.Google Scholar
Digital Library
- Cadence. Xtensa customizable processor. http://ip.cadence.com. last checked: 01/19/2015.Google Scholar
- Koushik Chakraborty, Philip M. Wells, and Gurindar S. Sohi. Computation spreading: Employing hardware migration to specialize CMP cores on-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII), pages 283--292, New York, NY, USA, 2006. ACM.Google Scholar
Digital Library
- Emilio G. Cota, Paolo Mantovani, Giuseppe Di Guglielmo, and Luca P. Carloni. An analysis of accelerator coupling in heterogeneous architectures. In Proceedings of the 52nd Annual Design Automation Conference (DAC '15), pages 202:1--202:6, New York, NY, USA, 2015. ACM.Google Scholar
Digital Library
- R.H. Dennard, V.L. Rideout, E. Bassous, and A.R. LeBlanc. Design of ion-implanted MOSFET's with very small physical dimensions. Solid-State Circuits, IEEE Journal of, 9(5):256--268, Oct 1974.Google Scholar
- Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 9(3):143--155, March 1966.Google Scholar
Digital Library
- Manuel Fahndrich, Mark Aiken, Chris Hawblitzel, Orion Hodson, Galen Hunt, James R. Larus, and Steven Levi. Language support for fast and reliable message-based communication in Singularity OS. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems, pages 177--190, New York, NY, USA, 2006. ACM.Google Scholar
Digital Library
- Norman Feske. A case study on the cost and benefit of dynamic RPC marshalling for low-level system components. ACM SIGOPS Operating Systems Review, 41(4):40--48, July 2007.Google Scholar
Digital Library
- L. Fiorin, G. Palermo, S. Lukovic, V. Catalano, and C. Silvano. Secure memory accesses on networks-on-chip. IEEE Transactions on Computers, 57(9):1216--1229, Sept 2008.Google Scholar
Digital Library
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03), pages 29--43, New York, NY, USA, 2003. ACM.Google Scholar
Digital Library
- N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(4):6--15, July 2011.Google Scholar
Digital Library
- Norman Hardy. KeyKOS architecture. ACM SIGOPS Operating Systems Review, 19(4):8--25, October 1985.Google Scholar
Digital Library
- John Heinlein, Kourosh Gharachorloo, Scott Dresser, and Anoop Gupta. Integration of message passing and shared memory in the Stanford FLASH multiprocessor. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 38--50, New York, NY, USA, 1994. ACM.Google Scholar
Digital Library
- K.U. Jarvinen and J.O. Skytta. High-speed elliptic curve cryptography accelerator for koblitz curves. In 16th International Symposium on Field-Programmable Custom Computing Machines (FCCM '08), pages 109--118, April 2008.Google Scholar
Digital Library
- Sangman Kim, Seonggu Huh, Yige Hu, Xinya Zhang, Amir Wated, Emmett Witchel, and Mark Silberstein. GPUnet: Networking abstractions for GPU programs. In Proceedings of the International Conference on Operating Systems Design and Implementation, pages 6--8, 2014.Google Scholar
- Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. seL4: Formal verification of an OS kernel. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pages 207--220, New York, NY, USA, 2009. ACM.Google Scholar
Digital Library
- George Kurian, Jason E. Miller, James Psota, Jonathan Eastep, Jifeng Liu, Jurgen Michel, Lionel C. Kimerling, and Anant Agarwal. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10), pages 477--488, New York, NY, USA, 2010. ACM.Google Scholar
Digital Library
- J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The Stanford FLASH multiprocessor. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 302--313, Apr 1994.Google Scholar
Digital Library
- Adam Lackorzynski and Alexander Warg. Taming subsystems: Capabilities as universal resource access control in L4. In Proceedings of the Second Workshop on Isolation and Integration in Embedded Systems (IIES '09), pages 25--30, New York, NY, USA, 2009. ACM.Google Scholar
Digital Library
- J. Liedtke. On micro-kernel construction. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP '95), pages 237--250, New York, NY, USA, 1995. ACM.Google Scholar
Digital Library
- Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. Thin servers with smart pipes: Designing SoC accelerators for memcached. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13), pages 36--47, New York, NY, USA, 2013. ACM.Google Scholar
Digital Library
- Felix Xiaozhu Lin, Zhen Wang, and Lin Zhong. K2: A mobile operating system for heterogeneous coherence domains. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14), pages 285--300, New York, NY, USA, 2014. ACM.Google Scholar
Digital Library
- Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. PuDianNao: A polyvalent machine learning accelerator. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 369--381. ACM, 2015.Google Scholar
Digital Library
- K. Mackenzie, J. Kubiatowicz, M. Frank, W. Lee, W. Lee, A. Agarwal, and M.F. Kaashoek. Exploiting two-case delivery for fast protected messaging. In Fourth International Symposium on High-Performance Computer Architecture, pages 231--242, Feb 1998.Google Scholar
Cross Ref
- Avantika Mathur, Mingming Cao, Suparna Bhattacharya, Andreas Dilger, Alex Tomas, and Laurent Vivier. The new ext4 filesystem: current status and future plans. In Proceedings of the Linux Symposium, volume 2, pages 21--33, 2007.Google Scholar
- Edmund B. Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, and Galen Hunt. Helios: Heterogeneous multiprocessing with satellite kernels. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09), pages 221--234, New York, NY, USA, 2009. ACM.Google Scholar
Digital Library
- Mike Parker, Al Davis, and Wilson Hsieh. Message-passing for the 21st century: Integrating user-level networks with SMT. In Proceedings of the 5th Workshop on Multithreaded Execution, Architecture and Compilation, 2001.Google Scholar
- Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey, et al. Plan 9 from Bell Labs. In Proceedings of the Summer 1990 UKUUG Conference, pages 1--9. London, UK, 1990.Google Scholar
- J. Porquet, A. Greiner, and C. Schwarz. NoC-MPU: A secure architecture for flexible co-hosting on shared memory MPSoCs. In Design, Automation Test in Europe Conference Exhibition (DATE), 2011, pages 1--4, March 2011.Google Scholar
Cross Ref
- Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, and Mark Horowitz. Convolution engine: Balancing efficiency and flexibility in specialized computing. Communications of the ACM, 58(4):85--93, March 2015.Google Scholar
Digital Library
- Ohad Rodeh, Josef Bacik, and Chris Mason. BTRFS: The Linux B-tree filesystem. ACM Transactions on Storage (TOS), 9(3):9:1--9:32, August 2013.Google Scholar
- Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11), pages 233--248, New York, NY, USA, 2011. ACM.Google Scholar
Digital Library
- Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. GPUfs: Integrating a file system with GPUs. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13), pages 485--498, New York, NY, USA, 2013. ACM.Google Scholar
Digital Library
- Livio Soares and Michael Stumm. FlexSC: Flexible system call scheduling with exception-less system calls. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI '10), pages 1--8, Berkeley, CA, USA, 2010. USENIX Association.Google Scholar
- Udo Steinberg and Bernhard Kauer. NOVA: A microhypervisor-based secure virtualization architecture. In Proceedings of the 5th European Conference on Computer Systems (EuroSys '10), pages 209--222, New York, NY, USA, 2010. ACM.Google Scholar
Digital Library
- M.B. Taylor. A landscape of the new dark silicon design regime. IEEE Micro, 33(5):8--19, Sept 2013.Google Scholar
Digital Library
- David Wentzlaff and Anant Agarwal. Factored operating systems (fos): The case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review, 43(2):76--85, April 2009.Google Scholar
Digital Library
- Jonathan Woodruff, Robert N.M. Watson, David Chisnall, Simon W. Moore, Jonathan Anderson, Brooks Davis, Ben Laurie, Peter G. Neumann, Robert Norton, and Michael Roe. The CHERI capability model: Revisiting RISC in an age of risk. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14), pages 457--468, Piscataway, NJ, USA, 2014. IEEE Press.Google Scholar
Cross Ref
- Lisa Wu, Raymond J. Barker, Martha A. Kim, and Kenneth A. Ross. Navigating big data with high-throughput, energy-efficient data partitioning. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13), pages 249--260, New York, NY, USA, 2013. ACM.Google Scholar
Digital Library
- Wei Yu and Yun He. A high performance CABAC decoding architecture. IEEE Transactions on Consumer Electronics, 51(4):1352--1359, Nov 2005.Google Scholar
Digital Library
Index Terms
M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores
Recommendations
M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsIn the last decade, the number of available cores increased and heterogeneity grew. In this work, we ask the question whether the design of the current operating systems (OSes) is still appropriate if these trends continue and lead to abundantly ...
M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores
ASPLOS'16In the last decade, the number of available cores increased and heterogeneity grew. In this work, we ask the question whether the design of the current operating systems (OSes) is still appropriate if these trends continue and lead to abundantly ...
Rinnegan: Efficient Resource Use in Heterogeneous Architectures
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationCurrent processors provide a variety of different processing units to improve performance and power efficiency. For example, ARM's big.LITTLE, AMD's APUs, and Oracle's M7 provide heterogeneous processors, on-die GPUs, and on-die accelerators. However, ...







Comments