Abstract
In shared-memory multiprocessing, fine-grain synchronization is challenging because it requires frequent communication. As technology scaling delivers larger manycore chips, such pattern is expected to remain costly to support. In this paper, we propose to address this challenge by using on-chip wireless communication. Each core has a transceiver and an antenna to communicate with all the other cores. This environment supports very low latency global communication. Our architecture, called WiSync, uses a per-core Broadcast Memory (BM). When a core writes to its BM, all the other 100+ BMs get updated in less than 10 processor cycles. We also use a second wireless channel with cheaper transfers to execute barriers efficiently. WiSync supports multiprogramming, virtual memory, and context switching. Our evaluation with simulations of 128-threaded kernels and 64-threaded applications shows that WiSync speeds-up synchronization substantially. Compared to using advanced conventional synchronization, WiSync attains an average speedup of nearly one order of magnitude for the kernels, and 1.12 for PARSEC and SPLASH-2.
- S. Abadal, E. Alarcon, M. C. Lemme, M. Nemirovsky, and A. Cabellos-Aparicio. Graphene-enabled Wireless Communication for Massive Multicore Architectures. IEEE Communications Magazine, 51(11):137--143, 2013.Google Scholar
Cross Ref
- S. Abadal, M. Iannazzo, M. Nemirovsky, A. Cabellos-Aparicio, and E. Alarcon. On the Area and Energy Scalability of Wireless Network-on-Chip: A Model-based Benchmarked Design Space Exploration. IEEE/ACM Transactions on Net- working, 23(5):1, 2015.Google Scholar
Cross Ref
- S. Abadal, B. Sheinman, O. Katz, O. Markish, D. Elad, Y. Fournier, D. Roca, M. Hanzich, G. Houzeaux, M. Nemirovsky, E. Alarcon, and A. Cabellos-Aparicio. Broadcast-Enabled Massive Multicore Architectures: A Wireless RF Approach. IEEE MICRO, 35(5):52--61, 2015.Google Scholar
- J. L. Abellan, J. Fernandez, and M. E. Acacio. GLocks: Efficient Support for Highly-contended Locks in Many-core CMPs. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, pages 893--905, May 2011.Google Scholar
- J. L. Abellan, J. Fernandez, and M. E. Acacio. Efficient Hardware Barrier Synchronization in Many-Core CMPs. IEEE Transactions on Parallel and Distributed Systems, 23(8): 1453--1466, 2012.Google Scholar
Digital Library
- C. Batten, A. Joshi, V. Stojanovic, and K. Asanovic. Designing Chip-Level Nanophotonic Interconnection Networks. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2(2):137--153, 2012.Google Scholar
Cross Ref
- C. Beckmann and C. Polychronopoulos. Fast Barrier Synchronization Hardware. Proceedings of Supercomputing, November 1990.Google Scholar
Cross Ref
- S. Benedetto and E. Biglieri. Principles of Digital Transmission with Wireless Applications. Springer Science and Business Media, 1999.Google Scholar
- C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pages 72--81, October 2008.Google Scholar
Digital Library
- A. Carpenter, J. Hu, O. Kocabas, M. Huang, and H. Wu. Enhancing Effective Throughput for Transmission Line-based Bus. In Proceedings of the 39th Annual International Symposium on Computer Architecture, pages 165--176, June 2012.Google Scholar
Digital Library
- M. F. Chang, J. Cong, A. Kaplan, M. Naik, G. Reinman, E. Socher, and S.-W. Tam. CMP Network-on-Chip Overlaid With Multi-Band RF-Interconnect. In Proceedings of the 14th International Symposium on High Performance Computer Architecture, pages 191--202, February 2008.Google Scholar
- Cray Research Inc. CRAY T3D System Architecture Overview, 1993.Google Scholar
- S. Deb, A. Ganguly, P. P. Pande, B. Belzer, and D. Heo. Wireless NoC as Interconnection Backbone for Multicore Chips: Promises and Challenges. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2(2):228--239, 2012.Google Scholar
Cross Ref
- S. Deb, K. Chang, X. Yu, S. Sah, M. Cosic, P. P. Pande, B. Belzer, and D. Heo. Design of an Energy Efficient CMOS Compatible NoC Architecture with Millimeter-Wave Wireless Interconnects. IEEE Transactions on Computers, 62(12): 2382--2396, 2013.Google Scholar
Digital Library
- A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu, P. Coteus, M. E. Giampapa, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-Burow, T. Takken, and P. Vranas. Overview of the Blue Gene/L System Architecture. In IBM Journal of Research and Development, March/May 2005.Google Scholar
- J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach, Fifth Edition. Morgan Kaufmann, 2012.Google Scholar
- Intel Corporation. Intel Products. ark.intel.com, 2015.Google Scholar
- C. H. Jan, M. Agostinelli, H. Deshpande, M. a. El-Tanani, W. Hafez, U. Jalan, L. Janbay, M. Kang, H. Lakdawala, J. Lin, Y. L. Lu, S. Mudanai, J. Park, A. Rahman, J. Rizk, W. K. Shin, K. Soumyanath, H. Tashiro, C. Tsai, P. VanDerVoorn, J. Y. Yeh, and P. Bai. RF CMOS Technology Scaling in High-k/Metal Gate Era for RF SoC (System-on-Chip) Applications. In Proceedings of the IEEE International Electron Devices Meeting, pages 604--607, December 2010.Google Scholar
Cross Ref
- S. Kaya, S. Laha, A. Kodi, D. Ditomaso, D. Matolak, and W. Rayess. On Ultra-short Wireless Interconnects for NoCs and SoCs: Bridging the THz Gap. In Proceedings of the IEEE 56th International Midwest Symposium on Circuits and Systems, pages 804--808, August 2013.Google Scholar
Cross Ref
- B. Khamaisi, S. Jameson, and E. Socher. A 210-227 GHz Transmitter With Integrated On-Chip Antenna in 90 nm CMOS Technology. IEEE Transactions on Terahertz Science and Technology, 3(2):141--150, 2013.Google Scholar
Cross Ref
- N. Kirman, M. Kirman, R. Dokania, J. F. Martinez, A. B. Apsel, M. A. Watkins, and D. H. Albonesi. Leveraging Optical Technology in Future Bus-based Chip Multiprocessors. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 492--503, December 2006.Google Scholar
Digital Library
- T. Krishna, L. Peh, B. Beckmann, and S. K. Reinhardt. Towards the Ideal On-chip Fabric for 1-to-many and Many-to-1 Communication. In Proceedings of the 44th Annual International Symposium on Microarchitecture, pages 71--82, December 2011.Google Scholar
Digital Library
- G. Kurian, J. Miller, J. Psota, J. Eastep, J. Liu, J. Michel, L. Kimerling, and A. Agarwal. ATAC: A 1000-Core Cache- Coherent Processor with On-Chip Optical Network. In Proceedings of the 19th international conference on Parallel Architectures and Compilation Techniques, pages 477--488, September 2010.Google Scholar
Digital Library
- S. Laha, S. Kaya, D. W. Matolak, W. Rayess, D. DiTomaso, and A. Kodi. A New Frontier in Ultralow Power Wireless Links: Network-on-Chip and Chip-to-Chip Interconnects. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(2):186--198, 2015.Google Scholar
Cross Ref
- J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In International Symposium on Computer Architecture (ISCA), June 1997.Google Scholar
Digital Library
- C.-K. Liang and Milos Prvulovic. MiSAR: Minimalistic Synchronization Accelerator with Resource Overflow Management. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, pages 414--426, June 2015.Google Scholar
- B.-H. Lim and A. Agarwal. Reactive Synchronization Algorithms for Multiprocessors. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, November 1994.Google Scholar
Digital Library
- O. Markish, B. Sheinman, O. Katz., D. Corcos, and D. Elad. On-chip mmWave Antennas and Transceivers. In Proceedings of the 9th IEEE/ACM International Symposium on Networks on Chip, September 2015.Google Scholar
Digital Library
- D. Matolak, A. Kodi, S. Kaya, D. DiTomaso, S. Laha, and W. Rayess. Wireless Networks-on-Chips: Architecture, Wireless Channel, and Devices. IEEE Wireless Communications, 19(5), 2012.Google Scholar
- F. H. McMahon. The Livermore Fortran Kernels: A Computer Test Of The Numerical Performance Range. Technical report, Lawrence Livermore National Laboratory, 1986.Google Scholar
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, 1991.Google Scholar
Digital Library
- R. M. Metcalfe and D. R. Boggs. Ethernet: Distributed Packet Switching for Local Computer Networks. Communications of the ACM, 19(7):395--404, 1976.Google Scholar
- J. Oh, A. Zajic, and M. Prvulovic. TLSync: Support for Multiple Fast Barriers Using On-chip Transmission Lines. In Proceedings of the 38th Annual International Symposium on Computer Architecture, pages 105--115, June 2011.Google Scholar
Digital Library
- J. Oh, A. Zajic, and M. Prvulovic. Traffic Steering Between a Low-latency Unswitched TL Ring and a High-throughput Switched On-chip Interconnect. In Proceedings of the 22nd International conference on Parallel Architectures and Compilation Techniques, pages 309--318, September 2013.Google Scholar
Digital Library
- J.-D. Park, S. Kang, S. Thyagarajan, E. Alon, and A. Niknejad. A 260 GHz Fully Integrated CMOS Transceiver for Wireless Chip-to-chip Communication. In Proceedings of the IEEE Symposium on VLSI Circuits, pages 48--49, June 2012.Google Scholar
Cross Ref
- T. S. Rappaport, J. N. Murdock, and F. Gutierrez. State of the Art in 60-GHz Integrated Circuits and Systems for Wireless Communications. Proceedings of the IEEE, 99(8): 1390--1436, 2011.Google Scholar
Cross Ref
- J. Sampson, R. Gonzalez, J. F. Collard, N. P. Jouppi, M. Schlansker, and B. Calder. Exploiting Fine-grained Data Parallelism with Chip Multiprocessors and Fast Barriers. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 235--246, December 2006.Google Scholar
Digital Library
- S. Scott. Synchronization and Communication in the T3E Multiprocessor. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1996.Google Scholar
Digital Library
- E. Seok, D. Shim, C. Mao, R. Han, S. Sankaran, C. Cao, W. Knap, and K. K. O. Progress and Challenges towards Terahertz CMOS Integrated Circuits. IEEE Journal of Solid- State Circuits, 45(8):1554--1564, 2010.Google Scholar
Cross Ref
- S. Shang and K. Hwang. Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters. IEEE Transactions on Parallel and Distributed Systems, 6(6):591--605, 1995.Google Scholar
Digital Library
- E. Socher and M.-C. F. Chang. Can RF Help CMOS Processors? IEEE Communications Magazine, 45(8):104--111, 2007.Google Scholar
Digital Library
- C. Sun, M. T. Wade, Y. Lee, J. S. Orcutt, L. Alloatti, M. S. Georgas, A. S. Waterman, J. M. Shainline, R. R. Avizienis, S. Lin, B. R. Moss, R. Kumar, F. Pavanello, A. H. Atabaki, H. M. Cook, A. J. Ou, J. C. Leu, Y.-H. Chen, K. Asanovic, R. J. Ram, M. A. Popovic, and V. M. Stojanovic. Single-chip Microprocessor that Communicates Directly Using Light. Nature, 528(7583):534--538, 2015.Google Scholar
Cross Ref
- G. Sun, S.-H. Weng, C.-K. Cheng, B. Lin, and L. Zeng. An On-chip Global Broadcast Network Design with Equalized Transmission Lines in the 1024-core Era. In Proceedings of the International Workshop on System Level Interconnect Prediction, pages 11--18, June 2012.Google Scholar
Digital Library
- R. Ubal, P. Mistry, D. Schaa, H. Ave, and D. Kaeli. Multi2Sim: A Simulation Framework for CPU-GPU Computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pages 335--344, September 2012.Google Scholar
Digital Library
- M. Uzunkol and G. M. Rebeiz. A Low-Loss 50-70 GHz SPDT Switch in 90 nm CMOS. IEEE Journal of Solid-State Circuits, 45(10):2003--2007, 2010.Google Scholar
Cross Ref
- D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. Jouppi, M. Fiorentino, A. Davis, N. Binkert, R. Beausoleil, and J. Ahn. Corona: System Implications of Emerging Nanophotonic Technology. In Proceedings of the 35th International Symposium on Computer Architecture, June 2008.Google Scholar
Digital Library
- D. Vantrease, M. H. Lipasti, and N. Binkert. Atomic Coherence: Leveraging Nanophotonics to Build Race-free Cache Coherence Protocols. In Proceedings of the 17th International Symposium on High Performance Computer Architecture, pages 132--143, February 2011.Google Scholar
Cross Ref
- Z. Wang, P. Y. Chiang, P. Nazari, C. C. Wang, Z. Chen, and P. Heydari. A CMOS 210-GHz Fundamental Transceiver with OOK Modulation. IEEE Journal of Solid-State Circuits, 49 (3):564--580, 2014.Google Scholar
Cross Ref
- N. Weissman and E. Socher. 9mW 6Gbps Bi-directional 85-90GHz Transceiver in 65nm CMOS. In Proceedings of the 9th European Microwave Integrated Circuits Conference, pages 25--28, October 2014.Google Scholar
- S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. ACM SIGARCH Computer Architecture News, 23(2):24--36, 1995.Google Scholar
Digital Library
- X. Yu, J. Baylon, P. Wettin, D. Heo, P. Pande, and S. Mirabbasi. Architecture and Design of Multi-Channel Millimeter- Wave Wireless Network-on-Chip. IEEE Design & Test, 31(6): 19--28, 2014.Google Scholar
Cross Ref
- X. Yu, H. Rashtian, and S. Mirabbasi. An 18.7-Gb/s 60-GHz OOK Demodulator in 65-nm CMOS for Wireless Network-on-Chip. IEEE Transactions on Circuits And Systems -I: Regular Papers, 62(3):799--806, 2015.Google Scholar
Cross Ref
- W. Zhu, V. C. Sreedhar, Z. Hu, and G. R. Gao. Synchronization State Buffer: Supporting Efficient Fine-grain Synchronization on Many-core Architectures. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pages 35--45, June 2007.Google Scholar
Digital Library
Index Terms
WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication
Recommendations
WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication
ASPLOS'16In shared-memory multiprocessing, fine-grain synchronization is challenging because it requires frequent communication. As technology scaling delivers larger manycore chips, such pattern is expected to remain costly to support. In this paper, we propose ...
WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsIn shared-memory multiprocessing, fine-grain synchronization is challenging because it requires frequent communication. As technology scaling delivers larger manycore chips, such pattern is expected to remain costly to support. In this paper, we propose ...
WaveSync: Low-Latency Source-Synchronous Bypass Network-on-Chip Architecture
WaveSync is a network-on-chip architecture for a globally asynchronous locally-synchronous (GALS) design. The WaveSync design facilitates low-latency communication leveraging the source-synchronous clock sent along with the data to time components in ...







Comments