skip to main content
research-article
Public Access

WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication

Authors Info & Claims
Published:25 March 2016Publication History
Skip Abstract Section

Abstract

In shared-memory multiprocessing, fine-grain synchronization is challenging because it requires frequent communication. As technology scaling delivers larger manycore chips, such pattern is expected to remain costly to support. In this paper, we propose to address this challenge by using on-chip wireless communication. Each core has a transceiver and an antenna to communicate with all the other cores. This environment supports very low latency global communication. Our architecture, called WiSync, uses a per-core Broadcast Memory (BM). When a core writes to its BM, all the other 100+ BMs get updated in less than 10 processor cycles. We also use a second wireless channel with cheaper transfers to execute barriers efficiently. WiSync supports multiprogramming, virtual memory, and context switching. Our evaluation with simulations of 128-threaded kernels and 64-threaded applications shows that WiSync speeds-up synchronization substantially. Compared to using advanced conventional synchronization, WiSync attains an average speedup of nearly one order of magnitude for the kernels, and 1.12 for PARSEC and SPLASH-2.

References

  1. S. Abadal, E. Alarcon, M. C. Lemme, M. Nemirovsky, and A. Cabellos-Aparicio. Graphene-enabled Wireless Communication for Massive Multicore Architectures. IEEE Communications Magazine, 51(11):137--143, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  2. S. Abadal, M. Iannazzo, M. Nemirovsky, A. Cabellos-Aparicio, and E. Alarcon. On the Area and Energy Scalability of Wireless Network-on-Chip: A Model-based Benchmarked Design Space Exploration. IEEE/ACM Transactions on Net- working, 23(5):1, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  3. S. Abadal, B. Sheinman, O. Katz, O. Markish, D. Elad, Y. Fournier, D. Roca, M. Hanzich, G. Houzeaux, M. Nemirovsky, E. Alarcon, and A. Cabellos-Aparicio. Broadcast-Enabled Massive Multicore Architectures: A Wireless RF Approach. IEEE MICRO, 35(5):52--61, 2015.Google ScholarGoogle Scholar
  4. J. L. Abellan, J. Fernandez, and M. E. Acacio. GLocks: Efficient Support for Highly-contended Locks in Many-core CMPs. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, pages 893--905, May 2011.Google ScholarGoogle Scholar
  5. J. L. Abellan, J. Fernandez, and M. E. Acacio. Efficient Hardware Barrier Synchronization in Many-Core CMPs. IEEE Transactions on Parallel and Distributed Systems, 23(8): 1453--1466, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Batten, A. Joshi, V. Stojanovic, and K. Asanovic. Designing Chip-Level Nanophotonic Interconnection Networks. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2(2):137--153, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  7. C. Beckmann and C. Polychronopoulos. Fast Barrier Synchronization Hardware. Proceedings of Supercomputing, November 1990.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Benedetto and E. Biglieri. Principles of Digital Transmission with Wireless Applications. Springer Science and Business Media, 1999.Google ScholarGoogle Scholar
  9. C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pages 72--81, October 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Carpenter, J. Hu, O. Kocabas, M. Huang, and H. Wu. Enhancing Effective Throughput for Transmission Line-based Bus. In Proceedings of the 39th Annual International Symposium on Computer Architecture, pages 165--176, June 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. F. Chang, J. Cong, A. Kaplan, M. Naik, G. Reinman, E. Socher, and S.-W. Tam. CMP Network-on-Chip Overlaid With Multi-Band RF-Interconnect. In Proceedings of the 14th International Symposium on High Performance Computer Architecture, pages 191--202, February 2008.Google ScholarGoogle Scholar
  12. Cray Research Inc. CRAY T3D System Architecture Overview, 1993.Google ScholarGoogle Scholar
  13. S. Deb, A. Ganguly, P. P. Pande, B. Belzer, and D. Heo. Wireless NoC as Interconnection Backbone for Multicore Chips: Promises and Challenges. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2(2):228--239, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Deb, K. Chang, X. Yu, S. Sah, M. Cosic, P. P. Pande, B. Belzer, and D. Heo. Design of an Energy Efficient CMOS Compatible NoC Architecture with Millimeter-Wave Wireless Interconnects. IEEE Transactions on Computers, 62(12): 2382--2396, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu, P. Coteus, M. E. Giampapa, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-Burow, T. Takken, and P. Vranas. Overview of the Blue Gene/L System Architecture. In IBM Journal of Research and Development, March/May 2005.Google ScholarGoogle Scholar
  16. J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach, Fifth Edition. Morgan Kaufmann, 2012.Google ScholarGoogle Scholar
  17. Intel Corporation. Intel Products. ark.intel.com, 2015.Google ScholarGoogle Scholar
  18. C. H. Jan, M. Agostinelli, H. Deshpande, M. a. El-Tanani, W. Hafez, U. Jalan, L. Janbay, M. Kang, H. Lakdawala, J. Lin, Y. L. Lu, S. Mudanai, J. Park, A. Rahman, J. Rizk, W. K. Shin, K. Soumyanath, H. Tashiro, C. Tsai, P. VanDerVoorn, J. Y. Yeh, and P. Bai. RF CMOS Technology Scaling in High-k/Metal Gate Era for RF SoC (System-on-Chip) Applications. In Proceedings of the IEEE International Electron Devices Meeting, pages 604--607, December 2010.Google ScholarGoogle ScholarCross RefCross Ref
  19. S. Kaya, S. Laha, A. Kodi, D. Ditomaso, D. Matolak, and W. Rayess. On Ultra-short Wireless Interconnects for NoCs and SoCs: Bridging the THz Gap. In Proceedings of the IEEE 56th International Midwest Symposium on Circuits and Systems, pages 804--808, August 2013.Google ScholarGoogle ScholarCross RefCross Ref
  20. B. Khamaisi, S. Jameson, and E. Socher. A 210-227 GHz Transmitter With Integrated On-Chip Antenna in 90 nm CMOS Technology. IEEE Transactions on Terahertz Science and Technology, 3(2):141--150, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  21. N. Kirman, M. Kirman, R. Dokania, J. F. Martinez, A. B. Apsel, M. A. Watkins, and D. H. Albonesi. Leveraging Optical Technology in Future Bus-based Chip Multiprocessors. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 492--503, December 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Krishna, L. Peh, B. Beckmann, and S. K. Reinhardt. Towards the Ideal On-chip Fabric for 1-to-many and Many-to-1 Communication. In Proceedings of the 44th Annual International Symposium on Microarchitecture, pages 71--82, December 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Kurian, J. Miller, J. Psota, J. Eastep, J. Liu, J. Michel, L. Kimerling, and A. Agarwal. ATAC: A 1000-Core Cache- Coherent Processor with On-Chip Optical Network. In Proceedings of the 19th international conference on Parallel Architectures and Compilation Techniques, pages 477--488, September 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Laha, S. Kaya, D. W. Matolak, W. Rayess, D. DiTomaso, and A. Kodi. A New Frontier in Ultralow Power Wireless Links: Network-on-Chip and Chip-to-Chip Interconnects. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(2):186--198, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  25. J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In International Symposium on Computer Architecture (ISCA), June 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C.-K. Liang and Milos Prvulovic. MiSAR: Minimalistic Synchronization Accelerator with Resource Overflow Management. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, pages 414--426, June 2015.Google ScholarGoogle Scholar
  27. B.-H. Lim and A. Agarwal. Reactive Synchronization Algorithms for Multiprocessors. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, November 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. O. Markish, B. Sheinman, O. Katz., D. Corcos, and D. Elad. On-chip mmWave Antennas and Transceivers. In Proceedings of the 9th IEEE/ACM International Symposium on Networks on Chip, September 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Matolak, A. Kodi, S. Kaya, D. DiTomaso, S. Laha, and W. Rayess. Wireless Networks-on-Chips: Architecture, Wireless Channel, and Devices. IEEE Wireless Communications, 19(5), 2012.Google ScholarGoogle Scholar
  30. F. H. McMahon. The Livermore Fortran Kernels: A Computer Test Of The Numerical Performance Range. Technical report, Lawrence Livermore National Laboratory, 1986.Google ScholarGoogle Scholar
  31. J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. M. Metcalfe and D. R. Boggs. Ethernet: Distributed Packet Switching for Local Computer Networks. Communications of the ACM, 19(7):395--404, 1976.Google ScholarGoogle Scholar
  33. J. Oh, A. Zajic, and M. Prvulovic. TLSync: Support for Multiple Fast Barriers Using On-chip Transmission Lines. In Proceedings of the 38th Annual International Symposium on Computer Architecture, pages 105--115, June 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Oh, A. Zajic, and M. Prvulovic. Traffic Steering Between a Low-latency Unswitched TL Ring and a High-throughput Switched On-chip Interconnect. In Proceedings of the 22nd International conference on Parallel Architectures and Compilation Techniques, pages 309--318, September 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J.-D. Park, S. Kang, S. Thyagarajan, E. Alon, and A. Niknejad. A 260 GHz Fully Integrated CMOS Transceiver for Wireless Chip-to-chip Communication. In Proceedings of the IEEE Symposium on VLSI Circuits, pages 48--49, June 2012.Google ScholarGoogle ScholarCross RefCross Ref
  36. T. S. Rappaport, J. N. Murdock, and F. Gutierrez. State of the Art in 60-GHz Integrated Circuits and Systems for Wireless Communications. Proceedings of the IEEE, 99(8): 1390--1436, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  37. J. Sampson, R. Gonzalez, J. F. Collard, N. P. Jouppi, M. Schlansker, and B. Calder. Exploiting Fine-grained Data Parallelism with Chip Multiprocessors and Fast Barriers. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 235--246, December 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Scott. Synchronization and Communication in the T3E Multiprocessor. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. E. Seok, D. Shim, C. Mao, R. Han, S. Sankaran, C. Cao, W. Knap, and K. K. O. Progress and Challenges towards Terahertz CMOS Integrated Circuits. IEEE Journal of Solid- State Circuits, 45(8):1554--1564, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  40. S. Shang and K. Hwang. Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters. IEEE Transactions on Parallel and Distributed Systems, 6(6):591--605, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. E. Socher and M.-C. F. Chang. Can RF Help CMOS Processors? IEEE Communications Magazine, 45(8):104--111, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. Sun, M. T. Wade, Y. Lee, J. S. Orcutt, L. Alloatti, M. S. Georgas, A. S. Waterman, J. M. Shainline, R. R. Avizienis, S. Lin, B. R. Moss, R. Kumar, F. Pavanello, A. H. Atabaki, H. M. Cook, A. J. Ou, J. C. Leu, Y.-H. Chen, K. Asanovic, R. J. Ram, M. A. Popovic, and V. M. Stojanovic. Single-chip Microprocessor that Communicates Directly Using Light. Nature, 528(7583):534--538, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  43. G. Sun, S.-H. Weng, C.-K. Cheng, B. Lin, and L. Zeng. An On-chip Global Broadcast Network Design with Equalized Transmission Lines in the 1024-core Era. In Proceedings of the International Workshop on System Level Interconnect Prediction, pages 11--18, June 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Ubal, P. Mistry, D. Schaa, H. Ave, and D. Kaeli. Multi2Sim: A Simulation Framework for CPU-GPU Computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pages 335--344, September 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Uzunkol and G. M. Rebeiz. A Low-Loss 50-70 GHz SPDT Switch in 90 nm CMOS. IEEE Journal of Solid-State Circuits, 45(10):2003--2007, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  46. D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. Jouppi, M. Fiorentino, A. Davis, N. Binkert, R. Beausoleil, and J. Ahn. Corona: System Implications of Emerging Nanophotonic Technology. In Proceedings of the 35th International Symposium on Computer Architecture, June 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. D. Vantrease, M. H. Lipasti, and N. Binkert. Atomic Coherence: Leveraging Nanophotonics to Build Race-free Cache Coherence Protocols. In Proceedings of the 17th International Symposium on High Performance Computer Architecture, pages 132--143, February 2011.Google ScholarGoogle ScholarCross RefCross Ref
  48. Z. Wang, P. Y. Chiang, P. Nazari, C. C. Wang, Z. Chen, and P. Heydari. A CMOS 210-GHz Fundamental Transceiver with OOK Modulation. IEEE Journal of Solid-State Circuits, 49 (3):564--580, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  49. N. Weissman and E. Socher. 9mW 6Gbps Bi-directional 85-90GHz Transceiver in 65nm CMOS. In Proceedings of the 9th European Microwave Integrated Circuits Conference, pages 25--28, October 2014.Google ScholarGoogle Scholar
  50. S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. ACM SIGARCH Computer Architecture News, 23(2):24--36, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. X. Yu, J. Baylon, P. Wettin, D. Heo, P. Pande, and S. Mirabbasi. Architecture and Design of Multi-Channel Millimeter- Wave Wireless Network-on-Chip. IEEE Design & Test, 31(6): 19--28, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  52. X. Yu, H. Rashtian, and S. Mirabbasi. An 18.7-Gb/s 60-GHz OOK Demodulator in 65-nm CMOS for Wireless Network-on-Chip. IEEE Transactions on Circuits And Systems -I: Regular Papers, 62(3):799--806, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  53. W. Zhu, V. C. Sreedhar, Z. Hu, and G. R. Gao. Synchronization State Buffer: Supporting Efficient Fine-grain Synchronization on Many-core Architectures. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pages 35--45, June 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 51, Issue 4
      ASPLOS '16
      April 2016
      774 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2954679
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
        March 2016
        824 pages
        ISBN:9781450340915
        DOI:10.1145/2872362
        • General Chair:
        • Tom Conte,
        • Program Chair:
        • Yuanyuan Zhou

      Copyright © 2016 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 March 2016

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!