skip to main content
research-article
Public Access

Spatial: a language and compiler for application accelerators

Published:11 June 2018Publication History
Skip Abstract Section

Abstract

Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for productivity and are difficult to target from higher level languages. HLS tools are more productive, but offer an ad-hoc mix of software and hardware abstractions which make performance optimizations difficult.

In this work, we describe a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators. We describe Spatial's hardware-centric abstractions for both programmer productivity and design performance, and summarize the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning. We demonstrate the language's ability to target FPGAs and CGRAs from common source code. We show that applications written in Spatial are, on average, 42% shorter and achieve a mean speedup of 2.9x over SDAccel HLS when targeting a Xilinx UltraScale+ VU9P FPGA on an Amazon EC2 F1 instance.

Skip Supplemental Material Section

Supplemental Material

p296-koeplinger.webm

References

  1. 2015. MyHDL. http://www.myhdl.org/.Google ScholarGoogle Scholar
  2. 2015. Vivado design suite 2015.1 user guide.Google ScholarGoogle Scholar
  3. 2016. Vivado High-Level Synthesis. http://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html.Google ScholarGoogle Scholar
  4. 2017. EC2 F1 Instances with FPGAs Now Generally Available. aws.amazon.com/blogs/aws/ec2-f1-instances-with-fpgas-now-generally-available/.Google ScholarGoogle Scholar
  5. 2017. Intel FPGA SDK for OpenCL. https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html.Google ScholarGoogle Scholar
  6. 2017. Neon 2.0: Optimized for Intel Architectures. https://www.intelnervana.com/neon-2-0-optimized-for-intel-architectures/.Google ScholarGoogle Scholar
  7. 2017. Wave Computing Launches Machine Learning Appliance. https://www.top500.org/news/wave-computing-launches-machine-learning-appliance/.Google ScholarGoogle Scholar
  8. Arvind. 2003. Bluespec: A Language for Hardware Design, Simulation, Synthesis and Verification. Invited Talk. In Proceedings of the First ACM and IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE '03). IEEE Computer Society, Washington, DC, USA, 249-. http://dl.acm.org/citation.cfm?id=823453.823860 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Bachrach, Huy Vo, B. Richards, Yunsup Lee, A. Waterman, R. Avizienis, J. Wawrzynek, and K. Asanovic. 2012. Chisel: Constructing hardware in a Scala embedded language. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE. 1212-1221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. David Bacon, Rodric Rabbah, and Sunil Shukla. 2013. FPGA Programming for the Masses. Queue 11, 2, Article 40 (Feb. 2013), 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bruno Bodin, Luigi Nardi, M. Zeeshan Zia, Harry Wagstaff, Govind Sreekar Shenoy, Murali Emani, John Mawer, Christos Kotselidis, Andy Nisbet, Mikel Lujan, Björn Franke, Paul H.J. Kelly, and Michael O'Boyle. 2016. Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding. In PACT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level Synthesis for FPGA-based Processor/ Accelerator Systems. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '11). ACM, New York, NY, USA, 33-36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Cascaval, S. Chatterjee, H. Franke, K. J. Gildea, and P. Pattnaik. 2010. A taxonomy of accelerator architectures and their programming models. IBM Journal of Research and Development 54, 5 (Sept 2010), 5:1-5:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nitin Chugh, Vinay Vasista, Suresh Purini, and Uday Bondhugula. 2016. A DSL compiler for accelerating image processing pipelines on FPGAs. In Parallel Architecture and Compilation Techniques (PACT), 2016 International Conference on. IEEE, 327-338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bjorn De Sutter, Praveen Raghavan, and Andy Lambrechts. 2013. Coarse-Grained Reconfigurable Array Architectures. Springer New York, New York, NY, 553-592.Google ScholarGoogle Scholar
  16. Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken, William J. Dally, and Pat Hanrahan. 2006. Sequoia: Programming the Memory Hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC '06). ACM, New York, NY, USA, Article 83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. Govindaraju, C. H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim. 2012. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing. IEEE Micro 32, 5 (Sept 2012), 38-51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Prabhat K. Gupta. 2015. Xeon+FPGA Platform for the Data Center. http://www.ece.cmu.edu/~calcm/carl/lib/exe/fetch.php?media=carl15-gupta.pdf.Google ScholarGoogle Scholar
  19. James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan-Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz, and Pat Hanrahan. 2014. Darkroom: compiling high-level image processing code into hardware pipelines. ACM Trans. Graph. 33, 4 (2014), 144-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. James Hegarty, Ross Daly, Zachary DeVito, Jonathan Ragan-Kelley, Mark Horowitz, and Pat Hanrahan. 2016. Rigel: Flexible multi-rate image processing hardware. ACM Transactions on Graphics (TOG) 35, 4 (2016), 85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Intel. 2015. Advanced NAND Flash Memory Single-Chip Storage Solution. www.altera.com/b/nand-flash-memory-controller.html?_ga=2.108749825.2041564619.1502344247-21903935.1501673108.Google ScholarGoogle Scholar
  22. David Koeplinger, Raghu Prabhakar, Yaqi Zhang, Christina Delimitrou, Christos Kozyrakis, and Kunle Olukotun. 2016. Automatic Generation of Efficient Accelerators for Reconfigurable Hardware. In International Symposium in Computer Architecture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yanqiang Liu, Yao Li, Weilun Xiong, Meng Lai, Cheng Chen, Zhengwei Qi, and Haibing Guan. 2017. Scala Based FPGA Design Flow. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 286-286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Maxeler Technologies. 2011. MaxCompiler white paper.Google ScholarGoogle Scholar
  25. Richard Membarth, Oliver Reiche, Frank Hannig, Jürgen Teich, Mario Körner, and Wieland Eckert. 2016. Hipa cc: A domain-specific language and compiler for image processing. IEEE Transactions on Parallel and Distributed Systems 27, 1 (2016), 210-224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, et al. 2016. A survey and evaluation of fpga high-level synthesis tools. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35, 10 (2016), 1591-1604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Luigi Nardi, Bruno Bodin, Sajad Saeedi, Emanuele Vespa, Andrew J. Davison, and Paul H. J. Kelly. 2017. Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications Using HyperMapper. In iWAPTIPDPS. http://arxiv.org/abs/1702.00505Google ScholarGoogle Scholar
  28. Luigi Nardi, Bruno Bodin, M Zeeshan Zia, John Mawer, Andy Nisbet, Paul HJ Kelly, Andrew J Davison, Mikel Luján, Michael FP O'Boyle, Graham Riley, et al. 2015. Introducing SLAMBench, a Performance and Accuracy Benchmarking Methodology for SLAM. In ICRA.Google ScholarGoogle Scholar
  29. Jian Ouyang, Shiding Lin, Wei Qi, Yong Wang, Bo Yu, and Song Jiang. 2014. SDA: Software-Defined Accelerator for LargeScale DNN Systems (Hot Chips 26).Google ScholarGoogle Scholar
  30. Angshuman Parashar, Michael Pellauer, Michael Adler, Bushra Ahsan, Neal Crago, Daniel Lustig, Vladimir Pavlov, Antonia Zhai, Mohit Gambhir, Aamer Jaleel, Randy Allmon, Rachid Rayess, Stephen Maresh, and Joel Emer. 2013. Triggered Instructions: A Control Paradigm for Spatially-programmed Architectures. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 142-153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 27-40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matthew Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A Reconfigurable Architecture For Parallel Paterns. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017. 389-402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2016. Programming Heterogeneous Systems from an Image Processing DSL. CoRR abs/1610.09405 (2016). arXiv:1610.09405 http://arxiv.org/abs/1610.09405Google ScholarGoogle Scholar
  34. Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 13-24. http://dl.acm.org/citation.cfm?id=2665671.2665678 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13). ACM, New York, NY, USA, 519-530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sajad Saeedi, Luigi Nardi, Edward Johns, Bruno Bodin, Paul Kelly, and Andrew Davison. 2017. Application-oriented Design Space Exploration for SLAM Algorithms. In ICRA.Google ScholarGoogle Scholar
  37. Ofer Shacham. 2011. Chip multiprocessor generator: automatic generation of custom and heterogeneous compute platforms. Stanford University.Google ScholarGoogle Scholar
  38. Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. 2014. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on. IEEE, 97-108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages. In TECS'14: ACM Transactions on Embedded Computing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 13-26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yuxin Wang, Peng Li, and Jason Cong. 2014. Theory and Algorithm for Generalized Memory Partitioning in High-level Synthesis. In Proceedings of the 2014 ACM/SIGDA International Symposium on Fieldprogrammable Gate Arrays (FPGA '14). ACM, New York, NY, USA, 199-208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xilinx. 2014. The Xilinx SDAccel Development Environment. https://www.xilinx.com/publications/prod_mktg/sdx/sdaccel-backgrounder.pdf.Google ScholarGoogle Scholar
  43. Xilinx. 2017. HLS Pragmas. https://www.xilinx.com/html_docs/xilinx2017_2/sdaccel_doc/topics/pragmas/concept-Intro_to_HLS_pragmas.html.Google ScholarGoogle Scholar
  44. Xilinx. 2017. SDAccel DATAFLOW pragma. https://www.xilinx.com/html_docs/xilinx2017_2/sdaccel_doc/topics/pragmas/ref-pragma_HLS_dataflow.html.Google ScholarGoogle Scholar
  45. Xilinx. 2017. SDAccel Example Repository. https://github.com/Xilinx/SDAccel_Examples.Google ScholarGoogle Scholar

Index Terms

  1. Spatial: a language and compiler for application accelerators

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!