skip to main content
research-article

Liquid Silicon-Monona: A Reconfigurable Memory-Oriented Computing Fabric with Scalable Multi-Context Support

Authors Info & Claims
Published:19 March 2018Publication History
Skip Abstract Section

Abstract

With the recent trend of promoting Field-Programmable Gate Arrays (FPGAs) to first-class citizens in accelerating compute-intensive applications in networking, cloud services and artificial intelligence, FPGAs face two major challenges in sustaining competitive advantages in performance and energy efficiency for diverse cloud workloads: 1) limited configuration capability for supporting light-weight computations/on-chip data storage to accelerate emerging search-/data-intensive applications. 2) lack of architectural support to hide reconfiguration overhead for assisting virtualization in a cloud computing environment. In this paper, we propose a reconfigurable memory-oriented computing fabric, namely Liquid Silicon-Monona (L-Si), enabled by emerging nonvolatile memory technology i.e. RRAM, to address these two challenges. Specifically, L-Si addresses the first challenge by virtue of a new architecture comprising a 2D array of physically identical but functionally-configurable building blocks. It, for the first time, extends the configuration capabilities of existing FPGAs from computation to the whole spectrum ranging from computation to data storage. It allows users to better customize hardware by flexibly partitioning hardware resources between computation and memory, greatly benefiting emerging search- and data-intensive applications. To address the second challenge, L-Si provides scalable multi-context architectural support to minimize reconfiguration overhead for assisting virtualization. In addition, we provide compiler support to facilitate the programming of applications written in high-level programming languages (e.g. OpenCL) and frameworks (e.g. TensorFlow, MapReduce) while fully exploiting the unique architectural capability of L-Si. Our evaluation results show L-Si achieves 99.6% area reduction, 1.43× throughput improvement and 94.0% power reduction on search-intensive benchmarks, as compared with the FPGA baseline. For neural network benchmarks, on average, L-Si achieves 52.3× speedup, 113.9× energy reduction and 81% area reduction over the FPGA baseline. In addition, the multi-context architecture of L-Si reduces the context switching time to - 10ns, compared with an off-the-shelf FPGA (∼100ms), greatly facilitating virtualization.

References

  1. Michael Adler, Kermin E Fleming, Angshuman Parashar, Michael Pellauer, and Joel Emer. 2011. Leap scratchpads: automatic memory and cache management for reconfigurable logic Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 25--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jasmin Ajanovic. 2008. PCI Express*(PCIe*) 3.0 Accelerator Features. Intel Corporation (2008), 10.Google ScholarGoogle Scholar
  3. Amazon. 2016. Amazon EC2 F1 Instances. https://aws.amazon.com/ec2/instance-types/f1/. (2016).Google ScholarGoogle Scholar
  4. Mikhail Asiatici, Nithin George, Kizheppatt Vipin, Suhaib A Fahmy, and Paolo Ienne. 2017. Virtualized execution runtime for FPGA accelerators in the cloud. Ieee Access Vol. 5 (2017), 1900--1910.Google ScholarGoogle Scholar
  5. ASU. {n. d.}. Predictive Technology Model (PTM). http://ptm.asu.edu/. (. {n. d.}).Google ScholarGoogle Scholar
  6. V Baena-Lecuyer, MA Aguirre, A Torralba, Leopoldo Garc'ıa Franquelo, and Julio Faura. 1999. Decoder-driven switching matrices in multicontext fpgas: area reduction and their effect on routability. In Circuits and Systems, 1999. ISCAS'99. Proceedings of the 1999 IEEE International Symposium on, Vol. Vol. 1. IEEE, 463--466.Google ScholarGoogle ScholarCross RefCross Ref
  7. Mahdi Nazm Bojnordi and Engin Ipek. 2016. Memristive boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on. IEEE, 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jeremy Buhler. 2001. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, Vol. 17, 5 (2001), 419--428.Google ScholarGoogle ScholarCross RefCross Ref
  9. Stuart Byma, J Gregory Steffan, Hadi Bannazadeh, Alberto Leon Garcia, and Paul Chow. 2014. Fpgas in the cloud: Booting virtualized hardware accelerators with openstack Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on. IEEE, 109--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Adrian M Caulfield, Eric S Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, et al. 2016. A cloud-scale acceleration architecture. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Douglas Chang and Malgorzata Marek-Sadowska. 1999. Partitioning sequential circuits on dynamically reconfigurable FPGAs. IEEE Trans. Comput. Vol. 48, 6 (1999), 565--578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Meng-Fan Chang, Chien-Chen Lin, Albert Lee, Chia-Chen Kuo, Geng-Hau Yang, Hsiang-Jen Tsai, Tien-Fu Chen, Shyh-Shyuan Sheu, Pei-Ling Tseng, Heng-Yuan Lee, et al. 2015. 17.5 A 3T1R nonvolatile TCAM using MLC ReRAM with Sub-1ns search time Solid-State Circuits Conference-(ISSCC), 2015 IEEE International. IEEE, 1--3.Google ScholarGoogle Scholar
  13. An Chen. 2013. A comprehensive crossbar array model with solutions for line resistance and nonlinear device characteristics. IEEE Transactions on Electron Devices Vol. 60, 4 (2013), 1318--1326.Google ScholarGoogle ScholarCross RefCross Ref
  14. Fei Chen, Yi Shan, Yu Zhang, Yu Wang, Hubertus Franke, Xiaotao Chang, and Kun Wang. 2014 c. Enabling FPGAs in the cloud. In Proceedings of the 11th ACM Conference on Computing Frontiers. ACM, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hong-Yu Chen, Stefano Brivio, Che-Chia Chang, Jacopo Frascaroli, Tuo-Hung Hou, Boris Hudec, Ming Liu, Hangbing Lv, Gabriel Molas, Joon Sohn, et al. 2017. Resistive random access memory (RRAM) technology: From material, device, selector, 3D integration to bottom-up fabrication. Journal of Electroceramics (2017), 1--18.Google ScholarGoogle Scholar
  16. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014 a. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning ACM Sigplan Notices, Vol. Vol. 49. ACM, 269--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014 b. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yi-Chung Chen, Wenhua Wang, Hai Li, and Wei Zhang. 2012. Non-volatile 3D stacking RRAM-based FPGA. In Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on. IEEE, 367--372.Google ScholarGoogle ScholarCross RefCross Ref
  19. Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 367--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 27--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Eric S Chung, James C Hoe, and Ken Mai. 2011. CoRAM: an in-fabric memory architecture for FPGA-based computing Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jason Cong and Bingjun Xiao. 2011. mrFPGA: A novel FPGA architecture with memristor-based reconfiguration Proceedings of the 2011 IEEE/ACM International Symposium on Nanoscale Architectures. IEEE Computer Society, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jason Cong and Bingjun Xiao. 2014. FPGA-RPI: A novel FPGA architecture with RRAM-based programmable interconnects. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 22, 4 (2014), 864--877. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations Advances in Neural Information Processing Systems. 3123--3131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained toGoogle ScholarGoogle Scholar
  26. 1 or-1. arXiv preprint arXiv:1602.02830 (2016).Google ScholarGoogle Scholar
  27. André DeHon. 1996. DPGA utilization and application. In Proceedings of the 1996 ACM fourth international symposium on Field-programmable gate arrays. ACM, 115--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. André M. DeHon. 2013. Location, Location, Location: The Role of Spatial Locality in Asymptotic Energy Minimization FPGA. ACM, New York, NY, USA, 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Paul Dlugosch, Dave Brown, Paul Glendenning, Michael Leventhal, and Harold Noyes. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Transactions on Parallel and Distributed Systems, Vol. 25, 12 (2014), 3088--3098.Google ScholarGoogle ScholarCross RefCross Ref
  30. Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor ACM SIGARCH Computer Architecture News, Vol. Vol. 43. ACM, 92--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. James P Durbano and Fernando E Ortiz. 2004. FPGA-based acceleration of the 3D finite-difference time-domain method Field-Programmable Custom Computing Machines, 2004. FCCM 2004. 12th Annual IEEE Symposium on. IEEE, 156--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Steven K Esser, Paul A Merolla, John V Arthur, Andrew S Cassidy, Rathinakumar Appuswamy, Alexander Andreopoulos, David J Berg, Jeffrey L McKinstry, Timothy Melano, Davis R Barch, et al. 2016. Convolutional networks for fast, energy-efficient neuromorphic computing. Proceedings of the National Academy of Sciences (2016), 201604850.Google ScholarGoogle ScholarCross RefCross Ref
  33. Suhaib A Fahmy, Kizheppatt Vipin, and Shanker Shreejith. 2015. Virtualized FPGA accelerators for efficient cloud computing Cloud Computing Technology and Science (CloudCom), 2015 IEEE 7th International Conference on. IEEE, 430--435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. Drama: An architecture for accelerated processing near memory. IEEE Computer Architecture Letters Vol. 14, 1 (2015), 26--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Pierre-Emmanuel Gaillardon, Davide Sacchetto, Shashikanth Bobba, Yusuf Leblebici, and Giovanni De Micheli. 2012. GMS: Generic memristive structure for non-volatile FPGAs VLSI and System-on-Chip, 2012 (VLSI-SoC), IEEE/IFIP 20th International Conference on. IEEE, 94--98.Google ScholarGoogle Scholar
  36. Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, Shaoshuai Shi, Xi Chen, Guangyu Sun, Wei Zhang, and Jason Cong. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In Field-Programmable Custom Computing Machines (FCCM), 2017 IEEE 25th Annual International Symposium on. IEEE, 152--159.Google ScholarGoogle ScholarCross RefCross Ref
  37. Qing Guo, Xiaochen Guo, Yuxin Bai, and Engin Ipek. 2011. A resistive TCAM accelerator for data-intensive computing Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 339--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Qing Guo, Xiaochen Guo, Ravi Patel, Engin Ipek, and Eby G Friedman. 2013. Ac-dimm: associative computing with stt-mram. In ACM SIGARCH Computer Architecture News, Vol. Vol. 41. ACM, 189--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Robert J Halstead, Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Sameh Asaad, and Balakrishna Iyer. 2013. Accelerating join operation for relational databases with FPGAs Field-Programmable Custom Computing Machines (FCCM), 2013 IEEE 21st Annual International Symposium on. IEEE, 17--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kejie Huang, Yajun Ha, Rong Zhao, Akash Kumar, and Yong Lian. 2014. A low active leakage and high reliability phase change memory (PCM) based non-volatile FPGA storage element. IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 61, 9 (2014), 2605--2613.Google ScholarGoogle ScholarCross RefCross Ref
  42. Intel. {n. d.} b. Intel QuickPath Interconnect. http://www.intel.com/content/www/us/en/io/quickpath-technology/quickpath-technology-general.html. (. {n. d.}).Google ScholarGoogle Scholar
  43. Intel. 2017. Intel Collaborates with Alibaba Cloud to Help Customers Accelerate Business Applications. (2017).Google ScholarGoogle Scholar
  44. W. Jiang. 2013. Scalable Ternary Content Addressable Memory implementation using FPGAs ANCS. 71--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sung Hyun Jo, Tanmay Kumar, Sundar Narayanan, Wei D Lu, and Hagop Nazarian. 2014. 3D-stackable crossbar resistive memory based on field assisted superlinear threshold (FAST) selector. In Electron Devices Meeting (IEDM), 2014 IEEE International. IEEE, 6--7.Google ScholarGoogle Scholar
  46. Akifumi Kawahara, Ryotaro Azuma, Yuuichirou Ikeda, Ken Kawai, Yoshikazu Katoh, Yukio Hayakawa, Kiyotaka Tsuji, Shinichi Yoneda, Atsushi Himeno, Kazuhiko Shimakawa, et al. 2013. An 8 Mb multi-layered cross-point ReRAM macro with 443 MB/s write throughput. IEEE Journal of Solid-State Circuits Vol. 48, 1 (2013), 178--185.Google ScholarGoogle ScholarCross RefCross Ref
  47. Daisuke Kawakami, Yuichro Shibata, and Hideharu Amano. 2001. A prototype chip of multicontext FPGA with DRAM for Virtual Hardware Proceedings of the 2001 Asia and South Pacific Design Automation Conference. ACM, 17--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 380--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. arXiv preprint arXiv:1601.06071 (2016).Google ScholarGoogle Scholar
  50. Oliver Knodel and Rainer G Spallek. 2015. RC3E: provision and management of reconfigurable hardware accelerators in a cloud environment. arXiv preprint arXiv:1508.06843 (2015).Google ScholarGoogle Scholar
  51. Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2130--2137.Google ScholarGoogle Scholar
  52. Yahya Lakys, Weisheng Zhao, Jacques-Olivier Klein, and Claude Chappert. 2012. MRAM crossbar based configurable logic block. In Circuits and Systems (ISCAS), 2012 IEEE International Symposium on. IEEE, 2945--2948.Google ScholarGoogle ScholarCross RefCross Ref
  53. Myoung-Jae Lee et al. 2011. A fast, high-endurance and scalable non-volatile memory device made from asymmetric Ta2O5-x/TaO2-x bilayer structures. Nature materials, Vol. 10, 8 (2011), 625--630.Google ScholarGoogle Scholar
  54. Jing Li, Robert K Montoye, Masatoshi Ishii, and Leland Chang. 2014. 1 Mb 0.41 μm^2$ 2T-2R cell nonvolatile TCAM with two-bit encoding and clocked self-referenced sensing. IEEE Journal of Solid-State Circuits Vol. 49, 4 (2014), 896--907.Google ScholarGoogle ScholarCross RefCross Ref
  55. Zhiyuan Li, Katherine Compton, and Scott Hauck. 2000. Configuration caching management techniques for reconfigurable computing Field-Programmable Custom Computing Machines, 2000 IEEE Symposium on. IEEE, 22--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Young Yang Liauw, Zhiping Zhang, Wanki Kim, Abbas El Gamal, and S Simon Wong. 2012. Nonvolatile 3D-FPGA with monolithically stacked RRAM-based configuration memory Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International. IEEE, 406--408.Google ScholarGoogle Scholar
  57. Chien-Chen Lin, Jui-Yu Hung, Wen-Zhang Lin, Chieh-Pu Lo, Yen-Ning Chiang, Hsiang-Jen Tsai, Geng-Hau Yang, Ya-Chin King, Chrong Jung Lin, Tien-Fu Chen, et al. 2016. 7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14× improvement in efficiency-density product using 2.5 T1R cell Solid-State Circuits Conference (ISSCC), 2016 IEEE International. IEEE, 136--137.Google ScholarGoogle Scholar
  58. Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. Pudiannao: A polyvalent machine learning accelerator ACM SIGARCH Computer Architecture News, Vol. Vol. 43. ACM, 369--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Jason Luu et al. 2014. VTR 7.0: Next generation architecture and CAD system for FPGAs. TRETS, Vol. 7, 2 (2014), 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Jason Luu, Ian Kuon, Peter Jamieson, Ted Campbell, Andy Ye, Wei Mark Fang, Kenneth Kent, and Jonathan Rose. 2011. VPR 5.0: FPGA cad and architecture exploration tools with single-driver routing, heterogeneity and process scaling. ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol. 4, 4 (2011), 32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2007. Multi-probe LSH: efficient indexing for high-dimensional similarity search Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment, 950--961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Rafael Maestre, Milagros Fernandez, Fadi J Kurdahi, Nader Bagherzadeh, and Hartej Singh. 2000. Configuration management in multi-context reconfigurable systems for simultaneous performance and power optimizations. Proceedings of the 13th international symposium on System synthesis. IEEE Computer Society, 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Rafael Maestre, Fadi J Kurdahi, Milagros Fernández, Roman Hermida, Nader Bagherzadeh, and Hartej Singh. 2001. A framework for reconfigurable computing: task scheduling and context management. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 9, 6 (2001), 858--873. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Alan Mishchenko, Sungmin Cho, Satrajit Chatterjee, and Robert Brayton. 2007. Combinational and sequential mapping with priority cuts Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design. IEEE Press, 354--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Raphael Njuguna. 2008. A survey of FPGA benchmarks. Project Report, November Vol. 24 (2008).Google ScholarGoogle Scholar
  66. Eriko Nurvitadhi, David Sheffield, Jaewoong Sim, Asit Mishra, Ganesh Venkatesh, and Debbie Marr. 2016. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC. Proc. ICFPT (2016).Google ScholarGoogle ScholarCross RefCross Ref
  67. Corey B Olson, Maria Kim, Cooper Clauson, Boris Kogon, Carl Ebeling, Scott Hauck, and Walter L Ruzzo. 2012. Hardware acceleration of short read mapping. In Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on. IEEE, 161--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Jian Ouyang, Shiding Lin, Wei Qi, Yong Wang, Bo Yu, and Song Jiang. 2014. SDA: Software-defined accelerator for large-scale DNN systems Hot Chips 26 Symposium (HCS), 2014 IEEE. IEEE, 1--23.Google ScholarGoogle Scholar
  69. Kostas Pagiamtzis and Ali Sheikholeslami. 2006. Content-addressable memory (CAM) circuits and architectures: A tutorial and survey. IEEE Journal of Solid-State Circuits Vol. 41, 3 (2006), 712--727.Google ScholarGoogle ScholarCross RefCross Ref
  70. Andrew Putnam, Adrian M Caulfield, Eric S Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, et al. 2014. A reconfigurable fabric for accelerating large-scale datacenter services Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on. IEEE, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Yang Qu, Juha-Pekka Soininen, and Jari Nurmi. 2007. Static scheduling techniques for dependent tasks on dynamically reconfigurable devices. Journal of Systems Architecture Vol. 53, 11 (2007), 861--876. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. 2007. Evaluating mapreduce for multi-core and multiprocessor systems High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on. Ieee, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Deepak Ravichandran, Patrick Pantel, and Eduard Hovy. 2005. Randomized algorithms and nlp: using locality sensitive hash function for high speed noun clustering. In Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, 622--629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 267--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Ao Ren, Ji Li, Zhe Li, Caiwen Ding, Xuehai Qian, Qinru Qiu, Bo Yuan, and Yanzhi Wang. 2016. Sc-dcnn: highly-scalable deep convolutional neural network using stochastic computing. arXiv preprint arXiv:1611.05939 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Javier Resano, Daniel Mozos, and Francky Catthoor. 2005. A hybrid prefetch scheduling heuristic to minimize at run-time the reconfiguration overhead of dynamically reconfigurable hardware Proceedings of the conference on Design, Automation and Test in Europe-Volume 1. IEEE Computer Society, 106--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Javier Resano, Daniel Mozos, Diederik Verkest, Francky Catthoor, and Serge Vernalde. 2004. Specific scheduling support to minimize the reconfiguration overhead of dynamically reconfigurable hardware. In Design Automation Conference, 2004. Proceedings. 41st. IEEE, 119--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Herman Schmit, David Whelihan, Andrew Tsai, Matthew Moe, Benjamin Levine, and R Reed Taylor. 2002. PipeRench: A virtualized programmable datapath in 0.18 micron technology Custom Integrated Circuits Conference, 2002. Proceedings of the IEEE 2002. IEEE, 63--66.Google ScholarGoogle Scholar
  79. Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 14--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Rajendra Shinde, Ashish Goel, Pankaj Gupta, and Debojyoti Dutta. 2010. Similarity search and locality sensitive hashing using ternary content addressable memories Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 375--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. PipeLayer: A pipelined ReRAM-based accelerator for deep learning High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE.Google ScholarGoogle Scholar
  82. Suraj Sudhir, Suman Nath, and Seth Copen Goldstein. 2001. Configuration caching and swapping. In FPL, Vol. Vol. 1. Springer, 192--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Daisuke Suzuki and Takahiro Hanyu. 2015. Nonvolatile field-programmable gate array using 2-transistor--1-MTJ-cell-based multi-context array for power and area efficient dynamically reconfigurable logic. Japanese Journal of Applied Physics Vol. 54, 4S (2015), 04DE01.Google ScholarGoogle ScholarCross RefCross Ref
  84. Kosuke Tatsumura, Masato Oda, and Shinichi Yasuda. 2014. A pure-CMOS nonvolatile multi-context configuration memory for dynamically reconfigurable FPGAs Field-Programmable Technology (FPT), 2014 International Conference on. IEEE, 215--222.Google ScholarGoogle Scholar
  85. David E Taylor and Jonathan S Turner. 2007. Classbench: A packet classification benchmark. IEEE/ACM Transactions on Networking (TON) Vol. 15, 3 (2007), 499--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffman, Paul Johnson, Jae-Wook Lee, Walter Lee, et al. 2002. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE micro, Vol. 22, 2 (2002), 25--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Antonio C Torrezan et al. 2011. Sub-nanosecond switching of a tantalum oxide memristor. Nanotechnology, Vol. 22, 48 (2011), 485203.Google ScholarGoogle ScholarCross RefCross Ref
  88. Steven Trimberger, Dean Carberry, Anders Johnson, and Jennifer Wong. 1997. A time-multiplexed FPGA. In Field-Programmable Custom Computing Machines, 1997. Proceedings., the 5th Annual IEEE Symposium on. IEEE, 22--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2016. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. arXiv preprint arXiv:1612.07119 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Zeke Wang, Shuhao Zhang, Bingsheng He, and Wei Zhang. 2016. Melia: A mapreduce framework on opencl-based fpgas. IEEE Transactions on Parallel and Distributed Systems, Vol. 27, 12 (2016), 3547--3560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Z. Wei, Y. Kanzawa, K. Arita, Y. Katoh, K. Kawai, S. Muraoka, S. Mitani, S. Fujii, K. Katayama, M. Iijima, T. Mikawa, T. Ninomiya, R. Miyanaga, Y. Kawashima, K. Tsuji, A. Himeno, T. Okada, R. Azuma, K. Shimakawa, H. Sugaya, T. Takagi, R. Yasuhara, K. Horiba, H. Kumigashira, and M. Oshima. 2008. Highly reliable TaOx ReRAM and direct evidence of redox reaction mechanism 2008 IEEE International Electron Devices Meeting. 1--4. 1145/3020078.3021698Google ScholarGoogle Scholar
  92. Jialiang Zhang and Jing Li. 2017 b. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM, New York, NY, USA, 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Le Zheng, Sangho Shin, Scott Lloyd, Maya Gokhale, Kyungmin Kim, and Sung-Mo Kang. 2016. RRAM-based TCAMs for pattern search. In Circuits and Systems (ISCAS), 2016 IEEE International Symposium on. IEEE, 1382--1385.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Liquid Silicon-Monona: A Reconfigurable Memory-Oriented Computing Fabric with Scalable Multi-Context Support

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!