skip to main content
research-article

Multicore-based vector coprocessor sharing for performance and energy gains

Published:30 September 2013Publication History
Skip Abstract Section

Abstract

For most of the applications that make use of a dedicated vector coprocessor, its resources are not highly utilized due to the lack of sustained data parallelism which often occurs due to vector-length variations in dynamic environments. The motivation of our work stems from: (a) the mandate for multicore designs to make efficient use of on-chip resources for low power and high performance; (b) the omnipresence of vector operations in high-performance scientific and emerging embedded applications; (c) the need to often handle a variety of vector sizes; and (d) vector kernels in application suites may have diverse computation needs. We present a robust design framework for vector coprocessor sharing in multicore environments that maximizes vector unit utilization and performance at substantially reduced energy costs. For our adaptive vector unit, which is attached to multiple cores, we propose three basic shared working policies that enforce coarse-grain, fine-grain, and vector-lane sharing. We benchmark these vector coprocessor sharing policies for a dual-core system and evaluate them using the floating-point performance, resource utilization, and power/energy consumption metrics. Benchmarking for FIR filtering, FFT, matrix multiplication, and LU factorization shows that these coprocessor sharing policies yield high utilization and performance with low energy costs. The proposed policies provide 1.2--2 speedups and reduce the energy needs by about 50% as compared to a system having a single core with an attached vector coprocessor. With the performance expressed in clock cycles, the sharing policies demonstrate 3.62--7.92 speedups compared to optimized Xeon runs. We also introduce performance and empirical power models that can be used by the runtime system to estimate the effectiveness of each policy in a hybrid system that can simultaneously implement this suite of shared coprocessor policies.

References

  1. Azevedo, A. and Juurlink, B. 2009. Scalar processing overhead on simd-only architectures. In Proceedings of 20th IEEE International Conference on Application-specific Systems, Architectures and Processors. IEEE, 183--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Beldianu, S. F. and Ziavras, S. G. 2011. On-chip vector coprocessor sharing for multicores. In Proceedings of the 19th Euromicro International Conference on Parallel, Distributed and Network-Based Computing (PDP'11). IEEE, 431--438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cho, J., Chang, H., and Sung, W. 2006. An fpga based simd processor with a vector memory unit. In Proceedings of the IEEE International Symposium on Circuits and Systems. 525--528.Google ScholarGoogle Scholar
  4. Chou, C. H., Severance, A., Brant, A. D., Liu, Z., Sant, S., and Lemieux, G. 2011. VEGAS: Soft vector processor with scratchpad memory. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'11). ACM Press, New York, 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R., and Tullsen, D. 1997. Simultaneous multithreading: A platform for next-generation processors. IEEE Micro 17, 5, 12--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Frigo, M. and Johnson, S. G. 2005. The design and implementation of FFTW3. Proc. IEEE 93, 2, 216--231.Google ScholarGoogle ScholarCross RefCross Ref
  7. Gerneth, F. 2010. FIR filter algorithm implementation using intel SSE instructions: Optimizing for intel atom architecture. Software white paper on Intel embedded design center. http://download.intel.com/design/intarch/papers/323411.pdf.Google ScholarGoogle Scholar
  8. Golub, G. H. and Van Loan, C. F. 1996. Matrix Computations 3rd Ed. Johns Hopkins, Baltimore, MD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hagiescu, A. and Wong, W. F. 2011. Co-synthesis of fpga-based application-specific floating point SIMD accelerators. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'11). ACM Press, New York, 247--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Intel IPP. 2010. Integrated performance primitives for intel architecture reference manual. http://software.intel.com/en-us/articles/intel-ipp.Google ScholarGoogle Scholar
  11. Intel MKL. 2011. Intel math kernel library reference manual. http://software.intel.com/enus/articles/intel-math-kernel-library-documentation.Google ScholarGoogle Scholar
  12. Keating, M., Flynn, D., Aitken, R., Gibsons, A., and Shi, K. 2007. Low Power Methodology Manual for System on Chip Design. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kozyrakis, C. and Patterson, D. 2002. Vector vs. superscalar and vliw architectures for embedded multimedia benchmarks. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture. 283--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kozyrakis, C. and Patterson, D. 2003a. Overcoming the limitations of conventional vector processors. SIGARCH Comput. Archit. News 31, 2, 399--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kozyrakis, C. and Patterson, D. 2003b. Scalable, vector processors for embedded systems. IEEE Micro 23, 6, 36--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Laforest, C. E. and Steffan, J. G. 2010. Efficient multi-ported memories for FPGAs. In Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM Press, New York, 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lin, Y., Lee, H., Woh, M., Harel, Y., Mahlke, S., Mudge, T., Chakrabarti, C., and Flautner, K. 2006. SODA: A low-power architecture for software radio. In Proceedings of the 33rd Annual International Symposium on Computer Architecture. IEEE, 89--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sanchez, F., Alvarez, M., Salami, E., Ramirez, A., and Valero, M. 2005. On the scalability of 1- and 2-dimensional simd extensions for multimedia applications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'05). IEEE Computer Society, Washington, DC, 167--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sung, W. and Mitra, S. K. 1987. Implementation of digital filtering algorithms using pipelined vector processors. Proc. IEEE 75, 9, 1293--1303.Google ScholarGoogle ScholarCross RefCross Ref
  20. Woh, M., Seo, S., Mahlke, S., Mudge, T., Chakrabarti, C., and Flautner, K. 2010. AnySP: Anytime anywhere anyway signal processing. IEEE Micro 30, 1, 81--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xilinx Inc. 2010a. XPower estimator user guide. www.xilinx.com/support/documentation/user_guides.Google ScholarGoogle Scholar
  22. Xilinx Inc. 2010b. MicroBlaze processor reference guide. http://www.xilinx.com/support/documentation/sw_manuals/mb_ ref_guide.pdf.Google ScholarGoogle Scholar
  23. Yang, H. and Ziavras, S. 2005. FPGA-based vector processor for algebraic equation solvers. In Proceedings of the IEEE International Systems-On-Chip Conference. IEEE, 115--116.Google ScholarGoogle Scholar
  24. Yiannacouras, P., Steffan, J. G., and Rose, J. 2008. VESPA: Portable, scalable, and flexible FPGA-based vector processors. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. ACM Press, New York, 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yu, J., Eagleston, C., Chou, C. H.-Y., Perreault, M., and Lemieux, G. 2009. Vector processing as a soft processor accelerator. ACM Trans. Reconfig. Technol. Syst. 2, 2, 1--34. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multicore-based vector coprocessor sharing for performance and energy gains

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!