skip to main content
article

Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions

Published:21 June 2017Publication History
Skip Abstract Section

Abstract

More and more modern processors have been supporting non-contiguous SIMD data accesses. However, translating such instructions has been overlooked in the Dynamic Binary Translation (DBT) area. For example, in the popular QEMU dynamic binary translator, guest memory instructions with strides are emulated by a sequence of scalar instructions, leaving a significant room for performance improvement when the host machines have SIMD instructions available. Structured loads/stores, such as VLDn/VSTn in ARM NEON, are one type of strided SIMD data access instructions. They are widely used in signal processing, multimedia, mathematical and 2D matrix transposition applications. Efficient translation of such structured loads/stores is a critical issue when migrating ARM executables to other ISAs. However, it is quite challenging since not only the translation of structured loads/stores is not trivial, but also the difference between guest and host register configurations must be taken into consideration. In this work, we present the design and implementation of translating structured loads/stores in DBT, including target code generation as well as efficient SIMD register mapping. Our proposed register mapping mechanisms are not limited to handling structured loads/stores, they can be extended to deal with normal SIMD instructions. On a set of OpenCV benchmarks, our QEMU-based system has achieved a maximum speedup of 5.41x, with an average improvement of 2.93x. On a set of BLAS benchmarks, our system has also obtained a maximum speedup of 2.19x and an average improvement of 1.63x.

References

  1. A. Anderson, A. Malik, and D. Gregg. Automatic vectorization of interleaved data revisited. TACO, 12(4):50, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Hallou, E. Rohou, P. Clauss, and A. Ketterlin. Dynamic revectorization of binary code. In SAMOS, pages 228–237. IEEE, 2015.Google ScholarGoogle Scholar
  3. C. J. Hughes. Single-instruction multiple-data execution. Synthesis Lectures on Computer Architecture, 10(1):1–121, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  4. Intel. Intel 64 and ia-32 architectures optimization reference manual. Intel Corporation, Sept, 2016.Google ScholarGoogle Scholar
  5. S. Kim and H. Han. Efficient SIMD code generation for irregular kernels. In PPoPP, pages 55–64. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, pages 59–69. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Leupers. Code selection for media processors with SIMD instructions. In DATE, pages 4–8. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Michel, N. Fournel, and F. Pétrot. Speeding-up SIMD instructions dynamic binary translation in embedded processor simulation. In DATE, pages 1–4. ACM, 2011.Google ScholarGoogle Scholar
  9. D. Naishlos, M. Biberstein, and A. Zaks. Compiler vectorization techniques for disjoint SIMD architectures. Technical report, 2002.Google ScholarGoogle Scholar
  10. D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In CGO, pages 281–294. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. In PLDI, pages 132–143. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Porpodas, A. Magni, and T. M. Jones. Pslp: Padded slp automatic vectorization. In CGO, pages 190–201. IEEE Computer Society, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Sui, X. Fan, H. Zhou, and J. Xue. Loop-oriented array-and field-sensitive pointer analysis for automatic SIMD vectorization. In LCTES, pages 41–51. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Zheng and C. Thompson. Pa-risc to ia-64: Transparent execution, no recompilation. Computer, 33(3):47–52, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Zhou and J. Xue. A compiler approach for exploiting partial SIMD parallelism. TACO, 13(1):11, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Zhou and J. Xue. Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In CGO, pages 59–69. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 52, Issue 5
          LCTES '17
          May 2017
          120 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3140582
          Issue’s Table of Contents
          • cover image ACM Conferences
            LCTES 2017: Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems
            June 2017
            120 pages
            ISBN:9781450350303
            DOI:10.1145/3078633
            • General Chair:
            • Vijay Nagarajan,
            • Program Chair:
            • Zili Shao

          Copyright © 2017 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 June 2017

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!