skip to main content
article

Fast additions on masked integers

Published:01 May 2006Publication History
Skip Abstract Section

Abstract

Suppose the bits of a computer word are partitioned into d disjoint sets, each of which is used to represent one of a d-tuple of cartesian indices into d-dimensional space. Then, regardless of the partition, simple group operations and comparisons can be implemented for each index on a conventional processor in a sequence of two or three register operations.These indexings allow any blocked algorithm from linear algebra to use some non-standard matrix orderings that increase locality and enhance their performance. The underlying implementations were designed for alternating bit postitions to index Morton-ordered matrices, but they apply, as well, to any bit partitioning. A hybrid ordering of the elements of a matrix becomes possible, therefore, with row-/column-major ordering within cache-sized blocks and Morton ordering of those blocks, themselves. So, one can enjoy the temporal locality of nested blocks, as well as compiler optimizations on row- or column-major ordering in base blocks.

References

  1. Allen, F. E., Cocke, J., and Kennedy, K. Reduction of operator strength. In Program Flow Analysis: Theory and Applications, S. W. Muchnick and N. D. Jones, Eds. Prentice-Hall, Englewood Cliffs, NJ, 1981, ch. 3.2, pp. 79--101.]]Google ScholarGoogle Scholar
  2. Backus, J. The history of FORTRAN I, II, and III. In History of Programming Languages, R. L. Wexelblat, Ed. Academic Press, New York, 1981, pp. 25--45. Also preprinted in SIGPLAN Not., 13(8):166--180, Aug. 1978. http://doi.acm.org/10.1145/800025.808380]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chatterjee, S., Lebeck, A. R., Patnala, P. K., and Thottenthodi, M. Recursive array layouts and fast parallel matrix multiplication. IEEE Trans. Parallel Distrib. Syst. 13, 11 (Nov. 2002), 1105--1123. http://dx.doi.org/10.1109/TPDS.2002.1058095]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dongarra, J. J., Du Croz, J., Hammarling, S., and Duff, I. S. A set of level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Softw. 16, 1 (Mar. 1990), 1--17. http://doi.acm.org/10.1145/77626.79170]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran, S. Cache--oblivious algorithms. In Proc. 40th Ann. Symp. Foundations of Computer Science. IEEE Computer Soc. Press, Washington, DC, Oct. 1999, pp. 285--298. http://dx.doi.org/10.1109/SFFCS.1999.814600]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gabriel, S. T., and Wise, D. S. The Opie compiler from row-major source to Morton-ordered matrices. In Proc. 3rd Wkshp. on Memory Performance Issues, J. Carter and L. Zhang, Eds. ACM Press, New York, 2004, pp. 136--144. http://doi.acm.org/10.1145/1054943.1054962]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Golub, G. H., and Van Loan, C. F. Matrix Computations, third ed. The Johns Hopkins Univ. Press, Baltimore, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Heller, J. Catch-22. Simon and Schuster, New York, 1961.]]Google ScholarGoogle Scholar
  9. Morton, G. M. A computer oriented geodetic data base and a new technique in file sequencing. Tech. rep., IBM Ltd., Ottawa, Ontario, Mar. 1966.]]Google ScholarGoogle Scholar
  10. Perlis, A. J. Special feature: Epigrams on programming. SIGPLAN Not. 17, 9 (1982), 7--13. http://doi.acm.org/10.1145/947955.1083808]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Raman, R., and Wise, D. S. Converting to and from dilated integers. Submitted for publication, Jan. 2006.]]Google ScholarGoogle Scholar
  12. Schrack, G. Finding neighbors of equal size in linear quadtrees and octrees in constant time. CVGIP: Image Underst. 55, 3 (May 1992), 221--230.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Wise, D. S. Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free. In Euro-Par 2000 -- Parallel Processing, A. Bode, T. Ludwig, W. Karl, and R. Wismüller, Eds., vol. 1900 of Lecture Notes in Comput. Sci. Springer, Heidelberg, 2000, pp. 774--883. http://www.springerlink.com/link.asp?id=0pc0e9gfk4x9j5fa]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast additions on masked integers

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!