Abstract
Suppose the bits of a computer word are partitioned into d disjoint sets, each of which is used to represent one of a d-tuple of cartesian indices into d-dimensional space. Then, regardless of the partition, simple group operations and comparisons can be implemented for each index on a conventional processor in a sequence of two or three register operations.These indexings allow any blocked algorithm from linear algebra to use some non-standard matrix orderings that increase locality and enhance their performance. The underlying implementations were designed for alternating bit postitions to index Morton-ordered matrices, but they apply, as well, to any bit partitioning. A hybrid ordering of the elements of a matrix becomes possible, therefore, with row-/column-major ordering within cache-sized blocks and Morton ordering of those blocks, themselves. So, one can enjoy the temporal locality of nested blocks, as well as compiler optimizations on row- or column-major ordering in base blocks.
- Allen, F. E., Cocke, J., and Kennedy, K. Reduction of operator strength. In Program Flow Analysis: Theory and Applications, S. W. Muchnick and N. D. Jones, Eds. Prentice-Hall, Englewood Cliffs, NJ, 1981, ch. 3.2, pp. 79--101.]]Google Scholar
- Backus, J. The history of FORTRAN I, II, and III. In History of Programming Languages, R. L. Wexelblat, Ed. Academic Press, New York, 1981, pp. 25--45. Also preprinted in SIGPLAN Not., 13(8):166--180, Aug. 1978. http://doi.acm.org/10.1145/800025.808380]] Google Scholar
Digital Library
- Chatterjee, S., Lebeck, A. R., Patnala, P. K., and Thottenthodi, M. Recursive array layouts and fast parallel matrix multiplication. IEEE Trans. Parallel Distrib. Syst. 13, 11 (Nov. 2002), 1105--1123. http://dx.doi.org/10.1109/TPDS.2002.1058095]] Google Scholar
Digital Library
- Dongarra, J. J., Du Croz, J., Hammarling, S., and Duff, I. S. A set of level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Softw. 16, 1 (Mar. 1990), 1--17. http://doi.acm.org/10.1145/77626.79170]] Google Scholar
Digital Library
- Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran, S. Cache--oblivious algorithms. In Proc. 40th Ann. Symp. Foundations of Computer Science. IEEE Computer Soc. Press, Washington, DC, Oct. 1999, pp. 285--298. http://dx.doi.org/10.1109/SFFCS.1999.814600]] Google Scholar
Digital Library
- Gabriel, S. T., and Wise, D. S. The Opie compiler from row-major source to Morton-ordered matrices. In Proc. 3rd Wkshp. on Memory Performance Issues, J. Carter and L. Zhang, Eds. ACM Press, New York, 2004, pp. 136--144. http://doi.acm.org/10.1145/1054943.1054962]] Google Scholar
Digital Library
- Golub, G. H., and Van Loan, C. F. Matrix Computations, third ed. The Johns Hopkins Univ. Press, Baltimore, 1996.]] Google Scholar
Digital Library
- Heller, J. Catch-22. Simon and Schuster, New York, 1961.]]Google Scholar
- Morton, G. M. A computer oriented geodetic data base and a new technique in file sequencing. Tech. rep., IBM Ltd., Ottawa, Ontario, Mar. 1966.]]Google Scholar
- Perlis, A. J. Special feature: Epigrams on programming. SIGPLAN Not. 17, 9 (1982), 7--13. http://doi.acm.org/10.1145/947955.1083808]] Google Scholar
Digital Library
- Raman, R., and Wise, D. S. Converting to and from dilated integers. Submitted for publication, Jan. 2006.]]Google Scholar
- Schrack, G. Finding neighbors of equal size in linear quadtrees and octrees in constant time. CVGIP: Image Underst. 55, 3 (May 1992), 221--230.]] Google Scholar
Digital Library
- Wise, D. S. Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free. In Euro-Par 2000 -- Parallel Processing, A. Bode, T. Ludwig, W. Karl, and R. Wismüller, Eds., vol. 1900 of Lecture Notes in Comput. Sci. Springer, Heidelberg, 2000, pp. 774--883. http://www.springerlink.com/link.asp?id=0pc0e9gfk4x9j5fa]] Google Scholar
Digital Library
Index Terms
Fast additions on masked integers
Recommendations
Analyzing block locality in Morton-order and Morton-hybrid matrices
MEDEA '06: Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architecturesAs the architectures of computers change, introducing more caches onto multicore chips, even more locality becomes necessary. With the bandwidth between caches and RAM now even more valuable, additional locality from new matrix representations will be ...
Analyzing block locality in Morton-order and Morton-hybrid matrices
As the architectures of computers change, introducing more caches onto multicore chips, even more locality becomes necessary. With the bandwidth between caches and RAM now even more valuable, additional locality from new matrix representations will be ...
Is Morton layout competitive for large two-dimensional arrays yet?: Research Articles
10th International Workshop on Compilers for Parallel Computers (CPC 2003)Two-dimensional arrays are generally arranged in memory in row-major order or column-major order. Traversing a row-major array in column-major order, or vice versa, leads to poor spatial locality. With large arrays the performance loss can be a factor ...






Comments