Abstract
The matricized-tensor times Khatri-Rao product (MTTKRP) is the computational bottleneck for algorithms computing CP decompositions of tensors. In this work, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors. The algorithms cast nearly all of the computation as matrix operations in order to use optimized BLAS subroutines, and they avoid reordering tensor entries in memory. We use our parallel implementation to compute a CP decomposition of a neuroimaging data set and achieve a speedup of up to 7.4X over existing parallel software.
- W. Austin, G. Ballard, and T. G. Kolda. 2016. Parallel Tensor Compression for Large-Scale Scientific Data. In IPDPS. 912--922.Google Scholar
- B. W. Bader, T. G. Kolda, et al. 2015. MATLAB Tensor Toolbox Version 2.6. Available online. (February 2015). http://www.sandia.gov/~tgkolda/TensorToolbox/Google Scholar
- K. Hayashi, G. Ballard, Y. Jiang, and M. J. Tobia. 2017. Shared Memory Parallelization of MTTKRP for Dense Tensors. Technical Report 1708.08976. arXiv. https://arxiv.org/abs/1708.08976Google Scholar
- J. Li, C. Battaglino, I. Perros, J. Sun, and R. Vuduc. 2015. An Input-Adaptive and In-Place Approach to Dense Tensor-Times-Matrix Multiply. In SC (SC '15). ACM, New York, NY, USA, Article 76, 12 pages. Google Scholar
Digital Library
- A.-H. Phan, T. Tichavsky, and A. Cichocki. 2013. Fast Alternating LS Algorithms for High Order CANDECOMP/PARAFAC Tensor Factorizations. IEEE Transactions on Signal Processing 61, 19 (Oct 2013), 4834--4846. Google Scholar
Digital Library
- M. J. Tobia, K. Hayashi, G. Ballard, I. H. Gotlib, and C. E. Waugh. 2017. Dynamic functional connectivity and individual differences in emotions during social stress. Human Brain Mapping 38, 12 (2017), 6185--6205.Google Scholar
Cross Ref
Recommendations
Shared-memory parallelization of MTTKRP for dense tensors
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingThe matricized-tensor times Khatri-Rao product (MTTKRP) is the computational bottleneck for algorithms computing CP decompositions of tensors. In this work, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors. The algorithms ...
Static and Streaming Tucker Decomposition for Dense Tensors
Given a dense tensor, how can we efficiently discover hidden relations and patterns in static and online streaming settings? Tucker decomposition is a fundamental tool to analyze multidimensional arrays in the form of tensors. However, existing Tucker ...
CANDECOMP/PARAFAC Decomposition of High-Order Tensors Through Tensor Reshaping
In general, algorithms for order-3 CANDECOMP/ PARAFAC (CP), also coined canonical polyadic decomposition (CPD), are easy to implement and can be extended to higher order CPD. Unfortunately, the algorithms become computationally demanding, and they are ...







Comments