nTD: Noise-Profile Adaptive Tensor Decomposition

Tensor decomposition is used for many web and user data analysis operations from clustering, trend detection, anomaly detection, to correlation analysis. However, many of the tensor decomposition schemes are sensitive to noisy data, an inevitable problem in the real world that can lead to false conclusions. The problem is compounded by over-fitting when the user data is sparse. Recent research has shown that it is possible to avoid over-fitting by relying on probabilistic techniques. However, these have two major deficiencies: (a) firstly, they assume that all the data and intermediary results can fit in the main memory, and (b) they treat the entire tensor uniformly, ignoring potential non-uniformities in the noise distribution. In this paper, we propose a Noise-Profile Adaptive Tensor Decomposition (nTD) method, which aims to tackle both of these challenges. In particular, nTD leverages a grid-based two-phase decomposition strategy for two complementary purposes: firstly, the grid partitioning helps ensure that the memory footprint of the decomposition is kept low; secondly (and perhaps more importantly) any a priori knowledge about the noise profiles of the grid partitions enable us to develop a sample assignment strategy (or s-strategy) that best suits the noise distribution of the given tensor. Experiments show that nTD's performance is significantly better than conventional CP decomposition techniques on noisy user data tensors.


INTRODUCTION
Tensors are commonly used for representing multidimensional data, such as user-centered document collections in the web and user interactions in social networks [21, 14]. [20], for example, incorporates contextual information to the traditional HITS algorithm, formulating the task as tensor decomposition. In [5], authors analyze the ENRON email social network using tensor decomposition and in [33], authors use tensors to incorporate user click information to improve web search.
Consequently, tensor decomposition operations (such as CP [13] and Tucker [35]) are increasingly being used to implement various data analysis tasks, from clustering, anomaly detection [21], correlation analysis [32], to pattern discovery [17]. Yet, tensor decomposition process is subject to several major challenges: One major challenge is its computational complexity: decomposition algorithms have high computational costs and, in particular, incur large memory overheads (also known as the intermediary data blow-up problem) and, thus, basic algorithms and naive implementations are not suitable for large problems. Parallel implementations, such as GridParafac [27], GigaTensor [18], HaTen2 [16], TensorDB [19,24,23], were proposed to deal with the high computational cost of the task.
A second problem tensor decomposition faces is that the process can be negatively affected from noise and low quality in the data, which is especially a concern for web-based user data [6,39,40,9] -in particular, especially for sparse data, avoiding over-fitting to the noisy data can be a significant challenge. Recent research has shown that it is possible to avoid such over-fitting by relying on probabilistic techniques [38], which introduces priors on the parameters, it can effectively average over various models and ease the pain of tuning parameters. Unfortunately, existing probabilistic approaches have two major deficiencies: (a) firstly, they assume that all the data and intermediary results can fit in the main memory and (b) they treat the entire tensor uniformly, ignoring possible non-uniformities in the distribution of noise in the given tensor.
In this paper, we propose a Noise-Profile Adaptive Tensor Decomposition (nTD) method, which leverages a priori information about noise in the data (which may be user provided or obtained through automated techniques [30,12]) to improve decomposition accuracy. nTD partitions the user data tensor into multiple sub-tensors ( Figure 1) and then decomposes each sub-tensor probabilistically through Bayesian factorization -the resulting decompositions are then recombined to obtain the decomposition for the whole tensor. Most importantly, nTD provides a resource allocation strategy that accounts for the impact of the noise density of one sub-tensor on the decomposition accuracies of the other sub-tensors. In other words, a priori knowledge about noise distribution among the sub-tensors (noise profiles depicted in Figures 1 and 2) is used to obtain a resource assignment strategy that best suits the noise distribution of the given tensor.
This paper is organized as follows: In the next section, we present the related work. Section 3 presents the relevant notations and the background. Section 4 describes the overview of the grid based probabilistic tensor decomposition scheme. Section 5 introduces the proposed sample assignment strategy (s-strategy) to adapt to different noise profiles, leading to a novel noise adaptive tensor decomposition (nTD) approach. Section 6 experimentally evaluates the effectiveness of the nTD and its alternative implementations. Experiments show that nTD indeed improves the decomposition accuracy of noise polluted tensors and the proposed sample assignment strategy (s-strategy) helps optimize the nTD performance under different noise scenarios. We conclude the paper in Section 7.

RELATED WORK
As we discussed in the previous section, tensor analysis is a commonly used technique for user-centered data analysis [29,25,1,22,15,36,37]. Alternating least squares (ALS) is the most conventional method for tensor decomposition [13]: at each iteration, ALS estimates one factor matrix while maintaining other matrices fixed; this process is repeated for each factor matrix associated to the modes of the input tensor until convergence condition is reached. There are two widely used toolboxes for tensor manipulation: the Tensor Toolbox for Matlab [4] (for sparse tensors) and Nway Toolbox for Matlab [3] (for dense tensors). Yet, due to the significant cost [21] of tensor decompositions, various parallel algorithms and systems have been developed. [34] proposes MACH, a randomized algorithm that speedups the Tucker decomposition while providing accuracy guarantees. More recently, In [26], authors propose PARCUBE, a sampling based, parallel and sparsity promoting, approximate PARAFAC decomposition scheme. Scalability is achieved through sketching of the tensor (using biased sampling) and parallelization of the decomposition operations onto the resulting sketches. TensorDB [19,24] leverages a block-based framework to store and retrieve data, extends array operations to tensor operation, and introduces optimization schemes for in-database tensor decomposition. HaTen2 [16] focuses on sparse tensors and presents a scalable tensor decomposition suite of methods for Tucker and PARAFAC de-  compositions on a MapReduce framework. SCOUT [17] is a recent coupled matrix-tensor factorization framework, also built on MapReduce. In addition to parallelism, it also leverages computation reordering as well as data transformation and reuse to reduce the computational cost of the process. In [11], authors develop a probabilistic framework, pTucker, for modeling structural dependency from partially observed multi-dimensional arrays. [41] implements a deterministic Bayesian inference algorithm, which formulates CP factorization with a hierarchical probabilistic model and employs Bayesian treatment by incorporating a sparsityinducing prior over multiple latent factors and the appropriate hyperpriors over all hyperparameters, resulting in automatic rank determination. [28] proposed a Bayesian framework for low-rank decomposition of multiway missing observations tensor data. The method helps with the discovery of the decomposition rank from the data; moreover, inference scales linearly with the observation size, which helps the proposed approach scale very well. [10] proposes a loss function that helps the tensor decomposition process handle both Gaussian and grossly non-Gaussian perturbations.

BACKGROUND AND NOTATIONS
Intuitively, the tensor model maps a schema with N attributes to an N -modal array (where each potential tuple is a tensor cell). Tensor decomposition process generalizes the matrix decomposition process to tensors and rewrites the given tensor in the form of a set of factor matrices (one for each mode of the input tensor) and a core matrix (which, intuitively, describes the spectral structure of the given tensor). The two most popular tensor decomposition algorithms are the Tucker [35] and the CANDE-COMP/PARAFAC (CP) [13] decompositions. While CP decomposes the input tensor into a sum of component rank-one tensors (leading into a diagonal core ), Tucker decomposition results in a dense core. In this paper, we focus on the CP decomposition process of user data tensors.

CP Decomposition
Given a tensor X , CP factorizes the tensor into factor matrices with F rows (where F is a user supplied non-zero integer value also referred to as the rank of the decomposition). For the simplicity of the discussion, let us consider a 3-mode tensor X ∈ R I×J×K . CP would decompose X into three matrices A, B, and C, such that where a f ∈ R I , b f ∈ R J and c f ∈ R K . The factor matrices A, B, C are the combinations of the rank-one component vectors into matrices; e.g., A = [ a1 a2 · · · a F ]. Since tensors may not always be exactly decomposed, the new tensorX obtained by recomposing the factor matrices A, B, and C is often different from the input tensor, X . The accuracy of the decomposition is often measured by considering the Frobenius norm of the difference tensor.

Parameters of Tensor Noise Profile
Noise distribution: Noise can be distributed in a tensor in several ways: • In uniform (uni) noise (Figure 2(a)), there is no underlying pattern and noise is not clustered across any slice or region of the tensor.
• Slice-concentrated (sc) noise ( Figure 2(b)) is clustered on one or more slices on the tensor across one or more modes. For example, a particular data source (represented by one or more slices) may be known to provide low quality, untrusted information.
• In multi-modal (mm) noise, again, the noise is clustered; however, in this case the noise is expected to occur when a combination of a subset of the values across two or more modes are considered together as in Figure 2(c).
Noise density: This is the ratio of the cells that are subject to noise. In this paper, without loss of generality, we assume noise is on cells that have values (i.e., the observed values can be faulty, but there are no spurious observations) and, thus, we define noise density as a ratio of the non-null cells. Dependent vs. independent noise: Noise may impact the observed values in the tensor in different ways: in valueindependent noise, the correct data may be overwritten by a completely random new value, whereas in value-correlated noise existing values may be perturbed (often with a Gaussian noise, defined by a standard deviation, σ).

GRID BASED PROBABILISTIC TEN-SOR DECOMPOSITION (GPTD)
As we described above, noise may not be uniformly distributed on a tensor. In order to take into account the underlying non-uniformities, we propose to partition the tensor into a grid and treat each grid partition differently based on its noise profile. In this section, we present a Algorithm 1 Phase 1: Monte Carlo based Bayesian decomposition of each sub-tensor(extension of [38] to more than 3 modes) Input: Sub-tensor X k , sampling number L Output: Decomposed factors U 2. For l = 1, . . . , L (a) Sample the hyper-parameter, α: i. Sample the corresponding hyper-parameter, Θ: ii. For i j = 1, ..., I j , sample the mode (in parallel): Grid Based Probabilistic Tensor Decomposition (GPTD) approach which extends the wholistic Probabilistic Tensor Decomposition (PTD [38]) into a grid-based framework. Note that, in and of itself, GPTD does not leverage a priori knowledge about noise distribution, but as we see in Section 5, it provides a framework in which noise-profile based adaptation can be implemented.
Let us consider an N -mode tensor, X ∈ R I 1 ×I 2 ×...×I N , partitioned into a set (or grid) of sub-tensors X = {X k | k ∈ K}, where K is the set of sub-tensor indexes. Without loss of generality, let us assume that K partitions the mode i into Ki equal partitions; i.e., |K| = N i=1 Ki. Given a target decomposition rank, F , the first step of the proposed decomposition (GPTD) scheme is to decompose each sub-tensor in X with target rank F , such that for each k | k ∈ K} denotes the set of F -rank sub-factors 1 corresponding to the sub-tensors in X along mode i and I is the N -mode F × F × . . . × F identity tensor, where the diagonal entries are all 1s and the rest are all 0s. Intuitively, given a sub-tensor X k , each entry X k(i 1 ,i 2 ,i 3 ,...,i N ) can be expressed as the inner-product of N F -dimensional vectors: . We discuss the sub-tensor decomposition process next.

Phase 1: Monte Carlo based Bayesian Decomposition of Sub-tensors
For decomposing individual sub-tensors, we rely on the probabilistic approach proposed in [31,38]: i.e., we describe Figure 3: Illustration of sub-tensor based tensor decomposition: the input tensor is partitioned into smaller blocks, each block is decomposed (potentially in parallel), and the partial decompositions are stitched together through an iterative improvement process the fit between the observed data and the predicted latent factor matrices, probabilistically, as follows: k(i N ) ] and the observation precision α. We also impose independent Gaussian priors on the modes: where Ij is the dimensionality of the j th mode. Given this, one can estimate the latent features U k |X k ). One difficulty with this approach, however, is the tuning of the hyper-parameters of the model: α and Θ U (j) [38] notes that one can avoid the difficulty underlying the estimation of these parameters through a fully Bayesian approach, complemented with a sampling-based Markov Chain Monte Carlo (MCMC) method to address the lack of the analytical solution. The process is visualized in Algorithm 1 in pseudo-code form.

Phase 2: Iterative Refinement
Once the individual sub-tensors are decomposed, the next step is to stitch the resulting sub-factors into the full Frank factors, A (i) (each one along one mode), for the input tensor, X . Let us partition each factor A (i) into Ki parts corresponding to the block boundaries along mode i: Given this partitioning, each sub-tensor X k , k = [k1, . . . , ki, . . . , kN ] ∈ K can be described in terms of these Algorithm 2 The outline of the GPTD process Input: Input tensor X , partitioning pattern K, and decomposition rank, F , and per sub-tensor sample count, L Output: Tensor decompositionX with sample count L using Algorithm 1.

Overview of GPTD
The two phases of the decomposition process are visualized in Algorithm 2 and Figure 3.

NOISE-PROFILE ADAPTIVE TENSOR DECOMPOSITION
One crucial piece of information that the basic grid based decomposition process fails to account for is potentially available knowledge about the distribution of the noise across the input tensor. Note that, in the second phase of the process, each A (i) (k i ) is maintained incrementally by using, for all 1 ≤ j ≤ N , (a) the current estimates for A (j) (k j ) and (b) the decompositions in U (j) ; i.e., the F -rank sub-factors of the sub-tensors in X along the different modes of the tensor. This implies that a sub-tensor which is poorly decomposed due to noise may negatively impact decomposition accuracies also for other parts of the tensor. Consequently, it is important to properly allocate resources to prevent a few noisy sub-tensors among all from negatively impacting the overall accuracy.
In [23], we studied how to allocate resources, in a way that takes into account, user's non-uniform accuracy preferences for different parts of the tensor. In this paper, we develop a novel noise-profile adaptive tensor decomposition (nTD) scheme that focuses on resource allocation based on noise distribution. More specifically, user provided or auomatically discovered [2,39,40] a priori knowledge about the noise profiles of the grid partitions enables us to develop a sample assignment strategy (or s-strategy) that best suits the noise distribution in a given tensor. In particular, nTD assigns the ranks and samples to different sub-tensors in a way that maximizes the overall decomposition accuracy of the whole tensor without negatively impacting the efficiency of the decomposition process. Since probabilistic decomposition can be costly, nTD considers a priori knowledge about each sub-tensor's noise density to decide the appropriate number of Gibbs samples to achieve good accuracy with the given number of samples.

Noise Sensitive Sample Assignment: First Naive Attempt
As we experimentally show in Section 6, there is a direct relationship between the amount of noise a (sub-)tensor has and the number of Gibbs samples it requires for accurate decomposition. On the other hand, the number of samples also directly impacts the cost of the probabilistic decomposition process. Consequently, given a set of sub-tensors, with different amounts of noise, uniform assignment of the number of samples, L = L (total) |K| , where L (total) is the total number of samples for the whole tensor and |K| is the number of sub-tensors, may not be the best choice. In fact, the numbers of Gibbs samples allocated to different sub-tensors X k in Algorithm 1 do not need to be the same. As we have seen in Section 4.1, Phase 1 decomposition of each sub-tensor is independent from the others and, thus, the number of Gibbs samples of different sub-tensors can be different. This observation, along with observation that more samples can provide better accuracy for noisy sub-tensors, can be used to improve the overall decomposition accuracy for a given number of Gibbs samples. More specifically, the number of samples a noisy sub-tensor, X k , is allocated should be proportional to the density, nd k , of noise it contains: where Lmin is the minimum number of samples a (nonnoisy) tensor of the given size would need for accurate decomposition and γ is a control parameter. Note that the value of γ is selected such that the total number of samples needed is equal to the number, L (total) , of samples allocated for the whole tensor:

Noise Sensitive Sample Assignment: Second Naive Attempt
Equations 5 and 6, above, help allocate samples across sub-tensors based on their noise densities. However, they ignore the relationships among the sub-tensors. In Section 4.2, we have seen that, during the iterative refinement process of Phase 2, inaccuracies in decomposition of one sub-tensor can propagate across the rest of the sub-tensors. Therefore, a better approach could be to consider how errors can propagate across sub-tensors when allocating samples.

Accounting for Accuracy Inter-dependencies among Sub-Tensors
More specifically, in this section, we note that if we could assign a significance score to each sub-tensor, X k , that takes into account not only its noise density, but also the position of the sub-tensor relative to other sub-tensors, we could use this information to allocate samples.
Let X be a tensor partitioned into a set (or grid) of subtensors X = {X k | k ∈ K}. According to the update rule (Equation 4) in Section 4.2, if two sub-tensors are lined up along one of the modes of the tensor, they can be used to revise each other's estimates. This means that the update rule ties each sub-tensor's accuracy directly to 1≤i≤N Ki other sub-tensors (that line up with the given sub-tensor along one of the N modes -see Figure 4). Moreover, we see that if the two sub-tensors are similarly distributed along the modes that they share, then they are likely to have high impacts on each other's decomposition; in contrast, if they are dissimilar, their impacts on each other will also be minimal. In other words, given two sub-tensors X j and X l , we can compute an alignment score, align(X j , X l ), between X j and X l as align(X j , X l ) = cos(X l j X j l ), where cos() is the cosine similarity function and X b a is the version of the sub-tensor X a compressed, using the standard Frobenius norm, onto the modes along which the sub-tensor X a and X b are aligned ( Figure 5). Intuitively, this pairwise alignment score describes how the decomposition of one sub-tensor will impact another and also indicate the degree of numeric error propagation. A sub-tensor which is not aligned with the other sub-tensors is likely to have minimal impact on the accuracy of the overall decomposition even if it contains significant amount of noise. In contrast, a sub-tensor which is wellaligned with a larger portion of other sub-tensors may have a large impact on the other sub-tensors, and consequently, on the whole tensor. Consequently, while the former subtensor may not deserve a significant amount of resources, the accuracy of the latter sub-tensor is critical and hence that tensor should be allocated more resources to ensure better overall accuracy.

Sub-Tensor Centrality based Sample Assignment
Therefore, given pairwise alignment scores among the subtensors, one option is to measure the significance of a subtensor relative to other sub-tensors using a centrality measure like PageRank (PR [7]), which computes the significance of each node in a (weighted) graph relative to the other nodes. More specifically, given a graph, G(V, E), the PageRank score p[i], of a node vi ∈ V is obtained by solving p = (1 − β)A p + β s, where A denotes the transition matrix, β is a parameter controlling the random walk likelihood , and s is a teleportation vector such that for vj ∈ V , s[j] = 1 V . Therefore, given (a) the set (or grid) of subtensors X = {X k | k ∈ K} and (b) their pairwise alignment scores, we can associate a significance score, to each sub-tensor X k by computing PageRank scores described by the vector p. Given this score, we can then rewrite Equation 5 as taking into account both the noise density of the sub-tensor along with its relationship to other sub-tensors.

S-Strategy for Sample Assignment
The above formulation considers the position of each subtensor in the whole sub-tensor to compute its significance and then multiplies this with the corresponding noise density to decide how much resources to allocate to that sub-tensor. This, however, may not properly take into account the relationship among the noisy sub-tensors and the positioning of sub-tensors relative to the noisy ones.
In this paper, we note that a better approach would be to consider the noise densities of the sub-tensors directly when evaluating the significance of each sub-tensor. More specifically, instead of relying on PageRank, we propose to use a measure like personalized PageRank (PPR [8]), which computes the significance of each node in a (weighted) graph relative to a given set of seed nodes. Given a graph, G(V, E), and a set, S ⊆ G(V, E), of seed nodes, the PPR score p[i], of a node vi ∈ G(V, E) is obtained by solving p = (1 − β)A p + β s, where A denotes the transition matrix, β is a parameter controlling the overall importance of the seeds, and s is a seeding vector such that if vi ∈ S, then s[i] = 1 S and s[i] = 0, otherwise. Therefore, given (a) the set (or grid) of sub-tensors X = {X k | k ∈ K}, (b) their pairwise alignment scores, and (c) a seeding vector we associate a noise sensitive significance score, to each sub-tensor X k based on the PPR scores, described by the vector p, relative to the noisy tensors. Given this score, we rewrite Equation 5 as

Overview of nTD
The pseudo-code of the proposed noise adaptive tensor decomposition (nTD) process is visualized in Algorithm 3. Input: original tensor X , partitioning pattern K, noisy subtensor K P , and decomposition rank, F and total sampling number L Output: tensor decomposition,X 1. obtain the noise profile of the sub-tensors of X , 2. for sub-tensor k ∈ K, assign a decomposition rank F k = F and a sampling number L k based on noise-sensitive sample allocation strategy, described in Section 5.3.
3. obtain the decomposition,X , of X using the GPTD algorithm (Algorithm 2), given partitioning pattern K and the initial decomposition ranks {F k | k ∈ K} and sampling number {L k | k ∈ K}, 4. ReturnX

EXPERIMENTAL EVALUATION
In this section, we report experiments that aim to assess the effectiveness of the proposed noise adaptive tensor decomposition approach. In particular, we compare the proposed approach against another grid based strategy, Grid-Parafac. We further assess the proposed noise-sensitive sample assignment strategy (s-strategy) by comparing the performance of nTD, which leverages this strategy, against GPTD with uniform sample assignment, on user-centered data.

Experiment Setup
Key parameters and their values are reported in Table 1  In all three data sets, the tensor cells contain rating values between 1 and 5 or (if the rating does not exist) a special "null" symbol. Noise. In these experiments, uniform value-independent type of noise were introduced by modifying the existing ratings in the data set 2 . More specifically, given a uniform noise profile and density, we have selected a subset of the existing ratings (ignoring "null" values) and altered the existing values -by selecting a completely new rating (which we refer to as value-independent noise). Evaluation Criteria. We use the root mean squares error (RMSE) inaccuracy measure to assess the decomposition effectiveness. We also report the decomposition times and memory consumptions. Unless otherwise reported, the execution time of the overall process is reported as if sub-tensor decompositions in Phase 1 and Phase 2 are all executed serially, without leveraging any sub-tensor parallelism. Each

Discussion of the Results
We start the discussion of the results by studying the impact of the s-strategy for leveraging noise profiles. Impact of Leveraging Noise Profiles. In Figure 6, we compare the performance of nTD with noise-sensitive sample assignment (i.e., s-strategy) against GPTD with uniform sample assignment and the two naive noise adaptations, presented in Sections 5.1 and 5.2, respectively. Note that in the scenario considered in this figure, we have 640 total Gibbs samples for 64 sub-tensors, providing on the average 10 samples per sub-tensor. In these experiments, we set Lmin to 9 (i.e. very close to this average), thus requiring that 576(= 64 × 9) samples are uniformly distributed across the sub-tensors -this leaves only 64 samples to be distributed adaptively across the sub-tensors based on the noise profiles of the sub-tensors and their relationships to other sub-tensors. As we see in this figure, the proposed nTD is able to leverage these 64 uncommitted samples to significantly reduced RMSE relative to GPTD with uniform sample assignment. Moreover, we also see that naive noise adaptations can actually hurt the overall accuracy. nTD-naive 1 and 2, both use biased sampling on the noise blocks and focus on the centrality of sub-tensors. Thus, they perform worse than uniform way. These together show that the proposed s-strategy is highly effective in leveraging rough knowledge about noise distributions to better allocate the Gibbs samples across the tensor. Note that, as expected, nTD is costlier than GPTD as it requires additional preprocessing to compute sub-tensor alignments in Phase 2. However, the required pre-processing is trivially parallelizable as discussed next. Impact of Sub-Tensor Parallelism. As we see in : Impact of sub-tensor parallelism on nGPTD (4×4×4 grid; uniform noise; value independent noise; noise density 10%; F = 10; num. Gibbs samples per sub-tensor = 3; max. num. of P2 iteration = 1000; 4 sub-tensors with noise; Ciao data set) highly parallelizable as the sub-tensors resulting from gridpartitioning can be decomposed in parallel. Similarly, the pre-processing needed for computing the sample assignment strategy in Phase 2 is also highly parallelizable: the most expensive step of the process is the compression of the subtensors on modes shared with their neighbors (since the resulting sub-tensor graph is small, the PPR computation has negligible cost) and that work can be done in parallel for each sub-tensor or even for each cell in the resulting compressed representation. Unfortunately, Phase 2, involving incremental stitching and refinement of the factor matrices (see Algorithm 2) cannot be trivially parallelised by assigning different sub-tensors to different processors as the refinement rules need to simultaneously access data from multiple sub-tensors. GPTD vs. GridParafac in the Presence of Noise In its Phase 1, nTD relies on grid based probabilistic decomposition strategy. We next compare this grid probabilistic tensor decomposition (GPTD) against the more conventional Grid-Parafac . As we see in Figure 7, GPTD provides significantly better accuracy than the conventional approaches and also requires significantly lesser memory. As we expected, we also see that increasing the number of sub-tensors results in a significant drop in the per-sub-tensor memory requirement (therefore improving the scalability of the tensor decomposition process) -though the execution time of the second phase of the process (where the initial decompositions of the sub-tensors are stitched together) increases due to the existence of more sub-tensors to consider.
An important observation in Figure 7 (b) is that the memory requirement for the conventional techniques is very sensitive to data density: While the MovieLens tensor has smaller dimensionality then the other two, it has a slightly higher density (3.15 × 10 −5 vs. 1.7 × 10 −6 ). Consequently, for this data set, the memory consumptions of the conventional techniques (especially when the number of grid partitions used are low) are significantly higher than their memory consumptions for the other two data sets. In contrast, the results show that the probabilistic approach is not sensitive to data density and GPTD has similar memory usage for all three data sets. Impact of Noise Density. These results are confirmed in Figure 8(a) & (c), where we vary the noise density between 10% and 80%: as we see here, for all considered noise densities and for all three data sets, the RMSE provided by GPTD is significantly better than the RMSE provided by the conventional GridParafac and this RMSE gain does not come with a significant execution time penalty. Impact of Numbers of Samples. A key parameter of the GPTD algorithm is the number of Gibbs samples used per sub-tensor in Phase 1. As we see in Figure 8(b)&(d), as we would expect, increasing the number of Gibbs samples helps reduce the decomposition error (measured using RMSE) ; however having more samples increases the execution time of the algorithm. It is important to note that, when the number of Gibbs samples is low, the algorithm is very fast, indicating that the worst case complexity of the Bayesian iterations arises only when the number of Gibbs samples is very high. Most critically, as we have already seen in Figures 7 and 8(a), the GPTD algorithm does not need too many Gibbs samples: using a few (in these experiments, even just 1) Gibbs samples per sub-tensor is sufficient to provide significantly better accuracy than GridParafac, as reported in Figures 7 (a), with similar or better time overhead, as reported in Figure 7 (c).

CONCLUSIONS
Web-based user data can be noisy. Recent research has shown that it is possible to improve the resilience of the tensor decomposition process to overfitting (an important challenge in the presence of noisy data) by relying on probabilistic techniques.However, existing techniques assume that all the data and intermediary results can fit in the main memory and (more critically) they treat the entire tensor uniformly, ignoring potential non-uniformities in the noise distribution. In this paper, we proposed a novel noiseadaptive decomposition (nTD) technique that leverages rough information about noise distribution to improve the tensor decomposition performance. nTD partitions the tensor into multiple sub-tensors and then decomposes each sub-tensor probabilistically through Bayesian factorization. The noise profiles of the grid partitions and their alignments are then leveraged to develop a sample assignment strategy (or sstrategy) that best suits the noise profile of a given tensor. Experiments with user-centered web data show that nTD is significantly better than conventional CP decomposition on noisy tensors.