A Critical Reexamination of Intra-List Distance and Dispersion

Diversification of recommendation results is a promising approach for coping with the uncertainty associated with users' information needs. Of particular importance in diversified recommendation is to define and optimize an appropriate diversity objective. In this study, we revisit the most popular diversity objective called intra-list distance (ILD), defined as the average pairwise distance between selected items, and a similar but lesser known objective called dispersion, which is the minimum pairwise distance. Owing to their simplicity and flexibility, ILD and dispersion have been used in a plethora of diversified recommendation research. Nevertheless, we do not actually know what kind of items are preferred by them. We present a critical reexamination of ILD and dispersion from theoretical and experimental perspectives. Our theoretical results reveal that these objectives have potential drawbacks: ILD may select duplicate items that are very close to each other, whereas dispersion may overlook distant item pairs. As a competitor to ILD and dispersion, we design a diversity objective called Gaussian ILD, which can interpolate between ILD and dispersion by tuning the bandwidth parameter. We verify our theoretical results by experimental results using real-world data and confirm the extreme behavior of ILD and dispersion in practice.


INTRODUCTION
In recommender systems, solely improving the prediction accuracy of user preferences, as a single objective, is known to have the risk of recommending over-specialized items to a user, resulting in low user satisfaction [31].The primary approach for addressing such issues arising from the uncertainty associated with users' information needs is the introduction of beyond-accuracy objectives [26] such Dispersion selects scattered items ILD prefers two extremes Figure 1: An example such that intra-list distance and dispersion select very different items.
as diversity, novelty, and serendipity.Among the most important beyond-accuracy objectives is diversity, which refers to the internal differences between items recommended to a user.Recommending a set of diverse items may increase the chance of satisfying a user's needs.However, defining diversity is a nontrivial task because the contribution of a particular item depends on the other selected items.Of particular importance in diversified recommendation is thus to define and optimize an appropriate diversity objective.
In this study, we revisit two diversity objectives.One is the intra-list distance (ILD), which is arguably the most frequently used objective for diversity.The ILD [42,53] is defined as the average pairwise distance between selected items for a particular distance metric.ILD is easy to use and popular in diversified recommendation research for the following reasons: the hope that we can characterize what they are representing and reveal their drawbacks.We first identify the following potential drawbacks of ILD and dispersion based on our theoretical comparisons (Section 4): ILD selects items in a well-balanced manner if the entire item set is separated into two clusters.However, it may generally select duplicate items that are very close to each other.The items chosen by dispersion are well-scattered, but distant item pairs may be overlooked.
We then conduct numerical experiments to verify the assertions based on our theoretical analysis (Section 6).Our empirical results using MovieLens [24] and Amazon Review [33] demonstrate that ILD can readily select many items that are similar or even identical, which is undesirable if we wish to recommend very few items.Figure 1 shows a cloud of points in an ellipse such that ILD and dispersion select very different item sets.Our theoretical and empirical results imply that the items selected via ILD are biased toward two distant groups; items in the middle of the ellipse are never chosen.In contrast, the items selected by dispersion are well-scattered.
To better understand the empirical behaviors of ILD and dispersion, we design a new distance-based objective that generalizes ILD and dispersion as a competitor (Section 5).The designed one, Gaussian ILD (GILD), is defined as the average of the Gaussian kernel distances [35] between selected items.GILD has bandwidth parameter , and we prove that GILD approaches ILD as  → ∞ and approaches dispersion as  → 0; i.e., it can interpolate between them.We experimentally confirm that GILD partially circumvents the issues caused by the extreme behaviors of ILD and dispersion, thereby achieving a sweet spot between them (Section 6).
Finally, we examine the recommendation results obtained by enhancing ILD, dispersion, and GILD (Section 7).The experimental results demonstrate that (1) ILD frequently selects duplicate items, and thus it is not an appropriate choice; (2) if the relevance of the recommended items is highly prioritized, dispersion fails to diversify the recommendation results for some users.
In summary, ILD is not appropriate for either evaluating or enhancing distance-based diversity, whereas dispersion is often suitable for improving diversity, but not necessarily for measuring diversity.

RELATED WORK
Diversity enhancement has various motivations [12]; e.g., (1) because a user's preference is uncertain owing to the inherent sparsity of user feedback, recommending a set of diverse items has the potential to satisfy a user's needs; (2) users desire diversity of recommended items due to the variety-seeking behavior.Other beyond-accuracy objectives include novelty, serendipity, and coverage; see, e.g., Castells et al. [12], Kaminskas and Bridge [26], and Zangerle and Bauer [51].Generally, there are two types of diversity.One is individual diversity, which represents the diversity of recommended items for each user.The other is aggregate diversity [1,2], which represents the diversity across users and promotes long-tail items.We review the definitions and enhancement algorithms for individual diversity, which is simply referred to as diversity throughout this paper.
Defining Diversity Objectives.The intra-list distance (ILD) (also known as the average pairwise distance) due to Smyth and McClave [42] and Ziegler et al. [53] is among the earliest diversity objectives in recommendation research.Owing to its simplicity and flexibility in the choice of a distance metric, ILD has been used in a plethora of subsequent works [8,9,13,17,21,25,41,44,47,50,52].Dispersion is another distance-based diversity objective that is similar to ILD.Maximizing the dispersion value is known as the -dispersion problem in operations research and is motivated by applications in facility location [18,19,27,37].Notably, only a few studies on recommender systems [16,21] adopt dispersion as the diversity objective.Determinantal point processes (DPP) are probabilistic models that express the negative correlation among items using the determinant [10,30].DPP-based objectives have recently been applied to recommender systems [36].See Kulesza and Taskar [28] for more details.Topical diversity objectives use predefined topic information to directly evaluate how many topics are covered by selected items and/or the extent to which topic redundancy should be avoided [3,5,6,46].Such topic information is often readily available in many domains such as movies, music, and books.In this paper, we do not compare DPPs or topical diversity because we deeply investigate ILD and dispersion, which are more commonly used.
Gollapudi and Sharma [21] use an axiomatic approach, in which they design a set of axioms that a diversity objective should satisfy, and prove that any objective, including ILD and dispersion, cannot satisfy all the axioms simultaneously.Amigó et al. [4] present another axiomatic analysis of diversity-aware evaluation measures.Our study is orthogonal to these works because we focus on elucidating what diversity objectives represent.
Diversity Enhancement Algorithms.We review algorithms for enhancing the diversity of recommended items.The basic approach simultaneously optimizes both relevance and diversity.Given the relevance rel() for each item  and a diversity objective div(•) (e.g., ILD), we can formulate an objective function as a linear combination of the average relevance and diversity of selected items , i.e., max where  ∈ (0, 1) is the trade-off parameter.The maximal marginal relevance (MMR) [11] is an initial attempt using this approach, which applies a greedy heuristic to Eq. ( 1).Greedy-style algorithms are widely used in many diversified recommendation studies [3,6,13,21,25,41,46,48,50].Other algorithms include local search [50], binary quadratic programming [25,52], and multi-objective optimization [38,39].However, even (Pareto) optimal solutions are undesirable unless we choose an "appropriate" objective to be optimized.We investigate whether the greedy maximization of one diversity objective is useful for enhancing another objective.
Learning-to-rank approaches aim to directly learn the optimal ranking of recommended items for each user under a particular definition of the loss function.Notably, the underlying function that models diversity often originates from existing diversity objectives, including ILD [13,48].Thus, our study helps understand the impact of underlying diversity modeling on recommendation results.
not consider such diversity-aware IR measures, which assume that a distribution over the intents is available for each query.

PRELIMINARIES
Notations.For a nonnegative integer , let [] ≜ {1, 2, . . ., }.For a finite set  and an integer , we write   for the family of all size- subsets of .Vectors and matrices are written in bold (e.g., v and A), and the -th entry of a vector v in R  is denoted  ().The Recap of Two Diversity Objectives.We formally define two popular distance-based diversity objectives.We assume that a pairwise distance  (, ) is given between every pair of items , .One objective is the intra-list distance (ILD), which is defined as for an item set .The definition of ILD is intuitive, as it simply takes the average of the pairwise distances between all the items in .The other is dispersion, which is defined as the minimum pairwise distance between selected items: Dispersion is stricter than ILD in that it evaluates the pairwise distance among  in the worst-case sense.We can flexibly choose from any distance function  depending on the application.Such a distance function is often a metric; i.e., the following three axioms are satisfied for any items , , : (1) identity of indiscernibles:  (, ) = 0 ⇐⇒  = ; (2) symmetry:  (, ) =  ( , ); (3) triangle inequality:  (, ) +  ( , ) ≥  (, ).Commonlyused distance metrics in diversified recommendation include the Euclidean distance [6,41], i.e.,  (, ) ≜ ∥x  − x  ∥, where x  and x  are the feature vectors of items  and , respectively, the cosine distance [13,26], and the Jaccard distance [21,26,50].
Greedy Heuristic.Here, we explain a greedy heuristic for enhancing diversity.This heuristic has been frequently used in diversified recommendations, and thus we use it for theoretical and empirical analyses of ILD and dispersion in Sections 4, 6 and 7.
Consider the problem of selecting a set of  items that maximize the value of a particular diversity objective f.This problem is NPhard, even if f is restricted to ILD [45] and dispersion [18,37].However, we can obtain an approximate solution to this problem using the simple greedy heuristic shown in Algorithm 1.Given a diversity objective f : 2 [] → R + on  items and an integer  ∈ [] representing the number of items to be recommended, the greedy heuristic iteratively selects an item of [], not having been chosen so far, that maximizes the value of f.This heuristic has the following advantages from both theoretical and practical perspectives: (1) it is efficient because the number of evaluating f is at most ; (2) it provably finds a 1  2 -approximate solution to maximization of ILD [7] and dispersion [37], which performs nearly optimal in practice.

THEORETICAL COMPARISON
We present a theoretical analysis of the comparison between ILD and dispersion.Our goal is to elucidate the correlation between two diversity objectives.Once we establish that enhancing a diversity objective f results in an increase in another g to some extent, we merely maximize f to obtain diverse items with respect to both f and g .In contrast, if there is no such correlation, we shall characterize what f and g are representing or enhancing.The remainder of this section is organized as follows: Section 4.1 describes our analytical methodology, Section 4.2 summarizes our results, and Section 4.3 is devoted to lessons learned based on our results.

Our Methodology
We explain how to quantify the correlation between two diversity objectives.Suppose we are given a diversity objective f : 2 [] → R + over  items and an integer  ∈ [] denoting the output size (i.e., the number of items to be recommended).We define f-diversification as the following optimization problem: Hereafter, the optimal item set of and the optimal value is denoted OPT f, ; namely, we define We also denote by  Gr f, the set of  items selected using the greedy heuristic on f.We omit the subscript "" when it is clear from the context.Concepts related to approximation algorithms are also introduced.
Definition 4.1.We say that a -item set  is a -approximation to f-diversification for some  ≤ 1 if it holds that Parameter  is called the approximation factor.
For example, the greedy heuristic returns a 1 2 -approximation for ILD-diversification; i.e., ILD( Gr ILD ) ≥ 1 2 • OPT ILD .We now quantify the correlation between a pair of diversity objectives f and g .The primary logic is to think of the optimal set  * f, for f-diversification as an algorithm for g -diversification.The correlation is measured using the approximation factor of this algorithm for g -diversification, i.e., Intuitively, if this factor is sufficiently large, then we merely maximize the value of f; e.g., if Eq. ( 2) is 0.99, then any item set having the optimum f is also nearly-optimal with respect to g .However, when Eq. ( 2) is very low, such an item set is not necessarily good with respect to g ; namely, f-diversification does not imply g -diversification.Note that we can replace  * f, with the greedy solution, whose approximation factor is OPTg , .Our analytical methodology is twofold: 1. We prove a guarantee on the approximation factor; i.e., there exists a factor  such that OPTg ≥  for every set of items with a distance metric.
2. We construct an input to indicate inapproximability; i.e., there exists a (small) factor  ′ such that OPTg <  ′ for some item set with a distance metric.Such an input demonstrates the case in which f and g are quite different; thus, we can use it to characterize what f and g represent.

Our Results
We now present our results, each of which (i.e., a theorem or claim) is followed by a remark devoted to its intuitive implication.Given that ILD and dispersion differ only in that the former takes the average and the latter the minimum over all pairs of items, an item set with a large dispersion value is expected to possess a large ILD value.This intuition is first justified.We define the diameter  for  items as the maximum pairwise distance; i.e.,  ≜ max ≠ ∈ []  (, ), and denote by  *  the maximum dispersion among  items; i.e.,  *  ≜ OPT disp, .Our first result is the following, whose proof is deferred to Appendix A. Theorem 4.2.The following inequalities hold for any input and distance metric: In other words, the optimal size- set to disp-diversification is a  *  approximation to ILD-diversification, and Algorithm 1 on disp returns a max{ }-approximation to ILD-diversification. Remark: Theorem 4.2 implies that the larger the dispersion, the larger the ILD, given that  is not significantly large.In contrast, if the maximum dispersion  *  is much smaller than , the approximation factor  *   becomes less fascinating.Fortunately, the greedy heuristic exhibits a 1   -approximation, which facilitates a data-independent guarantee.
We demonstrate that Theorem 4.2 is almost tight, whose proof is deferred to Appendix A. Claim 4.3.There exists an input such that the pairwise distance is the Euclidean distance between feature vectors, and the following holds: In particular, Theorem 4.2 is tight up to constant.
Remark: The input used in the proof of Claim 4.3 consists of two "clusters" such that the intra-cluster distance of each cluster is extremely small (specifically, ) and the inter-cluster distance between them is large.The ILD value is maximized when the same number of items from each cluster are selected.However, any set of three or more items has a dispersion ; namely, we cannot distinguish between the largest-ILD case and the small-ILD case based on the value of dispersion.
In the reverse direction, we provide a very simple input such that no matter how large the ILD value is, the dispersion value can be 0, whose proof is deferred to Appendix A.  Remark: The input used in the proof of Claim 4.4 consists of (duplicates allowed) points on a line segment.Dispersion selects distinct points naturally.In contrast, ILD prefers points on the two ends of the segment, which are redundant.

Lessons Learned
Based on the theoretical investigations so far, we discuss the pros and cons of ILD and dispersion.Figure 2 shows two illustrative inputs such that maximization of ILD and dispersion results in very different solutions, where each item is a 2-dimensional vector and the distance between items is measured by the Euclidean distance.
• Pros of ILD: If the entire item set is separated into two "clusters" as shown in Figure 2a, ILD selects items in a well-balanced manner; i.e., a nearly equal number of items from each cluster are chosen (supported by Claim 4.3).
• Cons of ILD: ILD may select duplicate items that are very close (or even identical) to each other.Suppose that we are given feature vectors in an ellipse shown in Figure 2b.Then, ILD would select items from the left and right ends, each of which consists of similar feature vectors (supported by Claim 4.4); even more, items in the middle of the ellipse are never chosen.
In practice, if item features are given by dense vectors such as those generated by deep neural networks, ILD is undesirable because it selects many nearly-identical vectors.
• Pros of dispersion: If the entire item set is "well-dispersed" as in Figure 2b, then so are the items chosen by dispersion as well.
• Cons of dispersion: Dispersion may overlook distant item pairs that would have contributed to ILD.Suppose that we are given feature vectors in two circles in Figure 2a.Because the dispersion value of any (three or more) items is small whereas the diameter is large, we cannot distinguish distant items from close items using only the dispersion value.Thus, dispersion may select items in an unbalanced manner in the worst case (as in Claim 4.3).In practice, if item features are given by sparse (e.g., 0-1) vectors, such as indicator functions defined by genre or topic information, dispersion may not be favorable, because its value becomes 0 whenever two or more items with the same feature are selected.

GAUSSIAN INTRA-LIST DISTANCE
In Section 4.3, we discussed that ILD and dispersion have their own extreme behaviors.We now argue that they can be viewed as limits in the sense of a kernel function over items, i.e., we apply the Gaussian kernel to ILD.The Gaussian kernel for two vectors x, y ∈ R  is defined as , where  > 0 is a bandwidth parameter that controls the smoothness of the estimated function in kernel methods.Since the kernel function can be considered as similarity score, we can define the kernel distance [35] as   (x, y) = √︁ 2 − 2 (x, y).Using this kernel distance, we define the Gaussian ILD (GILD) as where  is a distance metric and  is a bandwidth parameter. 1The following asymptotic analysis shows that GILD interpolates ILD and dispersion, whose proof is deferred to Appendix A.
Theorem 5.1.GILD approaches ILD as the value of  goes to ∞, and it approaches dispersion as the value of  goes to 0 (up to scaling and addition by a constant).
Theorem 5.1 implies that GILD behaves as a compromise between ILD and dispersion by tuning the bandwidth parameter : the value of  must be small if we do not want the selected items to be close to each other;  must be large if we want to include (a few) distance items.
We use GILD to better understand the empirical behavior of ILD and dispersion.In particular, we are interested to know whether GILD can avoid the extreme behavior of ILD and dispersion.

Choosing the Value of 𝜎
Here, we briefly establish how to choose the value of  in Section 6.As will be shown in Section 6.2.3, GILD usually exhibits extreme behaviors like ILD or dispersion.We wish to determine the value of  for which GILD interpolates them.Suppose that we have selected  items, denoted .In Eq. ( 6) in the proof of Theorem 5.1, for the first two terms to be dominant, we must have  ≫ (  2 − ) •   , which implies that  ≫ √︂ . Based by this, we propose the following two schemes for determining the value of , referred to as the adjusted minimum and the adjusted median: and .
Note that  min  ≤  med  , and the adjusted median mimics the median heuristic [20,22] in kernel methods.In Section 6, we empirically justify that dividing by √︃ 2 log(  2 − 1) is necessary.Since  min
1 Note that we have replaced the Euclidean distance in exp by  so that we can use any distance metric.

EMPIRICAL COMPARISON
We report the experimental results of the empirical comparison among the diversity objectives analyzed in Sections 4 and 5.The theoretical results in Section 4 demonstrate that each objective captures its own notion of diversity; thus, enhancing one objective is generally unhelpful in improving another.One may think that such results based on worst-case analysis are too pessimistic to be applied in practice; for instance, ILD may be used to enhance dispersion in real data, even though any positive approximation guarantee is impossible.Thus, we empirically analyze the approximation factor for the diversity objectives examined thus far.

Settings
6.1.1Datasets.We use two real-world datasets including feedback and genre information and two synthetic datasets.
1. MovieLens 1M (ML-1M) [23,24]: Genre information is associated with each movie; there are 18 genres.We extracted the subset in which users and movies have at least 20 ratings, resulting in 995 thousand ratings on 3,000 movies from 6,000 users.
2. Amazon Review Data Magazine Subscriptions (Amazon) [32,33]: Each product contains categorical information, and there are 165 categories.We extracted the subset in which all users and movies have at least five ratings, resulting in 4,200 reviews of 720 products from 664 users.
3. Random points in two separated circles (TwoCircles, Figure 2a): Consist of 1,000 random points in two circles whose radius is 1  4 and centers are − 3 4 and 3 4 .4. Random points in an ellipse (Ellipse, Figure 2b): Consist of 1,000 random points in an ellipse of flattening 3 4 .
6.1.2Distance Metrics.We use two types of distance metrics for real-world datasets.
1. Implicit feedback (feedback for short): Let X be a user-item implicit feedback matrix over  users and  items, such that  , is 1 if user  interacts with item , and 0 if there is no interaction.We run singular value decomposition on X with dimension  ≜ 32 to obtain X = UV ⊤ , where The feature vector of item  is then defined as v  and the distance between two items ,  is given by the Euclidean distance  (, ) ≜ ∥v  − v  ∥.
2. Genre information (genre for short): We denote by   the set of genres that item  belongs to.The distance between two items ,  is given by the Jaccard distance  (, ) . Multiple items may have the same genre set; i.e.,  (, ) = 0 for some  ≠ .
For two synthetic datasets, we simply use the Euclidean distance.
6.1.3Diversity Enhancement Algorithms.We apply the greedy heuristic (Algorithm 1) to ILD, dispersion, and GILD with the adjusted median.A baseline that returns a random set of items (denoted Random) is implemented.Experiments were conducted on a Linux server with an Intel Xeon 2.20GHz CPU and 62GB RAM.All programs were implemented using Python 3.9.

Results
We calculate the empirical approximation factor for each pair of diversity objectives f and g as follows.First, we run the greedy  )/ g ( Gr g , ) for each  ∈ [128].This factor usually takes a number from 0 to 1 and is simply referred to as the relative score of f to g .Unlike the original definition in Eq. ( 2), we do not use OPTg , because its computation is NP-hard.Tables 1 to 6 report the average relative score over  = 2, . . ., 128.

ILD vs. Dispersion vs. GILD in Practice.
The relative score of ILD to dispersion is first investigated, where we proved that no approximation guarantee is possible (Claim 4.4).In almost all cases, the relative score is extremely low, with the highest being 0.424.This is because that multiple items with almost-the-same features were selected, resulting in a small (or even 0) value of dispersion.Figure 3 shows that ILD selects items that have similar feature vectors when  = 34; we thus confirmed the claim in Section 4.3 that ILD selects nearly-identical items in the case of dense feature vectors.Moreover, Figure 4 shows that it selects duplicate items that share the same genre set at  = 23.
We then examine the relative score of dispersion to ILD, for which we provided an approximation factor of max{ } (Theorem 4.2).Tables 1 to 6 show that the relative score is better than 0.859 except for Ellipse, which is better than expected from 1   .Figure 5 also indicates that the relative score does not decay significantly; e.g., at  = 100, the relative score is better than 0.94 even though the worst-case approximation factor is 1  = 0.01.It is evident that GILD has a higher relative score to ILD than dispersion, and a higher relative score to dispersion than ILD for all settings.That is, GILD finds an intermediate set between ILD and dispersion, suggesting that ILD and dispersion exhibit the extreme behavior in practice as discussed in Section 4. 6.2.2 Qualitative Analysis via Visualization.We qualitatively assess the diversity objectives based on the visualization of synthetic datasets.We first investigate Ellipse, in which ILD may select duplicate items (see Section 4.3).Figure 6 shows items of Ellipse that are selected by each diversity objective; Figure 8 shows the histogram of the pairwise Euclidean distances between the selected items.The items selected by ILD can be partitioned into two groups: the left and right ends of the ellipse (Figure 6a).The histogram further that the inter-group distance between them is approximately 1.8 whereas the intra-group distance is close to 0. Thus, the drawback of ILD in Section 4.3 occurs empirically.Unlike ILD, the items selected by dispersion are well dispersed (Figure 6b); however, it misses many pairs of distant items as shown in Figure 8.One reason for this result is given that dispersion is the minimum pairwise distance, maximizing the value of dispersion does not lead to the selection of distant item pairs, as discussed in Section 4.3.In contrast, the items chosen by GILD are not only scattered (Figure 6c); they include more dissimilar items Figure 9: Trade-off between ILD and dispersion.For each value of , we plot ILD( Gr GILD  , ) and disp( Gr GILD  , ).
than dispersion, as shown in the histogram.This observation can be explained by the GILD mechanism, which takes the sum of the kernel distance over all pairs.We then examine TwoCircles. Figure 7 shows that each diversity objective selects almost the same number of items from each cluster.In particular, the potential drawback of dispersion discussed in Section 4.3, i.e., the imbalance of selected items in the worst case, does not occur empirically.

6.2.3
Investigation of the Effect of  on GILD.We investigate the empirical effect of the value of  on the behavior of GILD.Specifically, we examine how GILD interpolates between ILD and dispersion by changing , as suggested in Theorem 5.1.Setting the value of  to each of 64 equally-spaced numbers on a log scale from 0.02 to 1, we greedily maximize GILD  for feedback on ML-1M to obtain a -item set  GILD  , .We also run the adaptive greedy heuristic, which is oblivious to the value of , to obtain a -item set  GILD, .Figure 9 plots values of ILD and dispersion for each obtained set  GILD  , of size  = 16, 128.The vertical lines correspond to the adjusted minimum  min  GILD, , adjusted median  med  GILD, , minimum min ≠ ∈ GILD,  (, ), and median median ≠ ∈ GILD,  (, ).Horizontal lines correspond to ILD( Gr ILD ) ≈ OPT ILD , ILD( Gr disp ), disp( Gr disp ) ≈ OPT disp , and disp( Gr ILD ).Observe first that ILD is monotonically increasing in  and approaches OPT ILD ; disp is approximately decreasing in  and attains OPT disp for a "moderately small" value of , which coincides with Theorem 5.1.
Observe also that the degradation of both ILD and disp occurs for small values of .The reason is that each term exp −  (, ) 2 2 2 in GILD becomes extremely small, causing a floating-point rounding error.Setting  to the minimum and median results in a dispersion value of disp( Gr ILD ) when  = 16; i.e., the obtained set is almost identical to  Gr ILD .In contrast, setting  =  min  GILD, is similar to yields a set whose dispersion is between disp( Gr disp, ) and disp( ILD, ) and whose ILD is in the middle of ILD( Gr ILD, ) and ILD( disp, ).Thus, using the adjusted median, and division by √︃ 2 log  2 − 1 is crucial for avoiding trivial sets.We discuss the empirical behavior of ILD, dispersion, and GILD.Arguably, ILD easily selects many items that are similar or identical.As shown in Figure 6a, the chosen items are biased toward two distant groups, and items in the middle of the two groups never appear.This is undesirable if we wish to recommend very few items.Such drawbacks of ILD can be resolved via dispersion.Greedy maximization of dispersion also empirically enhances the ILD value.However, it may overlook distant item pairs, as discussed in Section 6.2.2.We also note that dispersion is not suitable for measuring diversity.As shown in Figure 10, the value of dispersion drops to nearly 0 when selecting a moderate number of items; it does not return to a positive value.Due to this nature, dispersion may not be used to compare large item sets.

Discussions
The empirical result of GILD implies that ILD and dispersion are not appropriate for improving and/or evaluating distance-based diversity.GILD partially circumvents the issues caused by the extreme behavior of ILD and dispersion, thereby achieving the sweet spot between them.On the one hand, GILD extracts dissimilar items such that the dispersion value does not drop to 0. On the other hand, GILD can select more dissimilar items than dispersion.Similar to dispersion, GILD cannot be used to compare the diversity among distinct sets, as shown in Table 3, which indicates that even Random can have the highest GILD value.This is because GILD with the adjusted median is designed to evaluate the next item to be selected given a fixed set of already-selected items.To sum up, GILD works successfully as an optimization objective interpolating ILD and dispersion and as a tool for analyzing them empirically.

DIVERSIFIED RECOMMENDATION RESULTS
Having a better understanding of the behavior of diversity objectives from both theoretical (Section 4) and empirical perspectives (Section 6), we incorporate them into the recommendation methods.

Settings
7.1.1Dataset.To investigate results produced by a recommendation method using ILD, dispersion, and GILD, we use the ML-1M dataset, the details of which are described in Section 6.1.We extracted the subset in which users and movies have at least 20 and 100 ratings, respectively, resulting in 370 thousand ratings on 2,000 movies from 2,800 users.The obtained subset was further split into training, validation, and test sets in a 60/20/20 ratio according to weak generalization; i.e., they may not be disjoint in terms of users.
7.1.2Algorithms.We adopt Embarrassingly Shallow AutoEncoder (ease r ) [43] to estimate the predictive score rel  () for item  by user  from a user-item implicit feedback matrix.ease r has a hyperparameter for  2 -norm regularization, and its value is tuned using the validation set.We construct a distance metric based on the implicit feedback in Section 6.1 to define ILD, dispersion, and GILD.We then apply the greedy heuristic to a linear combination of relevance and diversity.Specifically, given a set  ℓ −1 of already selected ℓ − 1 items, we select the next item  ℓ that maximizes the following objective: where  ∈ (0, 1) is a trade-off parameter between relevance and diversity.We run the greedy heuristic for each f, each value of  = 0, 0.1, 0.2, . . ., 0.9, 0.99, 0.999, 1, and each user  to retrieve a list of  ≜ 50 items to be recommended to , denoted  ,f, .Experiments were conducted on the same environment as described in Section 6. , respectively, where  Gr ,f is the set of  items obtained by greedily maximizing f on the set of items that do not appear in the training or validation set.We then take the mean of nDCG, nILD, and ndisp over all users.

Results
Figure 11 shows the relation between each pair of nDCG, nILD, and ndisp.First, we observe a clear trade-off relationship between relevance and diversity regarding .In particular, when diversity is not introduced into the objective (i.e.,  = 0), the mean ndisp takes 0, which implies that for most users, two or more of selected items have the same genre set.As shown in Section 6, incorporating ILD does not avoid the case of ndisp = 0.In contrast, dispersion and GILD with a moderate value of  enhance nILD and ndisp without substantially sacrificing accuracy.Comparing dispersion and GILD, it is observed that GILD achieves a slightly higher nILD than dispersion: When the mean nDCG is close to 0.25, the means of nILD for GILD and dispersion are 0.966 and 0.948, respectively, and the means of ndisp for them are 0.987 and 0.992, respectively.
Although dispersion and GILD have a similar trade-off for the high-relevance case (i.e., mean nDCG ≥ 0.4), which is often a realistic situation, they produce different results at the individual level.To this end, we select  such that they are nearly identical on average.Specifically, we choose  = 0.2 for dispersion and  = 0.7 for GILD, for which the means of nDCG, nILD and ndisp are respectively 0.457, 0.870 and 0.009 for dispersion, whereas those are respectively 0.445, 0.877 and 0.001 for GILD.The left figure in Figure 12 plots the nDCG of  ,disp,0.2and  ,GILD,0.7 for each user .Observe that dispersion and GILD show a similar trend; the standard deviation of nDCG is 0.161 for dispersion and 0.160 for GILD.In contrast, as shown in the right figure in Figure 12, dispersion often has a smaller nILD than GILD.Furthermore, the standard deviation of nILD for dispersion (0.051) is larger than that for GILD (0.038).This difference is possibly due to the potential drawback of dispersion (see Section 4.3): Since the values of dispersion for most users become 0 at a particular iteration of the greedy heuristic, the objective F ,disp,0.2 () in Eq. ( 5) is 0.8rel  () in the subsequent iterations; i.e., the greedy heuristic only selects the item with the highest relevance.Consequently, dispersion fails to diversify some users' recommendation results, which is not the case for GILD.In summary, as a diversity objective to be optimized in diversified recommendation, ILD and dispersion are not an appropriate choice.

CONCLUSIONS
To investigate the behavior of two common diversity objectives, ILD and dispersion, we performed a comparison analysis.Our results revealed the drawbacks of the two: ILD selects duplicate items, while dispersion may overlook distant item pairs.To analyze these drawbacks empirically, we designed Gaussian ILD (GILD) as an interpolation between ILD and dispersion.In the personalized recommendation setting, we demonstrated that both ILD and dispersion are not consistently successful in enhancing diversity at the individual level.As a future work, we plan to develop an evaluation measure of diversity in lieu of ILD and dispersion. .When we run the greedy heuristic on dispersion, we can assume that the first selected item is x 1 without loss of generality.Then, we would have selected y  for some  as the second item.In the remaining iterations, we may select  − 2 vectors all from X in the worst case, resulting in ILD( Gr disp ) .Observe that for any pair (, ),
Using a Taylor expansion of where  is the number of pairs (, ) with  (, ) = disp().Observing that lim →0   = 0, we have lim completing the proof of the second statement.□

Claim 4 . 4 .
There exists an input such that the pairwise distance is the Euclidean distance and disp( * ILD )OPT disp = disp( Gr ILD )OPT disp = 0.In other ILD selects items in balanced manner Dispersion may be imbalanced in worst case (a) Two separated circles (cf.Claim 4.3).Dispersion selects scattered items ILD prefers two extremes (b) An ellipse (cf.Claim 4.4).

Figure 2 :
Figure 2: Two inputs for which maximization of ILD and dispersion results in very different solutions.

Figure 3 :Figure 4 :Figure 5 :
Figure 3: Relative score of each objective to dispersion for feedback on ML-1M.

Figure 8 :
Figure 8: Histogram of the pairwise distances of the selected items on Ellipse.

Figure 6 :
Figure 6: 128 points (big red circles) of Ellipse selected by greedily maximizing each objective with the Euclidean distance.

Figure 7 :
Figure 7: 128 points (big red circles) of TwoCircles selected by greedily maximizing each objective with the Euclidean distance.

Figure 10 :
Figure 10: Dispersion of items for genre on ML-1M.

Figure 11 :
Figure 11: Relation between each pair of nDCG, nILD, and ndisp with regard to a trade-off parameter .

Table 1 :
Average rel.score of each pair of diversity objs.for feedback on ML-1M.

Table 2 :
Average rel.score of each pair of diversity objs.for feedback on Amazon.

Table 3 :
Average rel.score of each pair of diversity objs.for TwoCircles.

Table 4 :
Average rel.score of each pair of diversity objs.for genre on ML-1M.

Table 5 :
Average rel.score of each pair of diversity objs.for genre on Amazon.

Table 6 :
Average rel.score of each pair of diversity objs.for Ellipse.
7.1.3Evaluation.We evaluate the accuracy and diversity of the obtained sets as follows.Let   denote the set of relevant items to user  (i.e., those interacting with ) in the test set.We calculate the normalized Discounted Cumulative Gain (nDCG) by nDCG@ ( ,f, ;   ) ≜ ∑︁ [ℓ-th ranked item of  ,f, is in   ] ] log 2 (ℓ + 1) .