Discrete Conditional Diffusion for Reranking in Recommendation

Reranking plays a crucial role in modern multi-stage recommender systems by rearranging the initial ranking list to model interplay between items. Considering the inherent challenges of reranking such as combinatorial searching space, some previous studies have adopted the evaluator-generator paradigm, with a generator producing feasible sequences and a evaluator selecting the best one based on estimated listwise utility. Inspired by the remarkable success of diffusion generative models, this paper explores the potential of diffusion models for generating high-quality sequences in reranking. However, we argue that it is nontrivial to take diffusion models as the generator in the context of recommendation. Firstly, diffusion models primarily operate in continuous data space, differing from the discrete data space of item permutations. Secondly, the recommendation task is different from conventional generation tasks as the purpose of recommender systems is to fulfill user interests. Lastly, real-life recommender systems require efficiency, posing challenges for the inference of diffusion models. To overcome these challenges, we propose a novel Discrete Conditional Diffusion Reranking (DCDR) framework for recommendation. DCDR extends traditional diffusion models by introducing a discrete forward process with tractable posteriors, which adds noise to item sequences through step-wise discrete operations (e.g., swapping). Additionally, DCDR incorporates a conditional reverse process that generates item sequences conditioned on expected user responses. Extensive offline experiments conducted on public datasets demonstrate that DCDR outperforms state-of-the-art reranking methods. Furthermore, DCDR has been deployed in a real-world video app with over 300 million daily active users, significantly enhancing online recommendation quality.


INTRODUCTION
Multi-stage recommender systems are widely adopted in online platforms like Youtube, Tiktok, and Kuaishou.As the final stage in recommender system, reranking stage takes top-ranking items as input and generates a reordered sequence of items for recommendation, and thereby directly affects users' experience and satisfaction [16,20].
Different from preliminary stages (e.g., matching, ranking) that produce predictions of a candidate item based on the item itself, the reranking stage further considers the listwise context (cross-item interplay) [31].It is widely acknowledged that whether a user is interested in an item is also determined by other items in the same list [20].Therefore, the key to reranking models is to model the listwise context and produce the optimal sequence.
Generative models are well-suited for the reranking task considering the exponentially large space of item permutations [7,12].Previous studies have adopted the evaluator-generator paradigm, with a generator to generate feasible permutations and an evaluator to evaluate the listwise utility of each permutation [25].In this paradigm, the capacity of generator is of great importance.Despite the successes of traditional generative models like GANs and VAEs [8,26], their limitations, such as unstable optimization and posterior collapse, hinder their application in the reranking task for recommendation.Recently, diffusion models [9,22] achieve remarkable success in computer vision and other domains.Diffusion models usually involve a forward process that corrupts the input with noises in a step-wise manner; and a reverse process that iteratively generates the original input from the corrupted one with a denoising model.In light of the success of diffusion models, this paper aims to explore the potential of diffusion models for generating high-quality sequences in reranking.However, we find it is nontrivial to take diffusion models as the generator due to the following challenges: • Firstly, most diffusion models [9,17,23] are designed for continuous data domains, but the item permutations in recommender systems are operated in the discrete data space.The inherent discrete nature of item sequences in recommender systems brings challenges to the application of diffusion models.• Secondly, the recommendation task is different from conventional generation tasks as the purpose of recommender systems is to fulfill user interests.The generated sequence is expected to achieve positive user feedback, and hence the diffusion model should be controllable in terms of user feedback.• Thirdly, real-life recommender systems serve a huge number of users and the inference procedure of the reranking model is expected to be efficient.Since the generation process of diffusion models works in a step-wise manner, it poses challenges to the inference efficiency of diffusion models.
To tackle the aforementioned challenges, we propose a novel Discrete Conditional Diffusion Reranking (DCDR) framework, which extends traditional diffusion models with a discrete forward process and a conditional reverse process for sequence generation.Specifically, in each step of the forward process, DCDR uses a discrete operation to add noises to the input sequence.We propose two discrete operations including permutation-level operation and token-level operation.In the reverse process, DCDR introduces user feedback into the denoising model as conditions for generation.In each step, the denoising model takes conditions and the noisy sequence as input and estimates the distribution of denoised sequence in the next step.This enables the reverse process to generate sequences with expected feedback during inference.To train the denoising model, we derive the formal objective function by introducing carefully designed sequence encoding and transition matrix for both discrete operations.Moreover, for efficient and robust inference, we propose a series of techniques to enable the deployment of DCDR in real-life recommender systems.We conduct extensive offline and online A/B experiments, the comparison between DCDR and other state-of-the-art reranking methods demonstrates the superiority of DCDR.
DCDR actually serves as a general framework to leverage diffusion in reranking.The discrete operation and model architecture in DCDR are not exhaustive, which can vary according to specific application scenarios.We believe that DCDR will provide valuable insights for future investigations on diffusion-based multi-stage recommender systems.The contributions of this paper can be summarized as follows: • To the best of our knowledge, this is the first attempt to introduce diffusion models into the reranking stage in real-life multi-stage recommender systems.

RELATED WORK 2.1 Reranking in Recommendation
Reranking in recommendation focuses on rearranging the input item sequence considering correlations between items to achieve optimal user feedback.Therefore, reranking models [19,20,30] take the whole list of items (listwise context) as input and generate a reordered list as output.This is different from ranking models [3,15] in preliminary stages (e.g., matching, ranking) that consider a single candidate item at a time.Existing studies on reranking can be roughly categorized into two aspects: the first line of researches [1,19,20] focus on modeling item relations and directly learn a single ranking function which ranks items greedily with the ranking score; the other line of works [7,12] divides the reranking task into two components: sequence generation and sequence evaluation, with a generator produces feasible sequences an a evaluator selecting the best one based on estimated listwise utility.
Our work adopts the evaluator-generator paradigm and endeavors to explore the potential of diffusion models as the generator for producing high-quality item sequences, thereby enhancing the performance of reranking models.

Diffusion Models
Diffusion models [5,9,11,22] have achieved significant success in generation tasks of continuous data domains, such as image synthesis and audio generation.Some studies [2,21] attempt to apply diffusion models on tasks of discrete data domains like text generation.One line of researches [2,10] design corruption processes and reconstruction processes on discrete data.Another line of researches [4,13] attempt to apply continuous diffusion models on discrete data domains by adding noises to the embedding spaces of the data or real-number vector spaces.DiffusionLM [13] is one of the state-of-the-art method which generates text sequence with continuous diffusion models.While diffusion models have achieved success, their potential for generating high-quality item sequences in recommendation remains under-explored.Recently, some studies have attempted to apply diffusion models on sequential recommendation [14,27,28], where the focus is to generate the next item based on user's historical interactions.
However, it is important to note that the reranking task addressed in this paper is distinct from typical sequential recommendation.Specifically, reranking aims to generate feasible item permutations rather than focusing on the next item embedding [14] or user vector [27], which poses significant challenges for the application of diffusion models in the reranking stage.

Diffusion Model
Before we go deep into the details of DCDR, we first provide a brief introduction to diffusion models.The typical diffusion models consist of two processes: forward process and reverse process, which are illustrated in Fig.
where   (x  , ) and   (x  , ) are modeled with neural networks.
Training: The canonical objective function [9] is the variational lower bound: Since L  is a constant, it is usually removed from the loss function.L 0 represents the reconstruction error and L  represents the denoising error between denoised data at each step and the corresponding corrupted data in the forward phase.
Inference: During inference, diffusion models start from a noisy sample x  and draw denoising samples with   (x  −1 |x  ) step by step.After  steps, the generation process runs from x  to x 0 i.e. the final generated sample.

DISCRETE CONDITIONAL DIFFUSION RERANKING FRAMEWORK
In this section, we provide a detailed introduction to DCDR.First, we introduce the overall framework in Section 4.1.Then, we elaborate on each component from Section 4.2 to Section 4.5.The characteristics of DCDR are discussed in Section 4.6.

Overview of DCDR
The framework of DCDR is illustrated in Fig. 2, which mainly consists of: 1) discrete forward process, 2) conditional reverse process.Specifically, the discrete forward process defines a discrete operation with tractable posteriors to add noises to the input sequence.
Here we introduce permutation-level / token-level operation as an example, while other discrete operations are also feasible to be incorporated.The conditional backward process contains a denoising model that tries to recover the input from a noisy sequence at each step.Different from traditional diffusion models, we introduce the expected feedback of the original sequence as the condition to generate the last-step sequence, which is more consistent with the recommendation task.
During training, DCDR first adds noises to the user impression list with the discrete forward process in a step-wise manner.Then, the denoising model in the reverse process is trained to recover the last-step sequence conditioned on user responses to the original impression list.During inference, an initial ranked item list is fed as input, and we set an expected feedback of each item as the condition.Then, the conditional reverse process is able to generate sequences step by step.To further improve the generation quality, we maintain  sequences with top probabilities at each step and introduce an extra sequence evaluator to select the optimal sequence for the final recommendation.

Discrete Forward Process
The forward process in vanilla diffusion models adds Gaussian noises to continuous signals like images according to a specified schedule.However, it is sub-optimal for sequence generation tasks as explained in aforementioned sections.Therefore, we propose to use discrete forward process for sequence generation in the reranking task.
Remember that to train a vanilla diffusion model, the variational lower bound in Eq.( 2) involves posteriors (x  −1 |x  , x 0 ) in L  , which can be rewritten by applying Bayes' theorem: In the continuous setting with Gaussian noises, (x  |x  −1 ) and (x  |x 0 ) are easy to compute and the posterior follows a Gaussian distribution [9].To enable discrete forward process, we need to design discrete operations that also yield tractable forward process Contextual Encoding Layer posteriors (R  −1 |R  , R 0 ).Here we introduce two examples of the discrete operation, i.e., permutation-level operation and token-level operation (shown in Fig. 3).Note that the DCDR framework is not limited to these two operations.Other discrete operations are also feasible to be incorporated in the future.

Permutation-level Operation.
We first propose to encode the item sequence in the permutation space.In this setting, each sequence R  is maped to an integer  (R  ) ∈ [0,   !) and represented as s  ∈ R 1×  !, which is a one-hot vector with length   !.Here   is the length of the final recommendation list.Then we design a simple yet effective discrete operation that forms a Markov chain in the permutation space.For each step in the forward process, we either keep the sequence same with that in last step or randomly swap a pair of items in the sequence.This corresponds to a transition matrix Q  ∈ R   !×  ! at step  as follows: where  −1 (•) derives the counterpart permutation,  2 is the number of possible swap candidates,   ∈ [0, 1] controls the noise strength at each step 1 .And  ( −1 (),  −1 ( )) denotes the difference between two sequences.For example, given We use an example in Fig. 3(a) to illustrate the permutation-level operation.Notice that  (R  , R  −1 ) =  (R  −1 , R  ), the transition 1 In this work, we set ∀ :   =  as a single hyper-parameter and leave advanced noise schedule mechanisms as future work.
x T x t  matrix is a symmetric matrix.Moreover, we show that this transition matrix induces a Markov chain with a uniform stationary distribution, which means the corrupted sequence would eventually become a fully random sequence.The detailed proof can be found in the Appendix.
With the sequence encoding s  and the transition matrix Q  , the discrete forward process at each step can be formulated as: where  (•; p) means a categorical distribution with probability p.
After  steps of noise adding, R  given R 0 can be represented as: where As a result, we can compute the posterior (R  −1 |R  , R 0 ) as follows: ). ( This enables us to calculate the KL-divergence in L  during training.Note that the computation of posterior at time  requires a computation of Q  , this can be time-consuming if we are to compute a matrix multiplication of two matrices with a size of   !×   !.However, the matrix is fixed when the length of sequence   and   are determined.Note that both   and   are determined before training, Q  can be computed and stored in advance.Meanwhile, as s 0 is a one-hot vector, the calculation of is to select an entry of the matrix.Thus the computation of posterior is essentially very efficient during training and inference.

Token-level Operation.
Besides the permutation-level operation, we also propose a token-level operation beyond the permutation space.Here, we use a different way to encode an item sequence R  as multiple token-level vectors z follows: where  controls the noise strength at each step.This forward process also induces a uniform stationary distribution, and the detailed proof can be found in the Appendix.
Then, we can formulate the discrete forward process as:
Similarly, we can define O  = O 1 O 2 . . .O t , and the probability of R  given R 0 is represented as:
This leads to the posterior as follows: Note that the computation of posterior also requires a calculation of O  .Similarly this can be calculated and stored in advance.

Conditional Reverse Process
The reverse process in diffusion models attempts to recover the original sample from the noisy sample.For most diffusion models, this is parameterized as   (x  −1 |x  ) (i.e., denoising model).However, the recommendation task is different from conventional generation tasks as the purpose of recommender systems is to fulfill user interests.Besides, users only response to the displayed item sequence.The utilities of other sequences are unknown.It may be problematic if we only train the denoising model to recover towards the impression list.As a result, we expect the reverse process to be conditioned on the user feedback (e.g., like, effective view).In this way, the denoising model attempts to model   (R  −1 |R  , c), where c is a response sequence representing the expected feedback of sequence R 0 .During training, we can use the real feedback as condition; while at the inference stage, we can set c according to application scenarios (e.g., all positive feedback).c).Here we give an instantiation based on attention mechanism [24], shown in Fig. 4(a).Each item is first mapped into an embedding vector.Then the sequence R  goes through a contextual encoding layer, which consists of a self attention layer among items in the sequence and a history attention layer that uses items in the sequence as queries to aggregate the user history.The output of these two layers are concatenated position-wisely as the contextual-encoded sequence representations.To introduce user feedback as condition, we map the feedback at each position as condition embeddings.Then we use these condition embeddings as queries to aggregate the contextual-encoded sequence representations of R  .The outputs serve as the expected sequence representations of R  −1 .Then the probability of drawing R  −1 is computed as the cosine similarities between the expected representations of R  −1 and the contextual-encoded sequence representations of R  −1 position-wisely.As for the possible next sequence R  −1 , it depends on the discrete operation chosen in the forward process.Compute (R  −1 |R  , R 0 ) according to Eq.( 5) or Eq.( 7);

Inference of DCDR
The inference procudure in vanilla diffusion models starts with a pure Gaussian noise x  and samples x  −1 step by step.To accommodate the recomendation reranking scenario, we make some changes to enable the deployment of DCDR in real-life recommender systems.
• Condition setting: To fullfill user interests as much as possible, we set the expected condition c  as a sequence with all positive feedback during inference.• Starting sequence: In traditional diffusion models, the inference starts with a pure Gaussian noise.However, the pure Gaussian noise may eliminate the important information like user preferences thus hurting recommendation quality.More importantly, starting from pure noise demands far more steps for high-quality generation.Therefore, we use the ordered sequence from the previous stage as the starting sequence for generation.• Beam search: To improve the robustness of the sequence generation process, we adopt beam search to generate a specific number of sequences and further adopt a sequence evaluator model (detailed in Section 4.5) to select the final recommended sequence.Starting from the first noisy sequence, we use the diffusion model to generate  sequences for each candidate and keep  sequences with top-probabilities at each step 2 .• Early stop: Despite the inference optimizations like starting sequence and beam search, the multi-step reverse process may still be costly in time.Therefore we further introduce an earlystop mechanism into the inference procedure.For each step, we compute the likelihood that generated sequences match the expected conditions.With the diffusion steps increasing, the likelihood is expected to increase accordingly.And the diffusion steps would terminate when the likelihood stops increasing or the increase becomes quite marginal.
The detailed inference algorithm of DCDR is described in Alg.2.
2 Notice that when adopted token-level operation, the reverse process is possible to generate sequences containing duplicated items.We manually filter out such sequences and only retain sequences without duplicated items at each step.

Sequence Evaluator
The sequence evaluator model   (R) mainly focuses on estimating the overall utility of a given sequence.Note that many existing methods for the sequence evaluator [6,29] can be incorporated in our DCDR framework.To center the contribution of this paper to the overall framework, we adopt an intuitive design of the sequence evaluator.The architecture of the evaluator model is depicited in Fig. 4(c).Specifically, the input sequence is encoded by the contextual encoding layer as that in the denoising model.Then, the representation at each position is fed into a MLP to predict the score of a given feedback label.The overall utility of the sequence is measured by the rank-weighted sum of scores at each position.
Notice that the feedback label may vary across different recommendation tasks, such as clicks and purchases in e-commerce services, likes and forwards in online social media.Without loss of generality, we utilize the same feedback label as that in the conditional reverse process, and the objective function is a binary cross-entropy loss.

Discussion
The proposed DCDR provides a general and flexible framework to utilize diffusion models in recommendation.Various discrete operations applied in the forward process yield different diffusion models.For instance, the proposed permutation-level operation is well-suited for scenarios where the set of items to be displayed has been fixed (i.e.,   =   ).The diffusion model learns how to generate the optimal permutation by iteratively swaping items conditioned on the expected feedback.Conversely, the token-level operation is suitable when the final displayed item list is a subset of the initial sequence (i.e.,   >   ).The diffusion model learns how to generate the best sequence by step-wise substituting each item with a candidate item.Researchers can also develop other discrete operations tailored to specific application scenarios.Moreover, the architectures of the denoising model and the sequence evaluator model are also flexible to incorporate specific contextual features.Consequently, we believe that DCDR will provide valuable insights for future investigations on diffusion-based multi-stage recommender systems.

EXPERIMENTS
We conduct extensive experiments in both offline and online environments to demonstrate the effectiveness of DCDR.Three research questions are investigated in the experiments: • RQ1: How does DCDR perform in comparison with state-of-theart methods for reranking in terms of recommendation accuracy and generation quality?• RQ2: How do different settings of DCDR affect the performance?• RQ3: How does DCDR perform in real-life recommender systems?
5.1 Experiment Setup 5.1.1Dataset.For the consistency of dataset and the problem setup, we expect that each sample of the dataset is a real item sequence displayed in a complete session rather than a list constructed manually.Therefore we collect two datasets for offline experiments: • Avito 3 : this is a public dataset consisting of user search logs.Each sample corresponds to a search page with multiple ads and thus is a real impressed list with feedback.The data contains over 53M lists with 1.3M users and 36M ads.The impressions from first 21 days are used as training and the impressions in last 7 days are used as testing.And each list has a length of 5. • VideoRerank: this dataset is collected from Kuaishou, a popular short-video App with over 300 million daily active users.For the consistency of dataset and the problem setup, each sample in the dataset is also a real item sequence displayed to a certain user in a complete session.The dataset contains 100, 102 users, 1, 243, 877 items and 871, 834 lists where each list contains 6 items.

5.1.2
Baselines.We compare the proposed DCDR with serveral state-of-the-art reranking methods and a modified discrete diffusion method for text generation.
• PRM [20]: PRM is one of the state-of-the-art models for reranking tasks, which uses self attention to capture the relations between items.And it has been used for reranking tasks in Taobao recommender systems.• DLCM [1]: DLCM adopts gated recurrent units to model the cross-item relations sequentially.Meanwhile DLCM is optimized with an attention-based loss function, which also contributes to its predictive accuracy.• SetRank [19]: SetRank tries to learn a permutation-invariant model for reranking by removing positional encodings and generates the sequence with greedy selection of the items.• EGRerank [12]: EGRerank consists of a sequence generator and a sequence discriminator.And the generator is optimized with reinforcement learning to maximize the expected user feedbacks under the guidance of evaluator.It is worth noticing that EGRerank has been used for reranking tasks in AliExpress.We treat items as words and modify DiffusionLM to take the same conditions with DCDR as inputs for controllable generation.

Implementation Details.
For different datasets, we use different user feedback as condition in the reverse process of DCDR.
For Avito, we use the click behavior as feedback.For VideoRerank, we set the feedback condition as a binary value indicating whether a video has been completed watched.The settings for hyper-parameters of DCDR can be found in Section 5.2.2.As for other baselines, we carefully tune corresponding hyper-parameters to achieve the best performance.

5.1.4
Metrics.We use two widely-adopted metrics, namely AUC and NDCG, to evaluate the performance of different methods in the offline experiments.

Offline Experiment Results
For all offline experiments, we first pretrain a ranking model with Lambdamart and use the ranked list as the initial sequence for the reranking task.For our DCDR, the variant using permutationlevel operation is denoted as DCDR-P, while the variant using token-level operation is denoted as DCDR-T.

Performance Comparison (RQ1
).The performances of different approaches are listed in Table 1.Notice that DCDR achieves the best performances over other approaches, this verifies the effectiveness of DCDR in item sequence generation quality.Moreover the comparison between DCDR and DiffusionLM-R indicates that the generation quality by discrete diffusion models in DCDR outperforms DiffusionLM-R significantly.This verifies the benefits of discrete diffusion models for discrete data domains.

Analysis of DCDR (RQ2).
In this section, we provide a detailed analysis of DCDR from multiple aspects.
Beam size : We alter the beam size in the reverse process for inference and the results are presented in Fig. 5.Note that the result is best when beam size is set to 6, this makes sense since a proper number of beam size improves the recommendation robustness without involving too many sequences to evaluate.
Reverse steps  : We alter the number of reverse steps from 1 to 5 and list the performances in Fig. 5.As shown in the figure, the  performance gets improved when the number of diffusion steps increases.However the improvement becomes marginal when the number of steps reaches a certain value.This coincides with the intuition of adopting early-stop for efficiency.Noise scale : We alter the the swapping probability from 0.1 to 0.5 during training and present the results in Fig. 5.The result indicates that too much noise may add difficulty to the learning process and a proper amount of noise leads to satisfactory performances.

Online Experiment Results (RQ3)
We conduct online A/B experiments on KuaiShou APP.

Experiment Setup.
In online A/B experiments, the traffic of the whole app is split into ten buckets uniformly.20% of traffic is assigned to the online baseline PRM while another 10% is assigned to DCDR-P.As revealed in [32], Kuaishou serves over 320 million users daily, and the results collected from 10% of traffic for a whole week is very convincing.The initial video sequence comes from the early stage of the recommender system, which greedily ranks the items with point-wise ranking scores.And we directly use the initial sequence as the noisy sequence R  for generation.To enable the controllable video sequence generation, we set two conditions for diffusion models: first, we expect the users to finish watching each video in the sequence; second, we expect the positive feedback from users (for example like the video)

Experiment Results.
The experiments have been launched on the system for a week, and the results are listed in Table 2.
The metrics for online experiments include views, likes, follows (subscriptions of authors), collects (adding videos to collections) and downloads.Notice that DCDR-P outperforms the baseline in all the related metrics by a large margin.This verifies the quality of recommended video sequence of DCDR-P.As diffusion models generates the samples in a step-wise manner, the cost of computation and latency become a challenge for the deployment in real-life recommender systems.The computation costs and service latency are listed in Table 3.The step-wise generation causes inevitable latency costs of the recommendation service.However, the cost is acceptable in the system since it does not jeopardize the user experience.

CONCLUSION
In this paper, we make the first attempt to enhance the reranking stage in recommendation by leveraging diffusion models, which faces many challenges such as discrete data space, user feedback incorporation, and efficiency requirements.To address these challenges, we propose a novel framework called Discrete Conditional Diffusion Reranking (DCDR), involving a discrete forward process with tractable posteriors and a conditional reverse process that incorporates user feedback.Moreover, we propose several optimizations for efficient and robust inference of DCDR, enabling its deployment in a real-life recommender system with over 300 million daily active users.Extensive offline evaluations and online experiments demonstrate the effectiveness of DCDR.The proposed DCDR also serves as a general framework.Various discrete operations and contextual features are flexible to be incorporated to suit different application scenarios.We believe DCDR will provide valuable insights for future investigations on diffusion-based multi-stage recommender systems.In the future, we plan to study the impact of noise schedule and explore methods to enhance the efficiency and controllability of the generation process.

APPENDIX
Lemma 7.1.Let P be the transition matrix of a Markov chain.If P is a doubly-stochastic matrix, then the Markov chain defined by P has a uniform stationary distribution.
Proof.Given a transition matrix P, there exists a collection of eigenvalues and eigenvectors.As P is doubly-stochastic (every row sums to 1 and every column sums to 1), it is easy to verify that 1 is an eigenvalue of P, with a corresponding eigenvector e/ (1 • e/ = P • e/ = e/).Therefore, we have  = e/ and  = P, indicating that uniform distribution is a stationary distribution of the Markov chain.□ Lemma 7.2.A Markov chain is ergodic if and only if the process satisfies 1) connectivity: ∀,  : P  (, ) > 0 for some , and 2) aperiodicity: ∀ :  { : P  (, ) > 0} = 1.And any ergodic Markov chain has a unique stationary distribution.
Proof.The details can be found in the reference [18].□ Theorem 7.3.The Markov chain for the discrete forward process with permutation-level operation or token-level operation has a unique uniform stationary distribution.
Proof.First, it is easy to verify that both transition matrices are doubly stochastic: ∀ :  [Q]   = 1 and ∀ :  [O]   = 1.Therefore, uniform distribution is a stationary distribution of both processes according to Lemma 7.1.
Meanwhile, we can show that both Markov chains are ergodic.For the permutation-level operation, any permutation can be achieved through finite steps of swaps.For the token-level operation, it is easy to verify that each item has a chance to appear at each position.Therefore, both forward processes satisfy the connectivity condition.Besides, notice that [Q]  > 0 and [O]  > 0, thus the set { : Q  (, ) > 0} and { : O  (, ) > 0} contain 1.This indicates that  { : Q  (, ) > 0} = 1 and  { : O  (, ) > 0} = 1, which satisfies the aperiodicity condition.Therefore, both forward processes are ergodic, and hence only one stationary distribution exists according to Lemma 7.2.
Combining the above conclusions, both processes have a unique uniform stationary distribution, which means that the sequences are almost randomly arranged after a sufficient number of steps.□

Figure 2 :
Figure2: An illustration of the DCDR framework, which mainly consists of: 1) discrete forward process, and 2) conditional reverse process.In the forward process, two discrete operations with tractable posteriors are introduced; while in the reverse process, user feedback is introduced as the condition to control generation.
Token-level operation (a) Permutation-level operation

Figure 3 :
Figure 3: Illustrations of permutation-level (left) and tokenlevel (right) operations in the discrete forward process.

Figure 4 :
Figure 4: Illustrative instantiations of the denosing model in the conditional reverse process and the sequence evaluator model.Noted that the model architectures provided are not exhaustive, which can vary across application scenarios.

4. 3 . 1
Denoising Model Architecture.Note that DCDR framework does not restrict the concrete architecture of the denoising model   (R  −1 |R  ,

4. 3 . 2
Denoising Model Training.The training objective function of the denoising model   (R  −1 |R  , c) in the reverse process is the typical training loss of diffusion models:

5 :
Compute   (R  −1 |R  , c); 6: Compute L  and update  with ∇  L  ; 7: until converged process (i.e.sample based on (R  |R 0 )).Then the denoising model is optimized with L  accordingly.The training algorithm is presented in Alg.1.

Figure 5 :
Figure 5: The performances of DCDR-P and DCDR-T on Avito dataset with different hyper-parameter settings, including beam size  in reverse process for inference, the number of diffusion steps  and the noise scale  in forward process.
• A novel Discrete Conditional Diffusion Reranking (DCDR) framework is presented.We carefully design two discrete operations as the forward process and and introduce user feedback as conditions to guide the reverse generation process.•We provide proper approaches for deploying DCDR in a popular video app Kuaishou, which serves over 300 million users daily.And online A/B experiments demonstrate the effectiveness and efficiency of DCDR.
is the length of the final recommendation list.In practice,   is usually less than 10 and   can either be equal to or larger than   .
x  −1 ,   I) where   is the scale of the added noise at step .
Algorithm 2 Inference Algorithm for DCDR Require: R  : initial item sequence; c  : expected user feedback;   (R  −1 |R  , c): conditional denoising model;  : maximal step; : number of candidate sequences;   : sequence candidates set;   (R): sequence evaluator model 1:   = {R  }; 2: for  = { , − 1, . . ., 1} do Compute   (R  −1 |R  , c  ) given R  and c  ; Sample  item sequences according to   (R  −1 |R  , c  ) with highest probabilities; Merge the sampled item sequences into   and keep  item sequences with the overall highest probabilities; Select the final sequence with   (R) for recommendation;

Table 2 :
Online experiment results.All values are the relative improvements of DCDR-P over the baseline PRM.For Online A/B tests in Kuaishou, the improvement of over 0.5% in positive interactions (Like, Follow, Collect, Download) and 0.3% in Views is very significant.

Table 3 :
The additional cost of computation to the system and the additonal latency of reranking service.The computation is measured by CPU time.Avg (Comp)/Max (Comp) measure the average/maximum computation costs during the launch time of experiments respectively.Efficiency of DCDR.