Latent User Intent Modeling for Sequential Recommenders

Sequential recommender models are essential components of modern industrial recommender systems. These models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform. Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online. Intent modeling is thus critical for understanding users and optimizing long-term user experience. We propose a probabilistic modeling approach and formulate user intent as latent variables, which are inferred based on user behavior signals using variational autoencoders (VAE). The recommendation policy is then adjusted accordingly given the inferred user intent. We demonstrate the effectiveness of the latent user intent modeling via offline analyses as well as live experiments on a large-scale industrial recommendation platform.


INTRODUCTION
Modern recommender systems are powering many online platforms.Sequential recommendation models that consider the order of useritem interactions are becoming increasingly popular [7,18,34,38].Most existing sequential recommender models mainly rely on the item-level interaction history of a user to capture his/her topical interests.These models often lack a higher-level understanding of user intent, i.e., what a user wants from the platform at request time, for example exploring new contents, continuing with contents from the last session, playing background music, etc. User intents often span across sessions, thus intent understanding is crucial to optimizing long-term user experience.
User intent can be explicitly defined.For example, search intents are classified into navigational, informational, and transactional [5,16,32].With this approach, we can annotate the training data with explicitly defined user intent and cast it as a supervised learning task to predict user intent.This formulation has clear advantages: it allows for good interpretability and reliable evaluation of model prediction.It however also comes with an obvious downside, i.e., requiring expert knowledge to manually define and enumerate user intent.Compared with the search use case, user intent in organic recommendation is often multifaceted and subconscious, which are much more challenging to manually define [17].
Rather than explicitly defining user intent, we propose to model user intent as latent variables using latent variable models, which have been extensively studied and applied across various fields including statistics [11], machine learning [4], and econometrics [1].It is a probabilistic model relating observed variables or evidence to latent variables.Specifically, it defines a joint distribution over the observed and latent variables.One can then obtain the corresponding distribution of the observed variables through marginalization.
User behavior signals on an online platform-for example, search, browse, click, and consumption-are often good indicators of the underlying user intent.Therefore, they can be used as observed variables in the latent variable model.In other words, we formulate user intent as latent variables and jointly model them with user behavior signals, avoiding the need to manually define intent.
Together we make the following contributions: • Propose a probabilistic model to formulate the user intent as latent variables and relate it to behavior and context signals.

RELATED WORK
Sequential recommendation systems.Sequential recommenders are a class of recommender systems, which are based on modeling the order of the user-item interactions [38].Various architectures have been applied to capture long-term and complex dependencies in user-item interactions, for example, recurrent neural networks (RNN) [3,7,13,42], convolutional neural networks (CNN) [34,35,43], and self-attentive models [18].However, most methods mainly focus on the item-level interaction history and often lack a higherlevel understanding of user intent.
User intent modeling.Multiple approaches have been proposed to model user intent for improving recommender systems [6,8,21,22,28,31,36]; in particular, the implicit user intent approach has arXiv:2211.09832v2[cs.IR] 27 Mar 2023 gained popularity. Wang et al. [39] use the mixture-channel purpose routing network (MCPRN) to learn users' different purchase purposes of each item.Li et al. [24] propose methods to discover new intents based on existing explicitly-defined ones.Implicitly deduced intent from user interactions has also been used to understand user satisfaction in music recommendation [30].As far as we know, our proposal to bring latent variable models into intent modeling and leverage behavior signals is a novel contribution.
Latent variable models and variational autoencoders.Latent variable models are a class of probabilistic models that assume the observed variables are generated by a set of unobserved or latent variables.The models learn the joint distribution of observed and latent variables, then the distribution of observed ones is obtained by marginalization [1,4,11].Variational autoencoders (VAE) provide a principled framework for learning deep latent variable models and corresponding inference models [19,20].Several extensions of VAE are later proposed, including conditional VAE that models the conditional distributions [33] and variational RNN for modeling sequential data [10].Applications of VAE to recommender systems have also been an active research topic [23,[25][26][27].In this work, we propose a novel application of latent variable models for user intent modeling.

LATENT INTENT MODELING
In this section, we provide a detailed exposition of the proposed latent intent modeling technique.Section 3.1 discusses the model assumptions in the form of a probabilistic graphical model.Section 3.2 describes the model architecture and training objective of the latent intent module.Finally, Section 3.3 summarizes how the module is incorporated into a sequential recommender model.

Probabilistic Model
We start with a directed probabilistic graphical model (PGM) or Bayesian network to factorize the joint distribution of the random variables of interest.The high-level structural assumption of the model is that past user behavior and context information before the user request indicates a user's current intent, which is further manifested in future user behavior.Figure 1 provides a graphical representation of the model.Following the convention of visual representations of probabilistic graphical models, we use shaded nodes to denote observed variables and transparent nodes for latent ones.The nodes in the graph represent the following variables: • Node : past user behavior (e.g., number of clicks/searches in the past 15 minutes) and context (e.g., time of day, device); • Node : future user behavior (e.g., number of clicks/searches in the next 15 minutes); • Node : latent variable (i.e., user intent).
Note that  and  are assumed to be conditionally independent given .In other words, the link between past user behavior and context  and future behavior  are mediated by the latent intent .

Inference with Variational Autoencoders
From a probabilistic modeling perspective, the goal of the latent intent model is to capture the conditional distribution  (|), where the link between  and  is mediated by the latent variable .Latent variable models are known to be difficult to learn and infer from, due to intractable posterior distributions.In order to make learning and inference tractable, we adopt a variational inference algorithm, i.e., the conditional VAE [19,33] to efficiently scale to large datasets.The high-level idea is to introduce a variational distribution (|, ) that approximates the true posterior distribution and maximizes a lower bound of the log-likelihood: log  (|).It casts inference as an optimization problem.We refer interested readers to Kingma and Welling [20] and references therein for further details about variational inference and VAE.
The model is composed of the following networks/distributions: • Encoder/variational distribution: (|, ) ∼ N (  , Σ  ).All distributions are parameterized as multivariate Gaussian with diagonal covariance matrices; the mean and log-variance are the output of multi-layer perceptrons (MLP), with ReLU as the activation function.
The parameters of the networks are trained to maximize the evidence lower bound (ELBO): where  KL (•∥•) denotes the Kullback-Leibler (KL) divergence.It can be shown that the ELBO is a lower bound of the conditional likelihood: L ELBO ≤ log  (|), which is the quantity we are interested in maximizing [33].
The first term of the ELBO in Equation 1 can be regarded as a reconstruction loss.Given a pair of past user behavior and context  and future user behavior , if we pass it through the encoder and sample a latent intent  from (|, ), then  should contain relevant information about  and the model will be able to truthfully reconstruct .The dimensionality of  is often chosen to be smaller than that of  so that it can act as an information bottleneck [2].The second term of the ELBO is a regularization loss; it encourages the approximate posterior distribution (|, ) to not deviate too much from the prior distribution  (|).
Figure 2 shows a diagram of the latent intent module.Note that one adaptation from the original conditional VAE model is that the decoder does not take  as input, i.e.,  (|) instead of  (|, ).This is to be consistent with the probabilistic graphical model defined in Figure 1; that is,  is conditionally independent of  given .
Training Stability.In practice, we observed training instability issues of the latent intent module, due to an exploding KL-divergence

Incorporation into Sequential Recommenders
The latent intent model introduced in Section 3.2 is a standalone module with its own training objective.In this section, we describe how it is incorporated into the main recommendation model, in particular, a REINFORCE sequential recommendation model [7].Note that the proposed technique can be easily applied to other types of sequential recommendation models, for example, self-attentive ones [18].We briefly describe components of the recommendation model that are pertinent to this work and refer interested readers to Chen et al. [7] for more details.The sequential recommendation model uses an RNN to summarize a user's interaction history on the platform.The final RNN hidden state is concatenated with other context information and passed through the post-fusion layers (an MLP with ReLU activations), the output of which is used as the user representation.A softmax policy is defined on top of that, together with item representations.The model is trained using the REINFORCE algorithm [41] to optimize long-term user satisfaction; we denote the loss function by L rec .
Among the three components of the latent intent model (prior, encoder, and decoder), we only make the prior network directly interact with the main recommender model.Recall that the prior network  (|) infers user intent  given past user behavior and context .Both at training and serving time, we pass  through the prior network and draw a sample  from  (|).In order to further decouple the latent intent module from the main recommendation model, we apply a stop gradient operation to .Then the sampled user intent  is concatenated with the final RNN hidden state and then passed to the post-fusion layers.The output is the user representation augmented with the inferred user intent, which further conditions the recommendation policy.The training objective of the sequential recommendation model L rec remains unchanged.The overall loss function1 is L = L rec − L ELBO , where  > 0 is a hyperparameter controlling the relative strength of the ELBO loss.The overall model architecture is illustrated in Figure 3.

LATENT SPACE ANALYSIS
One drawback of VAE is the limited interpretability of the latent space.In our use case, the inferred user intent  is a compressed representation of user intent in a continuous latent space.In order to gain more insights into what the latent space encodes, we adopt an analysis technique by Chung et al. [10].
Recall that the KL-divergence term in Equation 1 is computed between the approximate posterior (|, ) and prior distributions  (|); in other words, it can be seen as the difference in user intent with and without the knowledge of future user behavior.Therefore, it can be regarded as a measure of surprise.In fact, the KL-divergence between posterior and prior distributions has long been used in neuroscience to quantify surprise [12,14].
Intuitively, when a user consumes an item that is "new", the amount of surprise and KL-divergence should be higher than that of an "old" item.Therefore, we can compare the KL-divergence between new and old items consumed by users to corroborate the statement above.Two definitions of new items are considered: (1) item-level: an item that has not been consumed by the user before; (2) topic-level: an item that belongs to a topic cluster2 that has not been consumed by the user before.We also consider scenarios when there are sudden changes in user search behavior, and the KL-divergence should be higher when such changes occur.In particular, we focus on the number of search queries a user issued/issues in the past 15 minutes  past and in the next 15 minutes  future .We say there is a change in user search behavior if a user searched in the past but not in the future ( past > 0 and  future = 0) or vice versa ( past = 0 and  future > 0); otherwise, user search behavior is deemed unchanged.
Figure 4 shows the average KL-divergence against the training steps.Figures 4a and 4b compare "new" and "old" items at the item-and topic-levels, respectively.When a user consumes a new item or topic, the KL-divergence is higher.Figure 4c shows the KL-divergence when user search behavior is changed/unchanged.When there is a change in user search behavior, the KL-divergence is higher.These results indicate that the latent space is able to capture salient information and detect transitions in user behavior.We conduct A/B experiments in a live system serving billions of users to measure the benefits of the proposed latent intent modeling technique.The sequential recommender system [7] is built to same user consecutively; 2) performing matrix factorization to generate an embedding for each item; 3) using -means to cluster the learned embeddings into 10,000 clusters; 4) assigning the nearest cluster to each item.retrieve hundreds of candidates from a corpus of millions of items upon each user request.The retrieved candidates, along with those returned by other sources, are scored and ranked by a separate ranking system before showing the top results to the user.

LIVE EXPERIMENTS
Experiments are run for three weeks, during which both the control and experiment models are trained continuously with new interactions and feedback being used as training data.We focus on the following two metrics: (1) users' overall enjoyment of the platform; (2) diversity of user-item interactions, which represents the number of unique topic clusters users have interacted with.It has been shown that consumption diversity is an effective surrogate for long-term user experience [40].
The experiment and control models are the sequential recommender models with and without the latent intent module.Figure 5 summarizes the live experiment results.On the x-axis is the date, and on the y-axis is the relative difference of a metric in percentage between the experiment and control.We report the mean and 95% confidence intervals of the metrics.Relative to the control, the experiment model improves the overall enjoyment by +0.07%with a 95% confidence interval of (+0.02%, +0.11%).Diversity of user-item interactions improves by +0.10% with a 95% confidence interval of (+0.08%, +0.13%).Furthermore, there is an upward trend in the overall enjoyment metric, suggesting a user learning effect, i.e., user states change in response to the recommendation policy.
The proposed model has been deployed to the production system for more than two weeks.The way it is deployed is described at the beginning of this section.

CONCLUSION
In this work, we propose the latent user intent model, a probabilistic modeling approach to capturing user intent and complementing existing sequential recommenders.The variational inference technique we adopt is efficient and scalable, and techniques to stabilize and interpret the model are also studied.Finally, the effectiveness of the proposed method is validated in large-scale live experiments.One future research direction is to further improve the interpretability of latent user intent by designing a discrete latent space, using Gumbel softmax [15,29] or vector quantized variational autoencoder (VQVAE) [37].

Figure 1 :
Figure 1: Probabilistic graphical model.The connection from past user behavior and context  to future user behavior  is mediated by the user intent .Note that  and  are observed variables whereas  is latent.

Figure 2 :
Figure 2: The latent intent module consists of three networks: prior, decoder; and encoder.The loss can be written as two terms: reconstruction loss and regularization loss.

Figure 3 :
Figure 3: The overall architecture incorporating the latent intent module to the sequential recommendation model.Only the prior network is used and a stop gradient is applied to further decouple them.

Figure 4 :
Figure 4: Latent space analysis.The KL-divergence between the approximate posterior (|, ) and prior  (|) is plotted against the training steps.The two curves correspond to new and old items at both item-and topic-levels in (a) and (b), and changed/unchanged search behavior in (c).

Figure 5 :
Figure 5: Live experiment results.On the x-axis is the date; on the y-axis is the relative difference in percentage between the experiment and control.

•
Apply variational inference techniques for efficient and scalable inference.• Shed light on training stability of probabilistic models for industrial use cases.• Conduct an analysis to gain insights into the semantics of the latent space.• Demonstrate the benefits of the proposed technique in largescale live experiments on a commercial recommendation platform serving billions of users and millions of items.