Target-constrained Bidirectional Planning for Generation of Target-oriented Proactive Dialogue

Target-oriented proactive dialogue systems aim to lead conversations from a dialogue context toward a pre-determined target, such as making recommendations on designated items or introducing new specific topics. To this end, it is critical for such dialogue systems to plan reasonable actions to drive the conversation proactively, and meanwhile, to plan appropriate topics to move the conversation forward to the target topic smoothly. In this work, we mainly focus on effective dialogue planning for target-oriented dialogue generation. Inspired by decision-making theories in cognitive science, we propose a novel target-constrained bidirectional planning (TRIP) approach, which plans an appropriate dialogue path by looking ahead and looking back. By formulating the planning as a generation task, our TRIP bidirectionally generates a dialogue path consisting of a sequence ofpairs using two Transformer decoders. They are expected to supervise each other and converge on consistent actions and topics by minimizing the decision gap and contrastive generation of targets. Moreover, we propose a target-constrained decoding algorithm with a bidirectional agreement to better control the planning process. Subsequently, we adopt the planned dialogue paths to guide dialogue generation in a pipeline manner, where we explore two variants: prompt-based generation and plan-controlled generation. Extensive experiments are conducted on two challenging dialogue datasets, which are re-purposed for exploring target-oriented dialogue. Our automatic and human evaluations demonstrate that the proposed methods significantly outperform various baseline models.


INTRODUCTION
Human-machine dialogue systems have made significant progress in chatting with users for entertainment, e.g., open-domain dialogues [14,58,70], and assisting users in accomplishing specific tasks, e.g., task-oriented dialogues [33,57,61].Despite passively responding to users, dialogue systems can also take a more proactive role [2,62] to introduce new interesting topics to users.Such a target-oriented proactive dialogue system looks more intelligent, sociable, and capable of directing the users towards topic areas that the system knows how to talk about [8].
Fig. 1.An illustrative example from the re-purposed DuRecDial [30] dataset.Given a pre-determined target and a dialogue context, our objective is to generate utterances that proactively and smoothly lead the conversation to achieve the target.
However, previous studies [8,42,48,62] mainly focus on the scenario of open-domain dialogues.They define the target as a commonsense topic and explore bridging an initial dialogue context and the given topic.Such a scenario is difficult to be generalized to real-world applications.
In this work, we take a further step towards a more challenging target-oriented dialogue scenario, where the target is defined as an <action, topic> pair, such as providing recommendations for a specific topic that possibly attracts users.It requires the system to take more engaging actions to achieve the target, such as social chitchat, user exploration, topic elicitation, recommendation, etc.As an example shown in Figure 1, suppose there is an explicit target, i.e., to recommend a specific movie named "Dearest", the system (i.e., Bot) is required to lead the conversation (e.g., "greeting" → "ask user" → "chat about the star" → "movie recommendation") so as to recommend the target movie when appropriate.It needs to consider the pre-determined target, dialogue history, and grounding domain knowledge (and user profile, if any).Particularly, the grounding domain knowledge associated with domain-specific topics and relevant attributes, is crucial to enable multiple topic transitions (e.g., warm-up chitchat → "Get in, and Go" → "Bo Huang" → "Dearest").It is non-trivial to solve target-oriented dialogue generation for two reasons: (1) The system needs to keep the conversation engaging and proactively drive the conversation; (2) The system is desired to move the conversation forward to the target topic coherently and arouse the user's interest in the target topic to be recommended.
To address the above challenges, we observe that effective dialogue planning [55,56] is essential for target-oriented dialogue generation.In order to achieve its target, the system needs to plan reasonable actions and appropriate topics to smoothly move the conversation forward to the target topic before generating each system utterance 1 .According to decision-making theories [15,51] in cognitive science, humans tend to look ahead (forward) and look back (backward) when making decisions to achieve a long-term goal.Such bidirectional thinking alleviates short-sighted cognition and drives people to think about the complete decision path more.Similarly, in target-oriented proactive dialogue, the target has been designated in advance and should be bounded at the end of the dialogue path to be planned, backward path planning is effective in leveraging target-side information but insensitive to the coherence of the dialogue context.In contrast, forward path planning is more effective in generating a starting path point that is coherent with the dialogue context, while lacking the target-driven ability to enable the target to be bounded at the end of the dialogue path to be planned.With this in mind, we propose a TaRget-constrained bIdirectional Planning (TRIP) method.The key point is to plan dialogue paths from both look-ahead (i.e., present-to-target) and look-back (i.e., target-to-present) directions.Generally, it is more appropriate when the look-ahead decision path is consistent with the look-back decision path.By formulating the planning as a generation task, our TRIP bidirectionally generates dialogue paths consisting of a sequence of <action, topic> pairs (see Figure 1) based on an encoder-decoder architecture.Concretely, we first take widely-used pre-trained language models, e.g., BERT [4], to encode complex input texts efficiently.Then, we employ two individual Transformer [54] decoders for dialogue path generation, with one to generate a dialogue path in the target-to-present direction and the other to generate one in the present-to-target direction.By minimizing the decision gap between the two directions, the two decoders are expected to provide supervision to each other and converge on a consistent dialogue path.In addition, we propose a contrastive generation mechanism (see Section 4.2) to enhance TRIP with the ability to better distinguish between the given target and non-targets.It enables TRIP to be more robust in generating the necessary target in the planned dialogue path accordingly.During inference, we propose a target-constrained decoding algorithm (see Section 4.3) with a bidirectional agreement, which reduces the gap between inference and training and facilitates the model to generate an appropriate dialogue path as the ultimate output.
Since each planned dialogue path outlines how to achieve the pre-determined target step by step, it is expected to help a dialogue system distill necessary knowledge and steer the system to generate more proper utterances with control.We adopt the planned dialogue path to guide dialogue generation in a pipeline manner, where we explore two variants: prompt-based generation (see Section 5.1) and plan-controlled generation (see Section 5.2).As part of this work, we repurpose two existing recommendation-oriented dialogue datasets, namely DuRecDial [30] and DuRecDial 2.0 [29], for target-oriented dialogue generation through automatic target construction.Extensive experiments are conducted to show the effect of planning and the performance of different dialogue generation methods.Both automatic and human evaluations demonstrate that our proposed methods significantly outperform various baseline models.
Overall, our contributions are summarized as follows: • We introduce the target-oriented dialogue generation task and discuss its relation and difference (Section 2 and 3) compared to existing works.• We propose a novel target-constrained bidirectional planning (TRIP) approach (Section 4) for target-oriented proactive dialogue systems.Our TRIP plans a dialogue path consisting of a sequence of <action, topic> pairs that outline how to achieve the designated target proactively and smoothly.• We investigate both the prompt-based and plan-controlled methods (Section 5) to leverage planned dialogue paths to guide dialogue generation effectively.• Experimental results show that our method achieves state-of-the-art performance in both automatic and human evaluations.Our extensive analysis provides some new insights into how planning affects target-oriented dialogue generation.

RELATED WORK
Our work is mainly related to dialogue systems and content planning.We briefly review related work and clarify key differences compared with our work as follows.

Target-oriented Dialogue
Target-oriented dialogue systems work on the task of generating responses guided by the given target.According to the variety of the target, previous works have mainly focused on using a keyword [40,48,72], a topic [42,62], and a concept or a sentence [8] as the guided target.For example, [48] introduced some coarse-grained keywords to control the intended content of the responses in open-domain dialogues, while [72] leveraged external commonsense knowledge graphs for keyword transitions.As a follow-up study, steering a dialogue towards a given keyword, or dialogue strategy learning, has also been explored in past work, including graph grounded policies [64,65] and conversational lines [6].For topic-guided dialogues, [62] investigated using an entity over a factual knowledge graph as the target topic, which requires the system to achieve a smooth transition from an initial topic to the given target topic.A new dataset called OTTers [42] was collected to explore one-turn topic transitions for open-domain response generation.More recently, [8] proposed to identify a bridging path of commonsense knowledge concepts between the dialogue context and the target sentence using data augmentation.Our work is more related to prior settings [26] on target topics and target sentences.However, existing works mainly focus on the scenario of open-domain target-guided dialogue, where they mainly consider guiding chitchat conversations to the target with transitions on commonsense topics.In comparison, we work on a more challenging setup that aims to achieve the target action for a designated target topic.It requires the system to take more engaging dialogue actions, such as social chitchat, user exploration, topic elicitation, and recommendation, to attract users so as to complete the target.We also clarify that existing studies on goal-oriented dialogue [7,45] focus on the user-side goal or task, while our work explores the system-side target (or a specific goal).

Recommendation-oriented Dialogue
As a special type of task-oriented dialogue system, a recommendation-oriented dialogue system is desired to make recommendations through natural conversations with users.It was the emergence of various recommendation-oriented dialogue datasets that helps push forward the research in this area, such as GoRecDial [16], TG-ReDial [74], INSPIRED [9], and DuRecDial [30].As followup studies, CR-Walker [31] was proposed to perform tree-structured reasoning over knowledge graphs, which can then be mapped into hierarchical dialogue acts to guide both item and response generations.MGCG [30] and KERS [68] explored the transition policy from a non-recommendation dialogue to a recommendation-oriented one.There is another similar research area called conversational recommender systems (CRS) [23,27,47].Compared with recommendation-oriented dialogue systems, the main task of CRS lies in discovering user preferences [63,66], asking clarifying questions about item attributes [20,24], and searching for optimal candidate items [25,69,73].In addition, [3] unified item recommendation and response generation into the same sequence-tosequence (Seq2Seq) paradigm using prompt-based learning.Nonetheless, most existing systems passively respond to a user, where they provide recommendations according to the user's expressed interests or requirements.Our work aims to endow a dialogue system with a more proactive role that can attract the user's interests and naturally lead user-engaged dialogues to achieve a pre-determined target.

Content Planning for Language Generation
There is a line of work [12,34,39,46] that separates natural language generation into content planning and surface realization.Content planning mainly focuses on selecting the key contents (e.g., key phrases and entities) and arranging their orderings [34,39], followed by a neural generation stage that focuses only on realization.Different strategies have been explored for content planning.
For example, [44] proposed a hierarchical variational model for planning-based data-to-text generation, where a global latent variable models the diversity of planning and a sequence of local latent variables controls sentence realization.[13] presented a planning framework with iterative refinement to leverage large pre-trained language models for argument generation and article writing.For long-form text generation tasks, several studies [10,11] conducted dynamic content planning while generating the output based on mixed language models to bridge the gap between content planning and sentence realization.Compared to these prior studies, our work is more related to planning for dialogue generation [55,67].We aim to address a more challenging dialogue generation task, where we propose a novel target-constrained bidirectional planning method to guide pre-trained language models to generate dialogue utterances more effectively.

PRELIMINARIES
In this section, we aim to provide preliminaries about the problem formulation and introduce essential sub-tasks accordingly.denotes dialogue content with a total number of  turns.P  = {( , ,  , )}  =1 denotes an annotated dialogue path for -th dialogue, each path span specifies an action-topic pair (a dialogue action  , and a dialogue topic  , ). is the number of unique action-topic pairs.Here, the dialogue topics are mainly constructed upon the domain knowledge K  , and each action/topic may affect multiple turns of dialogue.In some scenarios, there also exists a user profile U  grounded on the -th dialogue, which can be personal attributes or certain preferences.
Given a target G ′ = (  ′ ,   ′ ) consisting of a target action   ′ and a target topic   ′ , a dialogue history H ′ , and a set of relevant domain knowledge K ′ (and a user profile U ′ , if any), our objective is to generate coherent utterances to engage the user in the conversation so as to achieve the target G ′ when appropriate.Due to the complexity of the problem, it can be decomposed into three sub-tasks: (1) action planning, i.e., plan actions to determine where the conversation should go to lead the conversation proactively; (2) topic planning, i.e., plan appropriate topics to move forward to the target topic smoothly; (3) dialogue generation, i.e., generate an appropriate utterance to achieve the planned action and topic at each turn.To address the above tasks, we propose a target-constrained bidirectional planning method to guide dialogue generation in a pipeline manner.The target-constrained bidirectional planning aims to simultaneously solve sub-tasks of action planning and topic planning, which plans a reasonable dialogue path consisting of a sequence of dialogue actions and topics with proper orderings.At each turn, the planned path drives the system to distill necessary knowledge from the grounding domain knowledge and meanwhile guides the system to generate a proper utterance.We describe the details of the target-constrained bidirectional planning in Section 4 and plan-guided dialogue generation in Section 5, respectively.

TARGET-CONSTRAINED BIDIRECTIONAL PLANNING
In this section, we propose a TaRget-constrained bIdirectional Planning (TRIP) model to facilitate the system lead the conversation to achieve the pre-determined target.The overview of our TRIP is shown in Figure 2. Our TRIP is built with an encoder-decoder architecture, where we adopt two encoders to represent complex input texts and two individual decoders (i.e., a backward decoder and a forward decoder) to complete bidirectional planning.

Input Encoding
To efficiently represent various types of input, we take the widely-used pre-trained language model BERT [4] as our basic encoder.As shown in Figure 2, we concatenate the domain knowledge K and dialogue history H (and the user profile, if any) as the context.We separate them with a special token [SEP], which is consistent with the processing in BERT.Then, the context sequence is encoded using a BERT encoder, denoted as   .For the given target consisting of a target action   ′ and a target topic   ′ , we refer to the concatenated text of   ′ and   ′ as the target T .We adopt two new special tokens [A] and [T] to differentiate   ′ and   ′ , e.g., "[A] Movie Recommendation [T] Dearest".Then, the target T is encoded using another BERT encoder   .Briefly, the encoding of input is formulated as follows: where

Backward-Forward Path Generation
Our TRIP aims to plan a reasonable dialogue path consisting of a set of dialogue actions and topics with proper orderings, and this planning process performs in a generation-based manner.We let TRIP generate a forward (present-to-target) path and a backward (target-to-present) path, respectively.It should be noted that the target action   ′ and target topic   ′ are bounded at the end of the path to be planned.For example, at -th turn, a forward dialogue path is " Planning a dialogue path from two opposite directions provides supervision to each other during training, and is expected to derive more reasonable dialogue action-topic pairs that compose the ultimate dialogue path, imitating humankind's bidirectional thinking.
In detail, our TRIP generates the two paths based on the Transformer [54] decoder architecture.We adopt two individual Transformer decoders   and   to generate the backward and forward paths, respectively.Both two decoders take the encoded hidden representation M as input and then output a dialogue path token by token, i.e., " , in an autoregressive manner.Here,   denotes an action token,   denotes a topic token, [A] and [T] are two special tokens shared with the encoder   to differentiate an action and a topic, [EOS] denotes the end of the path sequence.For the backward decoder   , suppose the output dialogue path y is represented in token level, i.e., y = ( 1 ,  2 , • • • ,   ) with a sequence length of  , and it is conditioned on the input text sequence (denoted as x), the conditional distribution is approximated as follows: where W ∈ R  × , b ∈ R  denote trainable parameters.We train the backward decoder   by minimizing the negative log likelihood for given  observations {(x ( ) , y ( ) )}  =1 as follows: where  (y ( ) ) is the distribution of the ground-truth path sequence, while   ( ŷ() ) is the distribution of the approximated output path sequence,  denotes all trainable parameters.Similarly, we train the forward decoder   following the above equations, with the loss function denoted as L   .Reducing Gap between Backward-Forward Paths.Although the backward and forward paths are different, agreement on the dialogue actions and topics derived from the two paths is necessary since the two paths are planned for the same dialogue.By minimizing the decision gap between the backward path and the forward path, the two decoders (i.e.,   and   ) are expected to provide supervision to each other and converge on consistent dialogue actions and topics.In detail, we adopt the composition of a linear transformation with the ReLU [35] activation function and an average pooling to obtain the fixed-sized representation of a path, given by: h where trainable parameters, h  stands for decoder hidden state.h ∈ R  and h ∈ R  are fixed-sized representation of the backward path and the forward path, respectively.Then, we reduce the gap between the two paths by minimizing  2 distance between h and h as follows: where the distance L  is added to the training loss as a regularization term.
Contrastive Generation of Targets.Since our dialogue path generation model is trained with teacher forcing and never exposed to incorrectly generated actions or topics during training, it is insufficient to distinguish between the given target action/topic and other actions or actions.Hence, the model may struggle to constrain the given target generated in the path.To remedy such a situation, we propose a contrastive generation framework (see Figure 2) to expose the model to various incorrect output targets for a given input target T .Following the contrastive learning framework [19] for conditional text generation, we train the model to learn the representations of the ground-truth dialogue path by contrasting the positives with the negatives.The critical difference is that, we construct the perturbed negative examples by replacing the target topic in the ground-truth path with multiple randomly sampled topics {  }  =1 (  ≠   ′ ) from the domain knowledge K, such that the training paths are difficult for the model to distinguish correctly.By identifying which features make the output path negative, these perturbed negative examples are expected to leverage encoders and decoders to learn an adequate representation of the target.It tries to enable our model to generate the necessary target in the path accordingly.
In detail, for the text span consisting of the target action   ′ and topic   ′ separated with the special tokens [A] and [T] in the two decoders, we project their hidden representations into the latent space following Eq.( 6) and Eq. ( 7), obtaining fixed-sized target representations h  ′ and h  ′ respectively.Similarly, for the constructed negative examples, we also project those negative targets into the latent space following Eq.( 6) and Eq. ( 7), obtaining corresponding neg-target representations.Since the pre-determined target T is encoded by the encoder   , we pull the groundtruth target representations in the decoders to the encoded target representation while pushing the neg-target representations in the decoders far away from the encoded target representation (see Figure 2).Then, we maximize the similarity between the pair of the encoder-decoder targets, while minimizing the similarity between the negative pairs as follows: where h denotes the averaged representation of the target T in the encoder   after transformation following Eq.( 6) and Eq. ( 7), h  ′ and h  ′ are the ground-truth target representations in the two decoders, respectively.  and   stand for a set of neg-target representations in the two decoders, respectively.sim(•, •) is a cosine similarity function,  is a temperature coefficient.Furthermore, we use the averaged result between L   and L   as the contrastive generation loss: Training.
During training, we train our TRIP model by minimizing all the losses introduced above.We use two hyperparameters  and  to control the importance of gap reducing and contrastive generation, given by:

Target-constrained Decoding
After training is done, our TRIP model can be directly used to generate a dialogue path consisting of a set of dialogue actions and topics during inference.Alternatively, we can either use the forward decoder   to generate a path from the present to the target (denoted as "forward generation"), or use the backward decoder   to generate a path from the target to the present (denoted as "backward generation").In order to take advantage of the bidirectional decoders, we propose a simple yet effective target-constrained decoding algorithm with a bidirectional agreement based on the widely-used beam search decoding algorithm.First, each dialogue path is desired to be generated with lexical constraints, i.e., the target action and the target topic should be generated at the end of the path for "forward generation" while generated at the beginning of the path for "backward generation".To this end, we adopt two additional strategies to fulfill the lexical constraints.For the forward decoder   , we employ the dynamic beam allocation (DBA) [38] algorithm with a beam size of  to perform lexically constrained decoding, where the required constraint is defined as the given target action and topic.For the backward decoder   , we directly take the target tokens (i.e., a text span consisting of the Second, after the two decoders finish the search process, we obtain  backward candidates (i.e., path sequences) and  forward candidates.As shown in Figure 3, to select the best one path sequence as the decoding output, we rank the backward candidates by the following scoring function: where  denotes a set of backward candidates,    ( ( )  ) denotes the likelihood of the candidate  ( )   .L  (•, •) denotes  2 distance between a backward candidate and a forward candidate, which is obtained by passing each pair of backward-forward candidates into the model and computed following Eq.( 8).Intuitively, the above scoring function ranks the backward candidates by likelihood and gives a partial reward to candidates that satisfy higher agreement (i.e., shorter distance) with the forward candidates, which reduces the gap between inference and training and facilitates the model to select a better one.Here,  is a hyperparameter controlling the weight of the reward term.Note that we can also select the best one path sequence from the forward candidates using a similar scoring function, which performs a little inferior in most cases in our preliminary experiments.Therefore, by default, we select the best dialogue path sequence from the backward candidates as the ultimate planning output using Eq. ( 13).

PLAN-GUIDED DIALOGUE GENERATION
As mentioned in the preliminaries, we adopt the planned dialogue path (denoted as "plan path" P for short) to guide dialogue generation in a pipeline manner.We expect these plan paths can help a dialogue system distill necessary knowledge and steer the system to generate more proper utterances with control.To achieve plan-guided dialogue generation, we devise two variants and describe them below.

Prompt-based Generation
Motivated by previous works that employ prompt-based learning for dialogue systems [32,71], we regard each plan path P as the natural language prompt and then adopt a pre-trained generative language model (LM), e.g., GPT-2 [41], for dialogue generation.The overview of our prompt-based dialogue generation is shown in Figure 4. Formally, the plan path P is concatenated into the given dialogue history H and domain knowledge K (and the user profile, if any), formulating the input context  as follows: where ";" denotes concatenation.Here, the plan path P provides essential information that outlines how to achieve the target step by step.With the power of pre-trained LMs, the plan path P aims to distill necessary knowledge from both input context and LMs.In particular, the input context  is fed into the pre-trained GPT-2 [41] model to generate the system utterance  = {  }   =1 , where   is given by: We fine-tune GPT-2 for a few epochs using ground-truth plan paths in the dataset during training, while we adopt the plan paths generated by our TRIP model during inference.

Plan-controlled Generation
Considering that the plan path P outlines how to achieve the target step by step with a sequence of dialogue actions and topics, we expect to better leverage such critical information to control the attribute (e.g., switching or target topics) of dialogue generation.Inspired by plug-and-play language models for controllable language generation [1], we propose a plan-controlled dialogue generation method (see Figure 5).Built upon the pre-trained LM  (), e.g., GPT-2, we employ a simple plan model  (|) to act as the attribute controller, which guides the generation of the LM  () through gradients.Considering that the generation of system utterances follows the conditional form of  (|) ∝  () •  (|), we shift the hidden states of generation in the direction of the sum of two gradients: one toward higher log-likelihood of the unmodified LM  () and one toward higher log-likelihood of the attribute  under the conditional plan model  (|).Combining the two factors together makes it controllable to guide dialogue generation in a given direction (i.e., the plan path P) with specified strength.

LM p(y)
Step 2: Backward pass with gradients (satisfaction of plan path) Step 1: Forward pass Concretely, as shown in Figure 5, we take the concatenated context  following Eq.( 14) as input and employ the pre-trained GPT-2 (denoted as LM  ()) for dialogue generation.Let us define the cached hidden representations H  of the LM  () as all key-value pairs from the past, i.e., H  = [(K (1)   , ) where W is a linear transformation that maps the hidden vector o  +1 to a vector of vocabulary size.On top of that, we build a simple plan model (denoted as  (|)) using a Transformer [54] decoder.The plan model  (|) aims to re-generate the given plan path P conditioning on hidden vectors {o 0 , o 1 , • • • , o  } of the LM  () across all time-steps from 0 to .Here, the plan model  (|) performs as a generative discriminator that gives the LM  () a higher reward for having the desired generation direction, i.e., the plan path P.During training, we jointly train the plan model  (|) and fine-tune the LM  () by maximizing log-likelihood.
During inference, we use the plan model  (|) to control the output of the LM  () at every generation step , following [1].As shown in Figure 5, a forward pass is performed first through the LM  () to compute the unmodified likelihood.In step 2, a backward pass updates the cached key-value pairs H  with gradients from the plan model  (|).Let ΔH  be the update to H  , such that the generation with (H  + ΔH  ) shifts the distribution of the generated utterance so that it is more likely to satisfy the plan path P. ΔH  is initialized at zero and updated as follows: where  is the step size.This updating step can be repeated multiple times while in practice we update once for computational efficiency.Subsequently, we use the updated key-value pairs to recompute the perturbed hidden vector o  +1 , given by: , where The perturbed o  +1 is then used to generate the next token   +1 following Eq.( 17).
6 EXPERIMENTAL SETUP

Datasets and Processing
Datasets.The task of target-oriented dialogue generation is still relatively under-explored.Although many publicly available dialogue datasets exist, we find the DuRecDial [30] and DuRecDial 2.0 [29] are the most suitable datasets for this task to the best of our knowledge.The system often leads the dialogue proactively instead of passively responding to users in the DuRecDial and DuRecDial 2.0 datasets, with rich interactive actions such as chitchat, question answering, recommendation, etc.We first briefly introduce the two datasets and then introduce how we re-purpose the datasets for target-oriented dialogue generation.
The original DuRecDial and DuRecDial 2.0 datasets were collected from crowdsourced humanto-human dialogues.One person was defined as the seeker (the user's role) and the other as the recommender (the system's role) in a dialogue.The recommender was required to proactively lead the dialogue and make recommendations by introducing new topics.Each seeker was equipped with a user profile containing user attributes (e.g., name, age range) and his/her past preference information.In order to perform smooth conversations with the seeker, the recommender has a domain knowledge graph consisting of domain-specific topics (e.g., movies, music, and food) with related attributes.More importantly, a dialogue path composed of dialogue actions and topics was annotated for the recommender (or the system) from the beginning to the end of the dialogue.The original DuRecDial dataset contains about 10k multi-turn Chinese dialogues and 156k utterances, while the DuRecDial 2.0 dataset has 8.2k dialogues aligned across English and Chinese languages.Data Processing.Since no explicit targets are annotated in the original DuRecDial and DuRec-Dial 2.0 datasets, we re-purpose the two datasets through automatic target construction for targetoriented dialogue generation, following [55].For all those dialogues that are proactively led by the system, we treat the topic that the user has accepted at the end of each dialogue as the target topic, and view the system's corresponding action (e.g., movie recommendation, point-of-interest recommendation, etc.) as the target action.We filter out those dialogues without introducing any new recommendation topic.In addition, we discard all user reviews in the original domain knowledge triples because user reviews do not belong to domain knowledge.We further enrich existing grounding domain knowledge triples corresponding to each dialogue with more knowledge triples sampled from the triples within two hops of the target topics in the dataset.Hence, it is more challenging for knowledge selection and topic planning.Note that each target topic is guaranteed to ground on the domain knowledge triples corresponding to the dialogue.Statistics of all the system's dialogue actions on the re-purposed DuRecDial and DuRecDial 2.0 datasets are shown in Figure 6.The total numbers of topics are 640 (including a NULL topic) and 628 (including a NULL topic) in the DuRecDial and DuRecDial 2.0, respectively.
Following the splitting criterion in [30,55], we split the re-purposed DuRecDial dataset into the train/dev/test sets with 4,440/633/1,266 dialogues, respectively.Similarly, we obtain the train/dev/test sets of the DuRecDial 2.0 dataset with 4,256/608/1,216 dialogues, respectively.To investigate the performance of different methods for target-oriented dialogue generation, we further use the processed datasets with two types of splits for the test set: 1) In-Domain (ID) split and 2) Out-Of-Domain (OOD) split, similar to [8,42].The OOD split ensures that none of the target topics in the test set are present in the train set.In the ID split, the target topics in the test set are allowed to appear in the train set.In total, statistics of the two re-purposed datasets are reported in Table 1.We can observe an average of 4.3 ∼ 4.8 action-topic transitions (i.e., the average length of the plan path) from the beginning toward the target.

Baseline Methods
Dialogue Generation.To validate the effectiveness of our proposed two variants for targetoriented dialogue generation, we first compare them with the following dialogue generation methods based on pre-trained language models: • DialoGPT [70]: It is an autoregressive generation model pre-trained using large-scale dialogue corpora for conversational response generation.We adopt the pre-trained model2 for finetuning the dataset in English.For fine-tuning the dataset in Chinese, we adopt the Chinese version [59] pre-trained model3 .• GPT-2 [41]: It is a pre-trained autoregressive generation model for language generation.We use the publicly available GPT-2 base4 model and Chinese GPT-2 base5 model for fine-tuning the English and Chinese datasets, respectively.• BART [21]: It is an encoder-decoder pre-trained language model with denoising for natural language generation.We use the publicly available BART-base6 model and Chinese BARTbase 7 model for fine-tuning the English and Chinese datasets, respectively.Note that these models concatenate all parts of input texts described in the problem definition as the model input and are fine-tuned to generate system utterances directly.
We also compare our plan-guided dialogue generation methods with several competitive models that are closely related to target-oriented dialogue generation, where they follow the predict-thengenerate paradigm or planning-enhanced generation paradigm: • MGCG_G [30]: It employs the predicted next dialogue action and next topic to guide system utterance generation.Following our problem setting, we re-run the officially released code8 on the two re-purposed datasets.
• KERS [68]: It has a knowledge-enhanced mechanism for recommendation dialogue generation built upon Transformer [54] architectures.Similarly, we re-run the officially released code 9on the two re-purposed datasets.• TCP-Dial [55]: It proposes a target-driven conversation planning method to explicitly extract necessary knowledge and then guides dialogue generation built upon various backbone models.We adopt the GPT-2 [41] as the backbone model for comparisons in this work, and re-run the officially released code10 on the two re-purposed datasets.
Dialogue Planning.To explore the performance of planning for target-oriented dialogue systems, we compare our TRIP model with the following dialogue planning methods: • MGCG [30]: It employs a convolutional neural network [17] to conduct multi-task predictions for the next dialogue action and the next topic.However, it assumes that ground-truth historical dialogue actions and topics are known for a system.In this work, we only provide the target (i.e., a target action paired with a target topic), while the system itself should plan all interim dialogue actions and topics to achieve the target.For a fair comparison, we take the same input as our problem definition to conduct multi-task predictions.
• KERS [68]: It aims to generate the next dialogue action and the next topic based on a Transformer [54] network.Similarly, we take the same input as our problem definition for KERS.
• BERT [4]: Based on the intuition of multi-task predictions, we fine-tune the widely-used pretrained language model BERT [4] by adding two fully-connected layers to jointly predict the system's next dialogue action and topic.We use the publicly available BERT-base-uncased 11model and the Chinese BERT-base 12 model for fine-tuning the English and Chinese datasets, respectively.
• TCP [55]: It is a target-driven planning framework that aims to plan a path consisting of dialogue actions and topics in a generation-based manner.To the best of our knowledge, TCP is the most related work to ours on dialogue planning for the target-oriented dialogue generation task.

Evaluation Metrics
Automatic Evaluation.Following many previous studies [30,55] in dialogue generation, we adopt widely-used metrics for automatic evaluation as follows: • Perplexity (PPL) and distinct (DIST ) [22]: The perplexity and distinct measure the fluency and the diversity of generated system utterances, respectively.• F1: The F1 score estimates the precision and recall of each generated utterance at the word level (the character level if evaluating Chinese datasets).
• BLEU [37]: The BLEU score calculates -gram overlaps between generated utterances and gold utterances.• Knowledge F1 (Know.F1) [30]: It evaluates the performance of generating correct knowledge (e.g., topics, attributes) from the domain knowledge triples.However, there is no labeled knowledge annotated in gold system utterances in the datasets.We first conduct strict string matching to search for the entities from the domain knowledge that also occur in each gold system utterance as the knowledge label.Since some knowledge entries (object in the triple ⟨subject, relation, object⟩) are in form of long texts (e.g., topic-associated attributes) and they can be paraphrased during conversations, we then compute word-based recall scores between knowledge entries and gold system utterances.We take the knowledge entries whose recall scores are greater than a threshold of 0.55 as the pseudo label.For evaluating knowledge F1, we take the same threshold (i.e., 0.55) to examine whether a knowledge entry is hit in the generated utterances.• Goal success rate (Goal Succ.):It is essential to validate a model of how well it achieves the pre-determined target, where the target topic can be used for automatic evaluation.Similar to [55], we choose the dialogues at the target turn in the test dataset to compute the ratio of generating the target topic correctly for each model as the goal success rate.To evaluate dialogue planning, we adopt the following metrics: • F1: It estimates the micro-averaged precision and recall of the predicted action or topic.For generation-based models, we take the generated action or topic at the evaluating turn for a fair comparison.We report dialogue action F1 and topic F1 scores in the experimental results, respectively.• Bigram F1 (Bi.F1): Due to the nature of dialogues, multiple temporary planning strategies can be reasonable before completing the target.Following [75], we also expand gold labels by taking the system's actions and topics within the previous turn and the following turn into account, formulating the bigram F1.
Human Evaluation.Similar to [30], we conduct human evaluation from both turn-level and dialogue-level aspects.For turn-level evaluation, we randomly select 50 samples from the test-ID dataset and 50 samples from the test-OOD dataset and ask each model to produce system utterances.Three well-educated annotators are required to mark scores for different models from the aspects of both i) appropriateness and ii) informativeness.The appropriateness measures if a generated system utterance can complete the current plan and respond to the dialogue context appropriately, and the informativeness measures if a model can make full use of the grounding domain knowledge to generate an informative utterance.For fairness, all model names are masked to annotators during the evaluation process.
For dialogue-level evaluation, we let each model interact with human annotators, which indicates that a model's generated utterance in the current turn will be further used as a part of the dialogue history in the next turn.To ensure that the evaluation covers a wide range of targets, we randomly sample 5 different target actions from the test sets, with each action consisting of 10 different target topics.In total, 50 different dialogue targets are evaluated.To examine whether a model can lead the conversation to achieve the pre-determined target proactively and smoothly, we do not expose the target to human annotators during human-model interactions.Besides, human annotators are asked to be consistent with each given user profile, if any.All human-model dialogues are limited to no more than 15 turns.At the end of each dialogue, we will expose the pre-determined target to human annotators and ask each annotator to mark scores for different models from i) proactivity, which measures if a model can proactively lead new actions and topics in the conversation, ii) coherence, which measures the overall fluency and naturalness of the whole dialogue generation, and iii) goal success, which estimates how well the pre-determined target is achieved.
For all the above metrics, human evaluation scores are settled from {0, 1, 2}, where a higher score denotes better performance.The agreement among the annotators is measured by Fleiss's kappa [5].The averaged score of different human annotators is reported as the ultimate human evaluation result for each model.

Implementation Details
Our TRIP model and plan-guided generation methods are implemented by PyTorch.During planning, we adopt the BERT-base model (12 layers, 768 dimensions, 12 heads, and 110M parameters) and the Chinese BERT-base model released in Huggingface's Transformers [60] library as input encoders for the DuRecDial 2.0 dataset and the DuRecDial dataset, respectively.Both the backward and forward decoders are stacked to 6 layers with 8 attention heads.The hidden size is set to be consistent with BERT encoders, i.e., 768.The embeddings of the two decoders are randomly initialized, with the vocabulary size consistent with the BERT encoders.For the contrastive generation of targets, we randomly sample 3 negatives.The temperature coefficient  is set to 0.1.The two hyperparameters  and  are set to 0.1 and 1.0, respectively.We set the batch size to 6 due to memory constraints and train our TRIP model with a maximum of 10 epochs.We adopt the Adam [18] optimizer with an initial learning rate of 2-5 and warm up over the first 3,000 training steps with linear decay.We select the best model based on the performance of the validation set.For the target-constrained decoding, the beam size is set to 3, with a maximum decoding length of 80.The hyperparameter  that controls the weight of the agreement reward is set to 1.0.For our plan-guided dialogue generation, we fine-tune the GPT-2 base model and Chinese GPT-2 base model released in Huggingface's Transformers [60] library on the DuRecDial 2.0 dataset and the DuRecDial dataset, respectively.The length of the concatenated input text is limited to 512.In addition, the plan model  (|) in the plan-controlled generation employs a lightweight Transformer decoder with 3 layers and 8 attention heads.The embeddings of  (|) are copied from the embeddings of the LM  () (i.e., GPT-2).The step size  is set to 0.01.Both variants employ greedy search decoding during generation, with a maximum decoding length of 100.All the experiments are conducted on a single NVIDIA GeForce 3090 GPU machine.Our code and data are available at https://github.com/iwangjian/TRIP.

EXPERIMENTAL RESULTS
Our experiments and detailed analysis aim to answer the following research questions: • RQ1: How is the performance of the proposed planning for generation on the end task of targetoriented dialogue generation compared to existing methods?• RQ2: How is the performance of the proposed TRIP model on each sub-task, including action planning and topic planning, compared to existing methods?• RQ3: How does each proposed component or strategy contribute to the overall performance?• RQ4: What are the merits and limitations of the pipelined approach in this work?

Evaluation Results of Dialogue Generation (RQ1)
Our automatic evaluation results of dialogue generation on the DuRecDial and DuRecDial 2.0 datasets are reported in Table 2 and Table 3, respectively.The best result in terms of the corresponding metric is highlighted in boldface.As shown in Table 2, MGCG_G and KERS are capable of obtaining better results than DialoGPT on the in-domain (ID) test set in terms of F1, BLEU, and DIST.Considering that the two models are trained without using pre-trained language models, their competitive performance mainly benefits from the prediction of the next dialogue action and topic, which guides the model to generate more informative and reasonable utterances.However, MGCG_G, KERS, and DialoGPT obtain poor goal success rates, which drop sharply on the out-ofdomain (OOD) test set in particular.It shows that they still struggle to lead dialogues to reach the target when necessary.In comparison, GPT-2 and BART perform much better than other baseline models over various metrics when evaluated on both in-domain (ID) and out-of-domain (OOD) test sets.We note that in terms of DIST-1/2 scores, BART is significantly better than other baselines because BART seldom generates repeated words, making the generated utterances more diverse in many cases.However, GPT-2 performs better in most cases in generating -gram overlapped utterances (see BLEU-1/2) with correct knowledge (see Know.F1).We employ GPT-2 as our backbone model due to its strong generation ability and ease of incorporation in our plan-controlled generation.For the TCP-Dial, the goal success rate deteriorates remarkably on the OOD test dataset (16.72%) compared to the ID test dataset (69.88%).It is because TCP-Dial explicitly extracts topiccentric knowledge triples according to the planned topic, which may discard necessary domain knowledge when the target topic is not correctly planned especially on the OOD test dataset, making it difficult to generate a proper utterance containing the target topic.
Compared to baseline methods, our proposed plan-guided generation methods achieve significant improvements over most evaluation metrics.For example, our prompt-based generation method achieves much better knowledge F1 scores, i.e., 53.12% and 48.32% on the in-domain (ID) and out-of-domain (OOD) test sets (see Table 2).It shows that our model is more likely to generate correct knowledge (e.g., topics, attributes) from the domain knowledge triples.In terms of the goal success rate according to Table 2, our prompt-based generation method obtains a much higher More importantly, our model is still able to maintain a high goal success rate when evaluated on the out-of-domain (OOD) test set.In contrast to GPT-2, our model mainly benefits from our dialogue planning, which verifies the effectiveness of the proposed planning for generation on the end task of target-oriented dialogue generation.Moreover, our plan-controlled generation method further improves the performance of the prompt-based generation method, demonstrating that each planned dialogue path can further steer the model by controlling the generation process of each utterance.We observe similar trends in Table 3 regarding automatic evaluation results on the DuRecDial 2.0 dataset.Both our prompt-based generation and plan-controlled generation methods outperform existing baseline models over most evaluation metrics.We note that all baseline models and our methods perform inferior to that on the DuRecDial dataset in terms of the goal success rate.It is because, in the DuRecDial 2.0 dataset, the domain knowledge triples grounding on each dialogue are noisier than that in the DuRecDial dataset, making it non-trivial for these models to distinguish the target topic and to generate the target topic in the utterance accordingly when necessary.Nonetheless, our methods still achieve better goal success rates, especially when evaluated on the out-of-domain (OOD) test set.Overall, experimental results reported in Table 2

Evaluation Results of Dialogue Planning (RQ2)
To validate the performance of dialogue action planning and topic planning, we compare our proposed TRIP model with existing dialogue planning models.The automatic evaluation results on the DuRecDial and DuRecDial 2.0 datasets are reported in Table 4 and 5, respectively.As shown in Table 4, it is more difficult for all models to predict or generate dialogue topics correctly than dialogue actions because the total size of the topics is much larger than that of the actions in the dataset.For example, MGCG and KERS achieve comparable F1 and Bi.F1 scores on action planning while they perform much inferior on topic planning compared to other baseline models (i.e., BERT and TCP) that employ pre-trained language models.More obviously, we find that all models obtain much lower F1 and Bi.F1 scores in terms of topic planning when evaluated on the out-of-domain (OOD) test set.Since the target topics in the OOD test set are not allowed to appear in the train set, all models are challenging to capture the semantics of the target topics and predict or generate the target topics correctly.In contrast, our TRIP model achieves substantial improvements in both dialogue action planning and topic planning.Particularly, TRIP improves the topic F1 score from 70%-80% to over 90% on the in-domain (ID) test set.It still maintains a much higher topic F1 score of 69.76% on the challenging out-of-domain (OOD) test set.Similar trends are also observed in Table 5 when all these methods are evaluated on the DuRecDial 2.0 dataset.We can conclude that our TRIP is able to plan a dialogue path consisting of more accurate dialogue actions and more reasonable topics.It is our effective dialogue planning that makes it possible to steer the system to lead the conversation toward the target proactively and smoothly.

Ablation Study of TRIP (RQ3)
To explore why our TRIP achieves superior performance in dialogue planning, we conducted an ablation study to verify the effectiveness of the modules and mechanisms proposed in TRIP.We focus on the following settings for ablation experiments: (1) without the forward decoder (w/o   ), which denotes that only the backward decoder is employed to generate the dialogue path from the target turn to the present turn, followed by vanilla beam search decoding (the proposed targetconstrained decoding algorithm is invalid in such a case); (2) without the backward decoder (w/o   ), which denotes that only the forward decoder is employed to generate the dialogue path from the present turn to the target turn, followed by vanilla beam search decoding similarly; From the ablation study results shown in Table 6, we observe that each module or mechanism contributes to dialogue planning.The performance of TRIP sharply dropped when removing either the backward decoder   or the forward decoder   .In particular, the topic F1 score decreased from 69.76% to 46.52% (w/o   ) and 45.09% (w/o   ) on the out-of-domain (OOD) test set.Such ablation results prove that our basic idea of employing two decoders for bidirectional planning is viable and effective.We also observe that the absence of   performs worse than that of   .It is because   directly takes the target as the beginning input of the decoder and generates the dialogue path in a target-to-present direction, which is of benefit to leverage the target-side information to guide planning more effectively.For the ablation results without L  and L  , both reducing the gap between backward-forward paths and contrastive generation of targets can benefit the model in planning as we expect.In terms of the target-constrained decoding, we find that the ultimate performance deteriorated rapidly when removing the lexical constraints (w/o LC) or bidirectional agreement (w/o BA), especially the topic F1 score decreased from 69.76% to 51.34% (w/o LC) and 52.49% (w/o BA) on the OOD test set.It indicates that our target-constrained decoding performs a vital role in dialogue planning since it controls the model's attention to the target-side information during inference even when handling out-of-domain target topics.

Analysis of Parameters (RQ3)
We quantitatively analyzed some critical parameters of our methods, including 1) the hyperparameter  that controls the weight of the bidirectional agreement reward in the planning stage and, 2) the step size  that controls the updating step in the plan-controlled dialogue generation.
Impact of the hyperparameter .To investigate the impact of the hyperparameter  in the planning stage, we conducted target-constrained decoding by varying  in {0, 0.5, 1.0, 1.5, 2.0}.Experimental results are shown in Figure 7(a).We observe that our model achieves the best action F1 and topic F1 scores when  = 1.0 and a smaller value of  results in lower action F1 and topic F1 scores.Particularly, the model performs much inferior without any reward of bidirectional agreement, i.e.,  = 0, indicating that our target-constrained decoding with a bidirectional agreement is crucial in generating a more reasonable dialogue path.
Impact of the step size .To investigate the impact of the step size  in the plan-controlled dialogue generation, we varied  by selecting its value in {0, 0.001, 0.003, 0.01, 0.1, 0.3, 1.0}.Experimental results are shown in Figure 7(b).We observe that the step size  mainly affects the goal 0 0.5 success rate while it has a slighter impact regarding the knowledge F1 score.If no updating step is performed during plan-controlled dialogue generation, i.e.,  = 0, the dialogue generation model (i.e., LM  ()) has no control of the output distribution, especially for those utterances that the target topics should explicitly appear.By default, we choose 0.01 as the most proper step size since neither a larger value nor a smaller one will bring any performance gain.

Human Evaluation Results (RQ4)
We selected several representative models for human evaluation, including MGCG_G, GPT-2, TCP-Dial, and ours.The evaluation results are shown in Figure 8.The Fleiss's kappa scores are mainly distributed in [0.4,0.6], denoting moderate inter-annotator agreement.For turn-level evaluation, we observe that GPT-2, TCP-Dial, and ours obtain comparable scores in informativeness since they utilize powerful pre-trained language models and thus can generate informative utterances.In terms of appropriateness, our method obtains the highest human score on average, demonstrating the ability to generate more appropriate system utterances in response to dialogue context.On the other hand, dialogue-level evaluation (i.e., proactivity, coherence, and goal success) is more challenging for all models because errors might be propagated as the dialogue goes on.We find that our method obtains better results on average compared to all baseline models.Notably, our method achieves the highest proactivity and goal success scores, indicating that our method is more likely to drive the dialogue to reach the target successfully.

Case Study (RQ4)
To illustrate the quality of different methods for target-oriented dialogue generation, we conducted some case studies.We selected the same target with the same initial dialogue context and investigated the generated utterances by three different models from dialogue-level human evaluation, including MGCG_G, GPT-2, and ours (plan-controlled generation).Here, we show some generated cases in Figure 9.As shown in Figure 9(a), we observe that MGCG_G is incapable of generating fluent and coherent utterances.Although MGCG_G conducts planning first to predict the next dialogue action and topic, it fails to predict a correct topic when necessary, causing the model fails to achieve the target (i.e., recommend the movie "The Art of Action: Martial Arts in Motion Picture") at the end of the dialogue.For the case of GPT-2 shown in Figure 9(b), we find that GPT-2 is able to generate more fluent and informative utterances in general.However, it fails to achieve the target since it has no dialogue planning, making it not proactive enough to lead the dialogue towards the pre-determined target.In such cases, GPT-2 is not effective to generate the target topic as the dialogue goes on.In contrast, the case shown in Figure 9(c) demonstrates that our TRIP model can plan a dialogue path with reasonable actions and appropriate topics that outlines how to achieve the target step by step.With the guidance of the planned path, our plan-controlled generation method can know when and what to talk about to move the dialogue forward proactively.More importantly, our method succeeds in achieving the target since our TRIP plans a correct topic (i.e., the target topic "The Art of Action: Martial Arts in Motion Picture") when appropriate.

Additional Discussions (RQ4)
According to the human evaluation results and case study, our proposed methods effectively plan reasonable dialogue paths to guide dialogue generation.The advantages of such a pipelined framework are: (1) It provides our model with better explanations because each planned dialogue path tells the dialogue generation model how to achieve the target step by step with specific actions and essential topics.(2) It is controllable for the end task of target-oriented dialogue generation.
Our methods divide the complicated end task into two stages, making it more flexible to improve the overall performance stage by stage.Therefore, our methods are more practical and can be extended to real-world applications.After analyzing those cases with low human evaluation scores, we also identify some limitations and discuss the potential solutions: (1) Our pipelined framework has error propagation, which might be a typical issue of most existing pipelined methods.We find that the performance of dialogue generation is prone to drop once our TRIP model fails to plan a dialogue path appropriately.We intend to alleviate this issue by introducing some techniques in the cascaded generation, such as noisy channel models [28,43].(2) Our plan-guided dialogue generation method is still not robust enough.Although we have achieved significant good planning results with a large margin compared to baseline models on both datasets, we observe that the performance gain in terms of the goal success rate is much less prominent on one dataset than on another.One possible direction is to study how to improve dialogue generation with adaptive control when it is the turn with the target action and the target topic.

CONCLUSION AND FUTURE WORK
In this work, we explore the task of target-oriented proactive dialogue and focus on effective dialogue planning for dialogue generation.We propose a novel target-constrained bidirectional planning (TRIP) approach to plan dialogue paths from both backward and forward directions.
Our TRIP formulates planning as a generation task and bidirectionally generates dialogue paths consisting of reasonable actions and appropriate topics.To better control path generation, we devise a novel target-constrained decoding algorithm to achieve bidirectional agreement.We adopt the planned dialogue paths to guide dialogue generation in a pipeline manner, with two explored variants: prompt-based generation and plan-controlled generation.Experimental results on two re-purposed datasets show that the proposed methods achieve state-of-the-art performance on all sub-tasks.Extensive analysis and discussions demonstrate the advantages of our methods.We observe that the emergence of large language models (LLMs) [36,49,50] has unprecedentedly boosted the research field of dialogue systems.LLMs will generally perform better for dialogue generation in terms of some aspects, such as fluency, informativeness, and human likeness.However, for the target-oriented proactive dialogue generation task, more critical dimensions should be considered, including proactivity, coherence, and target achievement success rate.Our work shows that dialogue planning plays a vital role in improving dialogue generation performance in these dimensions.Recent studies [52,53] indicate that the planning capabilities of LLMs are still far from that of humans.In the future, we intend to incorporate our proposed bidirectional approach based on LLMs for dialogue planning and generation since our methods are model-agnostic to backbone models.We are also interested in empowering the planning capabilities of LLMs to solve other complex tasks.

Fig. 3 .
Fig. 3. Illustration of our target-constrained beam search decoding with bidirectional agreement.

( 3 )
without reducing the gap between backward-forward paths (w/o L  ); (4) without the contrastive generation of targets (w/o L  ); (5) without the lexical constraints in the target-constrained decoding (w/o LC); (6) without the bidirectional agreement in the target-constrained decoding (w/o BA).

Fig. 7 .
Fig. 7. Quantitative results by varying the value of different parameters.

Fig. 9 .
Fig. 9. Illustrative cases from the dialogue-level human evaluation.The bot's utterances are generated by (a) MGCG_G, (b) GPT-2, and (c) Ours (plan-controlled generation), respectively.The topics and topic-related attributes that also appear in the domain knowledge are marked with underlines.
Then, we briefly introduce our proposed method with respect to addressing the problem effectively.Suppose we have a target-oriented dialogue corpus D = {(K  , P  , H  )}  =1 , where  denotes the number of dialogue samples.K  = { , }   =1 denotes a set of domain knowledge facts relevant to -th dialogue with each element  , in form of a ⟨subject, relation, object⟩ triple.H  = {( , ,  , )} and  ′ denote context length and target length respectively,  is the hidden size.Here, both H  and H  are tokenlevel hidden representations.To maintain full input information for the subsequent planning, we concatenate H  and H  as the final input representation, denoted as M = [H  ; H  ].
Target-constrained Bidirectional Planning for Generation of Target-oriented Proactive Dialogue 1:11 -th layer generated at all time-steps from 0 to .Efficient computations of the LM  () to generate the next token   +1 using the cached H  are summarized as: where (K ( )  , V ( )  ) corresponds to the key-value pairs from the ACM Trans.Inf.Syst., Vol. 1, No. 1, Article 1. Publication date: March 2024.

Table 2 .
Evaluation results of dialogue generation on the DuRecDial dataset.Significant improvements over baseline models are marked with * (t-test,  < 0.05).

Table 3 .
Evaluation results of dialogue generation on the DuRecDial 2.0 dataset.Significant improvements over baseline models are marked with * (t-test,  < 0.05).

Table 4 .
Experimental results of dialogue planning on the DuRecDial dataset.Significant improvements over baseline models are marked with * (t-test,  < 0.05).

Table 5 .
Experimental results of dialogue planning on the DuRecDial 2.0 dataset.Significant improvements over baseline models are marked with * (t-test,  < 0.05).
and Table 3 demonstrate that compared to existing methods, our proposed two variants are effective in generating more appropriate utterances on the end task of target-oriented dialogue generation.ACM Trans.Inf.Syst., Vol. 1, No. 1, Article 1. Publication date: March 2024.

Table 6 .
Ablation study results of our TRIP model on the DuRecDial dataset.
Fig. 8. Human evaluation results of different models. denotes Fleiss's kappa.