A Dual Prompt Learning Framework for Few-Shot Dialogue State Tracking

Dialogue State Tracking (DST) module is an essential component of task-oriented dialog systems to understand users’ goals and needs. Collecting dialogue state labels including slots and values can be costly, requiring experts to annotate all (slot, value) information for each turn in dialogues. It is also difficult to define all possible slots and values in advance, especially with the wide application of dialogue systems in more and more new-rising applications. In this paper, we focus on improving DST module to generate dialogue states in circumstances with limited annotations and knowledge about slot ontology. To this end, we design a dual prompt learning framework for few-shot DST. The dual framework aims to explore how to utilize the language understanding and generation capabilities of pre-trained language models for DST efficiently. Specifically, we consider the learning of slot generation and value generation as dual tasks, and two kinds of prompts are designed based on this dual structure to incorporate task-related knowledge of these two tasks respectively. In this way, the DST task can be formulated as a language modeling task efficiently under few-shot settings. To evaluate the proposed framework, we conduct experiments on two task-oriented dialogue datasets. The results demonstrate that the proposed method not only outperforms existing state-of-the-art few-shot methods, but also can generate unseen slots. It indicates that DST-related knowledge can be probed from pre-trained language models and utilized to address low-resource DST efficiently with the help of prompt learning.


INTRODUCTION
Dialogue state tracking (DST) module, which aims to extract dialogue states during conversation [42], is an important component for task-oriented dialog systems to understand users' goals and needs [21,38].Dialogue states are sets of slots and their corresponding values as shown in Figure 1.A slot describes an attribute about the user's need (e.g."price range") and value is the value of the given attribute (e.g."cheap" for "price range").Collecting state labels can be costly [3], requiring experts to annotate all (slot, value) information for each turn in dialogues.In addition, dialogue states are various in different dialogue systems.For example, for different goods in an e-commerce platform, the types of users' needs are very different (e.g."size" for clothes and "CPU" for computers).Thus, it is difficult to define all possible slots and their values in advance, especially with the wide application of dialogue systems in more and more new-rising applications.These challenges require DST models to be able to generate dialogue states in circumstances with limited annotations and knowledge about slot ontology.
To reduce the dependency on large amounts of training data, some few-shot methods are proposed recently for low-resource DST.Most of them apply domain transfer-based methods [20,33,41] which rely on the assumption about the similarity among different domains and thus do not generalize very well to completely unseen slots.Some approaches have tried to exploit external knowledge.Chen et al. [6] and Hudecek et al. [17] consider slots and frames as similar semantic units and use the FrameNet semantic parsers to automatically induce slots.Wu et al. [39] fine-tune BERT with a task-oriented dialogue dataset and utilize it for the downstream DST task.These methods rely on pre-defined slot ontology and can not generate unseen dialogue states.
In this paper, we rethink about DST task as a generation task.Considering slots and values as core semantic information that can be generated from dialogues, DST is similar to a hybrid summarization task including both extractive and abstractive summarization as target information can be both in and not in the original text.From the generation perspective, slots and values do not need to be predefined.We require a model which can understand such semantics and generate them as dialogue states.
Recently, a new paradigm, "Pre-train, Prompt and Predict" [28], which aims at utilizing PLM in a more effective way, has aroused the public's attention.This paradigm can efficiently "probe" the target task-related knowledge with a textual prompt and its superiority has been shown in many few-shot scenarios like few-shot text classification [14] and text summarization [23].In 2021, Su et al. [37] use "translate dialogue to belief state:" as prompts to generate dialogue state sequences.Such a simple prompt incorporates limited knowledge about DST task.Thus, the promising paradigm is still very under-explored in low-resource DST task.To further exploit the potential of prompt learning, we design a dual prompt learning framework (DPL) for few-shot DST.Different from existing work which generates both slot and value in a sequence, we consider the learning of slot generation and value generation as dual tasks, and two prompts are designed based on such a dual structure to incorporate task-related knowledge of these two tasks respectively.In this way, DST task can be formulated as a language modeling task efficiently under few-shot settings.
As shown in Figure 2, we first design value prompt functions which equip the textual prompt with values and dialogue history.A value prompt function is a textual template, e.g., "[] belief states: value = [], slot = []".Given the dialogue history  ("...Plan a train to London") and value candidate  (London), the prompt becomes: "...Plan a train to London.belief states: value = London, slot = [s]" where [s] is supposed to be generated as destination by the PLM.Further, prompting values via slots can be seen as a dual task of prompting slots with values.Thus, we design slot prompt mechanism as the dashed lines in Figure 2 show.While training, after generating slots  via value prompt, slots are presented to the slot prompt function  .This process aims to generate the corresponding value  ′ which is supposed to be close to the original input .Naturally, there exists an internal correlation between these two types of prompt tasks and they can benefit each other, especially under the few shot settings.The slot prompt can also help self-checking and restrict the output of the value prompt: if a generated slot can be used to prompt the original value, the value belongs to the slot with a larger probability.Finally, a simple but effective ensemble method is used to leverage the complementary advantages of different prompt functions while testing.
The main contributions of our work can be summarized as: • We reformulate DST as a language modeling task and propose to split it into two dual tasks: slot generation and value generation.• A novel dual prompt learning framework is designed to help PLMs understand the essence of DST with few labels and utilize the generation ability of PLMs efficiently.• Experimental results show that our model can generate unseen dialogue states and outperforms state-of-the-art fewshot approaches.

PRELIMINARY 2.1 Prompt Learning
Prompt learning, which aims to utilize pre-trained language models more effectively with the help of prompt, is a new NLP paradigm ("Pre-train, Prompt and Predict") proposed recently.Usually, the original task input  is used to construct a prompt that can reformulate the original task into a language modeling task.Take the emotion classification task as an example, when recognizing the emotion of a social media post, "I missed the bus today", we may continue with a prompt "I felt so __", and ask the PLM to fill the blank with an emotion-bearing word.With the appropriate prompts, PLM can be pushed to generate the task-related output directly.Given the prompt function f which maps original input x to the prompt, the goal is to learn: where  is the answer to be generated/filled.In DST,  can be a word in the dialogue state sequence.

Dialogue State Tracking
We consider each conversation with  -turn utterances alternating between the user and system:  = { 1 , If   is considered as a word sequence [15], DST is essentially a language modeling task.Large-scale pre-trained language models (PLM) show outstanding language modeling and generation ability.Following the existing paradigm (Pre-train and Fine-tune), we need to fine-tune PLM with the task-related dataset.Fine-tuning with a few labeled dataset may lead to over-fitting.Thus, an effective way to help PLM understand DST task in their familiar way (language modeling) and utilize the generation ability is important, inspiring the exploration of prompt learning for few-shot DST.

METHOD 3.1 Dual Prompt Learning for Few-shot Dialogue State Tracking
To utilize the few-shot generation ability of PLMs via prompt learning, previous work [7,18,36] show that the design of prompt function  is a key factor that influences the final performance.The main question is how to formulate the downstream task as a language modeling task and thus can utilize the generation ability of PLMs efficiently.A natural idea is to consider slots and values as same semantics, dialogue history can be used as the input and fed into GPT2 to generate the sequence of dialogue states directly [24].However, this method needs plenty of annotations as the learning process lacks knowledge about the target task.Some existing work use slots as prompts and generate values [22].For example, given  ("...plan a train to London.") and slot (destination), the input of PLM becomes "...Plan a train to London.Where is the destination the user wants to reach?[]" where [] is supposed to be generated as "London".This method relies on the known ontology of slot type.For few-shot DST, the slots that appear in the few labeled datasets may not include all possible needs.In addition, defining all possible slots are difficult as the rapid application in different new-rising domains and user's continuous need.In the real-world application, the candidate set of  may be unknown and changeable.
Actually, values and slots are both core semantic units in utterances that describe users' needs.Generating values with slots can be seen as a dual task of generating slots with values.Naturally, these two types of tasks are supposed to hold an intrinsic correlation and can benefit each other, especially in the few-shot settings.Thus, we split the DST task into two dual sub-tasks (slot generation and value generation) and propose a dual prompt learning framework (DPL) for few-shot DST as shown in Figure 2.While training, few labeled data can help PLMs understand DST under the dual prompt framework.Next, we'll describe the framework in detail.1.Take the first template  1 for an example ("[c] belief states: value = [], slot = []"), given value candidate  = "London",  (, ) = "[] belief states: value = London, slot = []".The goal is to learn the probability of slots given  and the value :

Value Prompt. Four intuitive templates for value prompts are shown in Table
The overall learning objective of this generation processing is minimizing the negative log-likelihood of slots in the training dataset : As a turn may contain multiple values and slots, each pair of (slot, value) constructs an instance for training and testing.It's worth mentioning that the slot types are not static.We generate Table 1: Different value prompt functions.[] is the dialogue history.[] is the input of value candidate and [] is the slot to be generated.
the slots in the whole vocabulary space, making generating unseen slots possible.
3.1.2Slot Prompt.Although slots and values are all core semantics of dialogue, they are differently expressed.Slots are types, which are more often implicitly indicated in the dialogue.However, values are the specific needs users express.So they have more probability of explicitly appearing in the dialogue.We analyze the multi-domain dialogue dataset MultiWOZ 2.1 Eric et al. [10] and find that 89.36% values can be matched in the original dialogue.So, slot prompt is considered as an auxiliary task for the value prompt.Our goal is to utilize it to help PLM understand the task and tune the output further: if a slot can be used to prompt the original value, it means there is a larger probability that the value belongs to the generated slot.
Thus, we design slot prompt as shown in Figure 2.While training, the slot () is presented to slot prompt function ( ).The slot prompt process aims to answer the corresponding value  ′ which is supposed to be close to the original input one .We take "[] belief states: [s] = [v]" as the template in  .We use teacher forcing for training and the loss function L  is: where  is a decimal in (0, 1) and is used to adjust these two tasks.The training process is described in Algorithm 1. for batch  ∈  do #Get value prompts 7:

Inference
←  (  ,   ) 8: #Get slot prompts 10: ←  (  ,   ) 11: end for 13: Calculate L  on   in Eq. 4 14: Calculate L  on   in Eq. 5 15: Calculate final loss L in Eq. 6 16: Update  to minimize L  self-critical sequence training (SCST) [35].After generating values, they are constructed as value prompt and fed into  to generate slots.The loss of the tuning process is:

Value Candidate Generator
where   is the generated values and (  ) is the reward which is the generation probability of target slot using   as the input for the slot generator.The final loss is: where  is a decimal in (0, 1) and is used to balance these two losses.

Prompt
Ensemble.In Section 3.1.1,we described methods to generate a set of value prompt functions as shown in Table 1.
Each of these prompts may be more or less effective at eliciting knowledge from PLMs, and thus it is necessary to decide how to use these generated prompts at test time.Unfortunately, under fewshot settings, it's hard to get enough training and development set to automatically select or generate the best-performing prompt [1,8,14,18,29].We introduce a multi-prompt learning method (prompt ensemble) for few-shot DST task in this section to effectively utilize different prompts.Prompt ensemble methods use multiple unanswered prompts for input at inference time to make predictions [28].It can leverage the complementary advantages of different prompts and alleviate the cost of choosing one best-performing prompt.There is relatively little work on prompt ensemble for generation tasks.A simple way for ensemble in this case is to train a separate model for each prompt and generate the output based on the vocabulary distribution learned by several models while testing.The probability of slot   is calculated via: where   is the -th prompt and   is its weight. is the number of prompt functions.We adopt the standard metric in DST [41]: joint goal accuracy (JGA).The metric compares the entire predicted belief states to the gold one at each dialog turn.The prediction is considered correct if and only if all the predicted states exactly match the ground truth states.Only when the values and slots are both correct, the prediction is correct.

EXPERIMENTS 4.1 Experimental Setup
4.1.2Implement Details.We choose SOLOIST [31] as our base model.SOLOIST is initialized with the 12-layer GPT-2 [32] and further trained on multiple task-oriented dialog corpora (Schema [34] and Taskmaster [4] ) for two dialogue-related tasks (belief prediction and response generation).Specifically, the belief prediction task accepts utterance as input and generates the belief states as a word sequence (e.g., "Belief state: destination = London").Thus, we suppose that knowledge about DST may be learned via SOLOIST, and what we need to do is to find an effective way to "probe" the knowledge and apply it to few-shot scenarios.In addition, the moderate size of SOLOIST (117M parameters) makes fine-tuning for the task-related prompts computationally efficient. for each prompt function in Eq.9 is set to the same value (1/4). in Eq.6 is 0.1. in Eq. 8 is set to 0.1.Our code will be released after the review process.

Few-shot Experiments
For few-shot experiments, we compare our methods with several strong baselines capable of few-shot inference, which achieve SoTA on MultiWOZ 2.0 dataset.They can be categorized into two classes: one need slot ontology and another doesn't need it.Baselines requiring slot ontology include: (1) TRADE [41] requires the embedding of slots as inputs and uses a soft copy mechanism to either copy the corresponding values from utterance pairs or generate them using RNN.(2) Self-Sup [40] adds two self-supervised objectives: preserving latent consistency and modeling conversational behavior for TRADE.(3) TOD-BERT [39] trained BERT with several taskoriented dialogue-relevant tasks: masked language modeling and response generation with large-scale corpora (100k dialogues across over 60 different domains).For DST, it learns a classifier to predict the value over the pre-defined possible value set for each known slot.Baselines that do not needs slot ontology consider DST as sequence generation task: (1) SimpleTOD [15] uses a single causal language model to generate all outputs given the dialogue context (2) MinTL [27] jointly learn DST and dialogue response generation which introduces Levenshtein belief spans.(3) SOLOIST [31] is the base model of DPL. ( 4) PPTOD [37] integrates different dialogue modules into a unified model with prompt.
To simulate the few-shot scenarios, we randomly select a limited quantity of labeled training data for training.To compare with previous work, we randomly select dataset with given ratio (1%, 5%, 10%, 20% and 25%) in the training set for training and test on the whole test set in Table 2. Some baselines provide results of 20% training set while others provide that of 25%.For comparison, we evaluate our model on both 20% and 25% settings."N/A" denotes the results not presented in the original paper.
Compared to previous approaches, our model achieves consistently higher JGA (3.9% on average) on other domains under different data ratio settings.The improvement is especially large when only 1% training set are available (4.0% over the strong baseline PPTOD).It indicates the superiority of our model in low-resource scenarios and verifies the strong task-related generation ability of PLMs under prompt learning.

Few-shot Cross-domain Experiments
In the few-shot cross-domain experiments, models are first trained with four domains and then fine-tuned with 1%, 5% and 10% of  the target domain data.We compare with several strong models with reported results: TRADE, DSTQA and T5DST [26].DSTQA considers DST as question-answer task and needs slot information to construct questions.T5DST is a strong prompt baseline that uses slot descriptions as a prompt.They all rely on known slot ontology.
The experiments are also conducted on MultiWOZ 2.0 for comparison with previous works.Table 4 summarizes the evaluation results.We can see that in all domains, our model outperforms these strong baselines, especially in 1% training data setting for hotel domain.

Unseen Slot Generation
We present the slots of each domain in Table 3.We find that some domains share some slots with other domains.For example, all slots of Attraction can be found in Hotel.On the contrary, some domains hold some slots that are not seen in other domains.For Hotel, it has four unseen slots: parking, book stay, stars and internet.Restaurant has two unseen slots (food and time).Here, we consider "unseen slots" as both "unseen" in the labeled training data and "unseen" in the slot names of the source domains.
To observe the extension and generation ability for unseen slots, we design two zero-shot experiments: leave Hotel or Restaurant as held-out-domain respectively, and train on other four domains.We present slots accuracy which evaluates the slot-level accuracy of correctly generated slots while values are correctly generated.From the results in Figure 4 and 5, we find that: (1) For seen slots that have the same names as that of source domains, our model can generate them with high accuracy.For example, area in Hotel domain is a common slot for other two source domains (Attraction and Restaurant), which can be generated with 96.92% accuracy.It indicates good transfer ability across domains.
(2) For some unseen slots (book stay and stars in Hotel of Figure 4, book time and food in Restaurant of Figure 5), our model can generate them with more than 87% accuracy.For example, given the dialogue history "...yes, please book it for 1 person and for 5 nights starting Friday."The model successfully generates "book stay" for "5" even it has never seen the instances of book stay while training.Without known slot types, our model can infer the hidden semantic from the value and contexts, which is supposed to be the slot.
(3) For two unseen slots (internet and parking), their values are "yes".We find that the value generator can generate such implicit value as shown in Table 6.Then PLM model can generate the corresponding slot with large probabilities (73.15% and 84.31%).5 presents turn-level accuracy which measures the ratio of turns while all predicted values exactly match the ground truth values.

Ablation Studies
Rule-based candidate generator achieves 32.65% turn-level accuracy.Our trained generator can outperform it with only 1% training data over 14%, indicating the superiority of learned value generator.In addition, we find that "tuning" can improve the results of value generation.Although the value generator didn't achieve very high values of turn-level accuracy under the few-shot setting, our model still outperforms others as JGA in Table 2 shows.It attributes to the high accuracy of slot generation while turn-level values are correctly generated.4.5.2Prompt Functions.We further observe the performances of different components including different value prompt functions and prompt ensemble.We train separate models with each value prompt ("DPL" for  1 , ... 4 ).Then, we apply prompt ensemble ("En") for the trained models.Experiments with 1% training data are shown in Table 7.
• The first four numerals in the first row show the original performance with different prompt functions.Among the four prompts,   7. We find that the performances decrease for all prompt functions, indicating the importance of using slot prompt.For  2 , the decrease is relatively small (0.3%).It may attribute to the slot prompt ("belief states: [s] = [v]") and the value prompt ("belief states: [v] = [s]") are too similar to learn complementary knowledge.Further, we conduct experiments to observe the influence of weight  in Eq.6. is set to {0, 0.1, 0.3, 0.5}.Experiments using 1% training data and different value prompts are shown in Figure 6.We find that the JGA performance always increases with the value of  first and then begins to decrease.It means that slot prompt is actually an auxiliary task and can provide useful knowledge when the weight is relatively small.All experiments for the four prompts perform best when the  is 0.  Figure 6: The influence of weight  for slot prompt using different prompt functions  .X-axis is the value of  and yaxis is JGA.Experiments with  = 0.1 always perform best for all prompt functions.

RELATED WORK 5.1 Few-Shot Dialogue State Tracking
Some few-shot methods used data augmentation to get more labeled data for training.Campagna et al. [5] and Hou et al. [16] propose to synthesize dialogues for a new domain using the small number of domain templates derived from observing a small dataset and the ontology of the domain.These methods depend on the ontology of slots on the target domain.
Most of the existing work focuses on transferring from other resource-rich DST domains.Lee and Jha [20] and Rastogi et al. [33] utilize the slot description for transferring reusable concepts across domains.Wu et al. [39] learn similarity functions between slots and values, and transfer them into unseen domains.Dingliwal et al.
[9] introduces meta-learning and uses source domains to metalearn the parameters of the model used to initialize the fine-tuning process of the target domain.One constraint of such methods is that they rely on domain similarity for transfer, and therefore cannot be applied to general domains.
Another thread of approaches tries to exploit external knowledge.Chen et al. [6] and Hudecek et al. [17] utilize FrameNet-style [11] semantic frames and named entity recognition (NER) as the weak supervision for slot candidates.Gao et al. [13],Gao et al. [12], Li et al. [22] and Lin et al. [25] reformulate DST into a Reading Comprehension (RC) task and make use of the abundant RC data and frameworks to overcome the data scarcity issue in the DST task.Wu et al. [40] investigate two self-supervised objectives: preserving latent consistency and modeling conversational behavior.However, they have limited performance owing to the limited common knowledge.

Prompt Learning
With the rapid development of large-scale pre-trained language models (PLM), a new paradigm arise public's attention: "pre-train, prompt, and predict [28]".Instead of adapting PLM to downstream tasks via objective engineering, prompt learning reformulates downstream tasks to look more like those solved during the original PLM training with the help of a textual prompt.GPT-3 model [2] achieves remarkable few-shot performance solely by leveraging a few task demonstrations as input context (e.g., "Translate English into French") and a natural-language prompt (e.g., "cheese ==> ").However, training such a huge model (175B parameters) is difficult.A more usual prompt learning method is "prompt-based fine-tune": utilize a moderately-sized PLM for which fine-tuning is computationally efficient and fine-tune it with the task-related prompts.It shows good performance in many few-shot scenarios.Gao et al. [14] use RoBERT-large and design automatic prompt generation for text classification.Li and Liang [23] add continuous task-specific vector as prompt to each transformer layer and achieve improvements in low-resource text summarization.For DST task, Lee et al. [19] use slots as prompt directly and generate the corresponding values, which needs a lot of labeled training data for fine-tuning PLM.For few-shot DST, the prompt learning-based methods are still under-explored.

CONCLUSION
For the lack of labeled data in practical DST tasks, we design a dual prompt learning framework, which consists of two main components (value prompt and slot prompt).Our model can effectively probe DST-related knowledge from pre-trained language models and utilize it for DST task.Experiments show that our model outperforms existing state-of-the-art methods under different levels of resources.In addition, this framework doesn't rely on the known ontology of slot types.With extensive experiments, we find that it can generate slots that are not seen in source domains and are not pre-defined as well with high probabilities.In the future, we'll focus on improving the performance of extracting value candidates.

A 1 :
Good Morning.What can I help you?U 1 : I want a cheap hotel.A 2 : okay, what day would you like your booking for ?U 2 : please book it for Wednesday for 5 people.

Figure 1 :
Figure 1: Dialogue state tracking (DST) task.U and A represent the user's and system's utterances respectively.DST aims to extract dialogue states pairs (slot, value), for each user's utterance.Values are usually the explicit needs expressed in the utterances.

) 3 . 1 . 3
Training Process.The final loss function L consists of loss functions in slot generation L  and value generation L  :

Figure 3 :
Figure 3: The process of value candidate generation.Dashed lines denote the tuning process: The slot generator accepts the generated value   to construct the value prompt and then feeds it into the fine-tuned PLM to generate slots and get reward for tuning the value candidate generator.

Figure 4 :
Figure 4: Slot accuracy of each slot in Hotel domain under zero-shot settings.X-axis is the slot accuracy and y-axis is the slot.Red bars mark unseen slots.

Figure 5 :
Figure 5: Slot accuracy of each slot in Restaurant.

4. 5 . 1
Value Candidate Generation.We then analyze the results of value generation given the corresponding ratio of training data.Table

destination Generated dialogue states: (slot = destination, value = London)
[31]...  ,   } where   and   represent the user's and system's utterance respectively.Given the dialog history   (including current user utterance   Figure 2: Overview of the dual prompt learning framework for few-shot DST.While training, two dual prompts are constructed: value prompt and slot prompt.Value prompt is constructed with a value and given to the PLM to generate corresponding slots.Slot prompt is constructed with slots and used to generate values.While testing, value candidates are first generated by a pre-trained value candidate generator, and then used to construct value prompts and generate slots.and the former utterances,   = { 1 ,  1 , ...,   ,   }), a DST model aims to extract the dialogue state (belief state)   for   which comprises multiple tuples of slots  and their associated values  (  = {( 1 ,  1 ), ...(  ,   )}).For example, given the dialog history   ("...Plan a train to London on this Tuesday"), DST model is supposed to generate belief states   = {(destination, London), (day, this Tuesday)}.The goal is to learn probability distribution [31]for -th turn: [30]raining time, the labels of values are annotated and used for training.While testing, they are unknown.Existing work[30]extracts adjectives, named entities and others as value candidates.However, many values are implicit or do not belong to these pre-defined types.We consider the values in a turn as a sequence (e.g., " | 17 : 00") and the generation of values as a few-shot summarization task.As shown in Figure3, the input is the dialog history  concatenated with a prompt " =>  : " and output is the value sequence " 1 |  2 |  3 | ...".The training dataset is the same as that used in the training process of DPL and loss function is also the negative log-likelihood loss L ′ .Further, the trained model  is utilized to tune the training process of value candidate generation towards generating values which can be used to generate correct slots via Algorithm 1 Training process Require: Training dataset , value prompt function  , slot prompt function  and a PLM  Ensure: The fine-tuned model  1: repeat 3.2.1 Valule Candidate Generation.

Table 5 :
5 performs best which may attribute to the similar format of  2 compared with the output sequences in a .563.1 19.7 37.4 41.4 42.4 55.7 60.9 63.8 66.5 70.1 59.8 69.2 71.1 DSTQA N/A 70.5 71.6 N/A 50.2 53.7 N/A 59.0 64.5 N/A 70.9 74.2 N/A 70.4 74.5 T5DST 58.8 65.7 69.5 43.1 50.7 54.9 57.6 61.9 63.5 70.1 73.7 74.7 70.8 74.2 77.6 Turn-level accuracy on test set of value generator under different ratios of training data."w/o tuning" means removing the process of using the output of slot generation to tune the process of value generation.Dialogue history: ... [user] no , i do not care where it is .i like 3 stars and i absolutely need free wifi .Gold values: don't care, 3, yes Generated values: don't care, 3, yes Table 6: A test instance whose values are generated by the trained value generator with 25% training data.It shows that the value generator can generate implicit values ("yes").Dual Framework.In our dual framework, if we remove the branch of slot prompt (value generation), the model also can learn to generate slots based on value prompt.So we remove slot prompt to see its effects on the entire framework.Experimental results are reported in the "DPL w/o slot prompt" row of Table

Table 7 :
1.So we set it to 0.1 in all experiments.JGA results for our models trained with 1% data given different prompt functions (from  1 to  4 )."w/o slot prompt" means removing the training process of slot prompt."En" shows the result of the ensemble of models trained on different prompt functions with and without slot prompt.