Siamese Network-based Framework for Open-set Domain Generalization

Deep learning has made great progress in many fields, such as computer vision and natural language processing. But the performance of traditional deep learning models will be seriously degraded when facing the domain shift, which means that the distribution of test data and training data is significantly different. A large number of Domain Generalization (DG) methods have been proposed to enhance the generalizability of models. However, traditional DG methods are based on the assumption that the category space of training data and test data is consistent, which is always untenable in practice. Therefore, this paper further studies the open-set domain generalization problem when the category spaces of training data and test data are inconsistent. This paper proposes an open-set domain generalization framework based on the Siamese network, which generates images in the unknown categories through patch-shuffling, and treats generated images as negative samples to negatively supervise models. Thus models are forced to learn the critical feature representations, the over-fitting of models reduces, and then the performance of models on open-set domain generalization tasks is enhanced. The experimental results show that the proposed framework achieves state-of-the-art on the two open-set domain generalization benchmarks.


INTRODUCTION
In recent years, machine learning has made tremendous progress in many fields, such as computer vision and natural language processing.In traditional machine learning tasks, we generally design models for specific tasks and train the models on labeled training data.Generally speaking, if the distribution of the training data and the test data are relatively consistent, then the model trained on the training data will perform well on the test data.However, in practical applications, the distribution of test data and training data may be quite different, resulting in a great loss of model performance, which is the so-called domain shift problem.Domain generalization (DG) aims to solve this problem by improving the generalizability of the model, that is to improve the robustness of the model on various unknown test domains.
In traditional domain generalization tasks, the category space of the source domain data used for model training and the target domain data used for model testing is consistent.However, this assumption can hardly be satisfied in practice.Since the distribution of the target domain data is completely unknown, it may also contain categories that are not included in the source domain training data.Therefore, in recent years, domain generalization tasks under the open-set setting have attracted more attention.Open-set domain generalization mainly studies the DG problem when the category spaces of data in different domains are inconsistent.The model is expected to not only correctly classify the known categories, but also detect the unknown categories in the unknown target domain.Compared with traditional DG tasks, open-set domain generalization tasks are more practical and challenging.
Traditional DG methods have the problem of overconfidence on the open-set domain generalization tasks, that is, the model will misclassify the images in unknown categories as known categories with a high degree of confidence, this is due to the overfitting of the model during the training process.To address this problem, this paper proposes an open-set domain generalization framework based on the Siamese network.By generating new images which belong to unknown categories during training and using the generated images to negatively supervise the model, the model is able to learn the representations which are critical in classification, thereby

METHOD
In this section will introduce the Siamese network-based framework and the negative sample generation method and the relevant supervision methods.Overview of the framework are shown in Figure 1.

Random patch-shuffle
First, we perform random patch-shuffling on the original input images to generate new images belonging to unknown categories, and the generated images will be treated as negative samples in the subsequent training procedure.Given an input image   , we simply cut the image into a number of parts and randomly swap the position of each part to form a new image  ′  .The new image generated by patch-shuffling contains the local information of the original image, while losing the global information due to the change of the patch position.If we use such image to negatively supervise the model, the model will be inclined to do classification according to the global information of the input image, rather than outputting a classification result with high confidence based on some local information in the image, thus alleviating the overconfidence problem of the model when processing images in unknown categories.

Siamese network-based framework
Through the random patch-shuffle method proposed above, we could generate a large number of negative samples through input images.So how should we utilize these negative samples to supervise the model?A simple and direct approach is to directly use negative samples to negatively supervise the model.But this approach will damage the ability of the model to extract local features of images because the negative samples contain the local information of the original images.To address this issue, we propose a framework based on the Siamese network, which effectively utilizes the generated negative samples to negatively supervise the model while maintaining the model's ability to extract the local information of images.The Siamese network-based framework consists of three components: a Siamese network  , an original classification network  which is used to process the features of original input images, and an open-set classification network Ĝ which is used to process both the features of the original input images and the features of generated negative samples.We first input the original image   into the Siamese network to extract the original features   =  (  ), and input the original features into the original classification network to calculate the prediction   =  (  ) of the original image.We adopt the classic cross-entropy loss function to compute the loss based on the classification result and the ground truth label: where   =    (  ), and then use the loss to supervise the original classification network and Siamese network.At the same time, we patch-shuffle the input image to generate a negative sample, which is then fed into the Siamese network to extract the negative feature.Both the original feature and the negative feature will be input into the subsequent open-set classification network, and the prediction of the original image P = Ĝ (  ) and the prediction of the negative sample image P′  = Ĝ ( ′  ) are calculated.The loss is then computed using the two predictions jointly: where p =    ([ P , P′  ]); [•, •] refers to the line splicing operation;   means zero labels.The computed loss is used to merely supervise the open-set classification network.Through the framework proposed above to process the original images and the negative samples separately, the feature extractor could effectively extract local information from the images; while under the negative supervision of the negative samples, the open-set classification network tend to not output classification result with high confidence according to the partial local information of the input image, so it has a stronger ability to detect images in unknown categories.

Joint supervision with positive and negative samples
How to utilize the generated negative samples to negatively supervise the model?A simple way is to feed the negative samples into the model and then maximize the entropy of the output logits for each class, thus minimizing the confidence of the output results.This approach reduces the logit of the class, to which the original image corresponding to the current negative sample belongs, and increases the logits of other classes.But intuitively, since the negative samples do not belong to any category, it is unreasonable to increase the logit of any category during training, the logits of all categories should be reduced.Another approach is to directly minimize all of the logits, but this approach is difficult to alleviate the overconfidence of the model.Because the classification result depends on the differences between the logits of different categories, not the specific value of each logit.Minimizing the logit of each category will reduce the logit of all categories together, but will not change the result of softmax computation, nor will affect the confidence of the classification result.
In this section, this paper proposes a joint supervision module using positive and negative samples to carry out effective negative supervision of the model.This module concatenates the logits of the original images and the logits of the negative samples side by side to obtain a set of joint logits, and also concatenates the category labels of the original images and the labels of the negative samples side by side to obtain a set of joint labels.The labels of the negative samples are all zero labels.The module then feeds the joint logits and joint labels into a regular cross-entropy loss function to compute the loss and use it to supervise the model.Such a supervision method could maximize the output logits of the model corresponding to the categories to which the original images belong, and minimize the logits of other categories; at the same time, all of the logits of negative samples are minimized.As we can see, compared with the approach that directly maximizes the entropy of the classification results, the joint supervision module could effectively supervise the open-set classification network, ensuring that negative samples play their due role in the process of model training.Without using the Siamese network framework, the joint supervision module can still significantly improve the performance of the model; but without using the joint supervision module, the Siamese network framework is difficult to play a role.This is because although the Siamese network extracts high-quality features, the open-set classification network can hardly be well-trained under conventional supervision which is based on minimizing entropy, thus the performance of the model is poor.

CONCLUSION
This paper proposes a novel Siamese network-based open-set domain generalization framework, which effectively overcomes the overconfidence problem of models in open-set domain generalization tasks by generating and utilizing negative samples.We first generate negative samples through patch-shuffling, which destroys the global information of the original images while maintaining the local information.Then we propose a Siamese networks-based framework with the joint supervision module, which utilizes both the original images and negative samples to jointly supervise the model, to improve the ability of the model for detecting unknown samples, while maintaining the generalizability of the model.Experiments on two open-set domain generalization datasets demonstrate the effectiveness of our proposed framework.

Figure 1 :
Figure 1: The structure of the Siamese network-based framework.

Table 1 :
Comparison to prior works on PACS and OfficeHome datasets

Table 2 :
Results of ablation study on PACS dataset.We conducted ablation experiments on the PACS dataset to verify the effectiveness of negative sample generating based on patchshuffling, Siamese network framework, and joint supervision with positive and negative samples.The experimental results are shown in Table2, in which N represents utilizing generated negative samples to negatively supervise the model, S represents the Siamese network framework, and J represents the joint supervision module.If we do not perform patch-shuffling to generate negative samples to supervise the model, then our method degenerates into the ERM method.Therefore, in the ablation experiments of the Siamese network framework and the joint supervision module, we by default use the generated negative samples.