Handling Label Uncertainty for Camera Incremental Person Re-Identification

Incremental learning for person re-identification (ReID) aims to develop models that can be trained with a continuous data stream, which is a more practical setting for real-world applications. However, the existing incremental ReID methods make two strong assumptions that the cameras are fixed and the new-emerging data is class-disjoint from previous classes. This is unrealistic as previously observed pedestrians may re-appear and be captured again by new cameras. In this paper, we investigate person ReID in an unexplored scenario named Camera Incremental Person ReID (CIPR), which advances existing lifelong person ReID by taking into account the class overlap issue. Specifically, new data collected from new cameras may probably contain an unknown proportion of identities seen before. This subsequently leads to the lack of cross-camera annotations for new data due to privacy concerns. To address these challenges, we propose a novel framework ExtendOVA. First, to handle the class overlap issue, we introduce an instance-wise seen-class identification module to discover previously seen identities at the instance level. Then, we propose a criterion for selecting confident ID-wise candidates and also devise an early learning regularization term to correct noise issues in pseudo labels. Furthermore, to compensate for the lack of previous data, we resort prototypical memory bank to create surrogate features, along with a cross-camera distillation loss to further retain the inter-camera relationship. The comprehensive experimental results on multiple benchmarks show that ExtendOVA significantly outperforms the state-of-the-arts with remarkable advantages.


INTRODUCTION
Person Re-IDentification (ReID) aims to match the same identity across non-overlapping camera views.The success of the modern offline supervised person ReID paradigm [30,38,43] is largely attributed to the availability of large-scale cross-camera annotations and the assumption that the surveillance system is fixed.The problem arises when the model needs to acquire new knowledge from newly installed cameras over time, which may require re-collecting data and retraining the model.However, manually establishing crosscamera annotations of all the identities from new and old cameras and then retraining them is expensive and cumbersome.Moreover, The existing setting assumes that identities in new data are completely disjoint with previous data.(b) Our setting relaxes the strict class-disjoint assumption.Under our camera incremental setting, the new data will only have intracamera annotations, and may also contain previously seen people.
those methods are susceptible to catastrophic forgetting [21] when adapted to real-world dynamic surveillance systems, particularly when data privacy concerns are taken into account.
Recently, there has been increasing attention [13,25,36] on incremental learning (or lifelong) for person ReID (ILReID), which aims to address the practical requirements of continuously learning person ReID models from a stream of incoming data.As new data arrives, old data is not available for re-training due to privacy concerns.However, as shown in Fig. 1(a), existing ILReID methods commonly assume that the classes of new data are entirely different from the old ones.This assumption is not consistent with real-world scenarios, as previously observed pedestrians may re-appear and be captured again by the camera.
Motivated by this gap, in this paper, we introduce a new task setting named Camera Incremental Person ReID (CIPR), that naturally meets the demand of incrementally updating the model from newly installed cameras without access to previous data.As shown in Fig. 1(b), unlike previous incremental setting in person ReID [13,25,36] that heavily relies on the class-disjoint assumption, the proposed CIPR allows for partial class overlap between the old and new cameras.In fact, previous methods have overlooked a critical limitation, that annotations in person ReID are based solely on numerical IDs to distinguish between individuals, rather than specific categories (e.g."cat").This means that when previous data is no longer available, it becomes difficult to determine whether new data belongs to an existing or a new class, resulting in uncertainty in cross-camera labels of the new data.This thus makes CIPR a more realistic scenario since annotations can only be performed independently for each camera.Despite the above differences, CIPR still faces the risk of catastrophic forgetting due to the lack of prior data.In general, the challenges of CIPR stem from two main aspects: 1) How to recognize and associate seen classes without any prior data (termed as class-overlap issue).As these seen classes should not be expected to learn as new ones, any accumulated errors can lead to performance degradation over time.
2) How to learn more informative knowledge from new cameras while also retaining previously acquired knowledge.
To handle the above challenges in CIPR, a novel framework ExtendOVA is proposed.Specifically, to eliminate the detrimental effect of the class overlap issue, we first incorporate an One-vs-All (OVA) detector [22] that can identify unknown samples from new data.Nevertheless, directly applying the vanilla OVA detector to the CIPR task is problematic for two main reasons.On one hand, the OVA detector only models instance-level recognition, which fails to inherently identify whether a given class is unseen or not.On the other hand, the OVA detector is trained on the original camera data, leading to the domain shift from the new camera.As a result, potentially seen classes will be misidentified as unseen classes.To achieve ID-wise cross-camera identification, we extend the OVA detector by 1) We propose a simple yet effective criterion for selecting confidence-seen classes.2) We devise an early learning regularization term to address concerns of domain shift and rectify potential noisy labels.In addition, to compensate for the lack of previous data against the second challenge of CIPR, we resort to the prototypical memory bank to create surrogate features based on the prototypes and the Batch-Normalization (BN) layer statistics.We also present a cross-camera distillation loss to retain the inter-camera relationship.In conclusion, our contributions can be summarized as follows: • We introduce a novel yet more practical ReID task, named Camera Incremental Person ReID (CIPR), which is fundamentally different from the existing lifelong person ReID tasks.It demands continuous learning of more generalizable representations through data from newly installed cameras only with intra-camera supervision.• We carefully design a novel framework ExtendOVA, which crafts an ID-wise pseudo label generation module against the peculiar class overlap issue under the camera incremental setting.• For extensive assessment, we build a simple baseline in addition to ExtendOVA to tackle CIPR.Experimental results show that the proposed approach gains significant advantages over the comparative methods.

RELATED WORK
Person Re-identification.Offline person ReID settings can be roughly distinguished into three categories: supervised person ReID, unsupervised person ReID, and intra-camera supervised person ReID.Supervised person ReID [20,31,34] are usually superior in performance but are less scalable, relying on a large amount of cross-camera annotations.Differently, unsupervised person ReID [3,37,41,42,47] is more challenging employing either clustering algorithms to generate pseudo labels or the extra source labeled data to boost the performance.Moreover, intra-camera supervised (ICS) person ReID is another perspective to reduce annotation labor [5,23,45], where cross-camera association labels are removed from the training data.However, these settings assume that the training data is pre-collected, and thus camera relations can be learned by matching cross-camera images, yet they are not suited to incrementally add new cameras over time.Our task of incrementally learning person ReID models from newly installed cameras is related to the problem of intra-camera supervision, but with the added challenge of privacy concerns related to cross-camera images for associating positive pairs.Lifelong Person Re-identification.Recently, there has been significant interest in incremental learning for Person ReID, which aims to continuously learn new knowledge without experiencing catastrophic forgetting [26].Various methods have been proposed to prevent forgetting, which can be broadly divided into two categories: replay-based and data-free methods.Replay-based methods rely on maintaining a memory bank of limited samples that are recorded for replay.However, this approach requires additional storage, and maintaining raw data poses a risk to privacy.In contrast, data-free methods do not rely on any old samples.In this paper, we focus on a data-free incremental learning pipeline.
Previous research [4,19,25,36] primarily focuses on incremental scenarios, where new identities keep increasing in fixed camera systems.However, contemporary surveillance systems are dynamic, meaning cameras can be installed at any time.In this paper, we consider a more practical scenario for lifelong person re-id, which aims to optimize the model when one or more cameras are introduced in the existing surveillance systems.Our approach does not require any strict class-disjoint assumption for model training, and it also considers a scenario where cross-camera labels are unavailable in training data.
Out-of-Distribution Detection.Out-of-distribution (OOD) detection is a binary classification problem that involves the ability of a model to distinguish between in-distribution and out-ofdistribution samples during inference.There are various approaches to OOD detection, some of which involve modeling different scoring functions, such as maximum softmax probability [10,16] or entropy [2,17], to estimate confidence and identify OOD samples.Others [24,46] utilize generative models to learn the distribution of in-distribution data.One approach [22] proposed in a recent paper involves using neural One-vs-All (OVA) classifiers to handle out-of-distribution detection.In our work, we incorporate the OVA detector to differentiate between "unseen" and potential "seen" samples.However, it's important to note that the OVA detector is unable to perform ID-wise prediction and may not be robust enough to handle data with domain gaps.

PRELIMINARY 3.1 Problem Formulation
Consider a CIPR problem with several steps, and each incremental step introduces a new camera with a set of classes to learn.Formally, in the -th step, we have the training data } with intra-camera annotations    ∈   captured by the newly installed camera   , and   is the number of classes in   .We note that the training data D  can contain overlapping classes in D  −1 , while the old training data D  −1 are not available due to the privacy concern.Hence, we first need to identify the real number of extensions to correctly learn new classes.The goal of CIPR is to learn a robust ReID model that can be generalized to unseen classes from all encountered cameras.

A CIPR Baseline
We first present a straightforward baseline for CIPR task.Basically, in the -th step ( > 1), the feature extractor  (  ) initialized by  (  −1 ) is updated to learn a set of classes employing D  , and the classifier  (  ) is also extended to the corresponding new dimension [12], which is expected to predict all the classes seen so far.As a common incremental learning baseline, in addition to ReID loss [9] (e.g.ID loss L ID + triplet loss L Triplet [11]), knowledge distillation (KD) loss L KD is employed to prevent catastrophic forgetting, which can be formulated as: where (•) is the Kullback Leibler (KL) divergence,    and    denote the logit output of the old and new models, respectively.
To discriminate the seen and unseen identities without accessing the old data, a straightforward method is to leverage the softmax prediction score.We assume that samples belonging to unseen classes will produce smooth probability distributions since they are equally wrong and ambiguous.Therefore, we can treat an image as the seen class if the maximum softmax score is above a threshold  .For samples identified as a new class, we add a new ID based on the existing old classes.For samples classified into old classes, we use the model predict as its pseudo label.Then we can minimize the cross entropy with the global pseudo labels.The loss function can be formulated as: (2) where  ′  is the pseudo label of samples   , L  is the cross-entropy loss function.
Overall, the optimization objective of the baseline CIPR model can be formulated as: (3)

METHODOLOGY
The filtering mechanism proposed in our baseline method is an alternative way to address the class-overlap issue.However, the manual set threshold  is not robust enough to identify old classes, and the classifier is biased toward mass classes over few classes [10].Therefore, in this section, we introduce a new framework for CIPR.

Overview of Framework
The graphical illustration of our framework is depicted in Fig. 2. We replace the linear classifier with the non-parametric memory bank to alleviate the over-confidence issue [1], which stores the moving average of the cluster prototypes.The old model is fixed, and the new model is updataed via Exponential Moving Average (EMA) scheme during the optimization.Then we elaborate our ExtendOVA in three parts to tackle CIPR problem.The first technical novelty comes from taking advantage of the One-vs-All (OVA) detector for instance-wise seen class identification before training (section 4.2).
Then the samples are assigned with global pseudo labels via our criteria and an early learning regularization term L Aux , as to be detailed in section 4.3.Finally, in section 4.4, surrogate features are sampled based on the memory bank to guide the cross-camera distillation objective.

Instance-wise Seen Class Identification
In this section, we elaborate on the process of instance-wise seen class identification.We first describe the training of the One-vs-All detector before describing the remaining methods.
One-vs-All Detector.The One-vs-All (OVA) detector [22,29] is first proposed for the out-of-distribution detection, which extends a binary classifier to a multi-class classifier to learn a boundary between in-liers and outliers.Specifically, the OVA detector consists of multiple binary sub-classifiers, each of which is trained to distinguish that class from all other classes, i.e., samples belonging to this class are positive while others are negative.For more effectively learning a boundary to identify unknown identities, herein we only pick hard negative samples to compute the loss.Formally, we denote  ( ŷ |) as the positive softmax output for the class .
The optimization objective for a sample   within label   can be formulated as: Seen Class Identification.When new data arrives, we first get the nearest prototype and take the corresponding sub-classifier output of the OVA detector to determine whether the sample is a seen or unseen class, as illustrated in Fig. 2. Essentially, each sub-classifier of the OVA detector corresponds to the latent space of its class, and if a sample exceeds all the boundaries of that space, it will be recognized as outliers.The advantage of the OVA detector is that it can learn an adaptive threshold between seen and unseen classes.

ID-wise Pseudo Label Generation
Although the OVA detector is effective and more robust for instancelevel prediction, it may still introduce noisy labels, especially for hard samples.As a result, two images of the same class may be paradoxically predicted as a new class and an old one, which can affect the class-level prediction.Furthermore, we argue that due to the domain gap, the latent space trained for each class could not effectively represent the data from future cameras, as illustrated in Fig. 2. To this end, we propose an ID-wise pseudo label generation module (IPLG) to correct noisy labels and associate the samples with the same local label to the same pseudo-global label.We shall detail the operation of this module below.Pseudo Label Initialize.We propose a simple criterion to select confident-seen classes.Given a batch of samples {(   ,    )}  =1 that follows PK sampling, we first analyze the output of the samples with the same label    from the OVA detector.An identity    is the seen class if and only if all samples with label    are predicted to belong to the seen class.Denoting the set of seen classes as   , for    ∈   , we use the nearest class predicted by the prototype classifier as pseudo labels.For the remaining classes (denoted by   ) that are excluded by our criterion, we re-label them to a new class.Formally, the pseudo labels are assigned in the following way: where   ∈ R  stands for the -th column of ID-wise prototype in memory bank  ∈ R ×  −1 ,  is the feature dimension.Note that we only choose the most frequently predicted one if there are multiple  for one class.Correspondingly, we expand the memory bank to To rectify the noisy pseudo labels caused by the domain shift, an auxiliary loss is designed to regularize the early-stage learning.Concretely, leveraging the initialized pseudo labels, the network can be trained via a softmax loss to identify both seen and unseen classes in the new data, ensuring that features are closest to their corresponding prototypes, which can be formulated as: the key is to keep the old prototypes fixed throughout the process, serving as a boilerplate for learning domain-invariant features.
Additionally, the hope is that samples belonging to unseen classes can also output the second largest probability to the potential nearest prototype, which is constrained by the following auxiliary loss: Here, ỹ ∈ [1,   −1 ] is obtained using the same way as in Eq.5.The motivation behind this approach lies in the consensus [7,28] that seen classes are usually clustered to form high-density regions in the latent space.Hence, this regularizer encourages these samples to be closed to a shared and real prototype.On the other hand, the unseen classes are often distributed in low-density regions, leading to optimization conflicts where the samples struggle to simultaneously approach the current prototype and the nearest old prototype.
After the early-stage regularization, we would employ the selection criterion again to obtain refined pseudo labels, which would continue to be used for model updates.Meanwhile, those new prototypes that were previously created by falsely identifying as unseen classes would be removed from the memory bank.

Cross-camera Distillation
In incremental learning, models are susceptible to forgetting previous learned knowledge without relearning the old data.To address this, previous work [19] has proposed to use a GAN [6] to reconstruct old data in the image space.In our method, we generate substitute samples in the feature space instead.Concretely, to estimate the distribution of the old data, we assume a class-conditioned multivariate Gaussian distribution denoted as  (  −1

𝑘
is the mean of the Gaussian distribution and can be approximated using our prototypical memory bank.To estimate the covariance matrices, we utilize the statistics of BatchNorm (BN) layers.During training, a BN layer normalizes the features, which implicitly captures the means and variances of the data [40], thereby enabling the estimation of the covariance matrices of the old data.Overall, we estimate the distribution of the data in previous step by: Then we can sample surrogate features f  ∼ Ñ (  , BN( )).Based on these surrogate features, we present a cross-camera distillation loss that serves to regularize forgetting, by ensuring that the cosine distance is maintained across different cameras.Formally, given a batch of samples X = {(   , ŷ )|  =1 } along with a batch of sampled surrogate features F = { f 1 , f 2 ... f  }, the loss can be calculated by: where cos(, ) denotes the cosine similarity .The distillation loss L CD improves stability that is commensurate with the ability of previous data to maintain past structure.

Optimization Summary
In summary, the overall objective function for our ExtendOVA framework is formulated as (10) where  1 and  2 are coefficients.To enhance the model's stability during optimization, we utilize the Exponential Moving Average (EMA) technique [33], wherein the student model's parameters are initially shared by the teacher model  (  ).Once this iteration is complete, the student model is updated using the EMA parameters computed from the teacher model's parameters by where  is a smoothing factor typically set to a value 0.99 [39].In the test phase, we will use the student model to extract feature representations.By incorporating EMA into the training process, the updates to the student model's parameters are smoothed, leading to improved stability and better generalization performance.

EXPERIMENTS 5.1 Datasets and Evaluation Metrics
Datasets.To evaluate and compare different methods under Camera Incremental Person Re-Identification (CIPR) setting, three largescale person Re-Identification (ReID) datasets Market-1501 [43], MSMT17 [35] and DukeMTMC [44] (only for academic use, without displaying images of persons) are exploited.We form the intracamera annotations based on the provided labels.In order to simulate a realistic scenario for incremental learning in person reidentification, we simulate the deployment of a surveillance system that starts with multiple cameras and gradually adds new cameras over time.For example, we select 4 cameras from the Market-1501 for initial training and incrementally add 1 more camera in each subsequent step.Similarly, we create a five-step incremental training setup for MSMT17 and DukeMTMC.It is worth noting that we do not employ all classes of the initial cameras in the first step, but instead perform sampling to generate multiple setups to suit various conditions.The statistics of the datasets are shown in Fig. 3; section 5.2 goes into detail about the setups.Testing Protocols.Two commonly used metrics mean Average Precision (mAP) and Rank-1(R-1) accuracy are used to evaluate the performance of CIPR.To measure the model's ability to adapt and learn new knowledge, we evaluate its performance on unseen classes of all encountered cameras during the incremental learning process.Additionally, to assess the model's anti-forgetting ability, we evaluate its performance on unseen classes of the initial cameras as a measure of retention of previously learned knowledge.

Implementation Details
We use the widely adopted ResNet-50 [8] as the backbone network.
To obtain 2048-dimensional features, a Batch Normalization (BN) layer [14] is placed after the last layer of the network.The batch size is set to 64, comprising of 16 identities with 4 images per identity.
The Adam optimizer with a learning rate of 3.5 × 10 −4 is used for optimization at the initial step and the learning rate of the backbone is set to  /10 during the incremental learning.The model is trained for 40 epochs per step, and the early-stage learning regularization Table 1: Comparison of the final-step incremental results with the state-of-the-art methods in different setups.Joint-T refers to the upper-bound result.Red and blue: the best and second-best results.  is performed during the first ten epochs.The hyper-parameter  ,  1 , and  2 are set to 0.5, 0.9 and 0.6 (see hyper-parameter analysis in the supplemental material).General setup assumes that in most cases, there are more unseen classes initially emerging in a new camera than seen ones, and the number of unseen classes will increase linearly over time.As shown in Fig. 3(a), the identity distribution of the new data is managed by sampling the data from the initial camera.For example, we sampled 300, 500,and 300 identities in the first step for Market-1501, MSMT17 and DukeMTMC, respectively, yielding 110/131, 54/83, 74/144 seen/unseen classes in the second step.Exceptional setup is further considered for extreme scenarios where the majority of the classes captured by new cameras are old ones.This can be achieved by increasing the number of classes sampled in the initial step, thereby increasing the proportion of old classes in the new cameras.
For comparative experiments, we reproduce the state-of-the-art methods, i.e., data-free methods LwF [15], AKA [25], AGD [19], PatchKD [32] , the replay-based methods iCaRL [27], PTKP [4], and a distribution alignment methods MMD [18] on our setting.It is noteworthy that these methods are based on a class-disjoint setting, and they do not match our setting.Therefore, to implement them in our setting, they can only treat old classes as new ones.For more extensive assessment, we design some other comparative methods, including the baseline described in section 3.2, the fine-tune method that fine-tunes the model on new data, the Joint-T that denotes an upper-bound by training the model on all data seen so far.

Comparative Results with Different Settings
We compare our ExtendOVA with the current state-of-the-art.The evaluation is conducted on all cameras encountered so far and the final-step results are reported in Tabel 3 with both the general and exceptional setups.We summarize the results as follows: • Our proposed ExtendOVA outperforms the current state-ofthe-arts by a clear margin, and is the closest to the upper bound Joint-T.Notably, it even achieves comparable or better results than the replay-based methods, validating the effectiveness of our proposed solutions.• Previous methods, designing for the non-overlapping setting, still achieve poor performance.We attribute this poor performance to two aspects: Firstly, in the absence of cross-camera labels, these methods fail in learning cross-view representations.Secondly, turning old classes into new ones results in a false-positive prediction of the spurious classes, leading to accumulated errors.• Interestingly, our proposed baseline outperforms the datafree methods LwF, AKA and AGD, demonstrating the potential improvement in addressing the class-overlap issue.

Ablation Study
A closer look at early stage regularization.In Fig. 4, we plot the per-sample probability distribution of confidence scores generated by different methods.As can be seen, the baseline method uses the maximum probability as the confidence score, resulting in significant confusion between seen and unseen classes.Higher threshold values will reject a large number of seen classes.While the output of the OVA is more discriminative, there is still significant noise introduced due to domain shift.Our method improves upon the OVA detector by incorporating early regularization learning, which significantly mitigates the noise caused by domain shift.
To further observe the impact of early regularization learning on the ID-wise predictive performance, Fig. 5 shows the training curves of ID loss and model accuracy of seen classes during the training process.We can observe that during the training process, both models show a decreasing trend in loss and eventually converge.In the initial iterations of training, the accuracy of both models shows an increasing trend, indicating that the models have not yet started fitting to the noise.After a certain number of iterations, the accuracy of the model without early regularization starts to gradually decrease, while our method corrects the noise in the early stage, resulting in an increase in accuracy.
Effectiveness of the different components.We conduct ablation studies in the three-step exceptional setup to evaluate the effectiveness of each module in each step.To evaluate the effectiveness of our ID-wise pseudo label generation module, we conduct experiments where we disable the L Aux components and also compare the results to a baseline method, i.e.LwF, which is trained using ReID loss supervised by intra-camera labels.First, as shown in Table 2, removing L Aux will decrease the final performance by  0.9% to 3.7% in mAP.Second, the combination of L Aux and L ID * bring the gain in the range of 3.3% to 8.2% in mAP compared with the baseline.This suggests that simply optimizing the cross-entropy loss with intra-camera supervision is not sufficient.
To evaluate the contribution of the EMA scheme, we conduct experiments by removing it.Without the EMA technique, the performance drop ranges between 2.1% and 4.3% in the second step in terms of mAP, and the degradation becomes more significant as the incremental training phase proceeds.This clearly indicates that incorporating this design is crucial for overall performance.While the effect of L CD is not as pronounced as the EMA scheme, it still has an obvious impact on the performance.When both terms are eliminated, there is a significant decline in performance, suggesting that the L CD plays a role in maintaining the overall performance.
Compared with different distillation loss.Table 3 compares the performance of our method with different distillation losses in  the general setup.Specifically, we evaluate the impact of knowledge distillation (KD) and cross-camera distillation (CD) loss on the performance of the baseline model trained with ID loss (L ID * ) and the auxiliary losses (L Aux ).From the table, we observe that incorporating distillation losses, either L  or L  , improves the performance of the baseline model in terms of mAP and Rank-1 accuracy.Notably, adding L  achieves higher performance than adding L  , indicating that L  is more effective in preserving the structure-wise knowledge.

Anti-Forgetting Evaluation
We evaluate the anti-forgetting properties of our proposed method by measuring the performance on the test-set of the first step after each step.Fig. 6 plots the forgetting trend on DukeMTMC and MSMT17 in the general setup.We found that our method showed superior anti-forgetting properties, with no performance degradation and even a slight improvement on the previous tasks.Our baseline model exhibits less forgetting when compared to datafree methods, clearly indicating that class-overlap is an issue to be addressed.AGD employs the DeepInversion [40] to generate synthetic exemplars from previously learned classes, however, it still suffers from catastrophic forgetting.This is due to its reliance on a unified classifier, treating overlapping classes as new ones can result in the generation of a significant amount of noise.

Further discussion
Seen classes identification.To further study the potential of identifying seen classes, we compare our method with the baseline and OVA.We use precision (prec) and recall as metrics.Precision is calculated as the percentage of truly seen classes among the selected classes, and recall is calculated as the percentage of selected seen classes among all seen classes in the new data.The results in Table 4 show that our proposed method outperforms both the baseline and OVA methods in identifying seen classes, with comparable precision but higher recall scores in all datasets.This indicates that our method can effectively identify the seen classes in the new data, even when the data contains the domain shift.
Extension to multiple introduced cameras.To validate that our method can be applied in scenarios with multiple cameras increasing, we conducted further evaluations by including 2 cameras in the incremental step.Table 5 presents the performance of the final training process in this setting.Notably, our method continues to outperform other state-of-the-art methods even when additional cameras are introduced.This finding suggests that our method is capable of addressing a universal CIPR problem.

CONCLUSION
In this paper, we come up with a new yet very practical task, i.e., Camera Incremental person ReID (CIPR).We particularly emphasize the class-overlap issue brought by CIPR where the new camera might contain identities seen before and the ideal global crosscamera annotations are absent.To approach this task, we design a novel framework called ExtendOVA.In ExtendOVA, we address the class overlap issue by exploiting a One-vs-All detector combined with an early-stage regularization term to achieve the global pseudolabel assignment.Extensive experiments verify the effectiveness and superiority of ExtendOVA.

Figure 1 :
Figure 1: The comparison between camera incremental setting and previous setting in incremental person ReID.(a) The existing setting assumes that identities in new data are completely disjoint with previous data.(b) Our setting relaxes the strict class-disjoint assumption.Under our camera incremental setting, the new data will only have intracamera annotations, and may also contain previously seen people.

Figure 2 :
Figure 2: The proposed framework consists of three parts.The first part called Instance-wise Seen Class Identification, is used for detecting the seen and unseen samples using a One-vs-All detector before training.The second part is to generate ID-wise pseudo labels and further correct noisy labels by L Aux at the early training stage.The third part is cross-camera distillation, which leverages sampled surrogate features to regularize forgetting by forcing relationship to be maintained between cameras.

Figure 3 :
Figure 3: The distribution of identity number throughout training with different setups.

Figure 4 :
Figure 4: The confidence score distribution of seen and unseen samples produced by baseline (left), OVA detector (middle) and ExtendOVA (right) on Market-1501 in general setup.

Figure 5 :
Figure 5: Curves of CE loss and model performance on seen classes.ELR: Early stage learning regularization.

Figure 6 :
Figure 6: Anti-forgetting evaluation on MSMT17 and DukeMTMC in the general setup.mAP and Rank-1 score on the test set of original cameras (test set on step 1) during the training process.

Table 2 :
Ablation study of the contribution of ExtendOVA components during every incremental step in the exceptional setup.

Table 3 :
Compared with different distillation loss in the general setup.Base. is trained with L ID * + L Aux .

Table 4 :
Performance (%) of seen classes identification by our proposed method.We report the scores obtained in the second step.

Table 5 :
Results on the multiple-camera introduced setup.