Abstract
Person re-identification (Re-ID) has been promoted by the significant success of convolutional neural networks (CNNs). However, the application of such CNN-based Re-ID methods depends on the tremendous consumption of computation and memory resources, which affects its development on resource-limited devices such as next generation AI chips. As a result, CNN binarization has attracted increasing attention, which leads to binary neural networks (BNNs). In this article, we propose a new BNN-based framework for efficient person Re-ID (BiRe-ID). In this work, we discover that the significant performance drop of binarized models for Re-ID task is caused by the degraded representation capacity of kernels and features. To address the issues, we propose the kernel and feature refinement based on generative adversarial learning (KR-GAL and FR-GAL) to enhance the representation capacity of BNNs. We first introduce an adversarial attention mechanism to refine the binarized kernels based on their real-valued counterparts. Specifically, we introduce a scale factor to restore the scale of 1-bit convolution. And we employ an effective generative adversarial learning method to train the attention-aware scale factor. Furthermore, we introduce a self-supervised generative adversarial network to refine the low-level features using the corresponding high-level semantic information. Extensive experiments demonstrate that our BiRe-ID can be effectively implemented on various mainstream backbones for the Re-ID task. In terms of the performance, our BiRe-ID surpasses existing binarization methods by significant margins, at the level even comparable with the real-valued counterparts. For example, on Market-1501, BiRe-ID achieves 64.0% mAP on ResNet-18 backbone, with an impressive 12.51× speedup in theory and 11.75× storage saving. In particular, the KR-GAL and FR-GAL methods show strong generalization on multiple tasks such as Re-ID, image classification, object detection, and 3D point cloud processing.
- [1] . 2017. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning. 214–223. Google Scholar
Digital Library
- [2] . 2016. 3D semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1534–1543.Google Scholar
Cross Ref
- [3] . 2018. Domain adaptation through synthesis for unsupervised person re-identification. In Proceedings of the European Conference on Computer Vision. 189–205.Google Scholar
- [4] . 2015. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012.Google Scholar
- [5] . 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1335–1344.Google Scholar
Cross Ref
- [6] . 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8789–8797.Google Scholar
Cross Ref
- [7] . 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. In Proceedings of the European Conference on Computer Vision. 3123–3131. Google Scholar
Digital Library
- [8] . 2013. Predicting parameters in deep learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2148–2156. Google Scholar
Digital Library
- [9] . 2010. The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 2 (2010), 303–338. Google Scholar
Digital Library
- [10] . 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (2010), 1627–1645. Google Scholar
Digital Library
- [11] . 2018. FD-GAN: Pose-guided feature distilling GAN for robust person re-identification. In Proceedings of the European Conference on Computer Vision. 1222–1233. Google Scholar
Digital Library
- [12] . 2014. Generative adversarial nets. In Proceedings of the European Conference on Computer Vision. 2672–2680. Google Scholar
Digital Library
- [13] . 2019. Projection convolutional neural networks for 1-bit CNNs via discrete back-propagation. In Proceedings of the AAAI Conference on Artificial Intelligence. 8344–8351. Google Scholar
Digital Library
- [14] . 2019. Bayesian optimized 1-Bit CNNs. In Proceedings of the IEEE International Conference on Computer Vision. 4909–4917.Google Scholar
Cross Ref
- [15] . 2017. Improved training of Wasserstein GANs. In Proceedings of the European Conference on Computer Vision. 5767–5777. Google Scholar
Digital Library
- [16] . 2015. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026–1034. Google Scholar
Digital Library
- [17] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [18] . 2018. Soft filter pruning for accelerating deep convolutional neural networks. In Proceedings of the International Joint Conference on Artificial Intelligence. 2234–2240. Google Scholar
Digital Library
- [19] . 2020. IAUnet: Global context-aware feature learning for person reidentification. IEEE Trans. Neural Netw. Learn. Syst. 32, 10 (2020).Google Scholar
- [20] . 2020. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42, 8 (2020), 2011–2023.Google Scholar
Digital Library
- [21] . 2018. Multi-pseudo regularized label for generated data in person re-identification. IEEE Trans. Image Process. 28, 3 (2018), 1391–1403.Google Scholar
Digital Library
- [22] . 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125–1134.Google Scholar
Cross Ref
- [23] . 2017. Local binary convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19–28.Google Scholar
Cross Ref
- [24] . 2018. Human semantic parsing for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1062–1071.Google Scholar
Cross Ref
- [25] . 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 1097–1105. Google Scholar
Digital Library
- [26] . 1990. Optimal brain damage. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 598–605. Google Scholar
Digital Library
- [27] . 2017. Pruning filters for efficient ConvNets. In Proceedings of the International Conference on Learning Representations. 1–13.Google Scholar
- [28] . 2014. DeepReid: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 152–159. Google Scholar
Digital Library
- [29] . 2017. Person re-identification by deep joint learning of multi-loss classification. In Proceedings of the International Joint Conference on Artificial Intelligence. 2194–2200. Google Scholar
Digital Library
- [30] . 2020. Rotated binary neural network. In Proceedings of the Conference on Advances in Neural Information Processing Systems.Google Scholar
- [31] . 2017. ESPACE: Accelerating convolutional neural networks via eliminating spatial and channel redundancy. In Proceedings of the AAAI Conference on Artificial Intelligence. 1424–1430. Google Scholar
Digital Library
- [32] . 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. 740–755.Google Scholar
Cross Ref
- [33] . 2017. Towards accurate binary convolutional neural network. In Proceedings of the European Conference on Computer Vision. 345–353. Google Scholar
Digital Library
- [34] . 2019. RBCN: Rectified binary convolutional networks for enhancing the performance of 1-bit DCNNs. In Proceedings of the International Joint Conference on Artificial Intelligence. 854–860. Google Scholar
Digital Library
- [35] . 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. 21–37.Google Scholar
Cross Ref
- [36] . 2020. ReActNet: Towards precise binary neural network with generalized activation functions. In Proceedings of the European Conference on Computer Vision. 143–159.Google Scholar
Digital Library
- [37] . 2018. Bi-Real Net: Enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm. In Proceedings of the European Conference on Computer Vision. 722–737.Google Scholar
Cross Ref
- [38] . 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2794–2802.Google Scholar
Cross Ref
- [39] . 2020. Training binary neural networks with real-to-binary convolutions. In Proceedings of the International Conference on Learning Representations. 1–11.Google Scholar
- [40] . 2018. Training wide residual networks for deployment using a single bit for each weight. In Proceedings of the International Conference on Learning Representations. 1–16.Google Scholar
- [41] . 2017. Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the International Conference on Machine Learning. 2642–2651. Google Scholar
Digital Library
- [42] . 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.Google Scholar
- [43] . 2017. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 5099–5108. Google Scholar
Digital Library
- [44] . 2020. BiPointNet: Binary neural network for point clouds. In Proceedings of the International Conference on Learning Representations. 1–24.Google Scholar
- [45] . 2020. Forward and backward information retention for accurate binary neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2250–2259.Google Scholar
Cross Ref
- [46] . 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525–542.Google Scholar
Cross Ref
- [47] . 2016. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2016), 1137–1149. Google Scholar
Digital Library
- [48] . 2018. Features for multi-target multi-camera tracking and re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6036–6046.Google Scholar
Cross Ref
- [49] . 2015. FitNets: Hints for thin deep nets. In Proceedings of the International Conference on Learning Representations. 1–13.Google Scholar
- [50] . 2017. Pose-driven deep convolutional model for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 3960–3969.Google Scholar
Cross Ref
- [51] . 2016. Deep attributes driven multi-camera person re-identification. In Proceedings of the European Conference on Computer Vision. 475–491.Google Scholar
Cross Ref
- [52] . 2018. Part-aligned bilinear representations for person re-identification. In Proceedings of the European Conference on Computer Vision. 402–419.Google Scholar
Cross Ref
- [53] . 2017. SVDNet for pedestrian retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3800–3808.Google Scholar
Cross Ref
- [54] . 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision. 480–496.Google Scholar
Digital Library
- [55] . 2018. Deep generative models for distribution-preserving lossy compression. In Proceedings of the European Conference on Computer Vision. 5933–5944. Google Scholar
Digital Library
- [56] . 2018. TBN: Convolutional neural network with ternary inputs and binary weights. In Proceedings of the European Conference on Computer Vision. 315–332.Google Scholar
Cross Ref
- [57] . 2018. Transferable joint attribute-identity deep learning for unsupervised person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2275–2284.Google Scholar
Cross Ref
- [58] . 2018. Modulated convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 840–848.Google Scholar
Cross Ref
- [59] . 2019. Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38, 5 (2019), 1–12. Google Scholar
Digital Library
- [60] . 2020. BiDet: An efficient binarized object detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2049–2058.Google Scholar
Cross Ref
- [61] . 2018. Person transfer GAN to bridge domain gap for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 79–88.Google Scholar
Cross Ref
- [62] . 2018. Where-and-when to look: Deep siamese attention networks for video-based person re-identification. IEEE Trans. Multimedia 21, 6 (2018), 1412–1424.Google Scholar
Digital Library
- [63] . 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912–1920.Google Scholar
- [64] . 2020. Isosceles constraints for person re-identification. IEEE Trans. Image Process. 29 (2020), 8930–8943.Google Scholar
Cross Ref
- [65] . 2021. Efficient structured pruning based on deep feature stabilization. Neural Comput. Applic. 33 (2021), 1–12.Google Scholar
- [66] . 2020. Convolutional neural network pruning: A survey. In Proceedings of the Chinese Control Conference. 7458–7463.Google Scholar
Cross Ref
- [67] . 2020. Amplitude suppression and direction activation in networks for 1-bit faster R-CNN. In Proceedings of the International Workshop on Embedded and Mobile Deep Learning. 19–24.Google Scholar
Digital Library
- [68] . 2021. Layer-wise searching for 1-bit detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5682–5691.Google Scholar
Cross Ref
- [69] . 2014. Deep metric learning for person re-identification. In Proceedings of the International Conference on Pattern Recognition. 34–39. Google Scholar
Digital Library
- [70] . 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision. 1116–1124. Google Scholar
Digital Library
- [71] . 2020. VehicleNet: Learning robust visual representation for vehicle re-identification. IEEE Trans. Multimedia. 23 (2020).Google Scholar
- [72] . 2019. Joint discriminative and generative learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2138–2147.Google Scholar
Cross Ref
- [73] . 2017. A discriminatively learned CNN embedding for person reidentification. ACM Trans. Multimedia Comput. Commun. Applic. 14, 1 (2017), 1–20. Google Scholar
Digital Library
- [74] . 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision. 3754–3762.Google Scholar
Cross Ref
- [75] . 2018. Generalizing a person retrieval model hetero-and homogeneously. In Proceedings of the European Conference on Computer Vision. 172–188.Google Scholar
Cross Ref
- [76] . 2019. Invariance matters: Exemplar memory for domain adaptive person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 598–607.Google Scholar
Cross Ref
- [77] . 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160.Google Scholar
- [78] . 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223–2232.Google Scholar
Cross Ref
Index Terms
BiRe-ID: Binary Neural Network for Efficient Person Re-ID
Recommendations
Cross-dataset person re-identification using deep convolutional neural networks: effects of context and domain adaptation
Over the past years, the impact of surveillance systems on public safety increases dramatically. One significant challenge in this domain is person re-identification, which aims to detect whether a person has already been captured by another camera in ...
A loss combination based deep model for person re-identification
The Convolutional Neural Network (CNN) has significantly improved the state-of-the-art in person re-identification (re-ID). In the existing available identification CNN model, the softmax loss function is employed as the supervision signal to train the ...
Deep feature embedding learning for person re-identification based on lifted structured loss
Person re-identification (re-id) aims at matching the same individual in videos captured by multiple cameras, and much progress has been made in recent years due to large scale pedestrian data sets and deep learning-based techniques. In this paper, we ...






Comments