Abstract
In recent years, convolutional neural networks (CNNs) have achieved impressive performance for various visual recognition scenarios. CNNs trained on large labeled datasets not only obtain significant performance on most challenging benchmarks but also provide powerful representations, which can be used for a wide range of other tasks. However, the requirement of massive amounts of data to train deep neural networks is a major drawback of these models, as the data available are usually limited or imbalanced. Fine-tuning is an effective way to transfer knowledge learned in a source dataset to a target task. In this article, we introduce and systematically investigate several factors that influence the performance of fine-tuning for visual recognition. These factors include parameters for the retraining procedure (e.g., the initial learning rate of fine-tuning), the distribution of the source and target data (e.g., the number of categories in the source dataset, the distance between the source and target datasets), and so on. We quantitatively and qualitatively analyze these factors, evaluate their influence, and present many empirical observations. The results reveal insights into what fine-tuning changes CNN parameters and provide useful and evidence-backed intuition about how to implement fine-tuning for computer vision tasks.
- Pulkit Agrawal, Ross Girshick, and Jitendra Malik. 2014. Analyzing the performance of multilayer neural networks for object recognition. In Proc. Eur. Conf. Comput. Vis. 329--344.Google Scholar
- Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. 2016. Factors of transferability for a generic convNet representation. IEEE Trans. Pattern Anal. Mach. Intell. 38, 9 (2016), 1790--1802.Google Scholar
Digital Library
- Horace B. Barlow. 1972. Single units and sensation: A neuron doctrine for perceptual psychology?Perception 1, 4 (1972), 371--394.Google Scholar
- Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Mach. Learn. 79 (2010), 151--175.Google Scholar
Digital Library
- Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2007. Analysis of representations for domain adaptation. In Proc. Adv. Neural Inf. Process. Syst. 137--144.Google Scholar
- Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. 2007. Greedy layer-wise training of deep networks. In Proc. Adv. Neural Inf. Process. Syst. 153--160.Google Scholar
- Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. In Proc. Brit. Mach. Vis. Conf.Google Scholar
Cross Ref
- Sumit Chopra, Suhrid Balakrishnan, and Raghuraman Gopalan. 2013. DLID: Deep learning for domain adaptation by interpolating between domains. In Proc. 30th Int. Conf. Mach. Learn. Workshop Represent. Learn., Vol. 2.Google Scholar
- Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proc. 31th Int. Conf. Mach. Learn. 647--655.Google Scholar
- Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. 2010. Why does unsupervised pre-training help deep learning?Jour. Mach. Learn. Research 11 (2010), 625--660.Google Scholar
Digital Library
- Ross Girshick. 2015. Fast R-CNN. In Proc. IEEE Int. Conf. Comput. Vis. 1440--1448.Google Scholar
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 580--587.Google Scholar
- Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proc. 28th Int. Conf. Mach. Learn. 513--520.Google Scholar
- Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features. In Proc. Eur. Conf. Comput. Vis. 392--407.Google Scholar
- Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra Malik. 2014. Learning rich features from RGB-D images for object detection and segmentation. In Proc. Eur. Conf. Comput. Vis. 345--360.Google Scholar
- Steven Gutstein, Olac Fuentes, and Eric Freudenthal. 2008. Knowledge transfer in deep convolutional neural net. Int. Jour. Artif. Intel. Tool. 17, 3 (2008), 555--567.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proc. Eur. Conf. Comput. Vis. 346--361.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 770--778.Google Scholar
Cross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In Proc. Eur. Conf. Comput. Vis. 630--645.Google Scholar
- Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google Scholar
- Seunghoon Hong, Junhyuk Oh, Honglak Lee, and Bohyung Han. 2016. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 3204--3212.Google Scholar
- Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 7132--7141.Google Scholar
- Gao Huang, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 4700--4708.Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. 32th Int. Conf. Mach. Learn. 448--456.Google Scholar
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proc. 22nd ACM Int. Conf. Multimedia. 675--678.Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proc. Adv. Neural Inf. Process. Syst. 1097--1105.Google Scholar
Digital Library
- Xiangyang Li, Shuqiang Jiang, Xinhang Song, Luis Herranz, and Zhiping Shi. 2014. Multipath convolutional-recursive neural networks for object recognition. In Proc. Int. Conf. Intelligent Information Processing, Vol. 432. 269--277.Google Scholar
- Min Lin, Qiang Chen, and Shuichen Yan. 2014. Network in network. In Proc. Int. Conf. Learn. Representations.Google Scholar
- Heng Liu, Zilin Fu, Jungong Han, Ling Shao, Shudong Hou, and Yuezhong Chu. 2019. Single image super-resolution using multi-scale deep encoder--decoder with phase congruency edge map guidance. Inf. Sci. 473 (2019), 44--58.Google Scholar
- Lingqiao Liu, Chunhua Shen, and Anton van den Hengel. 2015. The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 4749--4757.Google Scholar
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 3431--3440.Google Scholar
- Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. 2015. Learning transferable features with deep adaptation networks. In Proc. 32th Int. Conf. Mach. Learn. 97--105.Google Scholar
- Shangzhen Luan, Chen Chen, Baochang Zhang, Jungong Han, and Jianzhuang Liu. 2018. Gabor convolutional networks. IEEE Trans. Imag. Proces. 27, 9 (2018), 4357--4366.Google Scholar
Cross Ref
- George A. Miller. 1995. WordNet: A lexical database for English. Commun. ACM 38, 11 (1995), 39--41.Google Scholar
Digital Library
- Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 1717--1724.Google Scholar
- Wanli Ouyang, Xiaogang Wang, Cong Zhang, and Xiaokang Yang. 2016. Factors in finetuning deep model for object detection with long-tail distribution. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 864--873.Google Scholar
- Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345--1359.Google Scholar
Digital Library
- Vishal M. Patel, Raghuraman Gopalan, Ruonan Li, and Rama Chellappa. 2015. Visual domain adaptation: A survey of recent advances. IEEE Signal Process. Mag. 32, 3 (2015), 53--69.Google Scholar
Cross Ref
- Novi Patricia and Barbara Caputo. 2014. Learning to learn, from transfer learning to domain adaptation: A unifying perspective. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 1442--1449.Google Scholar
- Mattis Paulin, Jérôme Revaud, Zaid Harchaoui, Florent Perronnin, and Cordelia Schmid. 2014. Transformation pursuit for image classification. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 3646--3653.Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. Adv. Neural Inf. Process. Syst. 91--99.Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. ImageNet large scale visual recognition challenge. Int. Jour. Comp. Vis. 115, 3 (2015), 211--252.Google Scholar
Digital Library
- Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. 2014. Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proc. Int. Conf. Learn. Representations.Google Scholar
- Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshop. 806--813.Google Scholar
- Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep inside convolutional networks: Visualising image classification models and saliency maps. In Proc. Int. Conf. Learn. Representations Workshop.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proc. Int. Conf. Learn. Representations.Google Scholar
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Proc. 31st AAAI Conf. Arti. Intellig. 4278--4284.Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 1--9.Google Scholar
- Antonio Torralba and Alexei A. Efros. 2011. Unbiased look at dataset bias. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 1521--1528.Google Scholar
- Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. 2014. Deep domain confusion: Maximizing for domain invariance. arXiv:1412.3474. http://arxiv.org/abs/1412.3474Google Scholar
- Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proc. 25th Int. Conf. Mach. Learn. 1096--1103.Google Scholar
- Raimar Wagner, Markus Thom, Roland Schweiger, Gunther Palm, and Albrecht Rothermel. 2013. Learning convolutional neural networks from few samples. In Proc. Int. Joint Conf. Neural Netw. 1--7.Google Scholar
- Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2017. Growing a brain: Fine-tuning by increasing model capacity. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2471--2480.Google Scholar
- Gengshen Wu, Jungong Han, Yuchen Guo, Li Liu, Guiguang Ding, Qiang Ni, and Ling Shao. 2019. Unsupervised deep video hashing via balanced code for large-scale video retrieval. IEEE Tran. Imag. Proces. 28, 4 (2019), 1993--2007.Google Scholar
Digital Library
- Gengshen Wu, Jungong Han, Zijia Lin, Guiguang Ding, Baochang Zhang, and Qiang Ni. 2018. Joint image-text hashing for fast large-scale cross-media retrieval using self-supervised deep learning. IEEE Trans. Indust. Elect. 66, 2 (2018), 9868--9877. DOI:https://doi.org/10.1109/TIE.2018.2873547Google Scholar
Cross Ref
- Gengshen Wu, Zijia Lin, Jungong Han, Li Liu, Guiguang Ding, Baochang Zhang, and jialie Shen. 2018. Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In Proc. 27th Inter. Joint Conf. Artif. Intel. 2854--2860.Google Scholar
- Donggeun Yoo, Sunggyun Park, Joon-Young Lee, and In So Kweon. 2015. Multi-scale pyramid pooling for deep convolutional representation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshop. 71--80.Google Scholar
- Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? In Proc. Adv. Neural Inf. Process. Syst. 3320--3328.Google Scholar
- Amir R. Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. 2018. Taskonomy: Disentangling task transfer learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 3712--3722.Google Scholar
- Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proc. Eur. Conf. Comput. Vis. 818--833.Google Scholar
- Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 6 (2017), 1452--1464.Google Scholar
Cross Ref
- Zongwei Zhou, Jae Shin, Lei Zhang, Suryakanth Gurudu, Michael Gotway, and Jianming Liang. 2017. Fine-tuning convolutional neural networks for biomedical image analysis: Actively and incrementally. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 7340--7351.Google Scholar
Index Terms
Multifaceted Analysis of Fine-Tuning in a Deep Model for Visual Recognition
Recommendations
Deep CNN for Classification of Image Contents
IPMV '21: Proceedings of the 2021 3rd International Conference on Image Processing and Machine VisionIn recent years the classification of images has made great progress and has been used in many fields. However, it may not be possible to classify images perfectly through the CNN because of overfitting and gradient vanishing. Most existing CNNs have ...
An FPGA-based Fine Tuning Accelerator for a Sparse CNN
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFine-tuning learns abundant feature expression for a wide range of natural images by using a pre-trained CNN model. It can be applied to a wide range of the neural network (NN)based computer vision problems. This paper proposes an FPGA-based fine-tuning ...
Stereoscopic video quality measurement with fine-tuning 3D ResNets
AbstractRecently, Convolutional Neural Networks with 3D kernels (3D CNNs) have shown great superiority over 2D CNNs for video processing applications. In the field of Stereoscopic Video Quality Assessment (SVQA), 3D CNNs are utilized to extract the spatio-...






Comments