skip to main content
research-article
Open Access

Multifaceted Analysis of Fine-Tuning in a Deep Model for Visual Recognition

Authors Info & Claims
Published:12 March 2020Publication History
Skip Abstract Section

Abstract

In recent years, convolutional neural networks (CNNs) have achieved impressive performance for various visual recognition scenarios. CNNs trained on large labeled datasets not only obtain significant performance on most challenging benchmarks but also provide powerful representations, which can be used for a wide range of other tasks. However, the requirement of massive amounts of data to train deep neural networks is a major drawback of these models, as the data available are usually limited or imbalanced. Fine-tuning is an effective way to transfer knowledge learned in a source dataset to a target task. In this article, we introduce and systematically investigate several factors that influence the performance of fine-tuning for visual recognition. These factors include parameters for the retraining procedure (e.g., the initial learning rate of fine-tuning), the distribution of the source and target data (e.g., the number of categories in the source dataset, the distance between the source and target datasets), and so on. We quantitatively and qualitatively analyze these factors, evaluate their influence, and present many empirical observations. The results reveal insights into what fine-tuning changes CNN parameters and provide useful and evidence-backed intuition about how to implement fine-tuning for computer vision tasks.

References

  1. Pulkit Agrawal, Ross Girshick, and Jitendra Malik. 2014. Analyzing the performance of multilayer neural networks for object recognition. In Proc. Eur. Conf. Comput. Vis. 329--344.Google ScholarGoogle Scholar
  2. Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. 2016. Factors of transferability for a generic convNet representation. IEEE Trans. Pattern Anal. Mach. Intell. 38, 9 (2016), 1790--1802.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Horace B. Barlow. 1972. Single units and sensation: A neuron doctrine for perceptual psychology?Perception 1, 4 (1972), 371--394.Google ScholarGoogle Scholar
  4. Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Mach. Learn. 79 (2010), 151--175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2007. Analysis of representations for domain adaptation. In Proc. Adv. Neural Inf. Process. Syst. 137--144.Google ScholarGoogle Scholar
  6. Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. 2007. Greedy layer-wise training of deep networks. In Proc. Adv. Neural Inf. Process. Syst. 153--160.Google ScholarGoogle Scholar
  7. Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. In Proc. Brit. Mach. Vis. Conf.Google ScholarGoogle ScholarCross RefCross Ref
  8. Sumit Chopra, Suhrid Balakrishnan, and Raghuraman Gopalan. 2013. DLID: Deep learning for domain adaptation by interpolating between domains. In Proc. 30th Int. Conf. Mach. Learn. Workshop Represent. Learn., Vol. 2.Google ScholarGoogle Scholar
  9. Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proc. 31th Int. Conf. Mach. Learn. 647--655.Google ScholarGoogle Scholar
  10. Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. 2010. Why does unsupervised pre-training help deep learning?Jour. Mach. Learn. Research 11 (2010), 625--660.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ross Girshick. 2015. Fast R-CNN. In Proc. IEEE Int. Conf. Comput. Vis. 1440--1448.Google ScholarGoogle Scholar
  12. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 580--587.Google ScholarGoogle Scholar
  13. Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proc. 28th Int. Conf. Mach. Learn. 513--520.Google ScholarGoogle Scholar
  14. Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features. In Proc. Eur. Conf. Comput. Vis. 392--407.Google ScholarGoogle Scholar
  15. Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra Malik. 2014. Learning rich features from RGB-D images for object detection and segmentation. In Proc. Eur. Conf. Comput. Vis. 345--360.Google ScholarGoogle Scholar
  16. Steven Gutstein, Olac Fuentes, and Eric Freudenthal. 2008. Knowledge transfer in deep convolutional neural net. Int. Jour. Artif. Intel. Tool. 17, 3 (2008), 555--567.Google ScholarGoogle Scholar
  17. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proc. Eur. Conf. Comput. Vis. 346--361.Google ScholarGoogle Scholar
  18. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  19. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In Proc. Eur. Conf. Comput. Vis. 630--645.Google ScholarGoogle Scholar
  20. Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google ScholarGoogle Scholar
  21. Seunghoon Hong, Junhyuk Oh, Honglak Lee, and Bohyung Han. 2016. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 3204--3212.Google ScholarGoogle Scholar
  22. Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 7132--7141.Google ScholarGoogle Scholar
  23. Gao Huang, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 4700--4708.Google ScholarGoogle Scholar
  24. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. 32th Int. Conf. Mach. Learn. 448--456.Google ScholarGoogle Scholar
  25. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proc. 22nd ACM Int. Conf. Multimedia. 675--678.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proc. Adv. Neural Inf. Process. Syst. 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xiangyang Li, Shuqiang Jiang, Xinhang Song, Luis Herranz, and Zhiping Shi. 2014. Multipath convolutional-recursive neural networks for object recognition. In Proc. Int. Conf. Intelligent Information Processing, Vol. 432. 269--277.Google ScholarGoogle Scholar
  28. Min Lin, Qiang Chen, and Shuichen Yan. 2014. Network in network. In Proc. Int. Conf. Learn. Representations.Google ScholarGoogle Scholar
  29. Heng Liu, Zilin Fu, Jungong Han, Ling Shao, Shudong Hou, and Yuezhong Chu. 2019. Single image super-resolution using multi-scale deep encoder--decoder with phase congruency edge map guidance. Inf. Sci. 473 (2019), 44--58.Google ScholarGoogle Scholar
  30. Lingqiao Liu, Chunhua Shen, and Anton van den Hengel. 2015. The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 4749--4757.Google ScholarGoogle Scholar
  31. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 3431--3440.Google ScholarGoogle Scholar
  32. Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. 2015. Learning transferable features with deep adaptation networks. In Proc. 32th Int. Conf. Mach. Learn. 97--105.Google ScholarGoogle Scholar
  33. Shangzhen Luan, Chen Chen, Baochang Zhang, Jungong Han, and Jianzhuang Liu. 2018. Gabor convolutional networks. IEEE Trans. Imag. Proces. 27, 9 (2018), 4357--4366.Google ScholarGoogle ScholarCross RefCross Ref
  34. George A. Miller. 1995. WordNet: A lexical database for English. Commun. ACM 38, 11 (1995), 39--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 1717--1724.Google ScholarGoogle Scholar
  36. Wanli Ouyang, Xiaogang Wang, Cong Zhang, and Xiaokang Yang. 2016. Factors in finetuning deep model for object detection with long-tail distribution. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 864--873.Google ScholarGoogle Scholar
  37. Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345--1359.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Vishal M. Patel, Raghuraman Gopalan, Ruonan Li, and Rama Chellappa. 2015. Visual domain adaptation: A survey of recent advances. IEEE Signal Process. Mag. 32, 3 (2015), 53--69.Google ScholarGoogle ScholarCross RefCross Ref
  39. Novi Patricia and Barbara Caputo. 2014. Learning to learn, from transfer learning to domain adaptation: A unifying perspective. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 1442--1449.Google ScholarGoogle Scholar
  40. Mattis Paulin, Jérôme Revaud, Zaid Harchaoui, Florent Perronnin, and Cordelia Schmid. 2014. Transformation pursuit for image classification. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 3646--3653.Google ScholarGoogle Scholar
  41. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. Adv. Neural Inf. Process. Syst. 91--99.Google ScholarGoogle Scholar
  42. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. ImageNet large scale visual recognition challenge. Int. Jour. Comp. Vis. 115, 3 (2015), 211--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. 2014. Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proc. Int. Conf. Learn. Representations.Google ScholarGoogle Scholar
  44. Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshop. 806--813.Google ScholarGoogle Scholar
  45. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep inside convolutional networks: Visualising image classification models and saliency maps. In Proc. Int. Conf. Learn. Representations Workshop.Google ScholarGoogle Scholar
  46. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proc. Int. Conf. Learn. Representations.Google ScholarGoogle Scholar
  47. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Proc. 31st AAAI Conf. Arti. Intellig. 4278--4284.Google ScholarGoogle Scholar
  48. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 1--9.Google ScholarGoogle Scholar
  49. Antonio Torralba and Alexei A. Efros. 2011. Unbiased look at dataset bias. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 1521--1528.Google ScholarGoogle Scholar
  50. Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. 2014. Deep domain confusion: Maximizing for domain invariance. arXiv:1412.3474. http://arxiv.org/abs/1412.3474Google ScholarGoogle Scholar
  51. Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proc. 25th Int. Conf. Mach. Learn. 1096--1103.Google ScholarGoogle Scholar
  52. Raimar Wagner, Markus Thom, Roland Schweiger, Gunther Palm, and Albrecht Rothermel. 2013. Learning convolutional neural networks from few samples. In Proc. Int. Joint Conf. Neural Netw. 1--7.Google ScholarGoogle Scholar
  53. Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2017. Growing a brain: Fine-tuning by increasing model capacity. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2471--2480.Google ScholarGoogle Scholar
  54. Gengshen Wu, Jungong Han, Yuchen Guo, Li Liu, Guiguang Ding, Qiang Ni, and Ling Shao. 2019. Unsupervised deep video hashing via balanced code for large-scale video retrieval. IEEE Tran. Imag. Proces. 28, 4 (2019), 1993--2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Gengshen Wu, Jungong Han, Zijia Lin, Guiguang Ding, Baochang Zhang, and Qiang Ni. 2018. Joint image-text hashing for fast large-scale cross-media retrieval using self-supervised deep learning. IEEE Trans. Indust. Elect. 66, 2 (2018), 9868--9877. DOI:https://doi.org/10.1109/TIE.2018.2873547Google ScholarGoogle ScholarCross RefCross Ref
  56. Gengshen Wu, Zijia Lin, Jungong Han, Li Liu, Guiguang Ding, Baochang Zhang, and jialie Shen. 2018. Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In Proc. 27th Inter. Joint Conf. Artif. Intel. 2854--2860.Google ScholarGoogle Scholar
  57. Donggeun Yoo, Sunggyun Park, Joon-Young Lee, and In So Kweon. 2015. Multi-scale pyramid pooling for deep convolutional representation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshop. 71--80.Google ScholarGoogle Scholar
  58. Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? In Proc. Adv. Neural Inf. Process. Syst. 3320--3328.Google ScholarGoogle Scholar
  59. Amir R. Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. 2018. Taskonomy: Disentangling task transfer learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 3712--3722.Google ScholarGoogle Scholar
  60. Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proc. Eur. Conf. Comput. Vis. 818--833.Google ScholarGoogle Scholar
  61. Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 6 (2017), 1452--1464.Google ScholarGoogle ScholarCross RefCross Ref
  62. Zongwei Zhou, Jae Shin, Lei Zhang, Suryakanth Gurudu, Michael Gotway, and Jianming Liang. 2017. Fine-tuning convolutional neural networks for biomedical image analysis: Actively and incrementally. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 7340--7351.Google ScholarGoogle Scholar

Index Terms

  1. Multifaceted Analysis of Fine-Tuning in a Deep Model for Visual Recognition

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!