skip to main content
research-article

Joint Augmented and Compressed Dictionaries for Robust Image Classification

Published:24 February 2023Publication History
Skip Abstract Section

Abstract

Dictionary-based Classification (DC) has been a promising learning theory in multimedia computing. Previous studies focused on learning a discriminative dictionary as well as the sparsest representation based on the dictionary, to cope with the complex conditions in real-world applications. However, robustness by learning only one single dictionary is far from the optimal level. What is worse, it cannot take advantage of the available techniques proven in modern machine learning, like data augmentation, to mitigate the same problem. In this work, we propose a novel method that utilizes joint Augmented and Compressed Dictionaries for Robust Dictionary-based Classification (ACD-RDC). For optimization under the noise model introduced by real-world conditions, the objective function of ACD-RDC incorporates only two simple, but well-designed constraints, including one enhanced sparsity constraint by the general data augmentation, which requires less case-by-case and sophisticated tuning, and another discriminative constraint solved by a jointly learned dictionary. The optimization of the objective function is then deduced theoretically to an approximate linear problem. The sparsity and discrimination enhanced by data augmentation guarantees the robustness for image classification under various conditions, which constructs the first positive case using data augmentation to obtain robust dictionary-based classification. Numerous experiments have been conducted on popular facial and object image datasets. The results demonstrate that ACD-RDC obtains more promising classification on diversely collected images than the current dictionary-based classification methods. ACD-RDC is also confirmed to be a state-of-the-art classification method when using deep features as inputs.

REFERENCES

  1. Aharon Michal, Elad Michael, and Bruckstein Alfred. 2006. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing 54, 11 (2006), 4311.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Akhtar Naveed, Mian Ajmal, and Porikli Fatih. 2017a. Joint discriminative Bayesian dictionary and classifier learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11931202.Google ScholarGoogle ScholarCross RefCross Ref
  3. Akhtar Naveed, Shafait Faisal, and Mian Ajmal. 2017b. Efficient classification with sparsity augmented collaborative representation. Pattern Recognition 65 (2017), 136145.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cai Sijia, Zhang Lei, Zuo Wangmeng, and Feng Xiangchu. 2016. A probabilistic collaborative representation based approach for pattern classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 29502959.Google ScholarGoogle ScholarCross RefCross Ref
  5. Candes E. J. and Tao T.. 2005. Decoding by linear programming. IEEE Transactions on Information Theory 51, 12 (2005), 42034215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chen Bo, Zhang Hao, Zhang Xuefeng, Wen Wei, Liu Hongwei, and Liu Jun. 2015. Max-margin discriminant projection via data augmentation. IEEE Transactions on Knowledge and Data Engineering 27, 7 (2015), 19641976.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chen Zitian, Fu Yanwei, Chen Kaiyu, and Jiang Yu-Gang. 2019. Image block augmentation for one-shot learning. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Vol. 6. 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Coates Adam, Ng Andrew Y., and Lee Honglak. 2011. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Vol. 15. 215223.Google ScholarGoogle Scholar
  9. Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248255.Google ScholarGoogle ScholarCross RefCross Ref
  10. Donoho David L.. 2006. For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59, 6 (2006), 797829.Google ScholarGoogle ScholarCross RefCross Ref
  11. Donoho D. L., Tsaig Y., Drori I., and Starck J. L.. 2012. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Transactions on Information Theory 58, 2 (2012), 10941121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dvornik Nikita, Mairal Julien, and Schmid Cordelia. 2018. Modeling visual context is key to augmenting object detection datasets. In Proceedings of the European Conference on Computer Vision (ECCV’18). 364380.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 26722680.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gou Jianping, Ma Hongxing, Ou Weihua, Zeng Shaoning, Rao Yunbo, and Yang Hebiao. 2019. A generalized mean distance-based k-nearest neighbor classifier. Expert Systems with Applications 115 (2019), 356372.Google ScholarGoogle ScholarCross RefCross Ref
  15. Gou Jianping, Wang Lei, Yi Zhang, Yuan Yunhao, Ou Weihua, and Mao Qirong. 2020. Weighted discriminative collaborative competitive representation for robust image classification. Neural Networks 125 (2020), 104120.Google ScholarGoogle ScholarCross RefCross Ref
  16. Hariharan Bharath and Girshick Ross. 2017. Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the IEEE International Conference on Computer Vision. 30183027.Google ScholarGoogle ScholarCross RefCross Ref
  17. He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  18. Hobert James P.. 2011. The data augmentation algorithm: Theory and methodology. In Handbook of Markov Chain Monte Carlo (2011), 253293.Google ScholarGoogle Scholar
  19. Huang Ke and Aviyente Selin. 2007. Sparse representation for signal classification. In Advances in Neural Information Processing Systems. 609616.Google ScholarGoogle Scholar
  20. Hurley Niall and Rickard Scott. 2009. Comparing measures of sparsity. IEEE Transactions on Information Theory 55, 10 (2009), 47234741.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jiang Zhuolin, Lin Zhe, and Davis Larry S.. 2013. Label consistent K-SVD: Learning a discriminative dictionary for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 11 (2013), 26512664.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2009. Learning multiple layers of features from tiny images. arXiv (2009).Google ScholarGoogle Scholar
  23. Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 10971105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kundu Abhisek, Drineas Petros, and Magdon-Ismail Malik. 2017. Recovering PCA and sparse PCA via hybrid-(L1,L2) sparse sampling of data elements. The Journal of Machine Learning Research 18, 1 (2017), 25582591.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Li Sheng, Shao Ming, and Fu Yun. 2018. Person re-identification by cross-view multi-level dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 12 (2018), 29632977.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Li Zhengming, Lai Zhihui, Xu Yong, Yang Jian, and Zhang David. 2017. A locality-constrained and label embedding dictionary learning algorithm for image classification. IEEE Transactions on Neural Networks and Learning Systems 28, 2 (2017), 278293.Google ScholarGoogle ScholarCross RefCross Ref
  27. MacWilliams Florence Jessie and Sloane Neil James Alexander. 1977. The Theory of Error-correcting Codes. Vol. 16. Elsevier.Google ScholarGoogle Scholar
  28. Mairal Julien, Bach Francis, and Ponce Jean. 2011. Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 4 (2011), 791804.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Martinez Aleix M.. 1998. The AR Face Database. CVC Technical Report 24 (1998).Google ScholarGoogle Scholar
  30. Nene Sameer A., Nayar Shree K., and Murase Hiroshi. 1996. Columbia Object Image Library (coil-20).Google ScholarGoogle Scholar
  31. Ozdemir Onur, Allen Thomas G., Choi Sora, Wimalajeewa Thakshila, and Varshney Pramod K.. 2018. Copula based classifier fusion under statistical dependence. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 11 (2018), 27402748.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Pirš Gregor and Štrumbelj Erik. 2019. Bayesian combination of probabilistic classifiers using multivariate normal mixtures. Journal of Machine Learning Research 20, 51 (2019), 118.Google ScholarGoogle Scholar
  33. Polson Nicholas G. and Scott Steven L.. 2011. Data augmentation for support vector machines. Bayesian Analysis 6, 1 (2011), 123.Google ScholarGoogle ScholarCross RefCross Ref
  34. Ruan Wenjie, Wu Min, Sun Youcheng, Huang Xiaowei, Kroening Daniel, and Kwiatkowska Marta. 2019. Global robustness evaluation of deep neural networks with provable guarantees for the hamming distance. In International Joint Conference on Artificial Intelligence. Early Access.Google ScholarGoogle ScholarCross RefCross Ref
  35. Schroff Florian, Kalenichenko Dmitry, and Philbin James. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815823.Google ScholarGoogle ScholarCross RefCross Ref
  36. Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  37. Tanner Martin A. and Wong Wing Hung. 1987. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 82, 398 (1987), 528540.Google ScholarGoogle ScholarCross RefCross Ref
  38. Tao Shaozhe, Boley Daniel, and Zhang Shuzhong. 2016. Local linear convergence of ISTA and FISTA on the LASSO problem. SIAM Journal on Optimization 26, 1 (2016), 313336.Google ScholarGoogle ScholarCross RefCross Ref
  39. Tran Luan Quoc, Yin Xi, and Liu Xiaoming. 2018. Representation learning by rotating your faces. IEEE Transactions on Pattern Analysis and Machine Intelligence Early Access (2018), 114. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. Tran Toan, Pham Trung, Carneiro Gustavo, Palmer Lyle, and Reid Ian. 2017. A Bayesian data augmentation approach for learning deep models. In Advances in Neural Information Processing Systems. 27972806.Google ScholarGoogle Scholar
  41. Dyk David A. Van and Meng Xiao-Li. 2001. The art of data augmentation. Journal of Computational and Graphical Statistics 10, 1 (2001), 150.Google ScholarGoogle ScholarCross RefCross Ref
  42. Wang Hua, Nie Feiping, Huang Heng, and Ding Chris. 2013. Heterogeneous visual features fusion via sparse multimodal machine. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 30973102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Wang Yu-Xiong, Girshick Ross, Hebert Martial, and Hariharan Bharath. 2018. Low-shot learning from imaginary data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 72787286.Google ScholarGoogle ScholarCross RefCross Ref
  44. Wen Jie, Xu Yong, and Liu Hong. 2020. Incomplete multiview spectral clustering with adaptive graph learning. IEEE Transactions on Cybernetics 50, 4 (2020), 14181429.Google ScholarGoogle ScholarCross RefCross Ref
  45. Wolf L., Hassner T., and Maoz I.. 2011. Face recognition in unconstrained videos with matched background similarity. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 529534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Wright John, Ma Yi, Mairal Julien, Sapiro Guillermo, Huang Thomas S., and Yan Shuicheng. 2010. Sparse representation for computer vision and pattern recognition. Proc. IEEE 98, 6 (2010), 10311044.Google ScholarGoogle ScholarCross RefCross Ref
  47. Wright John, Yang Allen Y., Ganesh Arvind, Sastry Shankar S., and Ma Yi. 2009. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 2 (2009), 210227.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Xian Yongqin, Lorenz Tobias, Schiele Bernt, and Akata Zeynep. 2018. Feature generating networks for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 55425551.Google ScholarGoogle ScholarCross RefCross Ref
  49. Xiao Han, Rasul Kashif, and Vollgraf Roland. 2017. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:cs.LG/cs.LG/1708.07747Google ScholarGoogle Scholar
  50. Xiao Jianxiong, Ehinger Krista A., Hays James, Torralba Antonio, and Oliva Aude. 2016. Sun database: Exploring a large collection of scene categories. International Journal of Computer Vision 119, 1 (2016), 322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Xu Chang, Tao Dacheng, and Xu Chao. 2015. Multi-view learning with incomplete views. IEEE Transactions on Image Processing 24, 12 (2015), 58125825.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Xu Yong, Li Zhengming, Yang Jian, and Zhang David. 2017a. A survey of dictionary learning algorithms for face recognition. IEEE Access 5 (2017), 85028514.Google ScholarGoogle ScholarCross RefCross Ref
  53. Xu Yong, Li Zhengming, Zhang Bob, Yang Jian, and You Jane. 2017b. Sample diversity, representation effectiveness and robust dictionary learning for face recognition. Information Sciences 375 (2017), 171182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xu Yong, Zhu Qi, Chen Yan, Pan Jeng-Shyang, et al. 2013. An improvement to the nearest neighbor classifier and face recognition experiments. Int. J. Innov. Comput. Inf. Control 9, 2 (2013), 543554.Google ScholarGoogle Scholar
  55. Yamaguchi Shin’ya, Kanai Sekitoshi, and Eda Takeharu. 2020. Effective data augmentation with multi-domain learning GANs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 65666574.Google ScholarGoogle ScholarCross RefCross Ref
  56. Yan Chenggang, Li Liang, Zhang Chunjie, Liu Bingtao, Zhang Yongdong, and Dai Qionghai. 2019. Cross-modality bridging and knowledge transferring for image understanding. IEEE Transactions on Multimedia 21, 10 (2019), 26752685.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Yang Zhenguo, Li Qing, Wenyin Liu, and Lv Jianming. 2019. Shared multi-view data representation for multi-domain event detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, early access (2019), 114.Google ScholarGoogle Scholar
  58. You Shan, Xu Chang, Xu Chao, and Tao Dacheng. 2018. Learning with single-teacher multi-student. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 43904397.Google ScholarGoogle ScholarCross RefCross Ref
  59. Zeng Shaoning, Zhang Bob, Gou Jianping, and Xu Yong. 2020. Regularization on augmented data to diversify sparse representation for robust image classification. IEEE Transactions on Cybernetics (2020). DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  60. Zeng Shaoning, Zhang Bob, Zhang Yanghao, and Gou Jianping. 2018. Collaboratively weighting deep and classic representation via \(l_2\) regularization for image classification. In Proceedings of the Asian Conference on Machine Learning. 502517.Google ScholarGoogle Scholar
  61. Zhang Kaipeng, Zhang Zhanpeng, Li Zhifeng, and Qiao Yu. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 14991503.Google ScholarGoogle ScholarCross RefCross Ref
  62. Zhang Lei, Yang Meng, and Feng Xiangchu. 2011. Sparse representation or collaborative representation: Which helps face recognition?. In Proceedings of the 2011 International Conference on Computer Vision. IEEE, 471478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Zhang Qiang and Li Baoxin. 2010. Discriminative K-SVD for dictionary learning in face recognition. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 26912698.Google ScholarGoogle ScholarCross RefCross Ref
  64. Zhang Yan, Qu Yanyun, Li Cuihua, Lei Yunqi, and Fan Jianping. 2019. Ontology-driven hierarchical sparse coding for large-scale image classification. Neurocomputing 360 (2019), 209219.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Zhang Zheng, Xu Yong, Yang Jian, Li Xuelong, and Zhang David. 2015. A survey of sparse representation: Algorithms and applications. IEEE Access 3 (2015), 490530.Google ScholarGoogle ScholarCross RefCross Ref
  66. Zhong Zhun, Zheng Liang, Zheng Zhedong, Li Shaozi, and Yang Yi. 2019. CamStyle: A novel data augmentation method for person re-identification. IEEE Transactions on Image Processing 28, 3 (2019), 11761190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Zhu Jun, Chen Ning, Perkins Hugh, and Zhang Bo. 2014. Gibbs max-margin topic models with data augmentation. The Journal of Machine Learning Research 15, 1 (2014), 10731110.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Joint Augmented and Compressed Dictionaries for Robust Image Classification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 3s
        June 2023
        270 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3582887
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 February 2023
        • Online AM: 1 December 2022
        • Accepted: 20 November 2022
        • Revised: 5 April 2022
        • Received: 6 June 2021
        Published in tomm Volume 19, Issue 3s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)90
        • Downloads (Last 6 weeks)12

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!