Abstract
With the significant development of black-box machine learning algorithms, particularly deep neural networks, the practical demand for reliability assessment is rapidly increasing. On the basis of the concept that “Bayesian deep learning knows what it does not know,” the uncertainty of deep neural network outputs has been investigated as a reliability measure for classification and regression tasks. By considering an embedding task as a regression task, several existing studies have quantified the uncertainty of embedded features and improved the retrieval performance of cutting-edge models by model averaging. However, in image-caption embedding-and-retrieval tasks, well-known samples are not always easy to retrieve. This study shows that the existing method has poor performance in reliability assessment and investigates another aspect of image-caption embedding-and-retrieval tasks. We propose posterior uncertainty by considering the retrieval task as a classification task, which can accurately assess the reliability of retrieval results. The consistent performance of the two uncertainty measures is observed with different datasets (MS-COCO and Flickr30k), different deep-learning architectures (dropout and batch normalization), and different similarity functions. To the best of our knowledge, this is the first study to perform a reliability assessment on image-caption embedding-and-retrieval tasks.
- Andrei Atanov, Arsenii Ashukha, Dmitry Molchanov, Kirill Neklyudov, and Dmitry Vetrov. 2018. Uncertainty estimation via stochastic batch normalization. In International Conference on Learning Representations Workhosp (ICLRW’18).Google Scholar
- David Barber and Christopher M. Bishopt. 1997. Ensemble learning for multi-layer networks. In Advances in Neural Information Processing Systems (NIPS’97).Google Scholar
- Yoshua Bengio. 2012. Deep learning of representations for unsupervised and transfer learning. In ICML Workshop.Google Scholar
- Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 8 (2013).Google Scholar
Digital Library
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.Google Scholar
Digital Library
- Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2019. A closer look at few-shot classification. In International Conference on Learning Representations (ICLR’19).Google Scholar
- Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP’14).Google Scholar
Cross Ref
- Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09).Google Scholar
Cross Ref
- Jianfeng Dong, Xirong Li, and Cees G. M. Snoek. 2018. Predicting visual features from text for image and video caption retrieval. IEEE Trans. Multimedia 20, 12 (2018).Google Scholar
Digital Library
- Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, and Xun Wang. 2019. Dual encoding for zero-example video retrieval. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google Scholar
Cross Ref
- Martin Engilberge, Louis Chevallier, Patrick Pérez, and Matthieu Cord. 2018. Finding beans in burgers: Deep semantic-visual embedding with localization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google Scholar
Cross Ref
- Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE++: Improving visual-semantic embeddings with hard negatives. In British Machine Vision Conference (BMVC’18).Google Scholar
- Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc’Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A deep visual-semantic embedding model. Advances in Neural Information Processing Systems (NIPS’13).Google Scholar
- Yarin Gal. 2016. Uncertainty in Deep Learning. Ph.D. Dissertation. University of Cambridge.Google Scholar
- Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (ICML’16).Google Scholar
- Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning (ICML’15).Google Scholar
- Jiuxiang Gu, Jianfei Cai, Shafiq Joty, Li Niu, and Gang Wang. 2018. Look, imagine and match: Improving textual-visual cross-modal retrieval with generative models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google Scholar
Cross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google Scholar
Cross Ref
- Matthias Hein, Maksym Andriushchenko, and Julian Bitterwolf. 2019. Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google Scholar
Cross Ref
- Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. -VAE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR’17).Google Scholar
- Geoffrey E. Hinton and Drew van Camp. 1993. Keeping the neural networks simple by minimizing the description length of the weights. In Annual Conference on Computational Learning Theory (COLT’93).Google Scholar
- Po-Yu Huang, Wan-Ting Hsu, Chun-Yueh Chiu, Ting-Fan Wu, and Min Sun. 2018. Efficient uncertainty estimation for semantic segmentation in videos. In European Conference on Computer Vision (ECCV’18).Google Scholar
Cross Ref
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML’15).Google Scholar
- Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16).Google Scholar
Digital Library
- Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).Google Scholar
Cross Ref
- Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in Bayesian deep learning for computer vision? In Advances in Neural Information Processing Systems (NIPS’17).Google Scholar
- Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla, Vijay Badrinarayanan, and Roberto Cipolla. 2017. Bayesian SegNet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. In British Machine Vision Conference (BMVC’17).Google Scholar
Cross Ref
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR’15).Google Scholar
- Diederik P. Kingma, Tim Salimans, Max Welling, and Machine Learning Group. 2015. Variational dropout and the local reparameterization trick. In Advances in Neural Information Processing Systems (NIPS’15).Google Scholar
- Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational Bayes. In International Conference on Learning Representations (ICLR’14).Google Scholar
- Ryan Kiros, Ruslan Salakhutdinov, and Richard S. Zemel. 2014. Unifying visual-semantic embeddings with multimodal neural language models. In NIPS Workshop.Google Scholar
- Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? Does it matter? Struct. Safety 31, 2 (2009).Google Scholar
- Christian Leibig and Siegfried Wahl. 2016. Discriminative Bayesian neural networks know what they do not know. In NIPS Workshop.Google Scholar
- Kunpeng Li, Yulun Zhang, K. Li, Yuanyuan Li, and Yun Fu. 2019. Visual semantic reasoning for image-text matching. In IEEE International Conference on Computer Vision (ICCV’19).Google Scholar
Cross Ref
- Yuncheng Li, Yale Song, and Jiebo Luo. 2017. Improving pairwise ranking for multi-label image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
Cross Ref
- Tsung Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV’14).Google Scholar
- David J. C. MacKay. 1992. A practical Bayesian framework for backpropagation networks. Neural Comput. 4, 3 (1992).Google Scholar
- Andrey Malinin and Mark Gales. 2018. Predictive uncertainty estimation via prior networks. In Advances in Neural Information Processing Systems (NIPS’18).Google Scholar
- Takashi Matsubara, Ryosuke Tachibana, and Kuniaki Uehara. 2018. Anomaly machine component detection by deep generative model with unregularized score. In International Joint Conference on Neural Networks (IJCNN’18).Google Scholar
Cross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NIPS’13).Google Scholar
- Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. 2015. Distributional smoothing with virtual adversarial training. In International Conference on Learning Representations (ICLR’15).Google Scholar
- Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, and Kate Saenko. 2018. Object hallucination in image captioning. In Conference on Empirical Methods in Natural Language Processing (EMNLP’18).Google Scholar
Cross Ref
- Lorenzo Rosasco, Ernesto De Vito, Andrea Caponnetto, Michele Piana, and Alessandro Verri. 2004. Are loss functions all the same? Neural Comput. 16, 5 (2004).Google Scholar
- Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved training of wasserstein GANs. In Advances in Neural Information Processing Systems (NIPS’17).Google Scholar
- Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Netw. 61 (2015).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR’15).Google Scholar
- Lewis Smith and Yarin Gal. 2018. Understanding measures of uncertainty for adversarial example detection. In Uncertainty in Artificial Intelligence (UAI’18).Google Scholar
- Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15 (2014).Google Scholar
- Ahmed Taha, Yi-ting Chen, Teruhisa Misu, Abhinav Shrivastava, and Larry Davis. 2019. Unsupervised data uncertainty learning in visual retrieval systems. CoRR, abs/1902.02586.Google Scholar
- Ahmed Taha, Yi-Ting Chen, Xitong Yang, Teruhisa Misu, and Larry Davis. 2019. Exploring uncertainty in conditional multi-modal retrieval systems. CoRR, abs/1901.07702.Google Scholar
- Ryo Takahashi, Takashi Matsubara, and Kuniaki Uehara. 2018. RICAP: Random image cropping and patching data augmentation for deep CNNs. In Asian Conference on Machine Learning (ACML’18).Google Scholar
- Sumio Watanabe. 2010. Equations of states in singular statistical estimation. Neural Netw. 23, 1 (2010).Google Scholar
- Jason Weston, Samy Bengio, and Nicolas Usunier. 2010. Large scale image annotation: Learning to rank with joint word-image embeddings. In European Conference on Machine Learning (ECML’10).Google Scholar
Digital Library
- Yijun Xiao and William Yang Wang. 2019. Quantifying uncertainties in natural language processing tasks. In AAAI Conference on Artificial Intelligence (AAAI’19).Google Scholar
Cross Ref
- Semih Yagcioglu, Aykut Erdem, Erkut Erdem, and Nazli Ikizler-Cinbis. 2018. RecipeQA: A challenge dataset for multimodal comprehension of cooking recipes. In Conference on Empirical Methods in Natural Language Processing (EMNLP’18).Google Scholar
Cross Ref
- M. H. Peter Young, Alice Lai, and J. Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Ling. 2 (2014).Google Scholar
- Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations (ICLR’18).Google Scholar
- Li Zhang, Tao Xiang, and Shaogang Gong. 2017. Learning a deep embedding model for zero-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
Cross Ref
- Quanshi Zhang, Wenguan Wang, and Song-Chun Zhu. 2018. Examining CNN representations with respect to dataset bias. In AAAI Conference on Artificial Intelligence (AAAI’18).Google Scholar
- Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, and Yi-Dong Shen. 2021. Dual-path convolutional image-text embedding with instance loss. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 2 (2020).Google Scholar
Index Terms
Exploring Uncertainty Measures for Image-caption Embedding-and-retrieval Task
Recommendations
Numerical approach for quantification of epistemic uncertainty
In the field of uncertainty quantification, uncertainty in the governing equations may assume two forms: aleatory uncertainty and epistemic uncertainty. Aleatory uncertainty can be characterised by known probability distributions whilst epistemic ...
Quantifying Uncertainty in Discrete-Continuous and Skewed Data with Bayesian Deep Learning
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningDeep Learning (DL) methods have been transforming computer vision with innovative adaptations to other domains including climate change. For DL to pervade Science and Engineering (S&EE) applications where risk management is a core component, well-...
Uncertainty Quantification and Estimation in Medical Image Classification
Artificial Neural Networks and Machine Learning – ICANN 2021AbstractDeep Neural Networks (DNNs) have shown tremendous success in numerous AI-related fields. However, despite DNNs exhibiting remarkable performance, they still can make mistakes. Therefore, estimation and quantification of uncertainty have become an ...






Comments