Abstract
Advances in image-based dietary assessment methods have allowed nutrition professionals and researchers to improve the accuracy of dietary assessment, where images of food consumed are captured using smartphones or wearable devices. These images are then analyzed using computer vision methods to estimate energy and nutrition content of the foods. Food image segmentation, which determines the regions in an image where foods are located, plays an important role in this process. Current methods are data dependent and thus cannot generalize well for different food types. To address this problem, we propose a class-agnostic food image segmentation method. Our method uses a pair of eating scene images, one before starting eating and one after eating is completed. Using information from both the before and after eating images, we can segment food images by finding the salient missing objects without any prior information about the food class. We model a paradigm of top-down saliency that guides the attention of the human visual system based on a task to find the salient missing objects in a pair of images. Our method is validated on food images collected from a dietary study that showed promising results.
- Kiyoharu Aizawa and Makoto Ogawa. 2015. FoodLog: Multimedia tool for healthcare applications. IEEE MultiMedia 22, 2 (April 2015), 4–8. https://doi.org/10.1109/MMUL.2015.39Google Scholar
Digital Library
- Fabio Anselmi, Joel Z. Leibo, Lorenzo Rosasco, Jim Mutch, Andrea Tacchetti, and Tomaso Poggio. 2016. Unsupervised learning of invariant representations. Theoretical Computer Science 633 (2016), 112–121. https://doi.org/10.1016/j.tcs.2015.06.048Google Scholar
Digital Library
- Dana H. Ballard, Mary M. Hayhoe, and Jeff B. Pelz. 1995. Memory representations in natural tasks. Journal of Cognitive Neuroscience. 7, 1 (1995), 66–80. https://doi.org/10.1162/jocn.1995.7.1.66Google Scholar
Digital Library
- Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE Transactions on Image Processing 24, 12 (Dec. 2015), 5706–5722. https://doi.org/10.1109/TIP.2015.2487833Google Scholar
Digital Library
- Ali Borji and Laurent Itti. 2013. State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (Jan. 2013), 185–207.Google Scholar
Digital Library
- Ali Borji, Dicky N. Sihite, and Laurent Itti. 2013. Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing 22, 1 (Jan. 2013), 55–69. https://doi.org/10.1109/TIP.2012.2210727Google Scholar
- Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101—Mining discriminative components with random forests. In Proceedings of the 2014 European Conference on Computer Vision. 446–461. https://doi.org/10.1007/978-3-319-10599-4_29Google Scholar
- Chunshui Cao, Xianming Liu, Yi Yang, Yinan Yu, Jiang Wang, Zilei Wang, Yongzhen Huang, et al. 2015. Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision. 2956–2964. https://doi.org/10.1109/ICCV.2015.338Google Scholar
- Hsin-Chen Chen, Wenyan Jia, Xin Sun, Zhaoxin Li, Yuecheng Li, John D. Fernstrom, Lora E. Burke, Thomas Baranowski, and Mingui Sun. 2015. Saliency-aware food image segmentation for personal dietary assessment using a wearable computer. Measurement Science and Technology 26, 2 (2015), 025702.Google Scholar
Cross Ref
- Bethany L. Daugherty, TusaRebecca E. Schap, Reynolette Ettienne-Gittens, Fengqing M. Zhu, Marc Bosch, Edward J. Delp, David S. Ebert, Deborah A. Kerr, and Carol J. Boushey. 2012. Novel technologies for assessing dietary intake: Evaluating the usability of a mobile telephone food record among adults and adolescents. Journal of Medical Internet Research 14, 2 (April 2012), e58. https://doi.org/10.2196/jmir.1967Google Scholar
- Joachim Dehais, Marios Anthimopoulos, and Stavroula Mougiakakou. 2016. Food image segmentation for dietary assessment. In Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management. 23–28.Google Scholar
Digital Library
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848Google Scholar
Cross Ref
- Zijun Deng, Xiaowei Hu, Lei Zhu, Xuemiao Xu, Jing Qin, Guoqiang Han, and Pheng-Ann Heng. 2018. R3Net: Recurrent residual refinement network for saliency detection. In Proceedings of the 2018 International Joint Conference on Artificial Intelligence. 684–690. http://dl.acm.org/citation.cfm?id=3304415.3304513Google Scholar
- Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of the 31st International Conference on Machine Learning—Volume 32. I-647–I-655.Google Scholar
- Shaobo Fang, Chang Liu, Fengqing Zhu, Edward J. Delp, and Carol J. Boushey. 2015. Single-view food portion estimation based on geometric models. In Proceedings of the 2015 IEEE International Symposium on Multimedia. 385–390. https://doi.org/10.1109/ISM.2015.67Google Scholar
- S. Fang, Z. Shao, R. Mao, C. Fu, E. J. Delp, F. Zhu, D. A. Kerr, and C. J. Boushey. 2018. Single-view food portion estimation: Learning image-to-energy mappings using generative adversarial networks. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP’18). 251–255. https://doi.org/10.1109/ICIP.2018.8451461Google Scholar
- Shaobo Fang, Fengqing Zhu, Carol J. Boushey, and Edward J. Delp. 2017. The use of co-occurrence patterns in single image based food portion estimation. In Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing. 462–466. https://doi.org/10.1109/GlobalSIP.2017.8308685Google Scholar
- Garcia Ginny, Thankam S. Sunil, and Pedro Hinojosa. 2012. The fast food and obesity link: Consumption patterns and severity of obesity. Obesity Surgery 22, 5 (May 2012), 810–818. https://doi.org/10.1007/s11695-012-0601-8Google Scholar
- Amber J. Hammons and Barbara H. Fiese. 2011. Is frequency of shared family meals related to the nutritional health of children and adolescents? Pediatrics 127, 6 (June 2011), e1565–e1574.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Yoshiyuki Kawano and Keiji Yanai. 2015. Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. In Computer Vision—ECCV 2014 Workshops. Lecture Notes in Computer Science, Vol. 8927. Springer, 3–17. https://doi.org/10.1007/978-3-319-16199-0_1Google Scholar
- Yoshiyuki Kawano and Keiji Yanai. 2015. FoodCam: A real-time food recognition system on a smartphone. Multimedia Tools and Applications 74, 14 (July 2015), 5263–5287. https://doi.org/10.1007/978-3-319-04117-9_38Google Scholar
- Deborah A. Kerr, Amelia J. Harray, Christina M. Pollard, Satvinder S. Dhaliwal, Edward J. Delp, Peter A. Howat, Mark R. Pickering, et al. 2016. The connecting health and technology study: A 6-month randomized controlled trial to improve nutrition behaviours using a mobile food record and text messaging support in young adults. International Journal of Behavioral Nutrition and Physical Activity 13, 1 (2016), 52. https://doi.org/10.1186/s12966-016-0376-8Google Scholar
Cross Ref
- Salman Khan, Xuming He, Fatih Porikli, Mohammed Bennamoun, Ferdous Sohel, and Roberto Togneri. 2017. Learning deep structured network for weakly supervised change detection. In Proceedings of the 2017 International Joint Conference on Artificial Intelligence. 2008–2015. http://dl.acm.org/citation.cfm?id=3172077.3172167Google Scholar
- Salman H. Khan, Xuming He, Fatih Porikli, and Mohammed Bennamoun. 2017. Forest change detection in incomplete satellite images with deep neural networks. IEEE Transactions on Geoscience and Remote Sensing 55, 9 (Sept. 2017), 5407–5423. https://doi.org/10.1109/TGRS.2017.2707528Google Scholar
- Fanyu Kong and Jindong Tan. 2012. DietCam: Automatic dietary assessment with mobile camera phones. Pervasive and Mobile Computing 8, 1 (2012), 147–163. https://doi.org/10.1016/j.pmcj.2011.07.003Google Scholar
Digital Library
- Guanbin Li and Yizhou Yu. 2015. Visual saliency based on multiscale deep features. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 5455–5463. https://doi.org/10.1109/CVPR.2015.7299184Google Scholar
- Zhiming Luo, Akshaya Mishra, Andrew Achka, Justin Eichel, Shaozi Li, and Pierre-Marc Jodoin. 2017. Non-local deep features for salient object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 6593–6601. https://doi.org/10.1109/CVPR.2017.698Google Scholar
- David R. Martin, Charless C. Fowlkes, and Jitendra Malik. 2004. Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 5 (May 2004), 530–549. https://doi.org/10.1109/TPAMI.2004.1273918Google Scholar
- Yuji Matsuda, Hajime Hoashi, and Keiji Yana. 2012. Recognition of multiple-food images by detecting candidate regions. In Proceedings of the 2012 IEEE International Conference on Multimedia and Expo. 25–30. https://doi.org/10.1109/ICME.2012.157Google Scholar
Digital Library
- Megan McCrory, Mingui Sun, Edward Sazonov, Gary Frost, Alex Anderson, Wenyan Jia, Modou L. Jobarteh, et al. 2019. Methodology for objective, passive, image-and sensor-based assessment of dietary intake, meal-timing, and food-related activity in Ghana and Kenya (P13-028-19). Current Developments in Nutrition 3, Suppl. 1 (2019), nzz036–P13.Google Scholar
- A. E. Mesas, M. Muñoz-Pareja, E. López-García, and F. Rodríguez-Artalejo. 2012. Selected eating behaviours and excess body weight: A systematic review. Obesity Reviews 13, 2 (Feb. 2012), 106–135. https://doi.org/10.1111/j.1467-789X.2011.00936.xGoogle Scholar
- Karin Nordström, Christian Coff, Håkan Jönsson, Lennart Nordenfelt, and Ulf Görman. 2013. Food and health: Individual, cultural, or scientific matters? Genes & Nutrition 8, 4 (July 2013), 357–363. https://doi.org/10.1007/s12263-013-0336-8Google Scholar
- Koichi Okamoto and Keiji Yanai. 2016. An automatic calorie estimation system of food images on a smartphone. In Proceedings of the 2016 International Workshop on Multimedia Assisted Dietary Management. 63–70. https://doi.org/10.1145/2986035.2986040Google Scholar
- World Health Organization. 2009. Global Health Risks Mortality and Burden of Disease Attributable to Selected Major Risks. World Health Organization. https://apps.who.int/iris/handle/10665/44203Google Scholar
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17). 1–4.Google Scholar
- Federico Perazzi, Philipp Krahenbuhl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. 733–740. https://doi.org/10.1109/CVPR.2012.6247743Google Scholar
- Robert J. Peters and Laurent Itti. 2007. Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. 1–8. https://doi.org/10.1109/CVPR.2007.383337Google Scholar
- Carmen Piernas and Barry M. Popkin. 2011. Food portion patterns and trends among U.S. children and the relationship to total eating occasion size, 1977-2006. Journal of Nutrition 141, 6 (June 2011), 1159–1164. https://doi.org/10.3945/jn.111.138727Google Scholar
- Parisa Pouladzadeh, Shervin Shirmohammadi, and Rana Almaghrabi. 2014. Measuring calorie and nutrition from food image. IEEE Transactions on Instrumentation and Measurement 63, 8 (Aug. 2014), 1947–1956. https://doi.org/10.1109/TIM.2014.2303533Google Scholar
- P. R. Deshmukh-Taskar, T. A. Nicklas, C. E. O’Neil, D. R. Keast, J. D. Radcliffe, and S. Cho. 2010. The relationship of breakfast skipping and type of breakfast consumption with nutrient intake and weight status in children and adolescents: The national health and nutrition examination survey 1999-2006. Journal of the American Dietetic Association 110, 6 (June 2010), 869–878. https://doi.org/10.1016/j.jada.2010.03.023Google Scholar
- Achanta Radhakrishna, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Susstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (Nov. 2012). https://doi.org/10.1109/TPAMI.2012.120Google Scholar
- Vasili Ramanishka, Abir Das, Jianming Zhang, and Kate Saenko. 2017. Top-down visual saliency guided by captions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 3135–3144. https://doi.org/10.1109/CVPR.2017.334Google Scholar
- Vijay Rengarajan, Abhijith Punnappurath, A. N. Rajagopalan, and Guna Seetharaman. 2014. Efficient change detection for very large motion blurred images. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 315–322. https://doi.org/10.1109/CVPRW.2014.55Google Scholar
Digital Library
- Ken Sakurada and Takayuki Okatani. 2015. Change detection from a street image pair using CNN features and superpixel segmentation. In Proceedings of the 2015 British Machine Vision Conference. Article 61, 12 pages. https://doi.org/10.5244/C.29.61Google Scholar
- Jee-Seon Shim, Kyungwon Oh, and Hyeon Chang Kim. 2014. Dietary assessment methods in epidemiologic studies. Epidemiology and Health 36 (July 2014), e2014009–e2014009. https://doi.org/10.4178/epih/e2014009Google Scholar
- Wataru Shimoda and Keiji Yanai. 2015. CNN-based food image segmentation without pixel-wise annotation. In New Trends in Image Analysis and Processing—ICIAP 2015. Lecture Notes in Computer Science, Vol. 9281. Springer, 449–457. https://doi.org/10.1007/978-3-319-23222-5_55Google Scholar
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 2015 International Conference on Learning Representations.Google Scholar
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826.Google Scholar
- Rick Szeliski. 2004. Image Alignment and Stitching: A Tutorial. Technical Report MSR-TR-2004-92. Microsoft Research. https://www.microsoft.com/en-us/research/publication/image-alignment-and-stitching-a-tutorial/.Google Scholar
- Maria F. Vasiloglou, Stavroula Mougiakakou, Emilie Aubry, Anika Bokelmann, Rita Fricker, Filomena Gomes, Cathrin Guntermann, Alexa Meyer, Diana Studerus, and Zeno Stanga. 2018. A comparative study on carbohydrate estimation: GoCARB vs. Dietitians. Nutrients 10, 6 (2018), 741.Google Scholar
- Yu Wang, Fengqing Zhu, Carol J. Boushey, and Edward J. Delp. 2017. Weakly supervised food image segmentation using class activation maps. In Proceedings of the 2017 IEEE International Conference on Image Processing. 1277–1281. https://doi.org/10.1109/ICIP.2017.8296487Google Scholar
- Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang, and Xiang Ruan. 2017. Amulet: Aggregating multi-level convolutional features for salient object detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision. 202–211. https://doi.org/10.1109/ICCV.2017.31Google Scholar
- Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang, and Baocai Yi. 2017. Learning uncertain convolutional features for accurate saliency detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision. 212–221. https://doi.org/10.1109/ICCV.2017.32Google Scholar
- Weiyu Zhang, Qian Yu, Behjat Siddiquie, Ajay Divakaran, and Harpreet Sawhney. 2015. “Snap-n-Eat”: Food recognition and nutrition estimation on a smartphone. Journal of Diabetes Science and Technology 9, 3 (May 2015), 525–533. https://doi.org/10.1177/1932296815582222Google Scholar
Cross Ref
- Fengqing Zhu, Marc Bosch, Insoo Woo, SungYe Kim, Carol J. Boushey, David S. Ebert, and Edward J. Delp. 2015. Multiple hypotheses image segmentation and classification with application to dietary assessment. IEEE Journal of Biomedical and Health Informatics 19, 1 (Jan. 2015), 377–388. https://doi.org/10.1109/JBHI.2014.2304925Google Scholar
- Fengqing Zhu, Marc Bosch, Insoo Woo, SungYe Kim, Carol J. Boushey, David S. Ebert, and Edward J. Delp. 2010. The use of mobile devices in aiding dietary assessment and evaluation. IEEE Journal of Selected Topics in Signal Processing 4, 4 (Aug. 2010), 756–766. https://doi.org/10.1109/JSTSP.2010.2051471Google Scholar
Index Terms
Saliency-Aware Class-Agnostic Food Image Segmentation
Recommendations
A Large-Scale Benchmark for Food Image Segmentation
MM '21: Proceedings of the 29th ACM International Conference on MultimediaFood image segmentation is a critical and indispensible task for developing health-related applications such as estimating food calories and nutrients. Existing food image segmentation models are underperforming due to two reasons: (1) there is a lack ...
Food Image Segmentation for Dietary Assessment
MADiMa '16: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary ManagementThe prevalence of diet-related chronic diseases strongly impacts global health and health services. Currently, it takes training and strong personal involvement to manage or treat these diseases. One way to assist with dietary assessment is through ...
A New Large-scale Food Image Segmentation Dataset and Its Application to Food Calorie Estimation Based on Grains of Rice
MADiMa '19: Proceedings of the 5th International Workshop on Multimedia Assisted Dietary ManagementTo estimate food calorie accurately from food images, accurate food image segmentation is needed. So far no large-scale food image segmentation datasets which have pixel-wise labels exists. In this paper, we added segmentation masks to the food images ...






Comments