skip to main content
research-article
Open Access

Saliency-Aware Class-Agnostic Food Image Segmentation

Published:15 July 2021Publication History
Skip Abstract Section

Abstract

Advances in image-based dietary assessment methods have allowed nutrition professionals and researchers to improve the accuracy of dietary assessment, where images of food consumed are captured using smartphones or wearable devices. These images are then analyzed using computer vision methods to estimate energy and nutrition content of the foods. Food image segmentation, which determines the regions in an image where foods are located, plays an important role in this process. Current methods are data dependent and thus cannot generalize well for different food types. To address this problem, we propose a class-agnostic food image segmentation method. Our method uses a pair of eating scene images, one before starting eating and one after eating is completed. Using information from both the before and after eating images, we can segment food images by finding the salient missing objects without any prior information about the food class. We model a paradigm of top-down saliency that guides the attention of the human visual system based on a task to find the salient missing objects in a pair of images. Our method is validated on food images collected from a dietary study that showed promising results.

References

  1. Kiyoharu Aizawa and Makoto Ogawa. 2015. FoodLog: Multimedia tool for healthcare applications. IEEE MultiMedia 22, 2 (April 2015), 4–8. https://doi.org/10.1109/MMUL.2015.39Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Fabio Anselmi, Joel Z. Leibo, Lorenzo Rosasco, Jim Mutch, Andrea Tacchetti, and Tomaso Poggio. 2016. Unsupervised learning of invariant representations. Theoretical Computer Science 633 (2016), 112–121. https://doi.org/10.1016/j.tcs.2015.06.048Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dana H. Ballard, Mary M. Hayhoe, and Jeff B. Pelz. 1995. Memory representations in natural tasks. Journal of Cognitive Neuroscience. 7, 1 (1995), 66–80. https://doi.org/10.1162/jocn.1995.7.1.66Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE Transactions on Image Processing 24, 12 (Dec. 2015), 5706–5722. https://doi.org/10.1109/TIP.2015.2487833Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ali Borji and Laurent Itti. 2013. State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (Jan. 2013), 185–207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ali Borji, Dicky N. Sihite, and Laurent Itti. 2013. Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing 22, 1 (Jan. 2013), 55–69. https://doi.org/10.1109/TIP.2012.2210727Google ScholarGoogle Scholar
  7. Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101—Mining discriminative components with random forests. In Proceedings of the 2014 European Conference on Computer Vision. 446–461. https://doi.org/10.1007/978-3-319-10599-4_29Google ScholarGoogle Scholar
  8. Chunshui Cao, Xianming Liu, Yi Yang, Yinan Yu, Jiang Wang, Zilei Wang, Yongzhen Huang, et al. 2015. Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision. 2956–2964. https://doi.org/10.1109/ICCV.2015.338Google ScholarGoogle Scholar
  9. Hsin-Chen Chen, Wenyan Jia, Xin Sun, Zhaoxin Li, Yuecheng Li, John D. Fernstrom, Lora E. Burke, Thomas Baranowski, and Mingui Sun. 2015. Saliency-aware food image segmentation for personal dietary assessment using a wearable computer. Measurement Science and Technology 26, 2 (2015), 025702.Google ScholarGoogle ScholarCross RefCross Ref
  10. Bethany L. Daugherty, TusaRebecca E. Schap, Reynolette Ettienne-Gittens, Fengqing M. Zhu, Marc Bosch, Edward J. Delp, David S. Ebert, Deborah A. Kerr, and Carol J. Boushey. 2012. Novel technologies for assessing dietary intake: Evaluating the usability of a mobile telephone food record among adults and adolescents. Journal of Medical Internet Research 14, 2 (April 2012), e58. https://doi.org/10.2196/jmir.1967Google ScholarGoogle Scholar
  11. Joachim Dehais, Marios Anthimopoulos, and Stavroula Mougiakakou. 2016. Food image segmentation for dietary assessment. In Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management. 23–28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848Google ScholarGoogle ScholarCross RefCross Ref
  13. Zijun Deng, Xiaowei Hu, Lei Zhu, Xuemiao Xu, Jing Qin, Guoqiang Han, and Pheng-Ann Heng. 2018. R3Net: Recurrent residual refinement network for saliency detection. In Proceedings of the 2018 International Joint Conference on Artificial Intelligence. 684–690. http://dl.acm.org/citation.cfm?id=3304415.3304513Google ScholarGoogle Scholar
  14. Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of the 31st International Conference on Machine Learning—Volume 32. I-647–I-655.Google ScholarGoogle Scholar
  15. Shaobo Fang, Chang Liu, Fengqing Zhu, Edward J. Delp, and Carol J. Boushey. 2015. Single-view food portion estimation based on geometric models. In Proceedings of the 2015 IEEE International Symposium on Multimedia. 385–390. https://doi.org/10.1109/ISM.2015.67Google ScholarGoogle Scholar
  16. S. Fang, Z. Shao, R. Mao, C. Fu, E. J. Delp, F. Zhu, D. A. Kerr, and C. J. Boushey. 2018. Single-view food portion estimation: Learning image-to-energy mappings using generative adversarial networks. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP’18). 251–255. https://doi.org/10.1109/ICIP.2018.8451461Google ScholarGoogle Scholar
  17. Shaobo Fang, Fengqing Zhu, Carol J. Boushey, and Edward J. Delp. 2017. The use of co-occurrence patterns in single image based food portion estimation. In Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing. 462–466. https://doi.org/10.1109/GlobalSIP.2017.8308685Google ScholarGoogle Scholar
  18. Garcia Ginny, Thankam S. Sunil, and Pedro Hinojosa. 2012. The fast food and obesity link: Consumption patterns and severity of obesity. Obesity Surgery 22, 5 (May 2012), 810–818. https://doi.org/10.1007/s11695-012-0601-8Google ScholarGoogle Scholar
  19. Amber J. Hammons and Barbara H. Fiese. 2011. Is frequency of shared family meals related to the nutritional health of children and adolescents? Pediatrics 127, 6 (June 2011), e1565–e1574.Google ScholarGoogle Scholar
  20. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  21. Yoshiyuki Kawano and Keiji Yanai. 2015. Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. In Computer Vision—ECCV 2014 Workshops. Lecture Notes in Computer Science, Vol. 8927. Springer, 3–17. https://doi.org/10.1007/978-3-319-16199-0_1Google ScholarGoogle Scholar
  22. Yoshiyuki Kawano and Keiji Yanai. 2015. FoodCam: A real-time food recognition system on a smartphone. Multimedia Tools and Applications 74, 14 (July 2015), 5263–5287. https://doi.org/10.1007/978-3-319-04117-9_38Google ScholarGoogle Scholar
  23. Deborah A. Kerr, Amelia J. Harray, Christina M. Pollard, Satvinder S. Dhaliwal, Edward J. Delp, Peter A. Howat, Mark R. Pickering, et al. 2016. The connecting health and technology study: A 6-month randomized controlled trial to improve nutrition behaviours using a mobile food record and text messaging support in young adults. International Journal of Behavioral Nutrition and Physical Activity 13, 1 (2016), 52. https://doi.org/10.1186/s12966-016-0376-8Google ScholarGoogle ScholarCross RefCross Ref
  24. Salman Khan, Xuming He, Fatih Porikli, Mohammed Bennamoun, Ferdous Sohel, and Roberto Togneri. 2017. Learning deep structured network for weakly supervised change detection. In Proceedings of the 2017 International Joint Conference on Artificial Intelligence. 2008–2015. http://dl.acm.org/citation.cfm?id=3172077.3172167Google ScholarGoogle Scholar
  25. Salman H. Khan, Xuming He, Fatih Porikli, and Mohammed Bennamoun. 2017. Forest change detection in incomplete satellite images with deep neural networks. IEEE Transactions on Geoscience and Remote Sensing 55, 9 (Sept. 2017), 5407–5423. https://doi.org/10.1109/TGRS.2017.2707528Google ScholarGoogle Scholar
  26. Fanyu Kong and Jindong Tan. 2012. DietCam: Automatic dietary assessment with mobile camera phones. Pervasive and Mobile Computing 8, 1 (2012), 147–163. https://doi.org/10.1016/j.pmcj.2011.07.003Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Guanbin Li and Yizhou Yu. 2015. Visual saliency based on multiscale deep features. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 5455–5463. https://doi.org/10.1109/CVPR.2015.7299184Google ScholarGoogle Scholar
  28. Zhiming Luo, Akshaya Mishra, Andrew Achka, Justin Eichel, Shaozi Li, and Pierre-Marc Jodoin. 2017. Non-local deep features for salient object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 6593–6601. https://doi.org/10.1109/CVPR.2017.698Google ScholarGoogle Scholar
  29. David R. Martin, Charless C. Fowlkes, and Jitendra Malik. 2004. Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 5 (May 2004), 530–549. https://doi.org/10.1109/TPAMI.2004.1273918Google ScholarGoogle Scholar
  30. Yuji Matsuda, Hajime Hoashi, and Keiji Yana. 2012. Recognition of multiple-food images by detecting candidate regions. In Proceedings of the 2012 IEEE International Conference on Multimedia and Expo. 25–30. https://doi.org/10.1109/ICME.2012.157Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Megan McCrory, Mingui Sun, Edward Sazonov, Gary Frost, Alex Anderson, Wenyan Jia, Modou L. Jobarteh, et al. 2019. Methodology for objective, passive, image-and sensor-based assessment of dietary intake, meal-timing, and food-related activity in Ghana and Kenya (P13-028-19). Current Developments in Nutrition 3, Suppl. 1 (2019), nzz036–P13.Google ScholarGoogle Scholar
  32. A. E. Mesas, M. Muñoz-Pareja, E. López-García, and F. Rodríguez-Artalejo. 2012. Selected eating behaviours and excess body weight: A systematic review. Obesity Reviews 13, 2 (Feb. 2012), 106–135. https://doi.org/10.1111/j.1467-789X.2011.00936.xGoogle ScholarGoogle Scholar
  33. Karin Nordström, Christian Coff, Håkan Jönsson, Lennart Nordenfelt, and Ulf Görman. 2013. Food and health: Individual, cultural, or scientific matters? Genes & Nutrition 8, 4 (July 2013), 357–363. https://doi.org/10.1007/s12263-013-0336-8Google ScholarGoogle Scholar
  34. Koichi Okamoto and Keiji Yanai. 2016. An automatic calorie estimation system of food images on a smartphone. In Proceedings of the 2016 International Workshop on Multimedia Assisted Dietary Management. 63–70. https://doi.org/10.1145/2986035.2986040Google ScholarGoogle Scholar
  35. World Health Organization. 2009. Global Health Risks Mortality and Burden of Disease Attributable to Selected Major Risks. World Health Organization. https://apps.who.int/iris/handle/10665/44203Google ScholarGoogle Scholar
  36. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17). 1–4.Google ScholarGoogle Scholar
  37. Federico Perazzi, Philipp Krahenbuhl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. 733–740. https://doi.org/10.1109/CVPR.2012.6247743Google ScholarGoogle Scholar
  38. Robert J. Peters and Laurent Itti. 2007. Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. 1–8. https://doi.org/10.1109/CVPR.2007.383337Google ScholarGoogle Scholar
  39. Carmen Piernas and Barry M. Popkin. 2011. Food portion patterns and trends among U.S. children and the relationship to total eating occasion size, 1977-2006. Journal of Nutrition 141, 6 (June 2011), 1159–1164. https://doi.org/10.3945/jn.111.138727Google ScholarGoogle Scholar
  40. Parisa Pouladzadeh, Shervin Shirmohammadi, and Rana Almaghrabi. 2014. Measuring calorie and nutrition from food image. IEEE Transactions on Instrumentation and Measurement 63, 8 (Aug. 2014), 1947–1956. https://doi.org/10.1109/TIM.2014.2303533Google ScholarGoogle Scholar
  41. P. R. Deshmukh-Taskar, T. A. Nicklas, C. E. O’Neil, D. R. Keast, J. D. Radcliffe, and S. Cho. 2010. The relationship of breakfast skipping and type of breakfast consumption with nutrient intake and weight status in children and adolescents: The national health and nutrition examination survey 1999-2006. Journal of the American Dietetic Association 110, 6 (June 2010), 869–878. https://doi.org/10.1016/j.jada.2010.03.023Google ScholarGoogle Scholar
  42. Achanta Radhakrishna, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Susstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (Nov. 2012). https://doi.org/10.1109/TPAMI.2012.120Google ScholarGoogle Scholar
  43. Vasili Ramanishka, Abir Das, Jianming Zhang, and Kate Saenko. 2017. Top-down visual saliency guided by captions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 3135–3144. https://doi.org/10.1109/CVPR.2017.334Google ScholarGoogle Scholar
  44. Vijay Rengarajan, Abhijith Punnappurath, A. N. Rajagopalan, and Guna Seetharaman. 2014. Efficient change detection for very large motion blurred images. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 315–322. https://doi.org/10.1109/CVPRW.2014.55Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ken Sakurada and Takayuki Okatani. 2015. Change detection from a street image pair using CNN features and superpixel segmentation. In Proceedings of the 2015 British Machine Vision Conference. Article 61, 12 pages. https://doi.org/10.5244/C.29.61Google ScholarGoogle Scholar
  46. Jee-Seon Shim, Kyungwon Oh, and Hyeon Chang Kim. 2014. Dietary assessment methods in epidemiologic studies. Epidemiology and Health 36 (July 2014), e2014009–e2014009. https://doi.org/10.4178/epih/e2014009Google ScholarGoogle Scholar
  47. Wataru Shimoda and Keiji Yanai. 2015. CNN-based food image segmentation without pixel-wise annotation. In New Trends in Image Analysis and Processing—ICIAP 2015. Lecture Notes in Computer Science, Vol. 9281. Springer, 449–457. https://doi.org/10.1007/978-3-319-23222-5_55Google ScholarGoogle Scholar
  48. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 2015 International Conference on Learning Representations.Google ScholarGoogle Scholar
  49. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826.Google ScholarGoogle Scholar
  50. Rick Szeliski. 2004. Image Alignment and Stitching: A Tutorial. Technical Report MSR-TR-2004-92. Microsoft Research. https://www.microsoft.com/en-us/research/publication/image-alignment-and-stitching-a-tutorial/.Google ScholarGoogle Scholar
  51. Maria F. Vasiloglou, Stavroula Mougiakakou, Emilie Aubry, Anika Bokelmann, Rita Fricker, Filomena Gomes, Cathrin Guntermann, Alexa Meyer, Diana Studerus, and Zeno Stanga. 2018. A comparative study on carbohydrate estimation: GoCARB vs. Dietitians. Nutrients 10, 6 (2018), 741.Google ScholarGoogle Scholar
  52. Yu Wang, Fengqing Zhu, Carol J. Boushey, and Edward J. Delp. 2017. Weakly supervised food image segmentation using class activation maps. In Proceedings of the 2017 IEEE International Conference on Image Processing. 1277–1281. https://doi.org/10.1109/ICIP.2017.8296487Google ScholarGoogle Scholar
  53. Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang, and Xiang Ruan. 2017. Amulet: Aggregating multi-level convolutional features for salient object detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision. 202–211. https://doi.org/10.1109/ICCV.2017.31Google ScholarGoogle Scholar
  54. Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang, and Baocai Yi. 2017. Learning uncertain convolutional features for accurate saliency detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision. 212–221. https://doi.org/10.1109/ICCV.2017.32Google ScholarGoogle Scholar
  55. Weiyu Zhang, Qian Yu, Behjat Siddiquie, Ajay Divakaran, and Harpreet Sawhney. 2015. “Snap-n-Eat”: Food recognition and nutrition estimation on a smartphone. Journal of Diabetes Science and Technology 9, 3 (May 2015), 525–533. https://doi.org/10.1177/1932296815582222Google ScholarGoogle ScholarCross RefCross Ref
  56. Fengqing Zhu, Marc Bosch, Insoo Woo, SungYe Kim, Carol J. Boushey, David S. Ebert, and Edward J. Delp. 2015. Multiple hypotheses image segmentation and classification with application to dietary assessment. IEEE Journal of Biomedical and Health Informatics 19, 1 (Jan. 2015), 377–388. https://doi.org/10.1109/JBHI.2014.2304925Google ScholarGoogle Scholar
  57. Fengqing Zhu, Marc Bosch, Insoo Woo, SungYe Kim, Carol J. Boushey, David S. Ebert, and Edward J. Delp. 2010. The use of mobile devices in aiding dietary assessment and evaluation. IEEE Journal of Selected Topics in Signal Processing 4, 4 (Aug. 2010), 756–766. https://doi.org/10.1109/JSTSP.2010.2051471Google ScholarGoogle Scholar

Index Terms

  1. Saliency-Aware Class-Agnostic Food Image Segmentation

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Computing for Healthcare
            ACM Transactions on Computing for Healthcare  Volume 2, Issue 3
            Survey Paper
            July 2021
            226 pages
            ISSN:2691-1957
            EISSN:2637-8051
            DOI:10.1145/3476113
            Issue’s Table of Contents

            Copyright © 2021 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 July 2021
            • Accepted: 1 November 2020
            • Revised: 1 April 2020
            • Received: 1 July 2019
            Published in health Volume 2, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!