skip to main content
research-article
Open Access

Understanding Human-side Impact of Sampling Image Batches in Subjective Attribute Labeling

Published:18 October 2021Publication History
Skip Abstract Section

Abstract

Capturing human annotators' subjective responses in image annotation has become crucial as vision-based classifiers expand the range of application areas. While there has been significant progress in image annotation interface design in general, relatively little research has been conducted to understand how to elicit reliable and cost-efficient human annotation when the nature of the task includes a certain level of subjectivity. To bridge this gap, we aim to understand how different sampling methods in image batch labeling, a design that allows human annotators to label a batch of images simultaneously, can impact human annotation performances. In particular, we developed three different strategies in forming image batches: (1) uncertainty-based labeling (UL) that prioritizes images that a classifier predicts with the highest uncertainty, (2) certainty-based labeling (CL), a reverse strategy of UL, and (3) random, a baseline approach that randomly selects images. Although UL and CL solely select images to be labeled from a classifier's point of view, we hypothesized that human-side perception and labeling performance may also vary depending on the different sampling strategies. In our study, we observed that participants were able to recognize a different level of perceived cognitive load across three conditions (CL the easiest while UL the most difficult). We also observed a trade-off between annotation task effectiveness (CL and UL more reliable than random) and task efficiency (UL the most efficient while CL the least efficient). Based on the results, we discuss the implications of design and possible future research directions of image batch labeling.

References

  1. Alan Aipe and Ujwal Gadiraju. 2018. Similarhits: Revealing the role of task similarity in microtask crowdsourcing. In Proceedings of the 29th on Hypertext and Social Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Najork Alonso, Marshall. 2014. Crowdsourcing a Subjective Labeling Task: A Human-Centered Framework to Ensure Reliable Results. Technical Report.Google ScholarGoogle Scholar
  3. Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney Tan. 2011. Effective End-user Interaction with Machine Learning. In Proc. the AAAI Conference on Artificial Intelligence (AAAI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proc. the International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  5. James V. Bradley. 1958. Complete Counterbalancing of Immediate Sequential Effects in a Latin Square Design. J. Amer. Statist. Assoc. (1958).Google ScholarGoogle Scholar
  6. Carrie J Cai, Shamsi T Iqbal, and Jaime Teevan. 2016. Chain reactions: The impact of order on microtask chains. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Xi Chen, Arpit Jain, and Larry S Davis. 2014. Object co-labeling in multiple images. In IEEE Winter Conference on Applications of Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  8. Minsuk Choi, Cheonbok Park, Soyoung Yang, Yonggyu Kim, Jaegul Choo, and Sungsoo Ray Hong. 2019. AILA: Attentive Interactive Labeling Assistant for Document Classification Through Attention-Based Deep Neural Networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 230:1--230:12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  10. John Joon Young Chung, Jean Y Song, Sindhu Kutty, Sungsoo Ray Hong, Juho Kim, and Walter S Lasecki. 2019. Efficient Elicitation Approaches to Estimate Collective Crowd Answers. Proc. of the ACM on Human-Computer Interaction (CSCW) (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Corinna Cortes and Vladimir Vapnik. 1995. Support-Vector Networks. Mach. Learn. (1995).Google ScholarGoogle Scholar
  12. Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  13. Jia Deng, Olga Russakovsky, Jonathan Krause, Michael S. Bernstein, Alex Berg, and Li Fei-Fei. 2014. Scalable Multi-label Annotation. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hui Ding, Kumar Sricharan, and Rama Chellappa. 2018. Exprgan: Facial expression editing with controllable expression intensity. Proc. the AAAI Conference on Artificial Intelligence (AAAI) (2018).Google ScholarGoogle ScholarCross RefCross Ref
  15. Jerry Alan Fails and Dan R. Olsen, Jr. 2003. Interactive Machine Learning. In Proc. of the 8th International Conference on Intelligent User Interfaces.Google ScholarGoogle Scholar
  16. Cristian Felix, Aritra Dasgupta, and Enrico Bertini. 2018a. The exploratory labeling assistant: Mixed-initiative label curation with large document collections. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 153--164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cristian Felix, Aritra Dasgupta, and Enrico Bertini. 2018b. The exploratory labeling assistant: Mixed-initiative label curation with large document collections. In Proc. of the Annual ACM Symposium on User Interface Software and Technology (UIST).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. JuiHsi Fu and SingLing Lee. 2013. Certainty-based active learning for sampling imbalanced datasets. Neurocomputing (2013). Intelligent Processing Techniques for Semantic-based Image and Video Retrieval.Google ScholarGoogle Scholar
  19. J. H. Fu and S. L. Lee. 2011. Certainty-Enhanced Active Learning for Improving Imbalanced Data Classification. In IEEE 11th International Conference on Data Mining Workshops.Google ScholarGoogle Scholar
  20. Yolanda Gil and Bart Selman. 2019. A 20-Year Community Roadmap for Artificial Intelligence Research in the US. arXiv (2019).Google ScholarGoogle Scholar
  21. E. R. Girden. 1992. ANOVA: Repeated measures .Sage.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ross Girshick. 2015. Fast R-CNN. In Proc. of the IEEE international conference on computer vision (ICCV).Google ScholarGoogle Scholar
  23. Shantanu Godbole, Abhay Harpale, Sunita Sarawagi, and Soumen Chakrabarti. 2004. Document Classification Through Interactive Supervision of Document and Term Labels.Google ScholarGoogle Scholar
  24. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proc. the Advances in Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  25. Ishaan Gulrajani, Faruk Ahmed, Mart'i n Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved Training of Wasserstein GANs. Proc. the Advances in Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  26. Yuhong Guo and Dale Schuurmans. 2007. Discriminative Batch Mode Active Learning. In Proc. the Advances in Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  27. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  28. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Gü nter Klambauer, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium. Proc. the Advances in Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  29. Fred Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2018. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Transactions on Visualization and Computer Graphics (TVCG) (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Fred Hohman, Kanit Wongsuphasawat, Mary Beth Kery, and Kayur Patel. 2020. Understanding and Visualizing Data Iteration in Machine Learning. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sungsoo Hong, Minhyang Suh, Nathalie Henry Riche, Jooyoung Lee, Juho Kim, and Mark Zachry. 2018a. Collaborative dynamic queries: Supporting distributed small group decision-making. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sungsoo Hong, Minhyang Suh, Tae Soo Kim, Irina Smoke, Sangwha Sien, Janet Ng, Mark Zachry, and Juho Kim. 2019. Design for Collaborative Information-Seeking: Understanding User Challenges and Deploying Collaborative Dynamic Queries. Proc. of the ACM on Human-Computer Interaction (CSCW) (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sungsoo Ray Hong, Jessica Hullman, and Enrico Bertini. 2020. Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs. Proc. of the ACM on Human-Computer Interaction (CSCW) (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sungsoo (Ray) Hong, Minhyang (Mia) Suh, Nathalie Henry Riche, Jooyoung Lee, Juho Kim, and Mark Zachry. 2018b. Collaborative Dynamic Queries: Supporting Distributed Small Group Decision-making. Proc. of the ACM on Human-Computer Interaction (CSCW).Google ScholarGoogle Scholar
  35. Xun Huang and Serge Belongie. 2017. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. In Proc. of the IEEE international conference on computer vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  36. Andreas Kirsch, Joost R. van Amersfoort, and Yarin Gal. 2019. BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning. Proc. the Advances in Neural Information Processing Systems (NeurIPS) (2019).Google ScholarGoogle Scholar
  37. Olga Korovina, Fabio Casati, Radoslaw Nielek, Marcos Baez, and Olga Berestneva. 2018. Investigating Crowdsourcing As a Method to Collect Emotion Labels for Images. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ronak Kosti, Jose M Alvarez, Adria Recasens, and Agata Lapedriza. 2017. EMOTIC: Emotions in Context dataset. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  39. C. Krell and H. Grant. 2010. Naïve learning algorithms utilized for the prediction of stock prices to compare economic models of decision making. In Proceedings of the Winter Simulation Conference (WSC).Google ScholarGoogle Scholar
  40. Ranjay A Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A Shamma, Li Fei-Fei, and Michael S Bernstein. 2016. Embracing error to enable rapid crowdsourcing. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Neural Information Processing Systems (NIPS).Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured Labeling for Facilitating Concept Evolution in Machine Learning. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Edith Law, Ming Yin, Joslin Goh, Kevin Chen, Michael A Terry, and Krzysztof Z Gajos. 2016. Curiosity killed the cat, but makes crowdwork better. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Lucian Leahu, Steve Schwenk, and Phoebe Sengers. 2008. Subjective Objectivity: Negotiating Emotional Meaning. In Proceedings of the 7th ACM Conference on Designing Interactive Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. David D. Lewis and William A. Gale. 1994. A Sequential Algorithm for Training Text Classifiers. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).Google ScholarGoogle Scholar
  46. Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollá r. 2020. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. (2020).Google ScholarGoogle ScholarCross RefCross Ref
  47. Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. 2017. SphereFace: Deep Hypersphere Embedding for Face Recognition. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  48. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild.. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Zhiwu Lu, Horace H. S. Ip, and Qizhen He. 2009. Context-based Multi-label Image Annotation. In Proceedings of the ACM International Conference on Image and Video Retrieval.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. G. Luo, P. Yang, M. Chen, and P. Li. 2020. HCI on the Table: Robust Gesture Recognition Using Acoustic Sensing in Your Hand. IEEE Access (2020).Google ScholarGoogle Scholar
  51. Adam Marcus and Aditya Parameswaran. 2015. Crowdsourced data management: Industry and academic perspectives. Foundations and Trends in Databases (2015).Google ScholarGoogle Scholar
  52. Francisco Massa and Ross Girshick. 2018. maskrcnn-benchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.Google ScholarGoogle Scholar
  53. Makoto Miwa, James Thomas, Alison O'Mara-Eves, and Sophia Ananiadou. 2014. Reducing systematic review workload through certainty-based screening. Journal of Biomedical Informatics (2014).Google ScholarGoogle Scholar
  54. Jose G. Moreno-Torres, Troy Raeder, Roc'iO Alaiz-Rodr'iGuez, Nitesh V. Chawla, and Francisco Herrera. 2012. A Unifying View on Dataset Shift in Classification. Pattern Recognition (2012).Google ScholarGoogle Scholar
  55. Fionn Murtagh and Pierre Legendre. 2014. Ward's Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward's Criterion? J. Classif. (2014).Google ScholarGoogle Scholar
  56. Edward Newell and Derek Ruths. 2016. How one microtask affects another. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Erik Ofgang. 2021 (accessed January 7th). Remote vs. In-person Classes: What the Data Shows. https://www.techlearninguniversity.com/news/remote-vs-in-person-classes-what-the-data-showGoogle ScholarGoogle Scholar
  58. O. M. Parkhi, A. Vedaldi, and A. Zisserman. 2015. Deep Face Recognition. In British Machine Vision Conference.Google ScholarGoogle Scholar
  59. László Polgár. 1989. Bring Up Genius! Interart, Budapest, Hungary.Google ScholarGoogle Scholar
  60. S. Prasad, P. Kumar, and K. P. Sinha. 2014. A wireless dynamic gesture user interface for HCI using hand data glove. In Seventh International Conference on Contemporary Computing (IC3).Google ScholarGoogle Scholar
  61. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  62. Tobias Scheffer, Christian Decomain, and Stefan Wrobel. 2001. Active Hidden Markov Models for Information Extraction. In Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  64. Burr Settles. 2010. Active Learning Literature Survey. (2010).Google ScholarGoogle Scholar
  65. Burr Settles and Mark Craven. 2008. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. Empirical Methods in Natural Language Processing (EMNLP).Google ScholarGoogle Scholar
  66. Claude Elwood Shannon. 1948. A Mathematical Theory of Communication. The Bell System Technical Journal (1948).Google ScholarGoogle Scholar
  67. Y. Shen, P. Luo, P. Luo, J. Yan, X. Wang, and X. Tang. 2018. FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google ScholarGoogle Scholar
  68. Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computing Research Repository (CoRR) (2014).Google ScholarGoogle Scholar
  69. C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google ScholarGoogle Scholar
  70. Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proc. the International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  71. Anne Treisman. 1982. Perceptual grouping and attention in visual search for features and for objects. Journal of experimental psychology: human perception and performance (1982).Google ScholarGoogle Scholar
  72. Alexey Tsymbal. 2004. The Problem of Concept Drift: Definitions and Related Work. (2004).Google ScholarGoogle Scholar
  73. Laurens van der Maaten and Geoffrey Hinton. 2008. Viualizing data using t-SNE. Journal of Machine Learning Research (2008).Google ScholarGoogle Scholar
  74. Luis von Ahn and Laura Dabbish. 2004. Labeling Images with a Computer Game. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Abraham Wald. 1943. On the Efficient Design of Statistical Investigations. The Annals of Mathematical Statistics (1943).Google ScholarGoogle Scholar
  76. Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018. CosFace: Large Margin Cosine Loss for Deep Face Recognition. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  77. Gerhard Widmer and Miroslav Kubat. 1996. Learning in the presence of concept drift and hidden contexts. Machine Learning (1996).Google ScholarGoogle Scholar
  78. Janyce Wiebe, Theresa Wilson, Rebecca Bruce, Matthew Bell, and Melanie Martin. 2004. Learning Subjective Language. Comput. Linguist. (2004).Google ScholarGoogle Scholar
  79. J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. 2009. Robust Face Recognition via Sparse Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2009).Google ScholarGoogle Scholar
  80. Yi Yang, Fei Wu, Feiping Nie, Heng Tao Shen, Yueting Zhuang, and Alexander G Hauptmann. 2011. Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Transactions on Image Processing (2011).Google ScholarGoogle Scholar
  81. T. Zhang, A. Wiliem, S. Yang, and B. Lovell. 2018. TV-GAN: Generative Adversarial Network Based Thermal to Visible Face Recognition. In International Conference on Biometrics (ICB).Google ScholarGoogle Scholar
  82. Wei Zhang, Yao Lu, Xiangyang Xue, and Jianping Fan. 2011. Automatic Image Annotation with Weakly Labeled Dataset. In Proceedings of the 19th ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Zijian Zhang, Jaspreet Singh, Ujwal Gadiraju, and Avishek Anand. 2019. Dissonance between human and machine understanding. Proc. of the ACM on Human-Computer Interaction (CSCW) (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Shiquan Zhao, Jian Wu, Victor S. Sheng, Chen Ye, Pengpeng Zhao, and Zhiming Cui. 2015. Weak Labeled Multi-Label Active Learning for Image Classification. In Proceedings of the 23rd ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. 2003. Face Recognition: A Literature Survey. Comput. Surveys (2003).Google ScholarGoogle Scholar
  86. Jingbo Zhu and Matthew Ma. 2012. Uncertainty-Based Active Learning with Instability Estimation for Text Classification. ACM Trans. Speech Lang. Process. (2012).Google ScholarGoogle Scholar

Index Terms

  1. Understanding Human-side Impact of Sampling Image Batches in Subjective Attribute Labeling

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!