Abstract
Capturing human annotators' subjective responses in image annotation has become crucial as vision-based classifiers expand the range of application areas. While there has been significant progress in image annotation interface design in general, relatively little research has been conducted to understand how to elicit reliable and cost-efficient human annotation when the nature of the task includes a certain level of subjectivity. To bridge this gap, we aim to understand how different sampling methods in image batch labeling, a design that allows human annotators to label a batch of images simultaneously, can impact human annotation performances. In particular, we developed three different strategies in forming image batches: (1) uncertainty-based labeling (UL) that prioritizes images that a classifier predicts with the highest uncertainty, (2) certainty-based labeling (CL), a reverse strategy of UL, and (3) random, a baseline approach that randomly selects images. Although UL and CL solely select images to be labeled from a classifier's point of view, we hypothesized that human-side perception and labeling performance may also vary depending on the different sampling strategies. In our study, we observed that participants were able to recognize a different level of perceived cognitive load across three conditions (CL the easiest while UL the most difficult). We also observed a trade-off between annotation task effectiveness (CL and UL more reliable than random) and task efficiency (UL the most efficient while CL the least efficient). Based on the results, we discuss the implications of design and possible future research directions of image batch labeling.
- Alan Aipe and Ujwal Gadiraju. 2018. Similarhits: Revealing the role of task similarity in microtask crowdsourcing. In Proceedings of the 29th on Hypertext and Social Media.Google Scholar
Digital Library
- Najork Alonso, Marshall. 2014. Crowdsourcing a Subjective Labeling Task: A Human-Centered Framework to Ensure Reliable Results. Technical Report.Google Scholar
- Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney Tan. 2011. Effective End-user Interaction with Machine Learning. In Proc. the AAAI Conference on Artificial Intelligence (AAAI).Google Scholar
Digital Library
- Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proc. the International Conference on Machine Learning (ICML).Google Scholar
- James V. Bradley. 1958. Complete Counterbalancing of Immediate Sequential Effects in a Latin Square Design. J. Amer. Statist. Assoc. (1958).Google Scholar
- Carrie J Cai, Shamsi T Iqbal, and Jaime Teevan. 2016. Chain reactions: The impact of order on microtask chains. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google Scholar
Digital Library
- Xi Chen, Arpit Jain, and Larry S Davis. 2014. Object co-labeling in multiple images. In IEEE Winter Conference on Applications of Computer Vision.Google Scholar
Cross Ref
- Minsuk Choi, Cheonbok Park, Soyoung Yang, Yonggyu Kim, Jaegul Choo, and Sungsoo Ray Hong. 2019. AILA: Attentive Interactive Labeling Assistant for Document Classification Through Attention-Based Deep Neural Networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 230:1--230:12.Google Scholar
Digital Library
- Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
Cross Ref
- John Joon Young Chung, Jean Y Song, Sindhu Kutty, Sungsoo Ray Hong, Juho Kim, and Walter S Lasecki. 2019. Efficient Elicitation Approaches to Estimate Collective Crowd Answers. Proc. of the ACM on Human-Computer Interaction (CSCW) (2019).Google Scholar
Digital Library
- Corinna Cortes and Vladimir Vapnik. 1995. Support-Vector Networks. Mach. Learn. (1995).Google Scholar
- Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
Cross Ref
- Jia Deng, Olga Russakovsky, Jonathan Krause, Michael S. Bernstein, Alex Berg, and Li Fei-Fei. 2014. Scalable Multi-label Annotation. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google Scholar
Digital Library
- Hui Ding, Kumar Sricharan, and Rama Chellappa. 2018. Exprgan: Facial expression editing with controllable expression intensity. Proc. the AAAI Conference on Artificial Intelligence (AAAI) (2018).Google Scholar
Cross Ref
- Jerry Alan Fails and Dan R. Olsen, Jr. 2003. Interactive Machine Learning. In Proc. of the 8th International Conference on Intelligent User Interfaces.Google Scholar
- Cristian Felix, Aritra Dasgupta, and Enrico Bertini. 2018a. The exploratory labeling assistant: Mixed-initiative label curation with large document collections. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 153--164.Google Scholar
Digital Library
- Cristian Felix, Aritra Dasgupta, and Enrico Bertini. 2018b. The exploratory labeling assistant: Mixed-initiative label curation with large document collections. In Proc. of the Annual ACM Symposium on User Interface Software and Technology (UIST).Google Scholar
Digital Library
- JuiHsi Fu and SingLing Lee. 2013. Certainty-based active learning for sampling imbalanced datasets. Neurocomputing (2013). Intelligent Processing Techniques for Semantic-based Image and Video Retrieval.Google Scholar
- J. H. Fu and S. L. Lee. 2011. Certainty-Enhanced Active Learning for Improving Imbalanced Data Classification. In IEEE 11th International Conference on Data Mining Workshops.Google Scholar
- Yolanda Gil and Bart Selman. 2019. A 20-Year Community Roadmap for Artificial Intelligence Research in the US. arXiv (2019).Google Scholar
- E. R. Girden. 1992. ANOVA: Repeated measures .Sage.Google Scholar
Cross Ref
- Ross Girshick. 2015. Fast R-CNN. In Proc. of the IEEE international conference on computer vision (ICCV).Google Scholar
- Shantanu Godbole, Abhay Harpale, Sunita Sarawagi, and Soumen Chakrabarti. 2004. Document Classification Through Interactive Supervision of Document and Term Labels.Google Scholar
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proc. the Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
- Ishaan Gulrajani, Faruk Ahmed, Mart'i n Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved Training of Wasserstein GANs. Proc. the Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
- Yuhong Guo and Dale Schuurmans. 2007. Discriminative Batch Mode Active Learning. In Proc. the Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
Cross Ref
- Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Gü nter Klambauer, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium. Proc. the Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
- Fred Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2018. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Transactions on Visualization and Computer Graphics (TVCG) (2018).Google Scholar
Digital Library
- Fred Hohman, Kanit Wongsuphasawat, Mary Beth Kery, and Kayur Patel. 2020. Understanding and Visualizing Data Iteration in Machine Learning. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google Scholar
Digital Library
- Sungsoo Hong, Minhyang Suh, Nathalie Henry Riche, Jooyoung Lee, Juho Kim, and Mark Zachry. 2018a. Collaborative dynamic queries: Supporting distributed small group decision-making. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google Scholar
Digital Library
- Sungsoo Hong, Minhyang Suh, Tae Soo Kim, Irina Smoke, Sangwha Sien, Janet Ng, Mark Zachry, and Juho Kim. 2019. Design for Collaborative Information-Seeking: Understanding User Challenges and Deploying Collaborative Dynamic Queries. Proc. of the ACM on Human-Computer Interaction (CSCW) (2019).Google Scholar
Digital Library
- Sungsoo Ray Hong, Jessica Hullman, and Enrico Bertini. 2020. Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs. Proc. of the ACM on Human-Computer Interaction (CSCW) (2020).Google Scholar
Digital Library
- Sungsoo (Ray) Hong, Minhyang (Mia) Suh, Nathalie Henry Riche, Jooyoung Lee, Juho Kim, and Mark Zachry. 2018b. Collaborative Dynamic Queries: Supporting Distributed Small Group Decision-making. Proc. of the ACM on Human-Computer Interaction (CSCW).Google Scholar
- Xun Huang and Serge Belongie. 2017. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. In Proc. of the IEEE international conference on computer vision (ICCV).Google Scholar
Cross Ref
- Andreas Kirsch, Joost R. van Amersfoort, and Yarin Gal. 2019. BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning. Proc. the Advances in Neural Information Processing Systems (NeurIPS) (2019).Google Scholar
- Olga Korovina, Fabio Casati, Radoslaw Nielek, Marcos Baez, and Olga Berestneva. 2018. Investigating Crowdsourcing As a Method to Collect Emotion Labels for Images. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google Scholar
Digital Library
- Ronak Kosti, Jose M Alvarez, Adria Recasens, and Agata Lapedriza. 2017. EMOTIC: Emotions in Context dataset. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
Cross Ref
- C. Krell and H. Grant. 2010. Naïve learning algorithms utilized for the prediction of stock prices to compare economic models of decision making. In Proceedings of the Winter Simulation Conference (WSC).Google Scholar
- Ranjay A Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A Shamma, Li Fei-Fei, and Michael S Bernstein. 2016. Embracing error to enable rapid crowdsourcing. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Neural Information Processing Systems (NIPS).Google Scholar
Digital Library
- Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured Labeling for Facilitating Concept Evolution in Machine Learning. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google Scholar
Digital Library
- Edith Law, Ming Yin, Joslin Goh, Kevin Chen, Michael A Terry, and Krzysztof Z Gajos. 2016. Curiosity killed the cat, but makes crowdwork better. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google Scholar
Digital Library
- Lucian Leahu, Steve Schwenk, and Phoebe Sengers. 2008. Subjective Objectivity: Negotiating Emotional Meaning. In Proceedings of the 7th ACM Conference on Designing Interactive Systems.Google Scholar
Digital Library
- David D. Lewis and William A. Gale. 1994. A Sequential Algorithm for Training Text Classifiers. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).Google Scholar
- Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollá r. 2020. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. (2020).Google Scholar
Cross Ref
- Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. 2017. SphereFace: Deep Hypersphere Embedding for Face Recognition. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
Cross Ref
- Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild.. In The IEEE International Conference on Computer Vision (ICCV).Google Scholar
Digital Library
- Zhiwu Lu, Horace H. S. Ip, and Qizhen He. 2009. Context-based Multi-label Image Annotation. In Proceedings of the ACM International Conference on Image and Video Retrieval.Google Scholar
Digital Library
- G. Luo, P. Yang, M. Chen, and P. Li. 2020. HCI on the Table: Robust Gesture Recognition Using Acoustic Sensing in Your Hand. IEEE Access (2020).Google Scholar
- Adam Marcus and Aditya Parameswaran. 2015. Crowdsourced data management: Industry and academic perspectives. Foundations and Trends in Databases (2015).Google Scholar
- Francisco Massa and Ross Girshick. 2018. maskrcnn-benchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.Google Scholar
- Makoto Miwa, James Thomas, Alison O'Mara-Eves, and Sophia Ananiadou. 2014. Reducing systematic review workload through certainty-based screening. Journal of Biomedical Informatics (2014).Google Scholar
- Jose G. Moreno-Torres, Troy Raeder, Roc'iO Alaiz-Rodr'iGuez, Nitesh V. Chawla, and Francisco Herrera. 2012. A Unifying View on Dataset Shift in Classification. Pattern Recognition (2012).Google Scholar
- Fionn Murtagh and Pierre Legendre. 2014. Ward's Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward's Criterion? J. Classif. (2014).Google Scholar
- Edward Newell and Derek Ruths. 2016. How one microtask affects another. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google Scholar
Digital Library
- Erik Ofgang. 2021 (accessed January 7th). Remote vs. In-person Classes: What the Data Shows. https://www.techlearninguniversity.com/news/remote-vs-in-person-classes-what-the-data-showGoogle Scholar
- O. M. Parkhi, A. Vedaldi, and A. Zisserman. 2015. Deep Face Recognition. In British Machine Vision Conference.Google Scholar
- László Polgár. 1989. Bring Up Genius! Interart, Budapest, Hungary.Google Scholar
- S. Prasad, P. Kumar, and K. P. Sinha. 2014. A wireless dynamic gesture user interface for HCI using hand data glove. In Seventh International Conference on Contemporary Computing (IC3).Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Neural Information Processing Systems (NIPS).Google Scholar
- Tobias Scheffer, Christian Decomain, and Stefan Wrobel. 2001. Active Hidden Markov Models for Information Extraction. In Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis.Google Scholar
Digital Library
- Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.. In The IEEE International Conference on Computer Vision (ICCV).Google Scholar
Cross Ref
- Burr Settles. 2010. Active Learning Literature Survey. (2010).Google Scholar
- Burr Settles and Mark Craven. 2008. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
- Claude Elwood Shannon. 1948. A Mathematical Theory of Communication. The Bell System Technical Journal (1948).Google Scholar
- Y. Shen, P. Luo, P. Luo, J. Yan, X. Wang, and X. Tang. 2018. FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computing Research Repository (CoRR) (2014).Google Scholar
- C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
- Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proc. the International Conference on Machine Learning (ICML).Google Scholar
- Anne Treisman. 1982. Perceptual grouping and attention in visual search for features and for objects. Journal of experimental psychology: human perception and performance (1982).Google Scholar
- Alexey Tsymbal. 2004. The Problem of Concept Drift: Definitions and Related Work. (2004).Google Scholar
- Laurens van der Maaten and Geoffrey Hinton. 2008. Viualizing data using t-SNE. Journal of Machine Learning Research (2008).Google Scholar
- Luis von Ahn and Laura Dabbish. 2004. Labeling Images with a Computer Game. In Proc. of the CHI Conference on Human Factors in Computing Systems (CHI).Google Scholar
Digital Library
- Abraham Wald. 1943. On the Efficient Design of Statistical Investigations. The Annals of Mathematical Statistics (1943).Google Scholar
- Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018. CosFace: Large Margin Cosine Loss for Deep Face Recognition. In Proc. of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
Cross Ref
- Gerhard Widmer and Miroslav Kubat. 1996. Learning in the presence of concept drift and hidden contexts. Machine Learning (1996).Google Scholar
- Janyce Wiebe, Theresa Wilson, Rebecca Bruce, Matthew Bell, and Melanie Martin. 2004. Learning Subjective Language. Comput. Linguist. (2004).Google Scholar
- J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. 2009. Robust Face Recognition via Sparse Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2009).Google Scholar
- Yi Yang, Fei Wu, Feiping Nie, Heng Tao Shen, Yueting Zhuang, and Alexander G Hauptmann. 2011. Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Transactions on Image Processing (2011).Google Scholar
- T. Zhang, A. Wiliem, S. Yang, and B. Lovell. 2018. TV-GAN: Generative Adversarial Network Based Thermal to Visible Face Recognition. In International Conference on Biometrics (ICB).Google Scholar
- Wei Zhang, Yao Lu, Xiangyang Xue, and Jianping Fan. 2011. Automatic Image Annotation with Weakly Labeled Dataset. In Proceedings of the 19th ACM International Conference on Multimedia.Google Scholar
Digital Library
- Zijian Zhang, Jaspreet Singh, Ujwal Gadiraju, and Avishek Anand. 2019. Dissonance between human and machine understanding. Proc. of the ACM on Human-Computer Interaction (CSCW) (2019).Google Scholar
Digital Library
- Shiquan Zhao, Jian Wu, Victor S. Sheng, Chen Ye, Pengpeng Zhao, and Zhiming Cui. 2015. Weak Labeled Multi-Label Active Learning for Image Classification. In Proceedings of the 23rd ACM International Conference on Multimedia.Google Scholar
Digital Library
- W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. 2003. Face Recognition: A Literature Survey. Comput. Surveys (2003).Google Scholar
- Jingbo Zhu and Matthew Ma. 2012. Uncertainty-Based Active Learning with Instability Estimation for Text Classification. ACM Trans. Speech Lang. Process. (2012).Google Scholar
Index Terms
Understanding Human-side Impact of Sampling Image Batches in Subjective Attribute Labeling
Recommendations
Graph-Based Active Learning Based on Label Propagation
MDAI '08 Sabadell: Proceedings of the 5th International Conference on Modeling Decisions for Artificial IntelligenceBy only selecting the most informative instances for labeling, active learning could reduce the labeling cost when labeled instances are hard to obtain. Facing the same situation, semi-supervised learning utilize unlabeled instances to strengthen ...
Transductive Multilabel Learning via Label Set Propagation
The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Semi-supervised multi-label classification using incomplete label information
Highlights- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
AbstractClassifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...






Comments