skip to main content
research-article

"I Hope This Is Helpful": Understanding Crowdworkers' Challenges and Motivations for an Image Description Task

Published:15 October 2020Publication History
Skip Abstract Section

Abstract

AI image captioning challenges encourage broad participation in designing algorithms that automatically create captions for a variety of images and users. To create large datasets necessary for these challenges, researchers typically employ a shared crowdsourcing task design for image captioning. This paper discusses findings from our thematic analysis of 1,064 comments left by Amazon Mechanical Turk workers using this task design to create captions for images taken by people who are blind. Workers discussed difficulties in understanding how to complete this task, provided suggestions of how to improve the task, gave explanations or clarifications about their work, and described why they found this particular task rewarding or interesting. Our analysis provides insights both into this particular genre of task as well as broader considerations for how to employ crowdsourcing to generate large datasets for developing AI algorithms.

References

  1. Chadia Abras, Diane Maloney-Krichmar, and Jenny Preece. 2004. User-centered design. In Encyclopedia of Human-Computer Interaction, W. Bainbridge (Ed.). Vol. 37. Thousand Oaks: Sage Publications, 445--456.Google ScholarGoogle Scholar
  2. Harsh Agrawal, Karan Desai, Xinlei Chen, Rishabh Jain, Dhruv Batra, Devi Parikh, Stefan Lee, and Peter Anderson. 2018a. Nocaps: Novel Object Captioning at Scale. arXiv preprint arXiv:1812.08658 (2018).Google ScholarGoogle Scholar
  3. Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, and Peter Anderson. 2018b. Nocaps: Novel object captioning at scale. arXiv:1812.08658 [cs] (Dec. 2018). http://arxiv.org/abs/1812.08658 arXiv: 1812.08658.Google ScholarGoogle Scholar
  4. Lora Aroyo, Lucas Dixon, Nithum Thain, Olivia Redfield, and Rachel Rosen. 2019. Crowdsourcing Subjective Tasks: The Case Study of Understanding Toxicity in Online Discussions. In Companion Proceedings of The 2019 World Wide Web Conference (WWW '19). Association for Computing Machinery, San Francisco, USA, 1100--1105. https://doi.org/10.1145/3308560.3317083Google ScholarGoogle Scholar
  5. Cynthia L. Bennett, Jane E, Martez E. Mott, Edward Cutrell, and Meredith Ringel Morris. 2018. How Teens with Visual Impairments Take, Edit, and Share Photos on Social Media. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). Association for Computing Machinery, Montreal QC, Canada, 1--12. https://doi.org/10.1145/3173574.3173650Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nilavra Bhattacharya, Qing Li, and Danna Gurari. 2019. Why Does a Visual Question Have Different Answers?. In Proceedings of the IEEE International Conference on Computer Vision. 4271--4280.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and Tom Yeh. 2010a. VizWiz: Nearly Real-time Answers to Visual Questions. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology (UIST '10). ACM, New York, NY, USA, 333--342. https://doi.org/10.1145/1866029.1866080 event-place: New York, New York, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jeffrey P. Bigham, Chandrika Jayant, Andrew Miller, Brandyn White, and Tom Yeh. 2010b. VizWiz:: LocateIt-Enabling Blind People to Locate Objects in Their Environment. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference On. IEEE, 65--72.Google ScholarGoogle ScholarCross RefCross Ref
  9. Anne E. Bowser, Oliver L. Haimson, Edward F. Melcer, and Elizabeth F. Churchill. 2015. On vintage values: The experience of secondhand fashion reacquisition. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 897--906. https://doi.org/10.1145/2702123.2702394 event-place: Seoul, Republic of Korea.Google ScholarGoogle Scholar
  10. Daren C. Brabham. 2008. Crowdsourcing as a model for problem solving: An introduction and cases. Convergence, Vol. 14, 1 (Feb. 2008), 75--90. https://doi.org/10.1177/1354856507084420Google ScholarGoogle Scholar
  11. Daren C. Brabham. 2012. Motivations for participation in a crowdsourcing application to improve public engagement in transit planning. Journal of Applied Communication Research, Vol. 40, 3 (Aug. 2012), 307--328. https://doi.org/10.1080/00909882.2012.693940Google ScholarGoogle ScholarCross RefCross Ref
  12. Erin Brady. 2015. Getting fast, free, and anonymous answers to questions asked by people with visual impairments. SIGACCESS Access. Comput. 112 (July 2015), 16--25. https://doi.org/10.1145/2809915.2809918Google ScholarGoogle Scholar
  13. Erin Brady, Meredith Ringel Morris, and Jeffrey P. Bigham. 2015. Gauging Receptiveness to Social Microvolunteering. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 1055--1064. https://doi.org/10.1145/2702123.2702329 event-place: Seoul, Republic of Korea.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jonathan Bragg, Mausam, and Daniel S. Weld. 2018. Sprout: Crowd-Powered Task Design for Crowdsourcing. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST '18). ACM, New York, NY, USA, 165--176. https://doi.org/10.1145/3242587.3242598 event-place: Berlin, Germany.Google ScholarGoogle Scholar
  15. Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology, Vol. 3, 2 (2006), 77--101.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alice M. Brawley and Cynthia L. S. Pury. 2016. Work experiences on MTurk: Job satisfaction, turnover, and information sharing. Computers in Human Behavior, Vol. 54 (Jan. 2016), 531--546. https://doi.org/10.1016/j.chb.2015.08.031Google ScholarGoogle Scholar
  17. Erin Buehler, Stacy Branham, Abdullah Ali, Jeremy J. Chang, Megan Kelly Hofmann, Amy Hurst, and Shaun K. Kane. 2015. Sharing is Caring: Assistive Technology Designs on Thingiverse. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). Association for Computing Machinery, Seoul, Republic of Korea, 525--534. https://doi.org/10.1145/2702123.2702525Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Michele A. Burton, Erin Brady, Robin Brewer, Callie Neylan, Jeffrey P. Bigham, and Amy Hurst. 2012. Crowdsourcing Subjective Fashion Advice Using VizWiz: Challenges and Opportunities. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, 135--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dana Chandler and Adam Kapelner. 2013. Breaking monotony with meaning: Motivation in crowdsourcing markets. Journal of Economic Behavior & Organization, Vol. 90 (June 2013), 123--133. https://doi.org/10.1016/j.jebo.2013.03.003Google ScholarGoogle ScholarCross RefCross Ref
  20. Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017a. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 2334--2346. https://doi.org/10.1145/3025453.3026044 event-place: Denver, Colorado, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017b. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). Association for Computing Machinery, Denver, Colorado, USA, 2334--2346. https://doi.org/10.1145/3025453.3026044Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Joseph Chee Chang, Aniket Kittur, and Nathan Hahn. 2016. Alloy: Clustering with Crowds and Computation. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). Association for Computing Machinery, San Jose, California, USA, 3180--3191. https://doi.org/10.1145/2858036.2858411Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jianfu Chen, Polina Kuznetsova, David Warren, and Yejin Choi. 2015b. Déja image-captions: A corpus of expressive descriptions in repetition. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 504--514.Google ScholarGoogle ScholarCross RefCross Ref
  24. Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C. Lawrence Zitnick. 2015a. Microsoft COCO Captions: Data Collection and Evaluation Server. arXiv preprint arXiv:1504.00325 (2015).Google ScholarGoogle Scholar
  25. Chun-Wei Chiang, Anna Kasunic, and Saiph Savage. 2018. Crowd Coach: Peer Coaching for Crowd Workers' Skill Growth. Proc. ACM Hum.-Comput. Interact., Vol. 2, CSCW (Nov. 2018), 37:1--37:17. https://doi.org/10.1145/3274306Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tai-Yin Chiu, Yinan Zhao, and Danna Gurari. 2020. Assessing Image Quality Issues for Real-World Problems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3646--3656.Google ScholarGoogle ScholarCross RefCross Ref
  27. Djellel Difallah, Elena Filatova, and Panos Ipeirotis. 2018. Demographics and dynamics of Mechanical Turk workers. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM '18). ACM, New York, NY, USA, 135--143. https://doi.org/10.1145/3159652.3159661 event-place: Marina Del Rey, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Desmond Elliott and Frank Keller. 2013. Image description using visual dependency representations. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1292--1302.Google ScholarGoogle Scholar
  29. Enrique Estellés-Arolas and Fernando González-Ladrón-de Guevara. 2012. Towards an integrated crowdsourcing definition. Journal of Information Science, Vol. 38, 2 (April 2012), 189--200. https://doi.org/10.1177/0165551512437638Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alexandra Eveleigh, Charlene Jennett, Ann Blandford, Philip Brohan, and Anna L. Cox. 2014. Designing for dabblers and deterring drop-outs in citizen science. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2985--2994.Google ScholarGoogle Scholar
  31. Facebook. 2015. Facebook: Milestones. https://www.facebook.com/facebook'sk=infoGoogle ScholarGoogle Scholar
  32. Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In European Conference on Computer Vision. Springer, 15--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Casey Fiesler and Blake Hallinan. 2018. "We are the product": Public reactions to online data sharing and privacy controversies in the media. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, 53:1--53:13. https://doi.org/10.1145/3173574.3173627 event-place: Montreal QC, Canada.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, and Li Deng. 2017. Stylenet: Generating attractive visual captions with styles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3137--3146.Google ScholarGoogle ScholarCross RefCross Ref
  35. Mary L. Gray and Siddharth Suri. 2019. Ghost Work: How to stop Silicon Valley from building a new global underclass .Houghton Mifflin Harcourt. Google-Books-ID: 8AmXDwAAQBAJ.Google ScholarGoogle Scholar
  36. Mary L. Gray, Siddharth Suri, Syed Shoaib Ali, and Deepti Kulkarni. 2016. The crowd is a collaborative network. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW '16). ACM, New York, NY, USA, 134--147. https://doi.org/10.1145/2818048.2819942 event-place: San Francisco, California, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Neha Gupta, David Martin, Benjamin V. Hanrahan, and Jacki O'Neill. 2014. Turk-life in India. In Proceedings of the 18th International Conference on Supporting Group Work. ACM, 1--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Danna Gurari and Kristen Grauman. 2017. CrowdVerge: Predicting If People Will Agree on the Answer to a Visual Question. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 3511--3522.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Danna Gurari, Yinan Zhao, Meng Zhang, and Nilavra Bhattacharya. 2020. Captioning Images Taken by People Who Are Blind. arXiv:2002.08565 [cs] (Feb. 2020). http://arxiv.org/abs/2002.08565 arXiv: 2002.08565.Google ScholarGoogle Scholar
  40. Benjamin V. Hanrahan, David Martin, Jutta Willamowski, and John M. Carroll. 2018. Investigating the Amazon Mechanical Turk market through tool design. Comput. Supported Coop. Work, Vol. 27, 3--6 (Dec. 2018), 1255--1274. https://doi.org/10.1007/s10606-018--9312--6Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, and Jeffrey P. Bigham. 2018. A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, 449:1--449:14. https://doi.org/10.1145/3173574.3174023 event-place: Montreal QC, Canada.Google ScholarGoogle Scholar
  42. Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Benjamin V. Hanrahan, Jeffrey P. Bigham, and Chris Callison-Burch. 2019. Worker demographics and earnings on Amazon Mechanical Turk: An exploratory analysis. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (CHI EA '19). ACM, New York, NY, USA, LBW1217:1--LBW1217:6. https://doi.org/10.1145/3290607.3312970 event-place: Glasgow, Scotland Uk.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. David Harwath and James Glass. 2015. Deep multimodal semantic embeddings for speech and images. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, 237--244.Google ScholarGoogle ScholarCross RefCross Ref
  44. William Havard, Laurent Besacier, and Olivier Rosec. 2017. Speech-Coco: 600k visually grounded spoken captions aligned to Mscoco data set. arXiv preprint arXiv:1707.08435 (2017).Google ScholarGoogle Scholar
  45. Micah Hodosh, Peter Young, and Julia Hockenmaier. 2013. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics. Journal of Artificial Intelligence Research, Vol. 47 (2013), 853--899.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jonathan Hook, Sanne Verbaan, Peter Wright, and Patrick Olivier. 2013. Exploring the Design of technologies and services that support do-it-yourself assistive technology practice. Proceedings of DE, Vol. 2013 (2013).Google ScholarGoogle Scholar
  47. Lilly C. Irani and M. Six Silberman. 2013. Turkopticon: Interrupting Worker Invisibility in Amazon Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, USA, 611--620. https://doi.org/10.1145/2470654.2470742 event-place: Paris, France.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Lilly C. Irani and M. Six Silberman. 2016. Stories We Tell About Labor: Turkopticon and the Trouble with "Design". In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 4573--4586. https://doi.org/10.1145/2858036.2858592 event-place: San Jose, California, USA.Google ScholarGoogle Scholar
  49. Mainak Jas and Devi Parikh. 2015. Image Specificity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2727--2736.Google ScholarGoogle Scholar
  50. Sanjay Kairam and Jeffrey Heer. 2016. Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW '16). Association for Computing Machinery, San Francisco, California, USA, 1637--1648. https://doi.org/10.1145/2818048.2820016Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Toni Kaplan, Susumu Saito, Kotaro Hara, and Jeffrey P. Bigham. 2018. Striving to Earn More: A Survey of Work Strategies and Tool Use Among Crowd Workers. In Sixth AAAI Conference on Human Computation and Crowdsourcing. https://www.aaai.org/ocs/index.php/HCOMP/HCOMP18/paper/view/17920Google ScholarGoogle Scholar
  52. Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). ACM, New York, NY, USA, 453--456. https://doi.org/10.1145/1357054.1357127Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Chen Kong, Dahua Lin, Mohit Bansal, Raquel Urtasun, and Sanja Fidler. 2014. What are you talking about? Text-to-image coreference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3558--3565.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, and David A. Shamma. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, Vol. 123, 1 (2017), 32--73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. 2013. Babytalk: Understanding and Generating Simple Image Descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 12 (2013), 2891--2903.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Polina Kuznetsova, Vicente Ordonez, Tamara L. Berg, and Yejin Choi. 2014. TreeTalk: Composition and Compression of Trees for Image Descriptions. Transactions of the Association for Computational Linguistics, Vol. 2 (Dec. 2014), 351--362. https://doi.org/10.1162/tacl_a_00188 Publisher: MIT Press.Google ScholarGoogle Scholar
  57. Laura Lascau, Sandy J. J. Gould, Anna L. Cox, Elizaveta Karmannaya, and Duncan P. Brumby. 2019. Monotasking or multitasking: Designing for crowdworkers' preferences. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, 419:1--419:14. https://doi.org/10.1145/3290605.3300649 event-place: Glasgow, Scotland Uk.Google ScholarGoogle Scholar
  58. Edith Law, Krzysztof Z. Gajos, Andrea Wiggins, Mary L. Gray, and Alex Williams. 2017. Crowdsourcing as a tool for research: Implications of uncertainty. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). ACM, New York, NY, USA, 1544--1561. https://doi.org/10.1145/2998181.2998197 event-place: Portland, Oregon, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. A. M. Layas and Helen Petrie. 2016. Exploring intrinsic and extrinsic motivations to participate in a crowdsourcing project to support blind and partially sighted students. Universal Design 2016: Learning from the past, designing for the future (Proceedings of the 3rd International Conference on Universal Design, UD2016). (Aug. 2016). http://eprints.whiterose.ac.uk/118514/Google ScholarGoogle Scholar
  60. Siming Li, Girish Kulkarni, Tamara L. Berg, Alexander C. Berg, and Yejin Choi. 2011. Composing simple image descriptions using web-scale n-grams. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL '11). Association for Computational Linguistics, Portland, Oregon, 220--228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft Coco: Common Objects in Context. In European Conference on Computer Vision. Springer, 740--755.Google ScholarGoogle Scholar
  62. Kiel Long, John Vines, Selina Sutton, Phillip Brooker, Tom Feltwell, Ben Kirman, Julie Barnett, and Shaun Lawson. 2017. "Could you define that in bot terms"?: Requesting, creating and using bots on Reddit. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3488--3500. https://doi.org/10.1145/3025453.3025830 event-place: Denver, Colorado, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. V. K. Chaithanya Manam and Alexander J. Quinn. 2018. WingIt: Efficient Refinement of Unclear Task Instructions. In Sixth AAAI Conference on Human Computation and Crowdsourcing. https://www.aaai.org/ocs/index.php/HCOMP/HCOMP18/paper/view/17931Google ScholarGoogle Scholar
  64. Andrew Mao, Ece Kamar, Yiling Chen, Eric Horvitz, Megan E. Schwamb, Chris J. Lintott, and Arfon M. Smith. 2013b. Volunteering versus work for pay: Incentives and tradeoffs in crowdsourcing. In First AAAI conference on human computation and crowdsourcing.Google ScholarGoogle Scholar
  65. Andrew Mao, Ece Kamar, and Eric Horvitz. 2013a. Why stop now? Predicting worker engagement in online crowdsourcing. In First AAAI Conference on Human Computation and Crowdsourcing.Google ScholarGoogle ScholarCross RefCross Ref
  66. Catherine C. Marshall and Frank M. Shipman. 2013. Experiences surveying the crowd: Reflections on methods, participation, and reliability. In Proceedings of the 5th Annual ACM Web Science Conference (WebSci '13). ACM, New York, NY, USA, 234--243. https://doi.org/10.1145/2464464.2464485 event-place: Paris, France.Google ScholarGoogle Scholar
  67. David Martin, Benjamin V. Hanrahan, Jacki O'Neill, and Neha Gupta. 2014. Being a turker. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (CSCW '14). Association for Computing Machinery, Baltimore, Maryland, USA, 224--235. https://doi.org/10.1145/2531602.2531663Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. David Martin, Jacki O'Neill, Neha Gupta, and Benjamin V. Hanrahan. 2016. Turking in a global labour market. Comput. Supported Coop. Work, Vol. 25, 1 (Feb. 2016), 39--77. https://doi.org/10.1007/s10606-015--9241--6Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Winter Mason and Siddharth Suri. 2012. Conducting behavioral research on Amazon?s Mechanical Turk. Behavior Research Methods, Vol. 44, 1 (March 2012), 1--23. https://doi.org/10.3758/s13428-011-0124--6Google ScholarGoogle ScholarCross RefCross Ref
  70. Brian McInnis, Dan Cosley, Chaebong Nam, and Gilly Leshed. 2016. Taking a HIT: Designing around rejection, mistrust, risk, and workers' experiences in Amazon Mechanical Turk. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 2271--2282. https://doi.org/10.1145/2858036.2858539 event-place: San Jose, California, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Lydia Michie, Madeline Balaam, John McCarthy, Timur Osadchiy, and Kellie Morrissey. 2018. From her story, to our story: Digital storytelling as public engagement around abortion rights advocacy in Ireland. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, 357:1--357:15. https://doi.org/10.1145/3173574.3173931 event-place: Montreal QC, Canada.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Margaret Mitchell, Xufeng Han, Jesse Dodge, Alyssa Mensch, Amit Goyal, Alex Berg, Kota Yamaguchi, Tamara Berg, Karl Stratos, and Hal Daumé. 2012. Midge: generating image descriptions from computer vision detections. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL '12). Association for Computational Linguistics, Avignon, France, 747--756.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Valerie S. Morash, Yue-Ting Siu, Joshua A. Miele, Lucia Hasty, and Steven Landau. 2015. Guiding novice web workers in making image descriptions using templates. ACM Transactions on Accessible Computing (TACCESS), Vol. 7, 4 (2015), 12.Google ScholarGoogle Scholar
  74. Meredith Ringel Morris, Jazette Johnson, Cynthia L. Bennett, and Edward Cutrell. 2018. Rich Representations of Visual Content for Screen Reader Users. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 59.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Meredith Ringel Morris, Annuska Zolyomi, Catherine Yao, Sina Bahram, Jeffrey P. Bigham, and Shaun K. Kane. 2016. "With most of it being pictures now, I rarely use it": Understanding Twitter's Evolving Accessibility to Blind Users. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). Association for Computing Machinery, San Jose, California, USA, 5506--5516. https://doi.org/10.1145/2858036.2858116Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Babak Naderi. 2018. Motivation of workers on microtask crowdsourcing platforms .Springer, Cham, Switzerland. OCLC: 1020790439.Google ScholarGoogle Scholar
  77. Midas Nouwens and Clemens Nylandsted Klokmose. 2018. The application and its consequences for non-standard knowledge work. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, 399:1--399:12. https://doi.org/10.1145/3173574.3173973 event-place: Montreal QC, Canada.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Jeremiah Parry-Hill, Patrick C. Shih, Jennifer Mankoff, and Daniel Ashbrook. 2017. Understanding Volunteer AT Fabricators: Opportunities and Challenges in DIY-AT for Others in e-NABLE. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). Association for Computing Machinery, Denver, Colorado, USA, 6184--6194. https://doi.org/10.1145/3025453.3026045Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. L. G. Pee, E. Koh, and M. Goh. 2018. Trait motivations of crowdsourcing and task choice: A distal-proximal perspective. International Journal of Information Management, Vol. 40 (June 2018), 28--41. https://doi.org/10.1016/j.ijinfomgt.2018.01.008Google ScholarGoogle Scholar
  80. Helen Petrie, Chandra Harrison, and Sundeep Dev. 2005. Describing Images on the Web: A Survey of Current Practice and Prospects for the Future. Proceedings of Human Computer Interaction International (HCII), Vol. 71 (2005).Google ScholarGoogle Scholar
  81. Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Association for Computational Linguistics, 139--147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Ludovico Orlando Russo, Giuseppe Airò Farulla, and Carlo Boccazzi Varotto. 2018. Hackability: A Methodology to Encourage the Development of DIY Assistive Devices. In Computers Helping People with Special Needs (Lecture Notes in Computer Science ), Klaus Miesenberger and Georgios Kouroupetroglou (Eds.). Springer International Publishing, Cham, 156--163. https://doi.org/10.1007/978--3--319--94274--2_22Google ScholarGoogle Scholar
  83. Susumu Saito, Chun-Wei Chiang, Saiph Savage, Teppei Nakano, Tetsunori Kobayashi, and Jeffrey P. Bigham. 2019. TurkScanner: Predicting the hourly wage of microtasks. In The World Wide Web Conference (WWW '19). ACM, New York, NY, USA, 3187--3193. https://doi.org/10.1145/3308558.3313716 event-place: San Francisco, CA, USA.Google ScholarGoogle Scholar
  84. Niloufar Salehi, Lilly C. Irani, Michael S. Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, and Clickhappier. 2015. We are Dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 1621--1630. https://doi.org/10.1145/2702123.2702508 event-place: Seoul, Republic of Korea.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Elliot Salisbury, Ece Kamar, and Meredith Ringel Morris. 2017. Toward Scalable Social Alt Text: Conversational Crowdsourcing as a Tool for Refining Vision-to-Language Technology for the Blind. Proceedings of HCOMP 2017 (2017).Google ScholarGoogle Scholar
  86. Mike Schaekermann, Joslin Goh, Kate Larson, and Edith Law. 2018. Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (Nov. 2018), 154:1--154:19. https://doi.org/10.1145/3274423Google ScholarGoogle Scholar
  87. Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, and Jason Weston. 2018. Engaging image captioning via personality. arXiv preprint arXiv:1810.10665 (2018).Google ScholarGoogle Scholar
  88. Luiz Fernando Silva Pinto and Carlos Denner dos Santos Júnior. 2018. Motivations of crowdsourcing contributors. RAI: Revista de Administração e Inovação; São Paulo, Vol. 15, 1 (2018), 58--72. http://search.proquest.com/docview/2063479696/abstract/648431A1613B4846PQ/1Google ScholarGoogle Scholar
  89. Jesper Simonsen and Toni Robertson (Eds.). 2013. Routledge international handbook of participatory design .Routledge, New York. OCLC: 754734489.Google ScholarGoogle Scholar
  90. Abigale Stangl, Meredith Ringel Morris, and Danna Gurari. 2020. "Person, Shoes, Tree. Is the Person Naked?" What People with Vision Impairments Want in Image Descriptions. Honolulu, HI, USA, 13. https://doi.org/10.1145/3313831.3376404Google ScholarGoogle Scholar
  91. Abigale J. Stangl, Esha Kothari, Suyog D. Jain, Tom Yeh, Kristen Grauman, and Danna Gurari. 2018. BrowseWithMe: An Online Clothes Shopping Assistant for People with Visual Impairments. In ACM SIGACCESS Conference on Computers and Accessibility (ASSETS).Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Rebekah Steele and Marjorie Derven. 2015. Diversity & Inclusion and innovation: A virtuous cycle. Industrial and Commercial Training, Vol. 47, 1 (Jan. 2015), 1--7. https://doi.org/10.1108/ICT-09--2014-0063Google ScholarGoogle ScholarCross RefCross Ref
  93. Twitter. 2015. About Twitter, Inc. https://about.twitter.com/companyGoogle ScholarGoogle Scholar
  94. Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-Based Image Description Evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4566--4575.Google ScholarGoogle ScholarCross RefCross Ref
  95. Luis Von Ahn, Shiry Ginosar, Mihir Kedia, Ruoran Liu, and Manuel Blum. 2006. Improving accessibility of the web with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 79--82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Violeta Voykinska, Shiri Azenkot, Shaomei Wu, and Gilly Leshed. 2016. How Blind People Interact with Visual Content on Social Networking Services. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW '16). Association for Computing Machinery, San Francisco, California, USA, 1584--1595. https://doi.org/10.1145/2818048.2820013Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Meihong Wang, Yuling Sun, Jing Yang, and Liang He. 2018. Enabling the Disagreement among Crowds: A Collaborative Crowdsourcing Framework. In 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design ((CSCWD)). 790--795. https://doi.org/10.1109/CSCWD.2018.8465368Google ScholarGoogle ScholarCross RefCross Ref
  98. Huichuan Xia, Yang Wang, Yun Huang, and Anuj Shah. 2017. "Our privacy needs to be protected at all costs": Crowd workers' privacy experiences on Amazon Mechanical Turk. Proc. ACM Hum.-Comput. Interact., Vol. 1, CSCW (Dec. 2017), 113:1--113:22. https://doi.org/10.1145/3134748Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Chun-Ju Yang, Kristen Grauman, and Danna Gurari. 2018. Visual question answer diversity. In Sixth AAAI Conference on Human Computation and Crowdsourcing.Google ScholarGoogle Scholar
  100. Yezhou Yang, Ching Lik Teo, Hal Daumé, and Yiannis Aloimonos. 2011. Corpus-guided sentence generation of natural images. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). Association for Computational Linguistics, Edinburgh, United Kingdom, 444--454.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Yuya Yoshikawa, Yutaro Shigeto, and Akikazu Takeuchi. 2017. Stair captions: Constructing a large-scale Japanese image caption dataset. arXiv preprint arXiv:1705.00823 (2017).Google ScholarGoogle Scholar
  102. Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, Vol. 2 (2014), 67--78.Google ScholarGoogle ScholarCross RefCross Ref
  103. Licheng Yu, Eunbyung Park, Alexander C. Berg, and Tamara L. Berg. 2015. Visual Madlibs: Fill in the Blank Description Generation and Question Answering. In Proceedings of the Ieee International Conference on Computer Vision. 2461--2469.Google ScholarGoogle Scholar
  104. Yuhang Zhao, Shaomei Wu, Lindsay Reynolds, and Shiri Azenkot. 2017. The effect of computer-generated descriptions on photo-sharing experiences of people with visual impairments. Proceedings of the ACM on Human-Computer Interaction, Vol. 1, CSCW (2017), 121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Yuxiang Zhao and Qinghua Zhu. 2014. Effects of extrinsic and intrinsic motivation on participation in crowdsourcing contest: A perspective of self-determination theory. Online Information Review; Bradford, Vol. 38, 7 (2014), 896--917. https://doi.org/10.1108/OIR-08--2014-0188Google ScholarGoogle Scholar
  106. Haichao Zheng, Dahui Li, and Wenhua Hou. 2011. Task design, motivation, and participation in crowdsourcing contests. International Journal of Electronic Commerce, Vol. 15, 4 (July 2011), 57--88. https://doi.org/10.2753/JEC1086--4415150402Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Yu Zhong, Walter S. Lasecki, Erin Brady, and Jeffrey P. Bigham. 2015. Regionspeak: Quick comprehensive spatial descriptions of complex images for blind users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2353--2362.Google ScholarGoogle Scholar
  108. C. Lawrence Zitnick, Devi Parikh, and Lucy Vanderwende. 2013. Learning the visual interpretation of sentences. In Proceedings of the IEEE International Conference on Computer Vision. 1681--1688.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. "I Hope This Is Helpful": Understanding Crowdworkers' Challenges and Motivations for an Image Description Task

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!