Abstract
AI image captioning challenges encourage broad participation in designing algorithms that automatically create captions for a variety of images and users. To create large datasets necessary for these challenges, researchers typically employ a shared crowdsourcing task design for image captioning. This paper discusses findings from our thematic analysis of 1,064 comments left by Amazon Mechanical Turk workers using this task design to create captions for images taken by people who are blind. Workers discussed difficulties in understanding how to complete this task, provided suggestions of how to improve the task, gave explanations or clarifications about their work, and described why they found this particular task rewarding or interesting. Our analysis provides insights both into this particular genre of task as well as broader considerations for how to employ crowdsourcing to generate large datasets for developing AI algorithms.
- Chadia Abras, Diane Maloney-Krichmar, and Jenny Preece. 2004. User-centered design. In Encyclopedia of Human-Computer Interaction, W. Bainbridge (Ed.). Vol. 37. Thousand Oaks: Sage Publications, 445--456.Google Scholar
- Harsh Agrawal, Karan Desai, Xinlei Chen, Rishabh Jain, Dhruv Batra, Devi Parikh, Stefan Lee, and Peter Anderson. 2018a. Nocaps: Novel Object Captioning at Scale. arXiv preprint arXiv:1812.08658 (2018).Google Scholar
- Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, and Peter Anderson. 2018b. Nocaps: Novel object captioning at scale. arXiv:1812.08658 [cs] (Dec. 2018). http://arxiv.org/abs/1812.08658 arXiv: 1812.08658.Google Scholar
- Lora Aroyo, Lucas Dixon, Nithum Thain, Olivia Redfield, and Rachel Rosen. 2019. Crowdsourcing Subjective Tasks: The Case Study of Understanding Toxicity in Online Discussions. In Companion Proceedings of The 2019 World Wide Web Conference (WWW '19). Association for Computing Machinery, San Francisco, USA, 1100--1105. https://doi.org/10.1145/3308560.3317083Google Scholar
- Cynthia L. Bennett, Jane E, Martez E. Mott, Edward Cutrell, and Meredith Ringel Morris. 2018. How Teens with Visual Impairments Take, Edit, and Share Photos on Social Media. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). Association for Computing Machinery, Montreal QC, Canada, 1--12. https://doi.org/10.1145/3173574.3173650Google Scholar
Digital Library
- Nilavra Bhattacharya, Qing Li, and Danna Gurari. 2019. Why Does a Visual Question Have Different Answers?. In Proceedings of the IEEE International Conference on Computer Vision. 4271--4280.Google Scholar
Cross Ref
- Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and Tom Yeh. 2010a. VizWiz: Nearly Real-time Answers to Visual Questions. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology (UIST '10). ACM, New York, NY, USA, 333--342. https://doi.org/10.1145/1866029.1866080 event-place: New York, New York, USA.Google Scholar
Digital Library
- Jeffrey P. Bigham, Chandrika Jayant, Andrew Miller, Brandyn White, and Tom Yeh. 2010b. VizWiz:: LocateIt-Enabling Blind People to Locate Objects in Their Environment. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference On. IEEE, 65--72.Google Scholar
Cross Ref
- Anne E. Bowser, Oliver L. Haimson, Edward F. Melcer, and Elizabeth F. Churchill. 2015. On vintage values: The experience of secondhand fashion reacquisition. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 897--906. https://doi.org/10.1145/2702123.2702394 event-place: Seoul, Republic of Korea.Google Scholar
- Daren C. Brabham. 2008. Crowdsourcing as a model for problem solving: An introduction and cases. Convergence, Vol. 14, 1 (Feb. 2008), 75--90. https://doi.org/10.1177/1354856507084420Google Scholar
- Daren C. Brabham. 2012. Motivations for participation in a crowdsourcing application to improve public engagement in transit planning. Journal of Applied Communication Research, Vol. 40, 3 (Aug. 2012), 307--328. https://doi.org/10.1080/00909882.2012.693940Google Scholar
Cross Ref
- Erin Brady. 2015. Getting fast, free, and anonymous answers to questions asked by people with visual impairments. SIGACCESS Access. Comput. 112 (July 2015), 16--25. https://doi.org/10.1145/2809915.2809918Google Scholar
- Erin Brady, Meredith Ringel Morris, and Jeffrey P. Bigham. 2015. Gauging Receptiveness to Social Microvolunteering. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 1055--1064. https://doi.org/10.1145/2702123.2702329 event-place: Seoul, Republic of Korea.Google Scholar
Digital Library
- Jonathan Bragg, Mausam, and Daniel S. Weld. 2018. Sprout: Crowd-Powered Task Design for Crowdsourcing. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST '18). ACM, New York, NY, USA, 165--176. https://doi.org/10.1145/3242587.3242598 event-place: Berlin, Germany.Google Scholar
- Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology, Vol. 3, 2 (2006), 77--101.Google Scholar
Digital Library
- Alice M. Brawley and Cynthia L. S. Pury. 2016. Work experiences on MTurk: Job satisfaction, turnover, and information sharing. Computers in Human Behavior, Vol. 54 (Jan. 2016), 531--546. https://doi.org/10.1016/j.chb.2015.08.031Google Scholar
- Erin Buehler, Stacy Branham, Abdullah Ali, Jeremy J. Chang, Megan Kelly Hofmann, Amy Hurst, and Shaun K. Kane. 2015. Sharing is Caring: Assistive Technology Designs on Thingiverse. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). Association for Computing Machinery, Seoul, Republic of Korea, 525--534. https://doi.org/10.1145/2702123.2702525Google Scholar
Digital Library
- Michele A. Burton, Erin Brady, Robin Brewer, Callie Neylan, Jeffrey P. Bigham, and Amy Hurst. 2012. Crowdsourcing Subjective Fashion Advice Using VizWiz: Challenges and Opportunities. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, 135--142.Google Scholar
Digital Library
- Dana Chandler and Adam Kapelner. 2013. Breaking monotony with meaning: Motivation in crowdsourcing markets. Journal of Economic Behavior & Organization, Vol. 90 (June 2013), 123--133. https://doi.org/10.1016/j.jebo.2013.03.003Google Scholar
Cross Ref
- Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017a. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 2334--2346. https://doi.org/10.1145/3025453.3026044 event-place: Denver, Colorado, USA.Google Scholar
Digital Library
- Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017b. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). Association for Computing Machinery, Denver, Colorado, USA, 2334--2346. https://doi.org/10.1145/3025453.3026044Google Scholar
Digital Library
- Joseph Chee Chang, Aniket Kittur, and Nathan Hahn. 2016. Alloy: Clustering with Crowds and Computation. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). Association for Computing Machinery, San Jose, California, USA, 3180--3191. https://doi.org/10.1145/2858036.2858411Google Scholar
Digital Library
- Jianfu Chen, Polina Kuznetsova, David Warren, and Yejin Choi. 2015b. Déja image-captions: A corpus of expressive descriptions in repetition. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 504--514.Google Scholar
Cross Ref
- Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C. Lawrence Zitnick. 2015a. Microsoft COCO Captions: Data Collection and Evaluation Server. arXiv preprint arXiv:1504.00325 (2015).Google Scholar
- Chun-Wei Chiang, Anna Kasunic, and Saiph Savage. 2018. Crowd Coach: Peer Coaching for Crowd Workers' Skill Growth. Proc. ACM Hum.-Comput. Interact., Vol. 2, CSCW (Nov. 2018), 37:1--37:17. https://doi.org/10.1145/3274306Google Scholar
Digital Library
- Tai-Yin Chiu, Yinan Zhao, and Danna Gurari. 2020. Assessing Image Quality Issues for Real-World Problems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3646--3656.Google Scholar
Cross Ref
- Djellel Difallah, Elena Filatova, and Panos Ipeirotis. 2018. Demographics and dynamics of Mechanical Turk workers. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM '18). ACM, New York, NY, USA, 135--143. https://doi.org/10.1145/3159652.3159661 event-place: Marina Del Rey, CA, USA.Google Scholar
Digital Library
- Desmond Elliott and Frank Keller. 2013. Image description using visual dependency representations. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1292--1302.Google Scholar
- Enrique Estellés-Arolas and Fernando González-Ladrón-de Guevara. 2012. Towards an integrated crowdsourcing definition. Journal of Information Science, Vol. 38, 2 (April 2012), 189--200. https://doi.org/10.1177/0165551512437638Google Scholar
Digital Library
- Alexandra Eveleigh, Charlene Jennett, Ann Blandford, Philip Brohan, and Anna L. Cox. 2014. Designing for dabblers and deterring drop-outs in citizen science. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2985--2994.Google Scholar
- Facebook. 2015. Facebook: Milestones. https://www.facebook.com/facebook'sk=infoGoogle Scholar
- Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In European Conference on Computer Vision. Springer, 15--29.Google Scholar
Digital Library
- Casey Fiesler and Blake Hallinan. 2018. "We are the product": Public reactions to online data sharing and privacy controversies in the media. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, 53:1--53:13. https://doi.org/10.1145/3173574.3173627 event-place: Montreal QC, Canada.Google Scholar
Digital Library
- Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, and Li Deng. 2017. Stylenet: Generating attractive visual captions with styles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3137--3146.Google Scholar
Cross Ref
- Mary L. Gray and Siddharth Suri. 2019. Ghost Work: How to stop Silicon Valley from building a new global underclass .Houghton Mifflin Harcourt. Google-Books-ID: 8AmXDwAAQBAJ.Google Scholar
- Mary L. Gray, Siddharth Suri, Syed Shoaib Ali, and Deepti Kulkarni. 2016. The crowd is a collaborative network. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW '16). ACM, New York, NY, USA, 134--147. https://doi.org/10.1145/2818048.2819942 event-place: San Francisco, California, USA.Google Scholar
Digital Library
- Neha Gupta, David Martin, Benjamin V. Hanrahan, and Jacki O'Neill. 2014. Turk-life in India. In Proceedings of the 18th International Conference on Supporting Group Work. ACM, 1--11.Google Scholar
Digital Library
- Danna Gurari and Kristen Grauman. 2017. CrowdVerge: Predicting If People Will Agree on the Answer to a Visual Question. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 3511--3522.Google Scholar
Digital Library
- Danna Gurari, Yinan Zhao, Meng Zhang, and Nilavra Bhattacharya. 2020. Captioning Images Taken by People Who Are Blind. arXiv:2002.08565 [cs] (Feb. 2020). http://arxiv.org/abs/2002.08565 arXiv: 2002.08565.Google Scholar
- Benjamin V. Hanrahan, David Martin, Jutta Willamowski, and John M. Carroll. 2018. Investigating the Amazon Mechanical Turk market through tool design. Comput. Supported Coop. Work, Vol. 27, 3--6 (Dec. 2018), 1255--1274. https://doi.org/10.1007/s10606-018--9312--6Google Scholar
Digital Library
- Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, and Jeffrey P. Bigham. 2018. A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, 449:1--449:14. https://doi.org/10.1145/3173574.3174023 event-place: Montreal QC, Canada.Google Scholar
- Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Benjamin V. Hanrahan, Jeffrey P. Bigham, and Chris Callison-Burch. 2019. Worker demographics and earnings on Amazon Mechanical Turk: An exploratory analysis. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (CHI EA '19). ACM, New York, NY, USA, LBW1217:1--LBW1217:6. https://doi.org/10.1145/3290607.3312970 event-place: Glasgow, Scotland Uk.Google Scholar
Digital Library
- David Harwath and James Glass. 2015. Deep multimodal semantic embeddings for speech and images. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, 237--244.Google Scholar
Cross Ref
- William Havard, Laurent Besacier, and Olivier Rosec. 2017. Speech-Coco: 600k visually grounded spoken captions aligned to Mscoco data set. arXiv preprint arXiv:1707.08435 (2017).Google Scholar
- Micah Hodosh, Peter Young, and Julia Hockenmaier. 2013. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics. Journal of Artificial Intelligence Research, Vol. 47 (2013), 853--899.Google Scholar
Digital Library
- Jonathan Hook, Sanne Verbaan, Peter Wright, and Patrick Olivier. 2013. Exploring the Design of technologies and services that support do-it-yourself assistive technology practice. Proceedings of DE, Vol. 2013 (2013).Google Scholar
- Lilly C. Irani and M. Six Silberman. 2013. Turkopticon: Interrupting Worker Invisibility in Amazon Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, USA, 611--620. https://doi.org/10.1145/2470654.2470742 event-place: Paris, France.Google Scholar
Digital Library
- Lilly C. Irani and M. Six Silberman. 2016. Stories We Tell About Labor: Turkopticon and the Trouble with "Design". In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 4573--4586. https://doi.org/10.1145/2858036.2858592 event-place: San Jose, California, USA.Google Scholar
- Mainak Jas and Devi Parikh. 2015. Image Specificity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2727--2736.Google Scholar
- Sanjay Kairam and Jeffrey Heer. 2016. Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW '16). Association for Computing Machinery, San Francisco, California, USA, 1637--1648. https://doi.org/10.1145/2818048.2820016Google Scholar
Digital Library
- Toni Kaplan, Susumu Saito, Kotaro Hara, and Jeffrey P. Bigham. 2018. Striving to Earn More: A Survey of Work Strategies and Tool Use Among Crowd Workers. In Sixth AAAI Conference on Human Computation and Crowdsourcing. https://www.aaai.org/ocs/index.php/HCOMP/HCOMP18/paper/view/17920Google Scholar
- Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). ACM, New York, NY, USA, 453--456. https://doi.org/10.1145/1357054.1357127Google Scholar
Digital Library
- Chen Kong, Dahua Lin, Mohit Bansal, Raquel Urtasun, and Sanja Fidler. 2014. What are you talking about? Text-to-image coreference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3558--3565.Google Scholar
Digital Library
- Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, and David A. Shamma. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, Vol. 123, 1 (2017), 32--73.Google Scholar
Digital Library
- Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. 2013. Babytalk: Understanding and Generating Simple Image Descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 12 (2013), 2891--2903.Google Scholar
Digital Library
- Polina Kuznetsova, Vicente Ordonez, Tamara L. Berg, and Yejin Choi. 2014. TreeTalk: Composition and Compression of Trees for Image Descriptions. Transactions of the Association for Computational Linguistics, Vol. 2 (Dec. 2014), 351--362. https://doi.org/10.1162/tacl_a_00188 Publisher: MIT Press.Google Scholar
- Laura Lascau, Sandy J. J. Gould, Anna L. Cox, Elizaveta Karmannaya, and Duncan P. Brumby. 2019. Monotasking or multitasking: Designing for crowdworkers' preferences. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, 419:1--419:14. https://doi.org/10.1145/3290605.3300649 event-place: Glasgow, Scotland Uk.Google Scholar
- Edith Law, Krzysztof Z. Gajos, Andrea Wiggins, Mary L. Gray, and Alex Williams. 2017. Crowdsourcing as a tool for research: Implications of uncertainty. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). ACM, New York, NY, USA, 1544--1561. https://doi.org/10.1145/2998181.2998197 event-place: Portland, Oregon, USA.Google Scholar
Digital Library
- A. M. Layas and Helen Petrie. 2016. Exploring intrinsic and extrinsic motivations to participate in a crowdsourcing project to support blind and partially sighted students. Universal Design 2016: Learning from the past, designing for the future (Proceedings of the 3rd International Conference on Universal Design, UD2016). (Aug. 2016). http://eprints.whiterose.ac.uk/118514/Google Scholar
- Siming Li, Girish Kulkarni, Tamara L. Berg, Alexander C. Berg, and Yejin Choi. 2011. Composing simple image descriptions using web-scale n-grams. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL '11). Association for Computational Linguistics, Portland, Oregon, 220--228.Google Scholar
Digital Library
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft Coco: Common Objects in Context. In European Conference on Computer Vision. Springer, 740--755.Google Scholar
- Kiel Long, John Vines, Selina Sutton, Phillip Brooker, Tom Feltwell, Ben Kirman, Julie Barnett, and Shaun Lawson. 2017. "Could you define that in bot terms"?: Requesting, creating and using bots on Reddit. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3488--3500. https://doi.org/10.1145/3025453.3025830 event-place: Denver, Colorado, USA.Google Scholar
Digital Library
- V. K. Chaithanya Manam and Alexander J. Quinn. 2018. WingIt: Efficient Refinement of Unclear Task Instructions. In Sixth AAAI Conference on Human Computation and Crowdsourcing. https://www.aaai.org/ocs/index.php/HCOMP/HCOMP18/paper/view/17931Google Scholar
- Andrew Mao, Ece Kamar, Yiling Chen, Eric Horvitz, Megan E. Schwamb, Chris J. Lintott, and Arfon M. Smith. 2013b. Volunteering versus work for pay: Incentives and tradeoffs in crowdsourcing. In First AAAI conference on human computation and crowdsourcing.Google Scholar
- Andrew Mao, Ece Kamar, and Eric Horvitz. 2013a. Why stop now? Predicting worker engagement in online crowdsourcing. In First AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
Cross Ref
- Catherine C. Marshall and Frank M. Shipman. 2013. Experiences surveying the crowd: Reflections on methods, participation, and reliability. In Proceedings of the 5th Annual ACM Web Science Conference (WebSci '13). ACM, New York, NY, USA, 234--243. https://doi.org/10.1145/2464464.2464485 event-place: Paris, France.Google Scholar
- David Martin, Benjamin V. Hanrahan, Jacki O'Neill, and Neha Gupta. 2014. Being a turker. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (CSCW '14). Association for Computing Machinery, Baltimore, Maryland, USA, 224--235. https://doi.org/10.1145/2531602.2531663Google Scholar
Digital Library
- David Martin, Jacki O'Neill, Neha Gupta, and Benjamin V. Hanrahan. 2016. Turking in a global labour market. Comput. Supported Coop. Work, Vol. 25, 1 (Feb. 2016), 39--77. https://doi.org/10.1007/s10606-015--9241--6Google Scholar
Digital Library
- Winter Mason and Siddharth Suri. 2012. Conducting behavioral research on Amazon?s Mechanical Turk. Behavior Research Methods, Vol. 44, 1 (March 2012), 1--23. https://doi.org/10.3758/s13428-011-0124--6Google Scholar
Cross Ref
- Brian McInnis, Dan Cosley, Chaebong Nam, and Gilly Leshed. 2016. Taking a HIT: Designing around rejection, mistrust, risk, and workers' experiences in Amazon Mechanical Turk. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 2271--2282. https://doi.org/10.1145/2858036.2858539 event-place: San Jose, California, USA.Google Scholar
Digital Library
- Lydia Michie, Madeline Balaam, John McCarthy, Timur Osadchiy, and Kellie Morrissey. 2018. From her story, to our story: Digital storytelling as public engagement around abortion rights advocacy in Ireland. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, 357:1--357:15. https://doi.org/10.1145/3173574.3173931 event-place: Montreal QC, Canada.Google Scholar
Digital Library
- Margaret Mitchell, Xufeng Han, Jesse Dodge, Alyssa Mensch, Amit Goyal, Alex Berg, Kota Yamaguchi, Tamara Berg, Karl Stratos, and Hal Daumé. 2012. Midge: generating image descriptions from computer vision detections. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL '12). Association for Computational Linguistics, Avignon, France, 747--756.Google Scholar
Digital Library
- Valerie S. Morash, Yue-Ting Siu, Joshua A. Miele, Lucia Hasty, and Steven Landau. 2015. Guiding novice web workers in making image descriptions using templates. ACM Transactions on Accessible Computing (TACCESS), Vol. 7, 4 (2015), 12.Google Scholar
- Meredith Ringel Morris, Jazette Johnson, Cynthia L. Bennett, and Edward Cutrell. 2018. Rich Representations of Visual Content for Screen Reader Users. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 59.Google Scholar
Digital Library
- Meredith Ringel Morris, Annuska Zolyomi, Catherine Yao, Sina Bahram, Jeffrey P. Bigham, and Shaun K. Kane. 2016. "With most of it being pictures now, I rarely use it": Understanding Twitter's Evolving Accessibility to Blind Users. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). Association for Computing Machinery, San Jose, California, USA, 5506--5516. https://doi.org/10.1145/2858036.2858116Google Scholar
Digital Library
- Babak Naderi. 2018. Motivation of workers on microtask crowdsourcing platforms .Springer, Cham, Switzerland. OCLC: 1020790439.Google Scholar
- Midas Nouwens and Clemens Nylandsted Klokmose. 2018. The application and its consequences for non-standard knowledge work. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, 399:1--399:12. https://doi.org/10.1145/3173574.3173973 event-place: Montreal QC, Canada.Google Scholar
Digital Library
- Jeremiah Parry-Hill, Patrick C. Shih, Jennifer Mankoff, and Daniel Ashbrook. 2017. Understanding Volunteer AT Fabricators: Opportunities and Challenges in DIY-AT for Others in e-NABLE. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). Association for Computing Machinery, Denver, Colorado, USA, 6184--6194. https://doi.org/10.1145/3025453.3026045Google Scholar
Digital Library
- L. G. Pee, E. Koh, and M. Goh. 2018. Trait motivations of crowdsourcing and task choice: A distal-proximal perspective. International Journal of Information Management, Vol. 40 (June 2018), 28--41. https://doi.org/10.1016/j.ijinfomgt.2018.01.008Google Scholar
- Helen Petrie, Chandra Harrison, and Sundeep Dev. 2005. Describing Images on the Web: A Survey of Current Practice and Prospects for the Future. Proceedings of Human Computer Interaction International (HCII), Vol. 71 (2005).Google Scholar
- Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Association for Computational Linguistics, 139--147.Google Scholar
Digital Library
- Ludovico Orlando Russo, Giuseppe Airò Farulla, and Carlo Boccazzi Varotto. 2018. Hackability: A Methodology to Encourage the Development of DIY Assistive Devices. In Computers Helping People with Special Needs (Lecture Notes in Computer Science ), Klaus Miesenberger and Georgios Kouroupetroglou (Eds.). Springer International Publishing, Cham, 156--163. https://doi.org/10.1007/978--3--319--94274--2_22Google Scholar
- Susumu Saito, Chun-Wei Chiang, Saiph Savage, Teppei Nakano, Tetsunori Kobayashi, and Jeffrey P. Bigham. 2019. TurkScanner: Predicting the hourly wage of microtasks. In The World Wide Web Conference (WWW '19). ACM, New York, NY, USA, 3187--3193. https://doi.org/10.1145/3308558.3313716 event-place: San Francisco, CA, USA.Google Scholar
- Niloufar Salehi, Lilly C. Irani, Michael S. Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, and Clickhappier. 2015. We are Dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 1621--1630. https://doi.org/10.1145/2702123.2702508 event-place: Seoul, Republic of Korea.Google Scholar
Digital Library
- Elliot Salisbury, Ece Kamar, and Meredith Ringel Morris. 2017. Toward Scalable Social Alt Text: Conversational Crowdsourcing as a Tool for Refining Vision-to-Language Technology for the Blind. Proceedings of HCOMP 2017 (2017).Google Scholar
- Mike Schaekermann, Joslin Goh, Kate Larson, and Edith Law. 2018. Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (Nov. 2018), 154:1--154:19. https://doi.org/10.1145/3274423Google Scholar
- Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, and Jason Weston. 2018. Engaging image captioning via personality. arXiv preprint arXiv:1810.10665 (2018).Google Scholar
- Luiz Fernando Silva Pinto and Carlos Denner dos Santos Júnior. 2018. Motivations of crowdsourcing contributors. RAI: Revista de Administração e Inovação; São Paulo, Vol. 15, 1 (2018), 58--72. http://search.proquest.com/docview/2063479696/abstract/648431A1613B4846PQ/1Google Scholar
- Jesper Simonsen and Toni Robertson (Eds.). 2013. Routledge international handbook of participatory design .Routledge, New York. OCLC: 754734489.Google Scholar
- Abigale Stangl, Meredith Ringel Morris, and Danna Gurari. 2020. "Person, Shoes, Tree. Is the Person Naked?" What People with Vision Impairments Want in Image Descriptions. Honolulu, HI, USA, 13. https://doi.org/10.1145/3313831.3376404Google Scholar
- Abigale J. Stangl, Esha Kothari, Suyog D. Jain, Tom Yeh, Kristen Grauman, and Danna Gurari. 2018. BrowseWithMe: An Online Clothes Shopping Assistant for People with Visual Impairments. In ACM SIGACCESS Conference on Computers and Accessibility (ASSETS).Google Scholar
Digital Library
- Rebekah Steele and Marjorie Derven. 2015. Diversity & Inclusion and innovation: A virtuous cycle. Industrial and Commercial Training, Vol. 47, 1 (Jan. 2015), 1--7. https://doi.org/10.1108/ICT-09--2014-0063Google Scholar
Cross Ref
- Twitter. 2015. About Twitter, Inc. https://about.twitter.com/companyGoogle Scholar
- Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-Based Image Description Evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4566--4575.Google Scholar
Cross Ref
- Luis Von Ahn, Shiry Ginosar, Mihir Kedia, Ruoran Liu, and Manuel Blum. 2006. Improving accessibility of the web with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 79--82.Google Scholar
Digital Library
- Violeta Voykinska, Shiri Azenkot, Shaomei Wu, and Gilly Leshed. 2016. How Blind People Interact with Visual Content on Social Networking Services. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW '16). Association for Computing Machinery, San Francisco, California, USA, 1584--1595. https://doi.org/10.1145/2818048.2820013Google Scholar
Digital Library
- Meihong Wang, Yuling Sun, Jing Yang, and Liang He. 2018. Enabling the Disagreement among Crowds: A Collaborative Crowdsourcing Framework. In 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design ((CSCWD)). 790--795. https://doi.org/10.1109/CSCWD.2018.8465368Google Scholar
Cross Ref
- Huichuan Xia, Yang Wang, Yun Huang, and Anuj Shah. 2017. "Our privacy needs to be protected at all costs": Crowd workers' privacy experiences on Amazon Mechanical Turk. Proc. ACM Hum.-Comput. Interact., Vol. 1, CSCW (Dec. 2017), 113:1--113:22. https://doi.org/10.1145/3134748Google Scholar
Digital Library
- Chun-Ju Yang, Kristen Grauman, and Danna Gurari. 2018. Visual question answer diversity. In Sixth AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Yezhou Yang, Ching Lik Teo, Hal Daumé, and Yiannis Aloimonos. 2011. Corpus-guided sentence generation of natural images. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). Association for Computational Linguistics, Edinburgh, United Kingdom, 444--454.Google Scholar
Digital Library
- Yuya Yoshikawa, Yutaro Shigeto, and Akikazu Takeuchi. 2017. Stair captions: Constructing a large-scale Japanese image caption dataset. arXiv preprint arXiv:1705.00823 (2017).Google Scholar
- Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, Vol. 2 (2014), 67--78.Google Scholar
Cross Ref
- Licheng Yu, Eunbyung Park, Alexander C. Berg, and Tamara L. Berg. 2015. Visual Madlibs: Fill in the Blank Description Generation and Question Answering. In Proceedings of the Ieee International Conference on Computer Vision. 2461--2469.Google Scholar
- Yuhang Zhao, Shaomei Wu, Lindsay Reynolds, and Shiri Azenkot. 2017. The effect of computer-generated descriptions on photo-sharing experiences of people with visual impairments. Proceedings of the ACM on Human-Computer Interaction, Vol. 1, CSCW (2017), 121.Google Scholar
Digital Library
- Yuxiang Zhao and Qinghua Zhu. 2014. Effects of extrinsic and intrinsic motivation on participation in crowdsourcing contest: A perspective of self-determination theory. Online Information Review; Bradford, Vol. 38, 7 (2014), 896--917. https://doi.org/10.1108/OIR-08--2014-0188Google Scholar
- Haichao Zheng, Dahui Li, and Wenhua Hou. 2011. Task design, motivation, and participation in crowdsourcing contests. International Journal of Electronic Commerce, Vol. 15, 4 (July 2011), 57--88. https://doi.org/10.2753/JEC1086--4415150402Google Scholar
Digital Library
- Yu Zhong, Walter S. Lasecki, Erin Brady, and Jeffrey P. Bigham. 2015. Regionspeak: Quick comprehensive spatial descriptions of complex images for blind users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2353--2362.Google Scholar
- C. Lawrence Zitnick, Devi Parikh, and Lucy Vanderwende. 2013. Learning the visual interpretation of sentences. In Proceedings of the IEEE International Conference on Computer Vision. 1681--1688.Google Scholar
Digital Library
Index Terms
"I Hope This Is Helpful": Understanding Crowdworkers' Challenges and Motivations for an Image Description Task
Recommendations
A Community Rather Than A Union: Understanding Self-Organization Phenomenon on MTurk and How It Impacts Turkers and Requesters
CHI EA '17: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing SystemsThis paper aims to understand the self-organization phenomenon among the workers of Amazon Mechanical Turk (MTurk), a well-known crowdsourcing platform. Specifically, we explored 1) why MTurk workers self-organize into online communities (Turker ...
How many crowdsourced workers should a requester hire?
Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a ...
Understanding the Microtask Crowdsourcing Experience for Workers with Disabilities: A Comparative View
CSCWMicrotask crowdsourcing holds great potential as an employment opportunity with the flexibility and anonymity that individuals with disability may require. Though prior research has explored the accessibility of crowd work, the lived crowd work ...






Comments