skip to main content
research-article
Public Access

Help Me to Help You: Machine Augmented Citizen Science

Published: 14 November 2019 Publication History

Abstract

The increasing size of datasets with which researchers in a variety of domains are confronted has led to a range of creative responses, including the deployment of modern machine learning techniques and the advent of large scale “citizen science projects.” However, the ability of the latter to provide suitably large training sets for the former is stretched as the size of the problem (and competition for attention amongst projects) grows. We explore the application of unsupervised learning to leverage structure that exists in an initially unlabelled dataset. We simulate grouping similar points before presenting those groups to volunteers to label. Citizen science labelling of grouped data is more efficient, and the gathered labels can be used to improve efficiency further for labelling future data.
To demonstrate these ideas, we perform experiments using data from the Pan-STARRS Survey for Transients (PSST) with volunteer labels gathered by the Zooniverse project, Supernova Hunters and a simulated project using the MNIST handwritten digit dataset. Our results show that, in the best case, we might expect to reduce the required volunteer effort by 87.0% and 92.8% for the two datasets, respectively. These results illustrate a symbiotic relationship between machine learning and citizen scientists where each empowers the other with important implications for the design of citizen science projects in the future.

References

[1]
E. Aljalbout, V. Golkov, Y. Siddiqui, M. Strobel, and D. Cremers. 2018. Clustering with deep learning: Taxonomy and new methods. ArXiv E-prints (Jan. 2018). arxiv:1801.07648
[2]
Gagan Bansal and Daniel S. Weld. 2018. A coverage-based utility model for identifying unknown unknowns. In Proc. of AAAI.
[3]
T. Boyajian, S. Croft, J. Wright, A. Siemion, M. Muterspaugh, M. Siegel, B. Gary, S. Wright, J. Maire, A. Duenas, C. Hultgren, and J. Ramos. 2017. A drop in optical flux from Boyajian’s star. The Astronomer’s Telegram 10405 (May 2017).
[4]
T. S. Boyajian, D. M. LaCourse, S. A. Rappaport, D. Fabrycky, D. A. Fischer, D. Gandolfi, G. M. Kennedy, H. Korhonen, M. C. Liu, A. Moor, K. Olah, K. Vida, M. C. Wyatt, W. M. J. Best, J. Brewer, F. Ciesla, B. Csak, H. J. Deeg, T. J. Dupuy, G. Handler, K. Heng, S. B. Howell, S. T. Ishikawa, J. Kovacs, T. Kozakis, L. Kriskovics, J. Lehtinen, C. Lintott, S. Lynn, D. Nespral, S. Nikbakhsh, K. Schawinski, J. R. Schmitt, A. M. Smith, Gy. Szabo, R. Szabo, J. Viuho, J. Wang, A. Weiksnar, M. Bosch, J. L. Connors, S. Goodman, G. Green, A. J. Hoekstra, T. Jebson, K. J. Jek, M. R. Omohundro, H. M. Schwengeler, and A. Szewczyk. 2016. Planet Hunters IX. KIC8462852—Where’s the flux?Monthly Notices of the Royal Astronomical Society 457, 4 (2016), 3988--4004. eprint=/oup/backfile/content_public/journal/mnras/457/4/10.1093_mnras_stw218/3/stw218.pdf.
[5]
Steve Branson, Grant Van Horn, and Pietro Perona. 2017. Lean crowdsourcing: Combining humans and machines in an online system. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7474--7483.
[6]
C. Cardamone, K. Schawinski, M. Sarzi, S. P. Bamford, N. Bennert, C. M. Urry, C. Lintott, W. C. Keel, J. Parejko, R. C. Nichol, D. Thomas, D. Andreescu, P. Murray, M. J. Raddick, A. Slosar, A. Szalay, and J. Vandenberg. 2009. Galaxy zoo green peas: Discovery of a class of compact extremely star-forming galaxies. Monthly Notices of the Royal Astronomical Society 399 (Nov. 2009), 1191--1205. arxiv:0907.4155
[7]
K. C. Chambers, E. A. Magnier, N. Metcalfe, H. A. Flewelling, M. E. Huber, C. Z. Waters, L. Denneau, P. W. Draper, D. Farrow, D. P. Finkbeiner, C. Holmberg, J. Koppenhoefer, P. A. Price, A. Rest, R. P. Saglia, E. F. Schlafly, S. J. Smartt, W. Sweeney, R. J. Wainscoat, W. S. Burgett, S. Chastel, T. Grav, J. N. Heasley, K. W. Hodapp, R. Jedicke, N. Kaiser, R.-P. Kudritzki, G. A. Luppino, R. H. Lupton, D. G. Monet, J. S. Morgan, P. M. Onaka, B. Shiao, C. W. Stubbs, J. L. Tonry, R. White, E. Bañados, E. F. Bell, R. Bender, E. J. Bernard, M. Boegner, F. Boffi, M. T. Botticella, A. Calamida, S. Casertano, W.-P. Chen, X. Chen, S. Cole, N. Deacon, C. Frenk, A. Fitzsimmons, S. Gezari, V. Gibbs, C. Goessl, T. Goggia, R. Gourgue, B. Goldman, P. Grant, E. K. Grebel, N. C. Hambly, G. Hasinger, A. F. Heavens, T. M. Heckman, R. Henderson, T. Henning, M. Holman, U. Hopp, W.-H. Ip, S. Isani, M. Jackson, C. D. Keyes, A. M. Koekemoer, R. Kotak, D. Le, D. Liska, K. S. Long, J. R. Lucey, M. Liu, N. F. Martin, G. Masci, B. McLean, E. Mindel, P. Misra, E. Morganson, D. N. A. Murphy, A. Obaika, G. Narayan, M. A. Nieto-Santisteban, P. Norberg, J. A. Peacock, E. A. Pier, M. Postman, N. Primak, C. Rae, A. Rai, A. Riess, A. Riffeser, H. W. Rix, S. Röser, R. Russel, L. Rutz, E. Schilbach, A. S. B. Schultz, D. Scolnic, L. Strolger, A. Szalay, S. Seitz, E. Small, K. W. Smith, D. R. Soderblom, P. Taylor, R. Thomson, A. N. Taylor, A. R. Thakar, J. Thiel, D. Thilker, D. Unger, Y. Urata, J. Valenti, J. Wagner, T. Walder, F. Walter, S. P. Watters, S. Werner, W. M. Wood-Vasey, and R. Wyse. 2016. The pan-STARRS1 surveys. Arxiv E-prints (Dec. 2016). arxiv:astro-ph.IM/1612.05560
[8]
Sander Dieleman, Kyle W Willett, and Joni Dambre. 2015. Rotation-invariant convolutional neural networks for galaxy morphology prediction. Monthly Notices of the Royal Astronomical Society 450, 2 (2015), 1441--1459.
[9]
H. Domínguez Sánchez, M. Huertas-Company, M. Bernardi, D. Tuccillo, and J. L. Fischer. 2018. Improving galaxy morphologies for SDSS with Deep Learning. Monthly Notices of the Royal Astronomical Society 476, 3 (2018), 3661--3676.
[10]
Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2009. Visualizing Higher-Layer Features of a Deep Network. Technical Report 1341. University of Montreal.
[11]
J. E. Geach, A. More, A. Verma, P. J. Marshall, N. Jackson, P.-E. Belles, R. Beswick, E. Baeten, M. Chavez, C. Cornen, B. E. Cox, T. Erben, N. J. Erickson, S. Garrington, P. A. Harrison, K. Harrington, D. H. Hughes, R. J. Ivison, C. Jordan, Y.-T. Lin, A. Leauthaud, C. Lintott, S. Lynn, A. Kapadia, J.-P. Kneib, C. Macmillan, M. Makler, G. Miller, A. Montaña, R. Mujica, T. Muxlow, G. Narayanan, D. O’Briain, T. O’Brien, M. Oguri, E. Paget, M. Parrish, N. P. Ross, E. Rozo, C. E. Rusu, E. S. Rykoff, D. Sanchez-Argüelles, R. Simpson, C. Snyder, F. P. Schloerb, M. Tecza, W.-H. Wang, L. Van Waerbeke, J. Wilcox, M. Viero, G. W. Wilson, M. S. Yun, and M. Zeballos. 2015. The Red Radio Ring: A gravitationally lensed hyperluminous infrared radio galaxy at z &emp;equals; 2.553 discovered through the citizen science project SPACE WARPS. Monthly Notices of the Royal Astronomical Society 452 (Sept. 2015), 502--510. arxiv:1503.05824
[12]
Xifeng Guo, Xinwang Liu, En Zhu, and Jianping Yin. 2017. Deep clustering with convolutional autoencoders. In International Conference on Neural Information Processing. Springer, 373--382.
[13]
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2001. The Elements of Statistical Learning. Springer New York, Inc., New York, NY.
[14]
Michael E. Hodgson. 1998. What size window for image classification? A cognitive perspective. PE 8 RS- Photogrammetric Engineering and Remote Sensing 64, 8 (1998), 797--807.
[15]
Ž. Ivezić, S. M. Kahn, J. A. Tyson, B. Abel, E. Acosta, R. Allsman, D. Alonso, Y. AlSayyad, S. F. Anderson, J. Andrew, and et al.2008. LSST: From science drivers to reference design and anticipated data products. ArXiv E-prints (May 2008). arxiv:0805.2366
[16]
Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, and Hanning Zhou. 2016. Variational deep embedding: An unsupervised and generative approach to clustering. Arxiv Preprint (2016). arXiv:1611.05148
[17]
D. P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. ArXiv E-prints (Dec. 2014). arxiv:1412.6980
[18]
Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Eric Horvitz. 2017. Identifying unknown unknowns in the open world: Representations and policies for guided exploration. In AAAI, Vol. 1. 2.
[19]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.
[20]
C. J. Lintott, K. Schawinski, A. Slosar, K. Land, S. Bamford, D. Thomas, M. J. Raddick, R. C. Nichol, A. Szalay, D. Andreescu, P. Murray, and J. Vandenberg. 2008. Galaxy Zoo: Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society 389 (Sept. 2008), 1179--1189.
[21]
James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. Oakland, CA, 281--297.
[22]
P. J. Marshall, A. Verma, A. More, C. P. Davis, S. More, A. Kapadia, M. Parrish, C. Snyder, J. Wilcox, E. Baeten, C. Macmillan, C. Cornen, M. Baumer, E. Simpson, C. J. Lintott, D. Miller, E. Paget, R. Simpson, A. M. Smith, R. Küng, P. Saha, and T. E. Collett. 2016. SPACE WARPS—I. Crowdsourcing the discovery of gravitational lenses. Monthly Notices of the Royal Astronomical Society 455 (Jan. 2016), 1171--1190. arxiv:astro-ph.IM/1504.06148
[23]
M. S. Norouzzadeh, A. Nguyen, M. Kosmala, A. Swanson, C. Packer, and J. Clune. 2017. Automatically identifying wild animals in camera trap images with deep learning. Arxiv Preprint (2017). arXiv:1703.05830
[24]
Chris Olah, Alexander Mordvintsev, and Ludwig Schubert. 2017. Feature visualization. Distill (2017). Retrieved from https://distill.pub/2017/feature-visualization.
[25]
Sharon Oviatt. 2006. Human-centered design meets cognitive load theory: Designing interfaces that help people think. In Proceedings of the 14th ACM International Conference on Multimedia. ACM, 871--880.
[26]
Andrew Rosenberg and Julia Hirschberg. 2007. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07).
[27]
Burr Settles. 2012. Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6, 1 (2012), 1--114.
[28]
E. Simpson, S. Roberts, I. Psorakis, and A. Smith. 2012. Dynamic Bayesian combination of multiple imperfect classifiers. ArXiv E-prints (June 2012). arxiv:math.ST/1206.1831
[29]
A. Swanson, M. Kosmala, C. Lintott, R. Simpson, A. Smith, and C. Packer. 2015. Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Scientific Data 2, 150026 (2015). http://dx.doi.org/10.1038/sdata.2015.26
[30]
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11 (Dec.2010), 3371--3408.
[31]
K. W. Willett, C. J. Lintott, S. P. Bamford, K. L. Masters, B. D. Simmons, K. R. V. Casteels, E. M. Edmondson, L. F. Fortson, S. Kaviraj, W. C. Keel, T. Melvin, R. C. Nichol, M. J. Raddick, K. Schawinski, R. J. Simpson, R. A. Skibba, A. M. Smith, and D. Thomas. 2013. Galaxy Zoo 2: Detailed morphological classifications for 304 122 galaxies from the sloan digital sky survey. Monthly Notices of the Royal Astronomical Society 435 (Nov. 2013), 2835--2860. arxiv:1308.3496
[32]
Darryl Wright. 2015. Machine Learning for Transient Surveys. Ph.D. Dissertation. Department of Physics and Astronomy, Queen’s University Belfast.
[33]
D. E. Wright, C. J. Lintott, S. J. Smartt, K. W. Smith, L. Fortson, L. Trouille, C. R. Allen, M. Beck, M. C. Bouslog, A. Boyer, K. C. Chambers, H. Flewelling, W. Granger, E. A. Magnier, A. McMaster, G. R. M. Miller, J. E. O’Donnell, B. Simmons, H. Spiers, J. L. Tonry, M. Veldthuis, R. J. Wainscoat, C. Waters, M. Willman, Z. Wolfenbarger, and D. R. Young. 2017. A transient search using combined human and machine classifications. Monthly Notices of the Royal Astronomical Society 472, 2 (2017), 1315--1323.
[34]
Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised deep embedding for clustering analysis. In International Conference on Machine Learning. 478--487.
[35]
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. Springer, 818--833.

Cited By

View all
  • (2024)Narratives of epistemic agency in citizen science classification projects: ideals of science and roles of citizensAI & Society10.1007/s00146-022-01428-939:2(523-540)Online publication date: 1-Apr-2024
  • (2022)Human-machine-learning integration and task allocation in citizen scienceHumanities and Social Sciences Communications10.1057/s41599-022-01049-z9:1Online publication date: 9-Feb-2022
  • (2021)The Partnership of Citizen Science and Machine Learning: Benefits, Risks, and Future Challenges for Engagement, Data Collection, and Data QualitySustainability10.3390/su1314808713:14(8087)Online publication date: 20-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Social Computing
ACM Transactions on Social Computing  Volume 2, Issue 3
September 2019
90 pages
EISSN:2469-7826
DOI:10.1145/3372281
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2019
Accepted: 01 September 2019
Revised: 01 September 2019
Received: 01 March 2019
Published in TSC Volume 2, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep learning
  2. citizen science
  3. clustering
  4. crowdsourcing
  5. machine learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)13
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Narratives of epistemic agency in citizen science classification projects: ideals of science and roles of citizensAI & Society10.1007/s00146-022-01428-939:2(523-540)Online publication date: 1-Apr-2024
  • (2022)Human-machine-learning integration and task allocation in citizen scienceHumanities and Social Sciences Communications10.1057/s41599-022-01049-z9:1Online publication date: 9-Feb-2022
  • (2021)The Partnership of Citizen Science and Machine Learning: Benefits, Risks, and Future Challenges for Engagement, Data Collection, and Data QualitySustainability10.3390/su1314808713:14(8087)Online publication date: 20-Jul-2021
  • (2021)Galaxy Zoo DECaLS: Detailed visual morphology measurements from volunteers and deep learning for 314 000 galaxiesMonthly Notices of the Royal Astronomical Society10.1093/mnras/stab2093509:3(3966-3988)Online publication date: 30-Sep-2021
  • (2021)From Green Peas to STEVE: Citizen Science Engagement in Space ScienceSpace Science and Public Engagement10.1016/B978-0-12-817390-9.00009-9(185-219)Online publication date: 2021
  • (2021)Snapshot Wisconsin: networking community scientists and remote sensing to improve ecological monitoring and managementEcological Applications10.1002/eap.243631:8Online publication date: 12-Sep-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media