Abstract
Scientific discoveries are often driven by finding analogies in distant domains, but the growing number of papers makes it difficult to find relevant ideas in a single discipline, let alone distant analogies in other domains. To provide computational support for finding analogies across domains, we introduce SOLVENT, a mixed-initiative system where humans annotate aspects of research papers that denote their background (the high-level problems being addressed), purpose (the specific problems being addressed), mechanism (how they achieved their purpose), and findings (what they learned/achieved), and a computational model constructs a semantic representation from these annotations that can be used to find analogies among the research papers. We demonstrate that this system finds more analogies than baseline information-retrieval approaches; that annotators and annotations can generalize beyond domain; and that the resulting analogies found are useful to experts. These results demonstrate a novel path towards computationally supported knowledge sharing in research communities.
Supplemental Material
Available for Download
Data and code for Study 1 and 3
- Paul André, Haoqi Zhang, Juho Kim, Lydia Chilton, Steven P. Dow, and Robert C. Miller. 2013. Community clustering: Leveraging an academic crowd to form coherent conference sessions. In First AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Ryan Arlitt, Friederich Berthelsdorf, Sebastian Immel, and Robert B. Stone. 2014. The Biology Phenomenon Categorizer: A Human Computation Framework in Support of Biologically Inspired Design . Journal of Mechanical Design (2014).Google Scholar
- Michele Banko, Michael J Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open Information Extraction from the Web.. In IJCAI, Vol. 7. 2670--2676. Google Scholar
Digital Library
- Abraham Bernstein, James Hendler, and Natalya Noy. 2016. A New Look at the Semantic Web . Commun. ACM , Vol. 59, 9 (Aug. 2016), 35--37. Google Scholar
Digital Library
- Chandra Bhagavatula, Sergey Feldman, Russell Power, and Waleed Ammar. 2018. Content-based citation recommendation. arXiv preprint arXiv:1802.08301 (2018).Google Scholar
- David M. Blei, Andrew Y. Ng, Michael I. Jordan, and John Lafferty. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research (2003), 993--1022. Google Scholar
Digital Library
- Jonathan Bragg and Daniel S. Weld. 2013. Crowdsourcing Multi-Label Classification for Taxonomy Creation. In First AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Joseph C. Chang, Aniket Kittur, and Nathan Hahn. 2016. Alloy: Clustering with crowds and computation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM. Google Scholar
Digital Library
- Lydia B. Chilton, Juho Kim, Paul André, Felicia Cordeiro, James A. Landay, Daniel S. Weld, Steven P. Dow, Robert C. Miller, and Haoqi Zhang. 2014. Frenzy: Collaborative Data Organization for Creating Conference Sessions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '14). ACM, New York, NY, USA, 1255--1264. Google Scholar
Digital Library
- Lydia B. Chilton, Greg Little, Darren Edge, Daniel S. Weld, and James A. Landay. 2013. Cascade: Crowdsourcing taxonomy creation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1999--2008. Google Scholar
Digital Library
- Paolo Ciccarese, Elizabeth Wu, Gwen Wong, Marco Ocana, June Kinoshita, Alan Ruttenberg, and Tim Clark. 2008. The SWAN biomedical discourse ontology. Journal of Biomedical Informatics , Vol. 41, 5 (Oct. 2008), 739--751. Google Scholar
Digital Library
- Tim Clark, Paolo N. Ciccarese, and Carole A. Goble. 2014. Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. Journal of Biomedical Semantics , Vol. 5 (July 2014), 28.Google Scholar
- Scott Deerwester, Susan T. Dumais, Geroge W. Furnas, and Thomas K. Landauer. 1990. Indexing by Latent Semantic Analysis. JASIST , Vol. 41, 6 (1990), 1990.Google Scholar
Cross Ref
- Brian Falkenhainer, Kenneth D Forbus, and Dedre Gentner. 1989. The structure-mapping engine: Algorithm and examples. Artificial intelligence , Vol. 41, 1 (1989), 1--63. Google Scholar
Digital Library
- Dedre Gentner. 1983. Structure-Mapping: A Theoretical Framework for Analogy*. Cognitive science , Vol. 7, 2 (1983), 155--170.Google Scholar
- M. L. Gick and K. J. Holyoak. 1983. Schema induction and analogical transfer. Cognitive Psychology , Vol. 15, 1 (1983), 1--38.Google Scholar
Cross Ref
- Karni Gilon, Joel Chan, Felicia Y Ng, Hila Lifshitz Assaf, Aniket Kittur, and Dafna Shahaf. 2018. Analogy Mining for Specific Design Needs . In Proceedings of the 2018 ACM SIGCHI Conference on Human Factors in Computing. Google Scholar
Digital Library
- Nathan Hahn, Joseph Chang, Ji Eun Kim, and Aniket Kittur. 2016. The Knowledge Accelerator: Big Picture Thinking in Small Pieces. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 2258--2270. Google Scholar
Digital Library
- Silvana Hartmann, Ilia Kuznetsov, Teresa Martin, and Iryna Gurevych. 2017. Out-of-domain FrameNet Semantic Role Labeling. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Vol. 1. 471--482.Google Scholar
Cross Ref
- Qi He, Jian Pei, Daniel Kifer, Prasenjit Mitra, and Lee Giles. 2010. Context-aware Citation Recommendation. In Proceedings of the 19th International Conference on World Wide Web (WWW '10). ACM, New York, NY, USA, 421--430. Google Scholar
Digital Library
- K. J. Holyoak and P. Thagard. 1996. The analogical scientist. In Mental Leaps: Analogy in Creative Thought , K. J. Holyoak and P. Thagard (Eds.). Cambridge, MA, 185--209.Google Scholar
- Tom Hope, Joel Chan, Aniket Kittur, and Dafna Shahaf. 2017. Accelerating Innovation Through Analogy Mining. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 235--243. Google Scholar
Digital Library
- John E Hummel and Keith J Holyoak. 2003. A symbolic-connectionist theory of relational inference and generalization. Psychological review , Vol. 110, 2 (2003), 220.Google Scholar
- Benjamin F. Jones. 2009. The Burden of Knowledge and the Death of the Renaissance Man: Is Innovation Getting Harder? Review of Economic Studies , Vol. 76, 1 (2009), 283--317.Google Scholar
Cross Ref
- Juho Kim, Phu Tran Nguyen, Sarah Weir, Philip J. Guo, Robert C. Miller, and Krzysztof Z. Gajos. 2014. Crowdsourcing Step-by-step Information Extraction to Enhance Existing How-to Videos. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '14). ACM, New York, NY, USA, 4017--4026. Google Scholar
Digital Library
- Scott Kirkpatrick, C Daniel Gelatt, Mario P Vecchi, et almbox. 1983. Optimization by simulated annealing. science , Vol. 220, 4598 (1983), 671--680.Google Scholar
- Maria Liakata, Shyamasree Saha, Simon Dobnik, Colin Batchelor, and Dietrich Rebholz-Schuhmann. 2012. Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics , Vol. 28, 7 (April 2012), 991--1000. Google Scholar
Digital Library
- Maria Liakata, Simone Teufel, Advaith Siddharthan, Colin R Batchelor, and others. 2010. Corpora for the Conceptualisation and Zoning of Scientific Papers.. In LREC. Citeseer.Google Scholar
- Yicong Liang, Qing Li, and Tieyun Qian. 2011. Finding Relevant Papers Based on Citation Relations. In Web-Age Information Management (Lecture Notes in Computer Science ). Springer, Berlin, Heidelberg, 403--414. Google Scholar
Digital Library
- Angli Liu, Stephen Soderland, Jonathan Bragg, Christopher H Lin, Xiao Ling, and Daniel S Weld. 2016. Effective Crowd Annotation for Relation Extraction.. In HLT-NAACL. 897--906.Google Scholar
- Salvador E Luria and Max Delbrück. 1943. Mutations of bacteria from virus sensitivity to virus resistance. Genetics , Vol. 28, 6 (1943), 491.Google Scholar
Cross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient Estimation of Word Representations in Vector Space . arXiv:1301.3781 {cs} (Jan. 2013). http://arxiv.org/abs/1301.3781 arXiv: 1301.3781.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed Representations of Words and Phrases and their Compositionality . In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111--3119. Google Scholar
Digital Library
- Tanushree Mitra, C.J. Hutto, and Eric Gilbert. 2015. Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 1345--1354. Google Scholar
Digital Library
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) , Vol. 12 (2014), 1532--1543.Google Scholar
Cross Ref
- Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological review , Vol. 106, 4 (1999), 643.Google Scholar
- Xiang Ren, Jialu Liu, Xiao Yu, Urvashi Khandelwal, Quanquan Gu, Lidan Wang, and Jiawei Han. 2014. ClusCite: effective citation recommendation by information network-based clustering. In Knowledge Discovery and Data Mining. 821--830. Google Scholar
Digital Library
- R. Keith Sawyer. 2012. Explaining creativity: the science of human innovation 2nd ed.). Oxford University Press, New York.Google Scholar
- Aashish Sheshadri and Matthew Lease. 2013. SQUARE: A Benchmark for Research on Computing Crowd Consensus. In Proceedings of the 1st AAAI Conference on Human Computation (HCOMP). 156--164. http://ir.ischool.utexas.edu/square/documents/sheshadri.pdfGoogle Scholar
- Pao Siangliulue, Joel Chan, Bernd Huber, Steven P. Dow, and Krzysztof Z. Gajos. 2016. IdeaHound: Self-sustainable Idea Generation in Creative Online Communities. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion (CSCW '16 Companion ). ACM, New York, NY, USA, 98--101. Google Scholar
Digital Library
- David W Stephens and John R Krebs. 1986. Foraging theory .Princeton University Press.Google Scholar
- Trevor Strohman, W. Bruce Croft, and David Jensen. 2007. Recommending Citations for Academic Papers . In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '07). ACM, New York, NY, USA, 705--706. Google Scholar
Digital Library
- Yalin Sun, Pengxiang Cheng, Shengwei Wang, Hao Lyu, Matthew Lease, Iain Marshall, and Byron C. Wallace. 2016. Crowdsourcing Information Extraction for Biomedical Systematic Reviews. In 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP): Works-in-Progress Track. http://arxiv.org/abs/1609.01017 3 pages. arXiv:1609.01017.Google Scholar
- Swaroop Vattam, Bryan Wiltgen, Michael Helms, Ashok K. Goel, and Jeannette Yen. 2011. DANE: Fostering Creativity in and through Biologically Inspired Design . In Design Creativity 2010 . http://link.springer.com/chapter/10.1007/978-0--85729--224--7_16Google Scholar
- S. Wuchty, B. F. Jones, and B. Uzzi. 2007. The increasing dominance of teams in production of knowledge. Science , Vol. 316, 5827 (2007), 1036--1039.Google Scholar
- James Zou, Kamalika Chaudhuri, and Adam Kalai. 2015. Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons. In Third AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
Cross Ref
Index Terms
SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers
Recommendations
Surface Name Errors in Wikipedia
CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)Surface name is the string used to refer to an entity in a text corpus. Crowd-sourced knowledge repositories such as Wikipedia can have multiple types of errors, including surface name errors. This paper focuses on identifying and correcting surface ...
Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles
Massive increases in electronically available text have spurred a variety of natural language processing methods to automatically identify relationships from text; however, existing annotated collections comprise only bioinformatics (gene-protein) or ...
Creating Training Data for Scientific Named Entity Recognition with Minimal Human Effort
Computational Science – ICCS 2019AbstractScientific Named Entity Referent Extraction is often more complicated than traditional Named Entity Recognition (NER). For example, in polymer science, chemical structure may be encoded in a variety of nonstandard naming conventions, and authors ...






Comments