skip to main content
research-article

GeCoAgent: A Conversational Agent for Empowering Genomic Data Extraction and Analysis

Authors Info & Claims
Published:15 October 2021Publication History
Skip Abstract Section

Abstract

With the availability of reliable and low-cost DNA sequencing, human genomics is relevant to a growing number of end-users, including biologists and clinicians. Typical interactions require applying comparative data analysis to huge repositories of genomic information for building new knowledge, taking advantage of the latest findings in applied genomics for healthcare. Powerful technology for data extraction and analysis is available, but broad use of the technology is hampered by the complexity of accessing such methods and tools.

This work presents GeCoAgent, a big-data service for clinicians and biologists. GeCoAgent uses a dialogic interface, animated by a chatbot, for supporting the end-users’ interaction with computational tools accompanied by multi-modal support. While the dialogue progresses, the user is accompanied in extracting the relevant data from repositories and then performing data analysis, which often requires the use of statistical methods or machine learning. Results are returned using simple representations (spreadsheets and graphics), while at the end of a session the dialogue is summarized in textual format. The innovation presented in this article is concerned with not only the delivery of a new tool but also our novel approach to conversational technologies, potentially extensible to other healthcare domains or to general data science.

REFERENCES

  1. [1] Masseroli Marco, Pinoli Pietro, Venco Francesco, Kaitoua Abdulrahman, Jalili Vahid, Palluzzi Fernando, Muller Heiko, and Ceri Stefano. 2015. GenoMetric query language: A novel approach to large-scale genomic data management. Bioinformatics 31, 12 (2015), 18811888.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Masseroli Marco, Canakoglu Arif, Pinoli Pietro, Kaitoua Abdulrahman, Gulino Andrea, Horlova Olha, Nanni Luca, Bernasconi Anna, Perna Stefano, Stamoulakatou Eirini, et al. 2019. Processing of big heterogeneous genomic datasets for tertiary analysis of next generation sequencing data. Bioinformatics 35, 5 (2019), 729736.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Bernasconi A., Canakoglu A., Masseroli M., and Ceri S.. 2020. META-BASE: A novel architecture for large-scale genomic metadata integration. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2020), 11. DOI: https://doi.org/10.1109/TCBB.2020.2998954Google ScholarGoogle Scholar
  4. [4] Canakoglu Arif, Bernasconi Anna, Colombo Andrea, Masseroli Marco, and Ceri Stefano. 2019. GenoSurf: Metadata driven semantic search system for integrated genomic datasets. Database: The Journal of Biological Databases and Curation 2019 (2019). DOI: https://doi.org/10.1093/database/baz132Google ScholarGoogle Scholar
  5. [5] Baxevanis Andreas D., Bader Gary D., and Wishart David S.. 2020. Bioinformatics. John Wiley & Sons.Google ScholarGoogle Scholar
  6. [6] Gabe R.. 2010. A hitchhiker’s guide to Next Generation Sequencing - Part 2. Retrieved May 1, 2021, from https://blog.goldenhelix.com/a-hitchhikers-guide-to-next-generation-sequencing-part-2/.Google ScholarGoogle Scholar
  7. [7] Bernasconi Anna, Canakoglu Arif, Masseroli Marco, and Ceri Stefano. 2021. The road towards data integration in human genomics: Players, steps and interactions. Briefings in Bioinformatics 22, 1 (2021), 3044. http://dx.doi.org/10.1093/bib/bbaa080Google ScholarGoogle Scholar
  8. [8] Ceri Stefano, Bernasconi Anna, Canakoglu Arif, Gulino Andrea, Kaitoua Abdulrahman, Masseroli Marco, Nanni Luca, and Pinoli Pietro. 2017. Overview of GeCo: A project for exploring and integrating signals from the genome. In International Conference on Data Analytics and Management in Data Intensive Domains. Springer, 4657.Google ScholarGoogle Scholar
  9. [9] Vincent Antony T. and Charette Steve J.. 2015. Who qualifies to be a bioinformatician?Frontiers in Genetics 6 (2015), 164.Google ScholarGoogle Scholar
  10. [10] Demšar Janez, Curk Tomaž, Erjavec Aleš, Gorup Črt, Hočevar Tomaž, Milutinovič Mitar, Možina Martin, Polajnar Matija, Toplak Marko, Starič Anže, Štajdohar Miha, Umek Lan, Žagar Lan, Žbontar Jure, Žitnik Marinka, and Zupan Blaž. 2013. Orange: Data mining toolbox in python. Journal of Machine Learning Research 14 (2013), 23492353. http://jmlr.org/papers/v14/demsar13a.html.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Goldman Mary J., Craft Brian, Hastie Mim, Repečka Kristupas, McDade Fran, Kamath Akhil, Banerjee Ayan, Luo Yunhai, Rogers Dave, Brooks Angela N., et al. 2020. Visualizing and interpreting cancer genomics data via the Xena platform. Nature Biotechnology 38 (2020), 675–678.Google ScholarGoogle Scholar
  12. [12] Madduri Ravi K., Sulakhe Dinanath, Lacinski Lukasz, Liu Bo, Rodriguez Alex, Chard Kyle, Dave Utpal J., and Foster Ian T.. 2014. Experiences building globus genomics: A next-generation sequencing analysis service using galaxy, globus, and amazon web services. Concurrency and Computation: Practice and Experience 26, 13 (2014), 22662279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Bolchini Davide, Finkelstein Anthony, Perrone Vito, and Nagl Sylvia. 2009. Better bioinformatics through usability analysis. Bioinformatics 25, 3 (2009), 406412.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Laranjo Liliana, Dunn Adam G., Tong Huong Ly, Kocaballi Ahmet Baki, Chen Jessica, Bashir Rabia, Surian Didi, Gallego Blanca, Magrabi Farah, Lau Annie Y.S., et al. 2018. Conversational agents in healthcare: A systematic review. Journal of the American Medical Informatics Association 25, 9 (2018), 12481258.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Turing AM. 1950. Mind. Mind 59, 236 (1950), 433460.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Weizenbaum Joseph. 1966. ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM 9, 1 (1966), 3645.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Wallace Richard S.. 2009. The anatomy of ALICE. In Parsing the Turing Test. Springer, 181210.Google ScholarGoogle Scholar
  18. [18] Colby Kenneth Mark. 1975. Artificial Paranoia: A Computer Simulation of Paranoid Process. Pergamon Press.Google ScholarGoogle Scholar
  19. [19] Wallace Richard. 2003. The elements of AIML style. Alice AI Foundation 139 (2003).Google ScholarGoogle Scholar
  20. [20] Bocklisch Tom, Faulkner Joey, Pawlowski Nick, and Nichol Alan. 2017. Rasa: Open source language understanding and dialogue management. ArXivDOI: https://arxiv.org/abs/1712.05181.Google ScholarGoogle Scholar
  21. [21] Hearst Marti and Tory Melanie. 2019. Would you like a chart with that? Incorporating visualizations into conversational interfaces. In 2019 IEEE Visualization Conference (VIS’19). IEEE, 15.Google ScholarGoogle Scholar
  22. [22] Allen James, Chambers Nathanael, Ferguson George, Galescu Lucian, Jung Hyuckchul, Swift Mary, and Taysom William. 2007. Plow: A collaborative task learning agent. In AAAI, Vol. 7. Association for the Advancement of Artificial Intelligence, 15141519. https://www.semanticscholar.org/paper/PLOW%3A-A-Collaborative-Task-Learning-Agent-Allen-Chambers/431e61648a59abcd05411503ead56de8aa97906b.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Brandtzaeg Petter Bae and Følstad Asbjørn. 2017. Why people use chatbots. In International Conference on Internet Science. Springer, 377392.Google ScholarGoogle Scholar
  24. [24] Kaufmann Esther and Bernstein Abraham. 2010. Evaluating the usability of natural language query languages and interfaces to semantic web knowledge bases. Journal of Web Semantics 8, 4 (2010), 377393.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Følstad Asbjørn and Brandtzæg Petter Bae. 2017. Chatbots and the new world of HCI. Interactions 24, 4 (2017), 3842.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Oviatt Sharon. 1999. Ten myths of multimodal interaction. Communications of the ACM 42, 11 (1999), 7481.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Li Toby Jia-Jun, Radensky Marissa, Jia Justin, Singarajah Kirielle, Mitchell Tom M., and Myers Brad A.. 2019. PUMICE: A multi-modal agent that learns concepts and conditionals from natural language and demonstrations. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology. 577589.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Gao Tong, Dontcheva Mira, Adar Eytan, Liu Zhicheng, and Karahalios Karrie G.. 2015. Datatone: Managing ambiguity in natural language interfaces for data visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 489500.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Dhamdhere Kedar, McCurley Kevin S., Nahmias Ralfi, Sundararajan Mukund, and Yan Qiqi. 2017. Analyza: Exploring data with conversation. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. 493504.Google ScholarGoogle Scholar
  30. [30] Hoque Enamul, Setlur Vidya, Tory Melanie, and Dykeman Isaac. 2017. Applying pragmatics principles for interaction with visual analytics. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017), 309318.Google ScholarGoogle Scholar
  31. [31] Tory Melanie and Setlur Vidya. 2019. Do what I mean, not what I say! Design considerations for supporting intent and context in analytical conversation. In 2019 IEEE Conference on Visual Analytics Science and Technology (VAST’19). IEEE, 93103.Google ScholarGoogle Scholar
  32. [32] Blum Adam. 1999. Microsoft English query 7.5: Automatic extraction of semantics from relational databases and OLAP cubes. In VLDB, Vol. 99. 247248.Google ScholarGoogle Scholar
  33. [33] Popescu Ana-Maria, Etzioni Oren, and Kautz Henry. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 8th International Conference on Intelligent User Interfaces. 149157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Saha Diptikalyan, Floratou Avrilia, Sankaranarayanan Karthik, Minhas Umar Farooq, Mittal Ashish R., and Özcan Fatma. 2016. ATHENA: An ontology-driven system for natural language querying over relational data stores. Proceedings of the VLDB Endowment 9, 12 (2016), 12091220.Google ScholarGoogle Scholar
  35. [35] Messina Antonio, Augello Agnese, Pilato Giovanni, and Rizzo Riccardo. 2017. BioGraphBot: A conversational assistant for bioinformatics graph databases. In International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing. Springer, 135146.Google ScholarGoogle Scholar
  36. [36] Fiannaca Antonino, Rosa Massimo La, Paglia Laura La, Messina Antonio, and Urso Alfonso. 2016. BioGraphDB: A new GraphDB collecting heterogeneous data for bioinformatics analysis. In Proceedings of BIOTECHNO.Google ScholarGoogle Scholar
  37. [37] Paixão-Côrtes Walter Ritzel, Paixão-Côrtes Vanessa Stangherlin Machado, Ellwanger Cristiane, and Souza Osmar Norberto de. 2019. Development and usability evaluation of a prototype conversational interface for biological information retrieval via bioinformatics. In International Conference on Human-Computer Interaction. Springer, 575593.Google ScholarGoogle Scholar
  38. [38] John Rogers Jeffrey Leo, Potti Navneet, and Patel Jignesh M.. 2017. Ava: From data to insights through conversations.. In CIDR.Google ScholarGoogle Scholar
  39. [39] Fuchs Norbert E. and Schwitter Rolf. 1995. Specifying logic programs in controlled natural language. arXiv preprint cmp-lg/9507009 (1995).Google ScholarGoogle Scholar
  40. [40] Fast Ethan, Chen Binbin, Mendelsohn Julia, Bassen Jonathan, and Bernstein Michael S.. 2018. Iris: A conversational agent for complex tasks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 112.Google ScholarGoogle Scholar
  41. [41] Vanderveken Daniel. 1990. Meaning and Speech Acts: Volume 1, Principles of Language Use. Cambridge University Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Masseroli Marco, Kaitoua Abdulrahman, Pinoli Pietro, and Ceri Stefano. 2016. Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying. Methods 111 (2016), 311.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Bernasconi Anna, Ceri Stefano, Campi Alessandro, and Masseroli Marco. 2017. Conceptual modeling for genomics: Building an integrated repository of open data. In Conceptual Modeling, Mayr Heinrich C., Guizzardi Giancarlo, Ma Hui, and Pastor Oscar (Eds.). Springer International Publishing, Cham, 325339.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Bernasconi Anna, Canakoglu Arif, and Ceri Stefano. 2019. From a conceptual model to a knowledge graph for genomic datasets. In Conceptual Modeling, Laender Alberto H.F., Pernici Barbara, Lim Ee-Peng, and Oliveira José Palazzo M. de (Eds.). Springer International Publishing, Cham, 352360.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Raj Sumit. 2018. Building chatbots with python. In Using Natural Language Processing and Machine Learning. Apress.Google ScholarGoogle Scholar
  46. [46] Desot Thierry, Raimondo Stefania, Mishakova Anastasia, Portet François, and Vacher Michel. 2018. Towards a french smart-home voice command corpus: Design and NLU experiments. In International Conference on Text, Speech, and Dialogue. Springer, 509517.Google ScholarGoogle Scholar
  47. [47] Bhattacharyya Srimoyee, Ray Soumi, and Dey Monalisa. 2020. Context-aware conversational agent for a closed domain task. In Proceedings of the Global AI Congress 2019. Springer, 303318.Google ScholarGoogle Scholar
  48. [48] Nanni Luca, Pinoli Pietro, Canakoglu Arif, and Ceri Stefano. 2019. PyGMQL: Scalable data extraction and analysis for heterogeneous genomic datasets. BMC Bioinformatics 20, 1 (2019), 560.Google ScholarGoogle Scholar
  49. [49] Boren Ted and Ramey Judith. 2000. Thinking aloud: Reconciling theory and practice. IEEE Transactions on Professional Communication 43, 3 (2000), 261278.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Guest Greg, MacQueen Kathleen M., and Namey Emily E.. 2011. Applied Thematic Analysis. Sage Publications.Google ScholarGoogle Scholar

Index Terms

  1. GeCoAgent: A Conversational Agent for Empowering Genomic Data Extraction and Analysis

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Computing for Healthcare
              ACM Transactions on Computing for Healthcare  Volume 3, Issue 1
              January 2022
              255 pages
              ISSN:2691-1957
              EISSN:2637-8051
              DOI:10.1145/3485154
              Issue’s Table of Contents

              Copyright © 2021 Association for Computing Machinery.

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 15 October 2021
              • Accepted: 1 April 2021
              • Revised: 1 March 2021
              • Received: 1 July 2020
              Published in health Volume 3, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Full Text

            View this article in Full Text.

            View Full Text

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!