skip to main content
research-article
Open Access

Type-directed synthesis of visualizations from natural language queries

Published:31 October 2022Publication History
Skip Abstract Section

Abstract

We propose a new technique based on program synthesis for automatically generating visualizations from natural language queries. Our method parses the natural language query into a refinement type specification using the intents-and-slots paradigm and leverages type-directed synthesis to generate a set of visualization programs that are most likely to meet the user's intent. Our refinement type system captures useful hints present in the natural language query and allows the synthesis algorithm to reject visualizations that violate well-established design guidelines for the input data set. We have implemented our ideas in a tool called Graphy and evaluated it on NLVCorpus, which consists of 3 popular datasets and over 700 real-world natural language queries. Our experiments show that Graphy significantly outperforms state-of-the-art natural language based visualization tools, including transformer and rule-based ones.

References

  1. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations. arxiv:1409.0473 Google ScholarGoogle Scholar
  2. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf Google ScholarGoogle Scholar
  3. Qiaochu Chen, Shankara Pailoor, Celeste Barnaby, Abby Criswell, Chenglong Wang, Greg Durrett, and Isil Dillig. 2022. Type-Directed Synthesis of Visualizations from Natural Language Queries. https://doi.org/10.48550/ARXIV.2209.01081 Google ScholarGoogle Scholar
  4. Yanju Chen, Chenglong Wang, Osbert Bastani, Isil Dillig, and Yu Feng. 2020. Program Synthesis Using Deduction-Guided Reinforcement Learning. In Computer Aided Verification: 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21–24, 2020, Proceedings, Part II. Springer-Verlag, Berlin, Heidelberg. 587–610. isbn:978-3-030-53290-1 https://doi.org/10.1007/978-3-030-53291-8_30 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. William Craig. 1957. Linear reasoning. A new form of the Herbrand-Gentzen theorem. Journal of Symbolic Logic, 22, 3 (1957), 250–268. https://doi.org/10.2307/2963593 Google ScholarGoogle ScholarCross RefCross Ref
  6. Deborah A. Dahl, Madeleine Bates, Michael Brown, William Fisher, Kate Hunicke-Smith, David Pallett, Christine Pao, Alexander Rudnicky, and Elizabeth Shriberg. 1994. Expanding the Scope of the ATIS Task: The ATIS-3 Corpus. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994. https://aclanthology.org/H94-1010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota. 4171–4186. https://doi.org/10.18653/v1/N19-1423 Google ScholarGoogle ScholarCross RefCross Ref
  8. Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program Synthesis Using Conflict-Driven Learning. 53, 4 (2018), jun, 420–435. issn:0362-1340 https://doi.org/10.1145/3296979.3192382 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-Output Examples. SIGPLAN Not., 50, 6 (2015), jun, 229–239. issn:0362-1340 https://doi.org/10.1145/2813885.2737977 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. 2016. Example-Directed Synthesis: A Type-Theoretic Interpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). Association for Computing Machinery, New York, NY, USA. 802–815. isbn:9781450335492 https://doi.org/10.1145/2837614.2837629 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G. Karahalios. 2015. DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (UIST ’15). Association for Computing Machinery, New York, NY, USA. 489–500. isbn:9781450337793 https://doi.org/10.1145/2807442.2807478 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sumit Gulwani and Mark Marron. 2014. NLyze: Interactive Programming by Natural Language for Spreadsheet Data Analysis and Manipulation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ’14). Association for Computing Machinery, New York, NY, USA. 803–814. isbn:9781450323765 https://doi.org/10.1145/2588555.2612177 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tihomir Gvero and Viktor Kuncak. 2015. Synthesizing Java Expressions from Free-Form Queries. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA. 416–432. isbn:9781450336895 https://doi.org/10.1145/2814270.2814295 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Charles T. Hemphill, John J. Godfrey, and George R. Doddington. 1990. The ATIS Spoken Language Systems Pilot Corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990. https://aclanthology.org/H90-1021 Google ScholarGoogle Scholar
  15. Minwoo Jeong and Gary Geunbae Lee. 2006. Exploiting Non-Local Features for Spoken Language Understanding. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions. Association for Computational Linguistics, Sydney, Australia. 412–419. https://aclanthology.org/P06-2054 Google ScholarGoogle ScholarCross RefCross Ref
  16. Tristan Knoth, Di Wang, Nadia Polikarpova, and Jan Hoffmann. 2019. Resource-Guided Program Synthesis. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 253–268. isbn:9781450367127 https://doi.org/10.1145/3314221.3314602 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kenneth Knowles and Cormac Flanagan. 2009. Compositional Reasoning and Decidable Checking for Dependent Contract Types. In Proceedings of the 3rd Workshop on Programming Languages Meets Program Verification (PLPV ’09). Association for Computing Machinery, New York, NY, USA. 27–38. isbn:9781605583303 https://doi.org/10.1145/1481848.1481853 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online. 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703 Google ScholarGoogle ScholarCross RefCross Ref
  19. Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2020. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online. 4870–4888. https://doi.org/10.18653/v1/2020.findings-emnlp.438 Google ScholarGoogle ScholarCross RefCross Ref
  20. Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, and Michael D. Ernst. 2018. NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https://aclanthology.org/L18-1491 Google ScholarGoogle Scholar
  21. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations. https://openreview.net/forum?id=Bkg6RiCqY7 Google ScholarGoogle Scholar
  22. Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin. 2021. Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks. Association for Computing Machinery, New York, NY, USA. 1235–1247. isbn:9781450383431 https://doi.org/10.1145/3448016.3457261 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Luo, N. Tang, G. Li, J. Tang, C. Chai, and X. Qin. 2022. Natural Language to Visualization by Neural Machine Translation. IEEE Transactions on Visualization and Computer Graphics, 28, 01 (2022), jan, 217–226. issn:1941-0506 https://doi.org/10.1109/TVCG.2021.3114848 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jock Mackinlay, Pat Hanrahan, and Chris Stolte. 2007. Show Me: Automatic Presentation for Visual Analysis. IEEE Transactions on Visualization and Computer Graphics, 13, 6 (2007), 1137–1144. https://doi.org/10.1109/TVCG.2007.70594 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Martin-Lof, Z. A. Lozinski, Michael Francis Atiyah, Cecil Arthur Hoare, and J. C. Shepherdson. 1984. Constructive mathematics and computer programming. Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, 312, 1522 (1984), 501–518. https://doi.org/10.1098/rsta.1984.0073 arxiv:https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.1984.0073. Google ScholarGoogle ScholarCross RefCross Ref
  26. Dominik Moritz, Chenglong Wang, Greg L. Nelson, Halden Lin, Adam M. Smith, Bill Howe, and Jeffrey Heer. 2019. Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco. IEEE Transactions on Visualization and Computer Graphics, 25, 1 (2019), 438–448. https://doi.org/10.1109/TVCG.2018.2865240 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Arpit Narechania, Arjun Srinivasan, and John Stasko. 2021. NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries. IEEE Transactions on Visualization and Computer Graphics, 27, 2 (2021), Feb, 369–379. issn:2160-9306 https://doi.org/10.1109/tvcg.2020.3030378 Google ScholarGoogle ScholarCross RefCross Ref
  28. Peter-Michael Osera. 2019. Constraint-Based Type-Directed Program Synthesis. In Proceedings of the 4th ACM SIGPLAN International Workshop on Type-Driven Development (TyDe 2019). Association for Computing Machinery, New York, NY, USA. 64–76. isbn:9781450368155 https://doi.org/10.1145/3331554.3342608 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-Example-Directed Program Synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 619–630. isbn:9781450334686 https://doi.org/10.1145/2737924.2738007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Benjamin C. Pierce. 2002. Types and Programming Languages (1st ed.). The MIT Press. isbn:0262162091 Google ScholarGoogle Scholar
  31. Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable Code Generation from Pre-trained Language Models. In International Conference on Learning Representations. https://openreview.net/forum?id=KmtVD97J43e Google ScholarGoogle Scholar
  32. Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. SIGPLAN Not., 51, 6 (2016), jun, 522–538. issn:0362-1340 https://doi.org/10.1145/2980983.2908093 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xuedi Qin, Yuyu Luo, Nan Tang, and Guoliang Li. 2018. DeepEye: An automatic big data visualization framework. Big Data Mining and Analytics, 1, 1 (2018), 75–82. https://doi.org/10.26599/BDMA.2018.9020007 Google ScholarGoogle ScholarCross RefCross Ref
  34. Patrick M. Rondon, Ming Kawaguchi, and Ranjit Jhala. 2008. Liquid Types. SIGPLAN Not., 43, 6 (2008), jun, 159–169. issn:0362-1340 https://doi.org/10.1145/1379022.1375602 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada. 1073–1083. https://doi.org/10.18653/v1/P17-1099 Google ScholarGoogle ScholarCross RefCross Ref
  36. Arjun Srinivasan, Nikhila Nyapathy, Bongshin Lee, Steven M. Drucker, and John Stasko. 2021. Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA. Article 464, 10 pages. isbn:9781450380966 https://doi.org/10.1145/3411764.3445400 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yiwen Sun, Jason Leigh, Andrew E. Johnson, and Sangyoon Lee. 2010. Articulate: A Semi-automated Model for Translating Natural Language Queries into Meaningful Visualizations. In Smart Graphics, 10th International Symposium on Smart Graphics, Banff, Canada, June 24-26, 2010, Proceedings, Robyn Taylor, Pierre Boulanger, Antonio Krüger, and Patrick Olivier (Eds.) (Lecture Notes in Computer Science, Vol. 6133). Springer, 184–195. https://doi.org/10.1007/978-3-642-13544-6_18 Google ScholarGoogle ScholarCross RefCross Ref
  38. Gokhan Tur, Dilek Hakkani-Tür, and Larry Heck. 2010. What is left to be understood in ATIS? In 2010 IEEE Spoken Language Technology Workshop. 19–24. https://doi.org/10.1109/SLT.2010.5700816 Google ScholarGoogle ScholarCross RefCross Ref
  39. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc.. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf Google ScholarGoogle Scholar
  40. Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020. RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online. 7567–7578. https://doi.org/10.18653/v1/2020.acl-main.677 Google ScholarGoogle ScholarCross RefCross Ref
  41. Chenglong Wang, Yu Feng, Rastislav Bodik, Alvin Cheung, and Isil Dillig. 2019. Visualization by Example. Proc. ACM Program. Lang., 4, POPL (2019), Article 49, dec, 28 pages. https://doi.org/10.1145/3371117 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online. 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6 Google ScholarGoogle ScholarCross RefCross Ref
  43. Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2015. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE transactions on visualization and computer graphics, 22, 1 (2015), 649–658. Google ScholarGoogle Scholar
  44. Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 63, oct, 26 pages. https://doi.org/10.1145/3133887 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xi Ye, Qiaochu Chen, Isil Dillig, and Greg Durrett. 2021. Optimal Neural Program Synthesis from Multimodal Specifications. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic. 1691–1704. https://doi.org/10.18653/v1/2021.findings-emnlp.146 Google ScholarGoogle ScholarCross RefCross Ref
  46. Bowen Yu and Claudio T. Silva. 2020. FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System. IEEE Transactions on Visualization and Computer Graphics, 26, 1 (2020), Jan, 1–11. issn:2160-9306 https://doi.org/10.1109/tvcg.2019.2934668 Google ScholarGoogle ScholarCross RefCross Ref
  47. Bowen Yu and Cláudio T. Silva. 2020. FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System. IEEE Trans. Vis. Comput. Graph., 26, 1 (2020), 1–11. https://doi.org/10.1109/TVCG.2019.2934668 Google ScholarGoogle ScholarCross RefCross Ref
  48. John M. Zelle and Raymond J. Mooney. 1996. Learning to Parse Database Queries using Inductive Logic Programming. In AAAI/IAAI. AAAI Press/MIT Press, Portland, OR. 1050–1055. http://www.cs.utexas.edu/users/ai-lab?zelle:aaai96 Google ScholarGoogle Scholar
  49. Luke S. Zettlemoyer and Michael Collins. 2005. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI’05). AUAI Press, Arlington, Virginia, USA. 658–666. isbn:0974903914 Google ScholarGoogle Scholar

Index Terms

  1. Type-directed synthesis of visualizations from natural language queries

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)151
        • Downloads (Last 6 weeks)10

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!