skip to main content
research-article
Public Access

PlanIT: planning and instantiating indoor scenes with relation graph and spatial prior networks

Published:12 July 2019Publication History
Skip Abstract Section

Abstract

We present a new framework for interior scene synthesis that combines a high-level relation graph representation with spatial prior neural networks. We observe that prior work on scene synthesis is divided into two camps: object-oriented approaches (which reason about the set of objects in a scene and their configurations) and space-oriented approaches (which reason about what objects occupy what regions of space). Our insight is that the object-oriented paradigm excels at high-level planning of how a room should be laid out, while the space-oriented paradigm performs well at instantiating a layout by placing objects in precise spatial configurations. With this in mind, we present PlanIT, a layout-generation framework that divides the problem into two distinct planning and instantiation phases. PlanIT represents the "plan" for a scene via a relation graph, encoding objects as nodes and spatial/semantic relationships between objects as edges. In the planning phase, it uses a deep graph convolutional generative model to synthesize relation graphs. In the instantiation phase, it uses image-based convolutional network modules to guide a search procedure that places objects into the scene in a manner consistent with the graph. By decomposing the problem in this way, PlanIT generates scenes of comparable quality to those generated by prior approaches (as judged by both people and learned classifiers), while also providing the modeling flexibility of the intermediate relationship graph representation. These graphs allow the system to support applications such as scene synthesis from a partial graph provided by a user.

Skip Supplemental Material Section

Supplemental Material

papers_131.mp4

References

  1. Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. Spice: Semantic propositional image caption evaluation. In European Conference on Computer Vision (ECCV). Springer, 382--398.Google ScholarGoogle ScholarCross RefCross Ref
  2. Martin Bokeloh, Michael Wand, and Hanspeter Seidel. 2010. A connection between partial symmetry and inverse procedural modeling. In SIGGRAPH 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Angel Chang, Will Monroe, Manolis Savva, Christopher Potts, and Christopher D. Manning. 2015. Text to 3D Scene Generation with Rich Lexical Grounding. In ACL 2015.Google ScholarGoogle Scholar
  4. Angel X Chang, Manolis Savva, and Christopher D Manning. 2014. Learning Spatial Knowledge for Text to 3D Scene Generation. In Empirical Methods in Natural Language Processing (EMNLP).Google ScholarGoogle Scholar
  5. Chaos Group. 2018. Putting the CGI in IKEA: How V-Ray Helps Visualize Perfect Homes. https://www.chaosgroup.com/blog/putting-the-cgi-in-ikea-how-v-ray-helps-visualize-perfect-homes. Accessed: 2018-10-13.Google ScholarGoogle Scholar
  6. Kang Chen, Yukun Lai, Yu-Xin Wu, Ralph Robert Martin, and Shi-Min Hu. 2014. Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information. ACM Transactions on Graphics 33, 6 (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Angela Dai, Daniel Ritchie, Martin Bokeloh, Scott Reed, Jürgen Sturm, and Matthias Nießner. 2018. ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  8. Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, and Dhruv Batra. 2018. Embodied Question Answering. In CVPR.Google ScholarGoogle Scholar
  9. P. Erdos and A Renyi. 1960. On the Evolution of Random Graphs. In PUBLICATION OF THE MATHEMATICAL INSTITUTE OF THE HUNGARIAN ACADEMY OF SCIENCES. 17--61.Google ScholarGoogle Scholar
  10. Matthew Fisher, Daniel Ritchie, Manolis Savva, Thomas Funkhouser, and Pat Hanrahan. 2012. Example-based Synthesis of 3D Object Arrangements. In SIGGRAPH Asia 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Matthew Fisher, Manolis Savva, Yangyan Li, Pat Hanrahan, and Matthias Nießner. 2015. Activity-centric Scene Synthesis for Functional 3D Scene Modeling. (2015).Google ScholarGoogle Scholar
  12. Qiang Fu, Xiaowu Chen, Xiaotian Wang, Sijia Wen, Bin Zhou, and Hongbo Fu. 2017. Adaptive Synthesis of Indoor Scenes via Activity-associated Object Relation Graphs. In SIGGRAPH Asia 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural Message Passing for Quantum Chemistry. CoRR arXiv:1704.01212 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Daniel Gordon, Aniruddha Kembhavi, Mohammad Rastegari, Joseph Redmon, Dieter Fox, and Ali Farhadi. 2018. IQA: Visual Question Answering in Interactive Environments. In CVPR.Google ScholarGoogle Scholar
  15. R. Hu, M. Savva, and O. van Kaick. 2018. Functionality Representations and Applications for Shape Analysis. Computer Graphics Forum 37, 2 (2018), 603--624.Google ScholarGoogle ScholarCross RefCross Ref
  16. Wenzel Jakob. 2010. Mitsuba renderer. http://www.mitsuba-renderer.org.Google ScholarGoogle Scholar
  17. Wengong Jin, Regina Barzilay, and Tommi S. Jaakkola. 2018. Junction Tree Variational Autoencoder for Molecular Graph Generation. In ICML 2018.Google ScholarGoogle Scholar
  18. Justin Johnson, Agrim Gupta, and Li Fei-Fei. 2018. Image generation from scene graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  19. Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, and Li Fei-Fei. 2015. Image retrieval using scene graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  20. Z. Sadeghipour Kermani, Z. Liao, P. Tan, and H. Zhang. 2016. Learning 3D Scene Synthesis from Annotated RGB-D Images. In Eurographics Symposium on Geometry Processing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. CoRR abs/ 1609.02907 (2016).Google ScholarGoogle Scholar
  22. Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision 123, 1 (2017), 32--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Danny Lange. 2018. Unity and DeepMind partner to advance AI research. https://blogs.unity3d.com/2018/09/26/unity-and-deepmind-partner-to-advance-ai-research. Accessed: 2018-10-13.Google ScholarGoogle Scholar
  24. Manyi Li, Akshay Gadi Patil, Kai Xu, Siddhartha Chaudhuri, Owais Khan, Ariel Shamir, Changhe Tu, Baoquan Chen, Daniel Cohen-Or, and Hao Zhang. 2018a. GRAINS: Generative Recursive Autoencoders for INdoor Scenes. CoRR arXiv:1807.09193 (2018).Google ScholarGoogle Scholar
  25. Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, and Xiaogang Wang. 2017. Scene graph generation from objects, phrases and region captions. In ICCV.Google ScholarGoogle Scholar
  26. Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. 2018b. Learning deep generative models of graphs. CoRR abs/1803.03324 (2018).Google ScholarGoogle Scholar
  27. Tianqiang Liu, Siddhartha Chaudhuri, Vladimir G. Kim, Qi-Xing Huang, Niloy J. Mitra, and Thomas Funkhouser. 2014. Creating Consistent Scene Graphs Using a Probabilistic Grammar. In SIGGRAPH Asia 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li Fei-Fei. 2016. Visual relationship detection with language priors. In European Conference on Computer Vision (ECCV). Springer, 852--869.Google ScholarGoogle ScholarCross RefCross Ref
  29. Rui Ma, Akshay Gadi Patil, Matthew Fisher, Manyi Li, Soren Pirk, Binh-Son Hua, Sai-Kit Yeung, Xin Tong, Leonidas Guibas, and Hao Zhang. 2018a. Language-driven synthesis of 3D scenes from scene databases. In SIGGRAPH Asia 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rui Ma, Akshay Gadi Patil, Matt Fisher, Manyi Li, Soren Pirk, Binh-Son Hua, Sai-Kit Yeung, Xin Tong, Leonidas J. Guibas, and Hao Zhang. 2018b. Language-Driven Synthesis of 3D Scenes Using Scene Databases. ACM Transactions on Graphics 37, 6 (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Paul Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala, and Vladlen Koltun. 2011. Interactive Furniture Layout Using Interior Design Guidelines. In SIGGRAPH 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Vittorio Ferrari Paul Henderson, Kartic Subr. 2018. Automatic Generation of Constrained Furniture Layouts. CoRR arXiv:1711.10939 (2018).Google ScholarGoogle Scholar
  33. Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron C. Courville. 2018. FiLM: Visual Reasoning with a General Conditioning Layer. In AAAI 2018.Google ScholarGoogle Scholar
  34. Planner5d. 2017. Home Design Software and Interior Design Tool ONLINE for home and floor plans in 2D and 3D. https://planner5d.com. Accessed: 2017-10-20.Google ScholarGoogle Scholar
  35. Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, and Song-Chun Zhu. 2018. Human-centric Indoor Scene Synthesis Using Stochastic Grammar. In Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  36. Daniel Ritchie, Anna Thomas, Pat Hanrahan, and Noah D. Goodman. 2016. Neurally-Guided Procedural Models: Amortized Inference for Procedural Graphics Programs using Neural Networks. In NIPS 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Daniel Ritchie, Kai Wang, and Yu an Lin. 2019. Fast and Flexible Indoor Scene Synthesis via Deep Convolutional Generative Models. In CVPR 2019.Google ScholarGoogle ScholarCross RefCross Ref
  38. RoomSketcher. 2017. Visualizing Homes. http://www.roomsketcher.com. Accessed: 2017-11-06.Google ScholarGoogle Scholar
  39. Grzegorz Rozenberg (Ed.). 1997. Handbook of Graph Grammars and Computing by Graph Transformation: Volume I. Foundations. World Scientific Publishing Co., Inc., River Edge, NJ, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle River, NJ, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Manolis Savva, Angel X. Chang, Alexey Dosovitskiy, Thomas Funkhouser, and Vladlen Koltun. 2017. MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments. arXiv:1712.03931 (2017).Google ScholarGoogle Scholar
  42. Shuran Song, Fisher Yu, Andy Zeng, Angel X Chang, Manolis Savva, and Thomas Funkhouser. 2017. Semantic Scene Completion from a Single Depth Image. CVPR 2017.Google ScholarGoogle ScholarCross RefCross Ref
  43. Ashwin J. Vijayakumar, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, and Sumit Gulwani. 2018. Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples. In ICLR 2018.Google ScholarGoogle Scholar
  44. Kai Wang, Manolis Savva, Angel X. Chang, and Daniel Ritchie. 2018. Deep Convolutional Priors for Indoor Scene Synthesis. In SIGGRAPH 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017. Scene graph generation by iterative message passing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2.Google ScholarGoogle ScholarCross RefCross Ref
  46. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. CoRR arXiv:1502.03044 (2015).Google ScholarGoogle Scholar
  47. Kun Xu, Kang Chen, Hongbo Fu, Wei-Lun Sun, and Shi-Min Hu. 2013. Sketch2Scene: Sketch-based Co-retrieval and Co-placement of 3D Models. In SIGGRAPH 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ken Xu, James Stewart, and Eugene Fiume. 2002. Constraint-based automatic placement for scene composition. In Graphics Interface, Vol. 2. 25--34.Google ScholarGoogle Scholar
  49. Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In AAAI.Google ScholarGoogle Scholar
  50. Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. 2018. Graph r-cnn for scene graph generation. In Proceedings of the European Conference on Computer Vision (ECCV). 670--685.Google ScholarGoogle ScholarCross RefCross Ref
  51. Jiaxuan You, Bowen Liu, Rex Ying, Vijay S. Pande, and Jure Leskovec. 2018a. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. In NeurIPS 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018b. GraphRNN: A Deep Generative Model for Graphs. In ICML 2018.Google ScholarGoogle Scholar
  53. Lap-Fai Yu, Sai-Kit Yeung, Chi-Keung Tang, Demetri Terzopoulos, Tony F. Chan, and Stanley J. Osher. 2011. Make It Home: Automatic Optimization of Furniture Arrangement. In SIGGRAPH 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. 2018. Neural Motifs: Scene Graph Parsing with Global Context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5831--5840.Google ScholarGoogle ScholarCross RefCross Ref
  55. Zaiwei Zhang, Zhenpei Yang, Chongyang Ma, Linjie Luo, Alexander Huth, Etienne Vouga, and Qixing Huang. 2018. Deep Generative Modeling for Scene Synthesis via Hybrid Representations. CoRR arXiv:1808.02084 (2018).Google ScholarGoogle Scholar

Index Terms

  1. PlanIT: planning and instantiating indoor scenes with relation graph and spatial prior networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Graphics
          ACM Transactions on Graphics  Volume 38, Issue 4
          August 2019
          1480 pages
          ISSN:0730-0301
          EISSN:1557-7368
          DOI:10.1145/3306346
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 July 2019
          Published in tog Volume 38, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader