skip to main content
research-article

GRAINS: Generative Recursive Autoencoders for INdoor Scenes

Published:14 February 2019Publication History
Skip Abstract Section

Abstract

We present a generative neural network that enables us to generate plausible 3D indoor scenes in large quantities and varieties, easily and highly efficiently. Our key observation is that indoor scene structures are inherently hierarchical. Hence, our network is not convolutional; it is a recursive neural network, or RvNN. Using a dataset of annotated scene hierarchies, we train a variational recursive autoencoder, or RvNN-VAE, which performs scene object grouping during its encoding phase and scene generation during decoding. Specifically, a set of encoders are recursively applied to group 3D objects based on support, surround, and co-occurrence relations in a scene, encoding information about objects’ spatial properties, semantics, and relative positioning with respect to other objects in the hierarchy. By training a variational autoencoder (VAE), the resulting fixed-length codes roughly follow a Gaussian distribution. A novel 3D scene can be generated hierarchically by the decoder from a randomly sampled code from the learned distribution. We coin our method GRAINS, for Generative Recursive Autoencoders for INdoor Scenes. We demonstrate the capability of GRAINS to generate plausible and diverse 3D indoor scenes and compare with existing methods for 3D scene synthesis. We show applications of GRAINS including 3D scene modeling from 2D layouts, scene editing, and semantic scene segmentation via PointNet whose performance is boosted by the large quantity and variety of 3D scenes generated by our method.

References

  1. Martin Bokeloh, Michael Wand, and Hans-Peter Seidel. 2010. A connection between partial symmetry and inverse procedural modeling. In Proc. of SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Siddhartha Chaudhuri, Evangelos Kalogerakis, Leonidas Guibas, and Vladlen Koltun. 2011. Probabilistic reasoning for assembly-based 3D modeling. ACM Transactions on Graphics (TOG) 30 (2011), 35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kang Chen, Yu-Kun Lai, Yu-Xin Wu, Ralph Martin, and Shi-Min Hu. 2014. Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information. ACM Trans. Graph. 33, 6 (2014), 208:1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Al’an Aspuru-Guzik, and Ryan P. Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Neural Information Processing Systems (NIPS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Noa Fish, Melinos Averkiou, Oliver van Kaick, Olga Sorkine-Hornung, Daniel Cohen-Or, and Niloy J. Mitra. 2014. Meta-representation of shape families. ACM Transactions on Graphics (TOG) 33, 4 (2014), 34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Matthew Fisher, Yangyan Li, Manolis Savva, Pat Hanrahan, and Matthias Nießner. 2015. Activity-centric scene synthesis for functional 3D scene modeling. ACM Trans. Graph. 34, 6 (2015), 212:1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Matthew Fisher, Daniel Ritchie, Manolis Savva, Thomas Funkhouser, and Pat Hanrahan. 2012. Example-based synthesis of 3D object arrangements. ACM Trans. Graph. 31, 6 (2012), 135:1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Matthew Fisher, Manolis Savva, and Pat Hanrahan. 2011. Characterizing structural relationships in scenes using graph kernels. ACM Transactions on Graphics (TOG) 30, 4 (2011), 34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Qiang Fu, Xiaowu Chen, Xiaotian Wang, Sijia Wen, Bin Zhou, and F. U. Hongbo. 2017. Adaptive synthesis of indoor scenes via activity-associated object relation graphs. ACM Trans. Graph. (Proc. of SIGGRAPH Asia) 36, 6 (2017), 201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, and Abhinav Gupta. 2016. Learning a predictable and generative vector representation for objects. In European Conference on Computer Vision (ECCV).Google ScholarGoogle ScholarCross RefCross Ref
  11. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mikael Henaff, Joan Bruna, and Yann LeCun. 2015. Deep convolutional networks on graph-structured data. CoRR abs/1506.05163 (2015). Retrieved from http://arxiv.org/abs/1506.05163.Google ScholarGoogle Scholar
  13. Haibin Huang, Evangelos Kalogerakis, and Benjamin Marlin. 2015. Analysis and synthesis of 3D shape families via deep-learned generative models of surfaces. Comput. Graph. Forum (SGP) 34, 5 (2015), 25--38.Google ScholarGoogle ScholarCross RefCross Ref
  14. Evangelos Kalogerakis, Siddhartha Chaudhuri, Daphne Koller, and Vladlen Koltun. 2012. A probabilistic model for component-based shape synthesis. ACM Trans. Graph. (Proc. of SIGGRAPH) 31, 4 (2012), 55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. Sadeghipour Kermani, Zicheng Liao, Ping Tan, and H. Zhang. 2016. Learning 3D scene synthesis from annotated RGB-D images. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 197--206.Google ScholarGoogle Scholar
  16. Vladimir G. Kim, Wilmot Li, Niloy J. Mitra, Siddhartha Chaudhuri, Stephen DiVerdi, and Thomas Funkhouser. 2013. Learning part-based templates from large collections of 3D shapes. ACM Trans. Graph. 32, 4 (2013), 70:1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  18. Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013).Google ScholarGoogle Scholar
  19. Jun Li, Kai Xu, Siddhartha Chaudhuri, Ersin Yumer, Hao Zhang, and Leonidas Guibas. 2017. GRASS: Generative recursive autoencoders for shape structures. ACM Trans. Graph. 36, 4 (2017), 52:1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rui Ma, Akshay Gadi Patil, Matthew Fisher, Manyi Li, Sören Pirk, Binh-Son Hua, Sai-Kit Yeung, Xin Tong, Leonidas Guibas, and Hao Zhang. 2018. Language-driven synthesis of 3D scenes from scene databases. ACM Trans. Graph. (Proc. SIGGRAPH ASIA) 37, 6 (2018), 212:1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rui Ma, Honghua Li, Changqing Zou, Zicheng Liao, Xin Tong, and Hao Zhang. 2016. Action-driven 3D indoor scene evolution. ACM Trans. Graph. 35, 6 (2016), 173:1–13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Paul Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala, and Vladlen Koltun. 2011. Interactive furniture layout using interior design guidelines. ACM Trans. Graph. 30, 4 (2011), 87:1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Marvin Minsky and Seymour Papert. 1969. Perceptrons: An Introduction to Computational Geometry. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Pascal Müller, Peter Wonka, Simon Haegler, Andreas Ulmer, and Luc Van Gool. 2006. Procedural modeling of buildings. In ACM Transactions On Graphics (TOG) 25 (2006), 614–623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in Pytorch. In Neural Information Processing Systems-Workshop (NIPS-W).Google ScholarGoogle Scholar
  27. Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 652--660.Google ScholarGoogle Scholar
  28. Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, and Song-Chun Zhu. 2018. Human-centric indoor scene synthesis using stochastic grammar. In Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  29. Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, and Andrew Y. Ng. 2012. Convolutional-recursive deep learning for 3D object classification. In Neural Information Processing Systems (NIPS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Richard Socher, Cliff C. Lin, Andrew Y. Ng, and Christopher D. Manning. 2011. Parsing natural scenes and natural language with recursive neural networks. In International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, and Thomas Funkhouser. 2017. Semantic scene completion from a single depth image. In IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  32. Jerry Talton, Lingfeng Yang, Ranjitha Kumar, Maxine Lim, Noah Goodman, and Radomír Měch. 2012. Learning design patterns with Bayesian grammar induction. In ACM Symposium on User Interface Software and Technology (UIST). 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, and Koray Kavukcuoglu. 2016a. WaveNet: A generative model for raw audio. CoRR abs/1609.03499 (2016). Retrieved from http://arxiv.org/abs/1609.03499.Google ScholarGoogle Scholar
  34. Aäron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016b. Pixel recurrent neural networks. CoRR abs/1601.06759 (2016). Retrieved from http://arxiv.org/abs/1601.06759.Google ScholarGoogle Scholar
  35. Kai Wang, Manolis Savva, Angel X. Chang, and Daniel Ritchie. 2018. Deep convolutional priors for indoor scene synthesis. ACM Transactions on Graphics (TOG) 37, 4 (2018), 70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yanzhen Wang, Kai Xu, Jun Li, Hao Zhang, Ariel Shamir, Ligang Liu, Zhiquan Cheng, and Yueshan Xiong. 2011. Symmetry hierarchy of man-made objects. Comput. Graph. Forum (Eurographics) 30, 2 (2011), 287--296.Google ScholarGoogle ScholarCross RefCross Ref
  37. Paul J. Werbos. 1974. Beyond Regression: New Tools for Predicting and Analysis in the Behavioral Sciences. Ph.D. Dissertation. Harvard University.Google ScholarGoogle Scholar
  38. Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, and Joshua B. Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In Neural Information Processing Systems (NIPS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  40. Kun Xu, Kang Chen, Hongbo Fu, Wei-Lun Sun, and Shi-Min Hu. 2013. Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Trans. Graph. (TOG) 32, 4 (2013), 123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kai Xu, Hao Zhang, Daniel Cohen-Or, and Baoquan Chen. 2012. Fit and diverse: Set evolution for inspiring 3D shape galleries. ACM Trans. Graph. 31, 4 (2012), 57:1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Lap-Fai Yu, Sai Kit Yeung, Chi-Keung Tang, Demetri Terzopoulos, Tony F. Chan, and Stanley Osher. 2011. Make it home: Automatic optimization of furniture arrangement. ACM Trans. Graph. 30, 4 (2011), 86:1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GRAINS: Generative Recursive Autoencoders for INdoor Scenes

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 38, Issue 2
        April 2019
        112 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3313807
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 February 2019
        • Accepted: 1 December 2018
        • Revised: 1 October 2018
        • Received: 1 July 2018
        Published in tog Volume 38, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format