skip to main content
10.1145/3450618.3469163acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
poster

Text-Based Motion Synthesis with a Hierarchical Two-Stream RNN

Published:06 August 2021Publication History

ABSTRACT

We present a learning-based method for generating animated 3D pose sequences depicting multiple sequential or superimposed actions provided in long, compositional sentences. We propose a hierarchical two-stream sequential model to explore a finer joint-level mapping between natural language sentences and the corresponding 3D pose sequences of the motions. We learn two manifold representations of the motion –- one each for the upper body and the lower body movements. We evaluate our proposed model on the publicly available KIT Motion-Language Dataset containing 3D pose data with human-annotated sentences. Experimental results show that our model advances the state-of-the-art on text-based motion synthesis in objective evaluations by a margin of 50%.

Skip Supplemental Material Section

Supplemental Material

3450618.3469163.mp4

References

  1. Chaitanya Ahuja and Louis-Philippe Morency. 2019. Language2Pose: Natural Language Grounded Pose Forecasting. In 2019 International Conference on 3D Vision (3DV). 719–728. https://doi.org/10.1109/3DV.2019.00084Google ScholarGoogle ScholarCross RefCross Ref
  2. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).Google ScholarGoogle Scholar
  3. Eva Hanser, Paul Mc Kevitt, Tom Lunney, and Joan Condell. 2009. Scenemaker: Intelligent multimodal visualisation of natural language scripts. In Irish Conference on Artificial Intelligence and Cognitive Science. Springer, 144–153.Google ScholarGoogle Scholar
  4. Angela S Lin, Lemeng Wu, Rodolfo Corona, Kevin Tai, Qixing Huang, and Raymond J Mooney. 2018. Generating animated videos of human activities from natural language descriptions. Visually Grounded Interaction and Language Workshop, NeurIPS (2018), 2.Google ScholarGoogle Scholar
  5. Matthias Plappert, Christian Mandery, and Tamim Asfour. 2016. The KIT Motion-Language Dataset. Big Data 4, 4 (dec 2016), 236–252. https://doi.org/10.1089/big.2016.0028Google ScholarGoogle Scholar

Index Terms

  1. Text-Based Motion Synthesis with a Hierarchical Two-Stream RNN
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGGRAPH '21: ACM SIGGRAPH 2021 Posters
      August 2021
      90 pages
      ISBN:9781450383714
      DOI:10.1145/3450618

      Copyright © 2021 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 August 2021

      Check for updates

      Qualifiers

      • poster
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate1,822of8,601submissions,21%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format