skip to main content
research-article
Open Access

Disentangling random and cyclic effects in time-lapse sequences

Published:22 July 2022Publication History
Skip Abstract Section

Abstract

Time-lapse image sequences offer visually compelling insights into dynamic processes that are too slow to observe in real time. However, playing a long time-lapse sequence back as a video often results in distracting flicker due to random effects, such as weather, as well as cyclic effects, such as the day-night cycle. We introduce the problem of disentangling time-lapse sequences in a way that allows separate, after-the-fact control of overall trends, cyclic effects, and random effects in the images, and describe a technique based on data-driven generative models that achieves this goal. This enables us to "re-render" the sequences in ways that would not be possible with the input images alone. For example, we can stabilize a long sequence to focus on plant growth over many months, under selectable, consistent weather.

Our approach is based on Generative Adversarial Networks (GAN) that are conditioned with the time coordinate of the time-lapse sequence. Our architecture and training procedure are designed so that the networks learn to model random variations, such as weather, using the GAN's latent space, and to disentangle overall trends and cyclic variations by feeding the conditioning time label to the model using Fourier features with specific frequencies.

We show that our models are robust to defects in the training data, enabling us to amend some of the practical difficulties in capturing long time-lapse sequences, such as temporary occlusions, uneven frame spacing, and missing frames.

Skip Supplemental Material Section

Supplemental Material

3528223.3530170.mp4

presentation

References

  1. Anokhin, I., Solovev, P., Korzhenkov, D., Kharlamov, A., Khakhulin, T., Silvestrov, A., Nikolenko, S., Lempitsky, V., and Sterkin, G. (2020). High-resolution daytime translation without domain labels. In Proc. CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  2. Brock, A., Donahue, J., and Simonyan, K. (2019). Large scale gan training for high fidelity natural image synthesis. In Proc. ICLR.Google ScholarGoogle Scholar
  3. Choi, Y., Uh, Y., Yoo, J., and Ha, J.-W. (2020). Stargan v2: Diverse image synthesis for multiple domains. In Proc. CVPR.Google ScholarGoogle Scholar
  4. Chong, M. J., Chu, W.-S., Kumar, A., and Forsyth, D. (2021). Retrieve in style: Unsupervised facial feature transfer and retrieval. In Proc. ICCV.Google ScholarGoogle ScholarCross RefCross Ref
  5. Clark, A., Donahue, J., and Simonyan, K. (2019). Efficient video generation on complex datasets. CoRR, abs/1907.06571.Google ScholarGoogle Scholar
  6. Collins, E., Bala, R., Price, B., and Süsstrunk, S. (2020). Editing in style: Uncovering the local semantics of GANs. In Proc. CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  7. Colton, S. and Ferrer, B. P. (2021). Ganlapse generative photography. In Proc. International Conference on Computational Creativity.Google ScholarGoogle Scholar
  8. Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2017). Density estimation using Real NVP. In Proc. ICLR.Google ScholarGoogle Scholar
  9. Endo, Y., Kanamori, Y., and Kuriyama, S. (2019). Animating landscape: self-supervised learning of decoupled motion and appearance for single-image video synthesis. In Proc. SIGGRAPH ASIA 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Networks. In Proc. NIPS.Google ScholarGoogle Scholar
  11. Härkönen, E., Hertzmann, A., Lehtinen, J., and Paris, S. (2020). GANSpace: Discovering interpretable GAN controls. In Proc. NeurIPS.Google ScholarGoogle Scholar
  12. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, page 6629--6640, Red Hook, NY, USA. Curran Associates Inc.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ho, J., Jain, A., and Abbeel, P. (2020). Denoising diffusion probabilistic models. In Proc. NeurIPS.Google ScholarGoogle Scholar
  14. Horita, D. and Yanai, K. (2020). Ssa-gan: End-to-end time-lapse video generation with spatial self-attention. In Proc. ACPR.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Huang, X., Mallya, A., Wang, T.-C., and Liu, M.-Y. (2021). Multimodal conditional image synthesis with product-of-experts GANs. CoRR, abs/2112.05130.Google ScholarGoogle Scholar
  16. Jacobs, N., Burgin, W., Fridrich, N., Abrams, A., Miskell, K., Braswell, B. H., Richardson, A. D., and Pless, R. (2009). The global network of outdoor webcams: Properties and applications. In ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL).Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jacobs, N., Roman, N., and Pless, R. (2007). Consistent temporal variations in many outdoor scenes. In Proc. CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  18. Kafri, O., Patashnik, O., Alaluf, Y., and Cohen-Or, D. (2021). Stylefusion: A generative model for disentangling spatial segments. CoRR, abs/2107.07437.Google ScholarGoogle Scholar
  19. Karacan, L., Akata, Z., Erdem, A., and Erdem, E. (2019). Manipulating attributes of natural scenes via hallucination. In Proc. TOG.Google ScholarGoogle Scholar
  20. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., and Aila, T. (2020a). Training generative adversarial networks with limited data. In Proc. NeurIPS.Google ScholarGoogle Scholar
  21. Karras, T., Aittala, M., Laine, S., Härkönen, E., Hellsten, J., Lehtinen, J., and Aila, T. (2021). Alias-free generative adversarial networks. In Proc. NeurIPS.Google ScholarGoogle Scholar
  22. Karras, T., Laine, S., and Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proc. CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  23. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., and Aila, T. (2020b). Analyzing and improving the image quality of StyleGAN. In Proc. CVPR.Google ScholarGoogle Scholar
  24. Kim, J., Kim, M., Kang, H., and Lee, K. (2020). U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In Proc. ICLR.Google ScholarGoogle Scholar
  25. Kingma, D. P. and Dhariwal, P. (2018). Glow: Generative flow with invertible 1x1 convolutions. In Proc. NeurIPS.Google ScholarGoogle Scholar
  26. Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. In Proc. ICLR.Google ScholarGoogle Scholar
  27. Logacheva, E., Suvorov, R., Khomenko, O., Mashikhin, A., and Lempitsky, V. (2020). Deeplandscape: Adversarial modeling of landscape videos. In Proc. ECCV.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Martin-Brualla, R., Gallup, D., and Seitz, S. M. (2015). Time-lapse mining from internet photos. In Proc. TOG.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mirza, M. and Osindero, S. (2014). Conditional generative adversarial nets. CoRR, abs/1411.1784.Google ScholarGoogle Scholar
  30. Miyato, T. and Koyama, M. (2018). cgans with projection discriminator. In Proc. ICLR.Google ScholarGoogle Scholar
  31. Nam, S., Ma, C., Chai, M., Brendel, W., Xu, N., and Kim, S. J. (2019). End-to-end time-lapse video synthesis from a single outdoor image. In Proc. CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  32. Park, T., Liu, M.-Y., Wang, T., and Zhu, J.-Y. (2019). Semantic image synthesis with spatially-adaptive normalization. In Proc. CVPR.Google ScholarGoogle Scholar
  33. Park, T., Zhu, J.-Y., Wang, O., Lu, J., Shechtman, E., Efros, A. A., and Zhang, R. (2020). Swapping autoencoder for deep image manipulation. In Proc. NeurIPS.Google ScholarGoogle Scholar
  34. Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., and Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In Proc. ICML.Google ScholarGoogle Scholar
  35. Song, Y. and Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. In Proc. NeurIPS.Google ScholarGoogle Scholar
  36. Sun, J., Shen, Z., Wang, Y., Bao, H., and Zhou, X. (2021). LoFTR: Detector-free local feature matching with transformers. In Proc. CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  37. Tancik, M., Srinivasan, P. P., Mildenhall, B., Fridovich-Keil, S., Raghavan, N., Singhal, U., Ramamoorthi, R., Barron, J. T., and Ng, R. (2020). Fourier features let networks learn high frequency functions in low dimensional domains. In Proc. NeurIPS.Google ScholarGoogle Scholar
  38. Tulyakov, S., Liu, M.-Y., Yang, X., and Kautz, J. (2018). MoCoGAN: Decomposing motion and content for video generation. In Proc. CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  39. van den Oord, A., Kalchbrenner, N., and Kavukcuoglu, K. (2016a). Pixel recurrent neural networks. In Proc. ICML.Google ScholarGoogle Scholar
  40. van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., and Kavukcuoglu, K. (2016b). Conditional image generation with PixelCNN decoders. In Proc. NIPS.Google ScholarGoogle Scholar
  41. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Proc. NeurIPS.Google ScholarGoogle Scholar
  42. Wang, T.-C., Liu, M.-Y., Tao, A., Liu, G., Kautz, J., and Catanzaro, B. (2019). Few-shot video-to-video synthesis. In Proc. NeurIPS.Google ScholarGoogle Scholar
  43. Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Liu, G., Tao, A., Kautz, J., and Catanzaro, B. (2018). Video-to-video synthesis. In Proc. NeurIPS.Google ScholarGoogle Scholar
  44. Xiong, W., Luo, W., Ma, L., Liu, W., and Luo, J. (2018). Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks. In Proc. CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  45. Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. ICCV.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Disentangling random and cyclic effects in time-lapse sequences

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Graphics
          ACM Transactions on Graphics  Volume 41, Issue 4
          July 2022
          1978 pages
          ISSN:0730-0301
          EISSN:1557-7368
          DOI:10.1145/3528223
          Issue’s Table of Contents

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 July 2022
          Published in tog Volume 41, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader