🔎 3D Motion Magnification:
Visualizing Subtle Motions with Time-Varying Neural Fields

Brandon Y. Feng^*1 Hadi AlZayer^*1 Michael Rubinstein² William T. Freeman^2,3 Jia-Bin Huang¹

¹University of Maryland ²Google Research ³Massachusetts Institute of Technology

International Conference on Computer Vision (ICCV) 2023

We deploy our method on in-the-wild videos containing subtle motion, ranging from a sleeping baby breathing to celebrities attempting the Mannequin Challenge.

Abstract

Video motion magnification helps us visualize subtle, imperceptible motion. Prior methods, however, are only applicable to 2D videos. We present 3D motion magnification techniques that allow us to magnify subtle motions in dynamic scenes while supporting rendering from novel views. Our core idea is to represent the dynamic scene with time-varying radiance fields and leverage the Eulerian principle for motion magnification to analyze and amplify the embedding features from a fixed point over time. We study and validate the capability of 3D motion magnification for both implicit and explicit/hybrid NeRF models. We evaluate the effectiveness of our approaches on both synthetic and real-world dynamic scenes under various capture setups.

Approach

a. We adopt the tri-plane representation to associate each 3D point to its feature vector, which is fed to an MLP network to produce its color and opacity used for volume rendering. b. To represent the dynamic scene, we learn one feature tri-plane for each observed timestep. All timesteps share the same MLP which decodes the color and opacity, so features on the tri-plane are solely responsible for producing the subtle temporal variations. We first learn a tri-plane for a single timestep as the initialization, and then finetune the features for each remaining timestep. After learning the feature tri-planes, we can split them into three feature videos. These feature videos are separately processed by using phase-based video motion magnification, resulting in three motion-magnified feature videos. c. We recompose those three motion-magnified feature videos into a single motion-magnified tri-plane, which can be used for volume rendering without further modifications.

Single-camera results

We demonstrate successful 3D motion magnification on various real-world scenes with different subtle motions, scene compositions, and handheld video captures in the wild.

Multi-camera results

We deploy our method on multi-camera captures, including 8 synthetic scenes generated with Blender, and 2 real-world multi-camera scenes focused on human subjects.

Stablization from handheld captures

Prior 2D method (phase-based Eulerian) fails on a handheld-captured video with camera shake as it assumes stablized capture. Our method benefits from having a 3D representation and separates camear motion from scene motion.

Rendering from different poses

We provide motion-magnified rendering at Tracked Poses, which are estimated from the
shaky handheld capture, and Fixed Pose, which is a static viewpoint.

Frequency selection

We capture two tuning forks with different vibration frequnecies (Left: 64 Hz, Right: 128 Hz).
By temporally filtering the point embeddings, we can selectively amplify different frequencies.

Varying the magnification factor

We visualize the impact of varying the magnification factor.

Comparisons of different magnification strategies

Using Positional Encoding as Point Embedding Function

Position Shift predicts a 3D displacement for the input point before positional encoding, while Encoding Shift predicts a phase shift within each sine wave for the input point during positional encoding. We perform Linear Eulerian magnification by amplifying the temporal variations of the predicted shifts. Position Shift leads to false motions, while Encoding Shift reduces such artifacts.

Using Tri-Plane as Point Embedding Function

Linear - Tri-Plane applies linear Eulerian magnification on tri-plane features.
Phase - Tri-Plane applies phase-based Eulerian magnification on tri-plane features.
Linear - Tri-Plane causes clipped intensities, while Phase - Tri-Plane reduces such artifacts.

Comparisons to video-based magnifications

Observed are Blender renderings from scenes with subtle object motions. Ground Truth are Blender renderings where the true object motions are artificially amplified.
Linear - Video and Phase - Video are obtained by deploying 2D magnification methods on the non-magnified RGB videos rendered by NeRF. In general, 3D methods that perform magnification in the embedding space produce fewer artifacts compared to 2D methods. Furthermore, 2D methods requires rendering at a fixed viewpoint and, for every new rendering, re-running video magnification on the fly. In contrast, magnification on tri-planes is performed just once, and the motion-magnified tri-planes can then be used to render at new viewpoints.

BibTeX

@inproceedings{feng2023motionmag,
    author    = {Feng, Brandon Y. and AlZayer, Hadi and Rubinstein, Michael and Freeman, William T. and Huang, Jia-Bin},
    title     = {Visualizing Subtle Motions with Time-Varying Neural Fields},
    booktitle = {International Conference on Computer Vision (ICCV)},
    year      = {2023},
  }

Template from Nerfies

🔎 3D Motion Magnification: Visualizing Subtle Motions with Time-Varying Neural Fields