ClipGStream: Clip-Stream Gaussian Splatting for Any Length and Any Motion Multi-View Dynamic Scene Reconstruction

Jie Liang^1,2*, Jiahao Wu^1,2*, Chao Wang^2‡, Jiayu Yang², Xiaoyun Zheng², Kaiqiang Xiong^1,2, Zhanke Wang¹, Jinbo Yan¹, Feng Gao³, Ronggang Wang^1,2,4†

¹ Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology

Shenzhen Graduate School, Peking University

² Pengcheng Lab, ³ Peking University, ⁴ MIGU Video Co., Ltd.,

CVPR 2026

Paper Code Our VRU Dataset

Abstract

Dynamic 3D scene reconstruction is essential for immersive media such as VR, MR, and XR, yet remains challenging for long multi-view sequences with large-scale motion. Existing dynamic Gaussian approaches are either Frame-Stream, offering scalability but poor temporal stability, or Clip, achieving local consistency at the cost of high memory and limited sequence length. We propose ClipGStream, a hybrid reconstruction framework that performs stream optimization at the clip level rather than the frame level. The sequence is divided into short clips, where dynamic motion is modeled using clip-independent spatio-temporal fields and residual anchor compensation to capture local variations efficiently, while inter-clip inherited anchors and decoders maintain structural consistency across clips. This Clip-Stream design enables scalable, flicker-free reconstruction of long dynamic videos with high temporal coherence and reduced memory overhead. Extensive experiments demonstrate that ClipGStream achieves state-of-the-art reconstruction quality and efficiency.

Long 360 Dataset (Large-Scale Motion, 1400 frames)

Unlike prior approaches, our method is the first to achieve temporal consistency on long-sequence, large-motion datasets while supporting efficient random access.