We introduce P-GSVC, a layered progressive 2D Gaussian splatting (2DGS), for scalable image and video.
Abstract
Gaussian splatting has emerged as an effective explicit representation for image and video reconstruction. In this work, we present P-GSVC, the first layered progressive 2D Gaussian splatting framework that provides a unified solution for scalable Gaussian representation in both images and videos. Our method organizes 2D Gaussian splats into a layered hierarchy, where the base layer reconstructs coarse structures and successive enhancement layers progressively refine details. To effectively optimize this layered representation, we propose a joint training strategy that simultaneously updates Gaussians across layers while aligning their optimization trajectories, ensuring consistency and robustness throughout training. This proposed model naturally supports progressive scalability in terms of both quality and resolution. Extensive experiments on both image and video datasets demonstrate that P-GSVC achieves high visual quality, robust layer consistency, and reliable progressive reconstruction with less than 3% overhead compared to non-scalable baselines, and obtains nearly 50% gains compared to existing training strategies for layered Gaussians.
Motivation
Existing 2DGS methods are primarily designed for image and video representation, without considering the challenges of scalable coding. Simple pruning-based “layers” often introduce holes and bad visual quality. We therefore build a layered progressive 2DGS where the base layer keeps complete scene structure and higher layers only add details, supported by joint training to ensure consistent and robust progressive decoding.
Methodology
We propose P-GSVC, a layered progressive 2D Gaussian Splatting (2DGS) framework for scalable image and video. We organize Gaussians into multiple layers: the base layer captures the main structure of the scene, and enhancement layers progressively add details. To train the layered representation reliably, we use a joint training strategy. In each iteration, we optimize two targets in parallel: a full reconstruction using all layers, and an intermediate reconstruction using only layers up to a cyclically chosen level. This keeps lower layers reliable, and makes the quality improve smoothly in higher layers.
Results
Image Scalability
P-GSVC consistently outperforms LIG, demonstrating the effectiveness of our joint training strategy over the sequential layer-wise approach in image representation tasks.
Video Scalability
P-GSVC also supports progressive video decoding. Intermediate layers remain reliable, while higher layers refine quality consistently across frames.
Rate–Distortion Trade-off
We report the rate–distortion (RD) performance to compare P-GSVC with baselines. P-GSVC achieves competitive RD while providing scalable decoding, with only a small overhead (<3%) versus non-scalable training (Monolithic).