P-GSVC:
Layered Progressive 2D Gaussian Splatting for Scalable Image and Video

National University of Singapore
The 17th ACM Multimedia Systems Conference (MMSys 2026)
Layer 1
Layer 2
Layer 3

We introduce P-GSVC, a layered progressive 2D Gaussian splatting (2DGS), for scalable image and video.

Abstract

Gaussian splatting has emerged as an effective explicit representation for image and video reconstruction. In this work, we present P-GSVC, the first layered progressive 2D Gaussian splatting framework that provides a unified solution for scalable Gaussian representation in both images and videos. Our method organizes 2D Gaussian splats into a layered hierarchy, where the base layer reconstructs coarse structures and successive enhancement layers progressively refine details. To effectively optimize this layered representation, we propose a joint training strategy that simultaneously updates Gaussians across layers while aligning their optimization trajectories, ensuring consistency and robustness throughout training. This proposed model naturally supports progressive scalability in terms of both quality and resolution. Extensive experiments on both image and video datasets demonstrate that P-GSVC achieves high visual quality, robust layer consistency, and reliable progressive reconstruction with less than 3% overhead compared to non-scalable baselines, and obtains nearly 50% gains compared to existing training strategies for layered Gaussians.

Motivation

Motivation: pruned vs layered 2DGS

Existing 2DGS methods are primarily designed for image and video representation, without considering the challenges of scalable coding. Simple pruning-based “layers” often introduce holes and bad visual quality. We therefore build a layered progressive 2DGS where the base layer keeps complete scene structure and higher layers only add details, supported by joint training to ensure consistent and robust progressive decoding.

Methodology

P-GSVC pipeline

We propose P-GSVC, a layered progressive 2D Gaussian Splatting (2DGS) framework for scalable image and video. We organize Gaussians into multiple layers: the base layer captures the main structure of the scene, and enhancement layers progressively add details. To train the layered representation reliably, we use a joint training strategy. In each iteration, we optimize two targets in parallel: a full reconstruction using all layers, and an intermediate reconstruction using only layers up to a cyclically chosen level. This keeps lower layers reliable, and makes the quality improve smoothly in higher layers.

Results

Image Scalability

P-GSVC consistently outperforms LIG, demonstrating the effectiveness of our joint training strategy over the sequential layer-wise approach in image representation tasks.

Image scalability results

Video Scalability

P-GSVC also supports progressive video decoding. Intermediate layers remain reliable, while higher layers refine quality consistently across frames.

Pruning
Sequential
P-GSVC
Monolithic

Rate–Distortion Trade-off

Rate–distortion trade-off results

We report the rate–distortion (RD) performance to compare P-GSVC with baselines. P-GSVC achieves competitive RD while providing scalable decoding, with only a small overhead (<3%) versus non-scalable training (Monolithic).