
CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting
- E3DGSAR
- Computer vision , 3 dgs , Style transfer
- November 30, 2025
Table of Contents
Abstract
We present CLIPGaussian, a framework that performs universal, multimodal style transfer by operating directly on Gaussian primitives—the shared representation underlying modern 2D, 3D, and 4D Gaussian Splatting pipelines. Instead of relying on heavy generative models, CLIPGaussian uses CLIP-driven text and image cues to steer both the appearance and, when beneficial, the geometry of Gaussian elements. This makes stylization a geometric-aware process: colors shift in locally meaningful ways, textures reorganize to match the target style, and structural details adapt subtly without disrupting the underlying scene. Because the method works natively on Gaussian fields, the same mechanism seamlessly applies to single images, complex 3D objects, temporally coherent videos, and dynamic 4D reconstructions.
Across modalities, CLIPGaussian produces:
- Consistent style transfer guided by natural language or reference images,
- Coherent transformations of color, texture, and fine detail without model retraining,
- Stable temporal behavior in videos and geometry-preserving edits in 3D and 4D scenes.
In essence, CLIPGaussian unifies stylistic manipulation across visual data types by treating Gaussian primitives as a flexible canvas for multimodal guidance. This enables expressive, efficient, and broadly applicable stylization while maintaining the compactness and differentiability that make Gaussian-based representations so powerful.
Paper: Click here to read


