CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting

CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting

Table of Contents

Abstract

We present CLIPGaussian, a framework that performs universal, multimodal style transfer by operating directly on Gaussian primitives—the shared representation underlying modern 2D, 3D, and 4D Gaussian Splatting pipelines. Instead of relying on heavy generative models, CLIPGaussian uses CLIP-driven text and image cues to steer both the appearance and, when beneficial, the geometry of Gaussian elements. This makes stylization a geometric-aware process: colors shift in locally meaningful ways, textures reorganize to match the target style, and structural details adapt subtly without disrupting the underlying scene. Because the method works natively on Gaussian fields, the same mechanism seamlessly applies to single images, complex 3D objects, temporally coherent videos, and dynamic 4D reconstructions.

Across modalities, CLIPGaussian produces:

  • Consistent style transfer guided by natural language or reference images,
  • Coherent transformations of color, texture, and fine detail without model retraining,
  • Stable temporal behavior in videos and geometry-preserving edits in 3D and 4D scenes.

In essence, CLIPGaussian unifies stylistic manipulation across visual data types by treating Gaussian primitives as a flexible canvas for multimodal guidance. This enables expressive, efficient, and broadly applicable stylization while maintaining the compactness and differentiability that make Gaussian-based representations so powerful.

Paper: Click here to read

Related Posts

PR-ENDO: Physically Based Relightable Gaussian Splatting for Endoscopy

PR-ENDO: Physically Based Relightable Gaussian Splatting for Endoscopy

Abstract Endoluminal endoscopic imaging—used in diagnosing colorectal cancer and other internal diseases—stands to benefit greatly from accurate 3D reconstructions and novel-view synthesis. However, current methods struggle with artifacts due to constrained camera trajectories and lighting effects that depend heavily on view, often overfitting and failing when viewing from novel angles.

Read More
VeGaS: Video Gaussian Splatting

VeGaS: Video Gaussian Splatting

Abstract Modern video representations often prioritize reconstruction quality and compression, but they can be difficult to edit in a precise and controllable way. VeGaS (Video Gaussian Splatting) tackles this gap by adapting Gaussian Splatting ideas to 2D videos while explicitly modeling nonlinear motion and appearance changes across time.

Read More
MiraGe: Editable 2.5D Image Representations with Flat-Controlled 3D Gaussians

MiraGe: Editable 2.5D Image Representations with Flat-Controlled 3D Gaussians

Abstract Implicit Neural Representations (INRs) encode images as continuous functions that map pixel coordinates to RGB values, achieving compact storage and high visual fidelity. Recent work such as GaussianImage replaces neural MLPs with collections of 2D Gaussian primitives, reaching similar reconstruction quality and compression but offering limited editability. In practice, creators often need to adjust content—move objects, bend a photo, cast new shadows, or create parallax—all of which are awkward within purely 2D or purely additive Gaussian schemes.

Read More