Multimodal

CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting

CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting

Abstract We present CLIPGaussian, a framework that performs universal, multimodal style transfer by operating directly on Gaussian primitives—the shared representation underlying modern 2D, 3D, and 4D Gaussian Splatting pipelines. Instead of relying on heavy generative models, CLIPGaussian uses CLIP-driven text and image cues to steer both the appearance and, when beneficial, the geometry of Gaussian elements. This makes stylization a geometric-aware process: colors shift in locally meaningful ways, textures reorganize to match the target style, and structural details adapt subtly without disrupting the underlying scene. Because the method works natively on Gaussian fields, the same mechanism seamlessly applies to single images, complex 3D objects, temporally coherent videos, and dynamic 4D reconstructions.

Read More