Computer Vision

Home /
Categories /
Computer Vision

CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting

Abstract We present CLIPGaussian, a framework that performs universal, multimodal style transfer by operating directly on Gaussian primitives—the shared representation underlying modern 2D, 3D, and 4D Gaussian Splatting pipelines. Instead of relying on heavy generative models, CLIPGaussian uses CLIP-driven text and image cues to steer both the appearance and, when beneficial, the geometry of Gaussian elements. This makes stylization a geometric-aware process: colors shift in locally meaningful ways, textures reorganize to match the target style, and structural details adapt subtly without disrupting the underlying scene. Because the method works natively on Gaussian fields, the same mechanism seamlessly applies to single images, complex 3D objects, temporally coherent videos, and dynamic 4D reconstructions.

MiraGe: Editable 2.5D Image Representations with Flat-Controlled 3D Gaussians

Abstract Implicit Neural Representations (INRs) encode images as continuous functions that map pixel coordinates to RGB values, achieving compact storage and high visual fidelity. Recent work such as GaussianImage replaces neural MLPs with collections of 2D Gaussian primitives, reaching similar reconstruction quality and compression but offering limited editability. In practice, creators often need to adjust content—move objects, bend a photo, cast new shadows, or create parallax—all of which are awkward within purely 2D or purely additive Gaussian schemes.

VeGaS: Video Gaussian Splatting

Abstract Modern video representations often prioritize reconstruction quality and compression, but they can be difficult to edit in a precise and controllable way. VeGaS (Video Gaussian Splatting) tackles this gap by adapting Gaussian Splatting ideas to 2D videos while explicitly modeling nonlinear motion and appearance changes across time.