
Most physics simulations maintain a separation that is rarely questioned. One system handles what objects look like. Another handles how they move. At every frame, information shuttles between the two. For offline rendering pipelines, this is fine. For anything requiring real-time physically plausible dynamics in robot simulation, embodied AI, or interactive world models, this separation becomes a fundamental bottleneck.
What if a single particle could be both things at once?
The standard approach, and its limits
A typical deformable simulation pipeline involves a surface mesh defining appearance and a separate physics representation (particles, voxels, a proxy geometry) defining dynamics. These are coupled through skinning or embedding schemes that translate between them every frame. The coupling introduces error, latency, and architectural complexity.
PhysGaussian, published at CVPR 2024, demonstrated that 3D Gaussian primitives could be enriched with mechanical attributes and evolved under physics simulation, with Gaussian shape staying consistent with local strain. This was a meaningful advance. But it remained reconstruction-first: physics bolted onto an existing scene. The geometry layer and the simulation layer remained conceptually distinct.
A unified primitive
GaussianFlesh proposes a different starting point. Each particle is a 3D Gaussian, an ellipsoidal primitive with position, orientation, and shape, that simultaneously carries the full state a physics solver needs: velocity, deformation gradient, affine velocity matrix. When a particle deforms under simulation, its visual shape deforms automatically. There is no translation step. The geometry is the simulation state.
The practical consequence is that constitutive law, the material identity of an object, becomes a runtime parameter. A cloud of 120 particles assigned rubber properties bounces and rebounds. Reassigned metal properties, the same cloud barely deforms. Reassigned jelly, it spreads laterally and oscillates before recovering. Same particles. Same solver. The only difference is the stress function.
This is not merely a software convenience.
It reflects a deeper architectural commitment: the particle is designed as a physics primitive first, not adapted into one after the fact.
Two technical commitments worth understanding
Keeping this stable over long simulations is harder than it sounds.
Most real-time physics solvers use an Updated Lagrangian formulation, where the reference configuration moves with the simulation and the deformation gradient updates multiplicatively each frame. Under short simulations this works well. Under sustained elastic deformation, small numerical errors compound and the math drifts from physical reality over hundreds of frames.
GaussianFlesh uses a Total Lagrangian formulation instead. The reference configuration is fixed at initialization and never rebuilt. Deformation is always measured relative to the original rest state. To prevent any residual drift, the deformation gradient is recomputed once per display frame via weighted least squares over actual particle positions. The math stays grounded.
The second commitment concerns shape restoration. Most deformable simulations require explicit correction steps, periodic nudges that remind particles what shape they are supposed to hold. GaussianFlesh uses corotated linear elasticity, a constitutive model in which shape restoration is implicit in the stress calculation itself. For stiff materials, no explicit correction is needed at all. For high-restitution rubber, a small residual correction suffices. This eliminates a class of ad hoc shape-preservation hacks common in prior work.
What this is actually about
A bouncing ball is not the finding. It is a demonstration that the primitive works.
The broader claim is that world models, AI systems that simulate physical behavior, currently lack a grounded physics primitive. Video-based approaches learn correlations from pixels. They are impressive, but they do not represent that a rubber ball and a metal ball dropped from the same height behave differently for reasons that can be computed, not just observed.
A substrate where objects are Gaussian particles with material identity, running forward simulation in real time, in the same representational space used for scene reconstruction, closes that loop. The next step is coupling a vision-language model to existing 3D Gaussian scenes so that material properties can be inferred from appearance and simulation can begin immediately, without a separate physics representation, without a translation layer.
We think the particle is the right place to start.