MoSA: Motion-constrained Stress Adaptation

Abstract

Learning real-world dynamics from visual observations is crucial across graphics, robotics, and embodied AI. A common strategy is to calibrate simulators by estimating physical parameters, yet accuracy is ultimately bounded by the underlying physical models, which usually assume materials are homogeneous and isotropic. Even when this is reasonable, real-world objects typically exhibit mild anisotropy and heterogeneity. Once the near-isotropic backbone is calibrated, these residual effects become the key bottleneck for further closing the real-to-sim gap. Black-box neural dynamics, on the other hand, discard strong physical priors and suffer from poor data efficiency and overfitting.

We propose MoSA, a motion-constrained stress adaptation framework that targets these residual effects directly. MoSA keeps an isotropic constitutive law as a physics prior and learns a structured residual stress operator that progressively adapts stresses via microplane-constrained redistribution in a physics-informed cascaded network. We further impose motion constraints by supervising temporal and spatial derivatives of the deformation field with cues from dynamic 3D Gaussian Splatting reconstruction. On synthetic and real-world multi-view captures, MoSA achieves superior accuracy, generalization, and robustness, while learning physically meaningful residual anisotropy. Deployed in a sim-to-real robot manipulation setting, the better dynamics directly translate into a +26% success rate over an isotropic-prior baseline.

MoSA — main pipeline overview — **Figure 1.** From multi-view videos of a deforming object, MoSA recovers a physics-consistent 3D simulator: a near-isotropic backbone explains the dominant dynamics, and a learned residual stress operator captures the mild anisotropy and heterogeneity that close the remaining real-to-sim gap.

Microplane stress redistribution: each face's stress is replaced by a learned linear combination across coordinate planes — **Figure 2.** Microplane stress redistribution. Each face's Cauchy stress is replaced by $\sum_{j}(I_{ij} + c_{ij}^{\,rd})\,\sigma_{ji}$ — an identity baseline plus a learned residual coefficient $c^{\,rd}$ — letting the operator rescale and redistribute stress across the three coordinate planes to express directional anisotropy on top of the isotropic prior.

What does MoSA learn?

To check that MoSA captures physically meaningful residual effects rather than overfitting, we probe two complementary aspects of the learned model: (i) directional anisotropy of the stress-adaptation operator, and (ii) spatial heterogeneity of the local material field.

Learned redistribution trend: normalized Young's modulus and Jacobian response share the same directional pattern — Learned redistribution trend — $E_{\text{norm}}(\theta)$ and $\lVert J\rVert_{\text{norm}}(\theta)$ trace the same directional pattern, so the operator's mechanism agrees with its physical outcome.

Directional Young's modulus: GT vs. Neural Fit vs. Ours vs. Isotropic — Directional Young's modulus $E(\theta)$ — MoSA recovers the anisotropic ground truth, while the isotropic prior misses it and a black-box neural fit overshoots.

Spatial visualization of the learned heterogeneity field on Mandarin — $\eta(\mathbf{x})$ on Mandarin — redder = stiffer, bluer = softer.

Spatial visualization of the learned heterogeneity field on Rabbit — $\eta(\mathbf{x})$ on Rabbit — same color code.

Directional anisotropy. We probe the learned operator on a sample with known anisotropic ground truth. (a) Mechanism check. The normalized directional Young's modulus $E_{\text{norm}}(\theta)$ and the normalized Jacobian response $\lVert J\rVert_{\text{norm}}(\theta)$ trace the same anisotropy pattern, showing that the operator's stress-redistribution mechanism is internally consistent with the directional stiffness it produces. (b) Ground-truth check. Against baselines, the isotropic prior (dashed) collapses to a perfect circle and misses the directional dependence entirely, while a black-box neural fit overshoots and hallucinates spurious anisotropy. MoSA closely tracks the GT, confirming that the residual stress operator captures physically meaningful anisotropy rather than overfitting.

Spatial heterogeneity. The learned field $\eta(\mathbf{x})$ locally modulates the global material parameter and produces smooth, object-dependent stiffness patterns — confirming that the continuous field captures structured material variation rather than fitting unstructured residual noise.

Qualitative Results

Real-to-sim rollouts on our 12-camera real-world dataset. Click the arrows to switch scenes; drag the slider below to scrub the current clip.

Chick1 — elastoplastic squash, residual scaling along the head-to-tail axis.

Chick2 — held-out initial pose; physics-consistent rollout.

Gorilla — plastic squash with residual stress redistribution.

Mandarin — elastic anisotropy along the equatorial direction.

Peanut — elongated body with stiffness gradient between the lobes.

Rabbit — elastoplastic deformation under free-fall impact.

Rainbowball — heterogeneous stiffness across colored sectors.

Comparisons

Quantitative results — real-world multi-view dataset

Per-scene and mean PSNR / SSIM on our real-world multi-view dataset (7 objects). SSIM is scaled by 100.

Method	Chick1		Gorilla		Mandarin		Chick2		Peanut		Rabbit		RBball		Mean
Method	PSNR↑	SSIM↑	PSNR↑	SSIM↑	PSNR↑	SSIM↑	PSNR↑	SSIM↑	PSNR↑	SSIM↑	PSNR↑	SSIM↑	PSNR↑	SSIM↑	PSNR↑	SSIM↑
DEL	28.32	91.6	29.11	90.4	31.92	92.4	28.59	90.9	28.38	91.3	28.57	91.4	30.95	91.7	29.41	91.4
Vid2Sim	26.71	90.9	29.02	90.0	25.85	89.8	25.70	89.5	30.69	94.6	26.85	90.3	31.71	91.7	28.08	91.0
NeuMA	30.73	92.4	29.78	91.1	31.85	92.4	28.92	91.0	30.70	91.8	28.30	92.3	30.74	91.1	30.00	91.7
GIC	30.93	92.5	29.75	91.0	29.54	91.6	28.17	90.9	32.88	91.3	28.08	91.3	30.78	91.7	30.02	91.5
MoSA (Ours)	32.05	92.7	30.19	91.9	32.83	92.7	30.17	91.6	33.01	92.0	30.35	92.1	32.06	92.7	31.35	92.3

Synthetic dynamic grounding — PAC-NeRF benchmark

Per-scene and mean CD / EMD on the PAC-NeRF dataset (7 scenes). Both metrics scaled by 100.

Method	torus		cat		playdoh		droplet		Cream		Bird		Letter		Mean
Method	CD↓	EMD↓	CD↓	EMD↓	CD↓	EMD↓	CD↓	EMD↓	CD↓	EMD↓	CD↓	EMD↓	CD↓	EMD↓	CD↓	EMD↓
PAC-NeRF	21.8	11.6	9.8	14.4	18.6	5.6	10.4	3.2	20.5	12.7	19.3	21.1	12.8	8.5	16.2	11.0
DEL	21.7	10.7	7.9	12.8	12.2	2.5	9.8	1.7	19.8	9.8	17.8	20.2	12.6	7.2	14.5	9.3
GIC	20.2	9.9	7.6	12.6	12.3	2.5	10.2	1.9	19.5	10.1	16.5	19.5	10.3	7.5	13.8	9.1
MoSA (Ours)	20.1	9.8	7.3	12.4	11.4	2.3	9.6	1.5	19.4	9.5	16.3	19.2	10.1	6.6	13.5	8.8

Sim-to-Real Robot Manipulation

We learn object dynamics from video, train a manipulation policy in the learned simulator, and zero-shot transfer to a real robot. Better real-to-sim dynamics translates directly into more reliable sim-to-real policy execution.

Isotropic Physical Model Baseline vs. MoSA — three deformable scenes

Drag the slider to scrub through 20 keyframes. Use the arrows to switch between Cup, Rabbit and Towel.

Isotropic Physical Model Baseline

MoSA (Ours)

Cup — grasp and lift a deformable cup.

Isotropic Physical Model Baseline

MoSA (Ours)

Rabbit — place an elastic rabbit onto a target box.

Isotropic Physical Model Baseline

MoSA (Ours)

Towel — fold and hang a flexible towel.

Elastic-rabbit placement

Place a deformable rabbit onto a white box.

68 / 100 vs. 42 / 100 with isotropic prior
+26% success

Tower hanging

Hang a deformable object on a peg tower.

82 / 100 vs. 55 / 100 with isotropic prior
+27% success

BibTeX

@inproceedings{wang2026mosa,
  title     = {{MoSA}: Motion-constrained Stress Adaptation for Mitigating Real-to-Sim Gap
               in Continuum Dynamics via Learning Residual Anisotropy},
  author    = {Wang, Jiaxu and He, Junhao and Coauthors and Advisor},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  year      = {2026}
}

MoSA: Motion-constrained Stress Adaptation Mitigating the Real-to-Sim Gap via Learning Residual Anisotropy

Abstract

What does MoSA learn?

Qualitative Results

Comparisons

Quantitative results — real-world multi-view dataset

Synthetic dynamic grounding — PAC-NeRF benchmark

Sim-to-Real Robot Manipulation

Isotropic Physical Model Baseline vs. MoSA — three deformable scenes

Elastic-rabbit placement

Tower hanging

BibTeX

MoSA: Motion-constrained Stress Adaptation
Mitigating the Real-to-Sim Gap via Learning Residual Anisotropy