Accepted to ICML 2026

MoSA: Motion-constrained Stress Adaptation
Mitigating the Real-to-Sim Gap via Learning Residual Anisotropy

1HC Lab, HKUST (GZ)   2MMLab, The Chinese University of Hong Kong   3The University of Hong Kong
*Equal contribution  Corresponding author

Abstract

Learning real-world dynamics from visual observations is crucial across graphics, robotics, and embodied AI. A common strategy is to calibrate simulators by estimating physical parameters, yet accuracy is ultimately bounded by the underlying physical models, which usually assume materials are homogeneous and isotropic. Even when this is reasonable, real-world objects typically exhibit mild anisotropy and heterogeneity. Once the near-isotropic backbone is calibrated, these residual effects become the key bottleneck for further closing the real-to-sim gap. Black-box neural dynamics, on the other hand, discard strong physical priors and suffer from poor data efficiency and overfitting.

We propose MoSA, a motion-constrained stress adaptation framework that targets these residual effects directly. MoSA keeps an isotropic constitutive law as a physics prior and learns a structured residual stress operator that progressively adapts stresses via microplane-constrained redistribution in a physics-informed cascaded network. We further impose motion constraints by supervising temporal and spatial derivatives of the deformation field with cues from dynamic 3D Gaussian Splatting reconstruction. On synthetic and real-world multi-view captures, MoSA achieves superior accuracy, generalization, and robustness, while learning physically meaningful residual anisotropy. Deployed in a sim-to-real robot manipulation setting, the better dynamics directly translate into a +26% success rate over an isotropic-prior baseline.

MoSA — main pipeline overview
Figure 1. From multi-view videos of a deforming object, MoSA recovers a physics-consistent 3D simulator: a near-isotropic backbone explains the dominant dynamics, and a learned residual stress operator captures the mild anisotropy and heterogeneity that close the remaining real-to-sim gap.
Microplane stress redistribution: each face's stress is replaced by a learned linear combination across coordinate planes
Figure 2. Microplane stress redistribution. Each face's Cauchy stress is replaced by $\sum_{j}(I_{ij} + c_{ij}^{\,rd})\,\sigma_{ji}$ — an identity baseline plus a learned residual coefficient $c^{\,rd}$ — letting the operator rescale and redistribute stress across the three coordinate planes to express directional anisotropy on top of the isotropic prior.

What does MoSA learn?

To check that MoSA captures physically meaningful residual effects rather than overfitting, we probe two complementary aspects of the learned model: (i) directional anisotropy of the stress-adaptation operator, and (ii) spatial heterogeneity of the local material field.

Learned redistribution trend: normalized Young's modulus and Jacobian response share the same directional pattern
Learned redistribution trend — $E_{\text{norm}}(\theta)$ and $\lVert J\rVert_{\text{norm}}(\theta)$ trace the same directional pattern, so the operator's mechanism agrees with its physical outcome.
Directional Young's modulus: GT vs. Neural Fit vs. Ours vs. Isotropic
Directional Young's modulus $E(\theta)$ — MoSA recovers the anisotropic ground truth, while the isotropic prior misses it and a black-box neural fit overshoots.
Spatial visualization of the learned heterogeneity field on Mandarin
$\eta(\mathbf{x})$ on Mandarin — redder = stiffer, bluer = softer.
Spatial visualization of the learned heterogeneity field on Rabbit
$\eta(\mathbf{x})$ on Rabbit — same color code.

Directional anisotropy. We probe the learned operator on a sample with known anisotropic ground truth. (a) Mechanism check. The normalized directional Young's modulus $E_{\text{norm}}(\theta)$ and the normalized Jacobian response $\lVert J\rVert_{\text{norm}}(\theta)$ trace the same anisotropy pattern, showing that the operator's stress-redistribution mechanism is internally consistent with the directional stiffness it produces. (b) Ground-truth check. Against baselines, the isotropic prior (dashed) collapses to a perfect circle and misses the directional dependence entirely, while a black-box neural fit overshoots and hallucinates spurious anisotropy. MoSA closely tracks the GT, confirming that the residual stress operator captures physically meaningful anisotropy rather than overfitting.

Spatial heterogeneity. The learned field $\eta(\mathbf{x})$ locally modulates the global material parameter and produces smooth, object-dependent stiffness patterns — confirming that the continuous field captures structured material variation rather than fitting unstructured residual noise.

Qualitative Results

Real-to-sim rollouts on our 12-camera real-world dataset. Click the arrows to switch scenes; drag the slider below to scrub the current clip.

Comparisons

Quantitative results — real-world multi-view dataset

Per-scene and mean PSNR / SSIM on our real-world multi-view dataset (7 objects). SSIM is scaled by 100.

Method Chick1 Gorilla Mandarin Chick2 Peanut Rabbit RBball Mean
PSNR↑SSIM↑ PSNR↑SSIM↑ PSNR↑SSIM↑ PSNR↑SSIM↑ PSNR↑SSIM↑ PSNR↑SSIM↑ PSNR↑SSIM↑ PSNR↑SSIM↑
DEL 28.3291.6 29.1190.4 31.9292.4 28.5990.9 28.3891.3 28.5791.4 30.9591.7 29.4191.4
Vid2Sim 26.7190.9 29.0290.0 25.8589.8 25.7089.5 30.6994.6 26.8590.3 31.7191.7 28.0891.0
NeuMA 30.7392.4 29.7891.1 31.8592.4 28.9291.0 30.7091.8 28.3092.3 30.7491.1 30.0091.7
GIC 30.9392.5 29.7591.0 29.5491.6 28.1790.9 32.8891.3 28.0891.3 30.7891.7 30.0291.5
MoSA (Ours) 32.0592.7 30.1991.9 32.8392.7 30.1791.6 33.0192.0 30.3592.1 32.0692.7 31.3592.3

Synthetic dynamic grounding — PAC-NeRF benchmark

Per-scene and mean CD / EMD on the PAC-NeRF dataset (7 scenes). Both metrics scaled by 100.

Method torus cat playdoh droplet Cream Bird Letter Mean
CD↓EMD↓ CD↓EMD↓ CD↓EMD↓ CD↓EMD↓ CD↓EMD↓ CD↓EMD↓ CD↓EMD↓ CD↓EMD↓
PAC-NeRF 21.811.6 9.814.4 18.65.6 10.43.2 20.512.7 19.321.1 12.88.5 16.211.0
DEL 21.710.7 7.912.8 12.22.5 9.81.7 19.89.8 17.820.2 12.67.2 14.59.3
GIC 20.29.9 7.612.6 12.32.5 10.21.9 19.510.1 16.519.5 10.37.5 13.89.1
MoSA (Ours) 20.19.8 7.312.4 11.42.3 9.61.5 19.49.5 16.319.2 10.16.6 13.58.8

Sim-to-Real Robot Manipulation

We learn object dynamics from video, train a manipulation policy in the learned simulator, and zero-shot transfer to a real robot. Better real-to-sim dynamics translates directly into more reliable sim-to-real policy execution.

Isotropic Physical Model Baseline vs. MoSA — three deformable scenes

Drag the slider to scrub through 20 keyframes. Use the arrows to switch between Cup, Rabbit and Towel.

Elastic-rabbit placement

Place a deformable rabbit onto a white box.

68 / 100 vs. 42 / 100 with isotropic prior
+26% success

Tower hanging

Hang a deformable object on a peg tower.

82 / 100 vs. 55 / 100 with isotropic prior
+27% success

BibTeX

@inproceedings{wang2026mosa,
  title     = {{MoSA}: Motion-constrained Stress Adaptation for Mitigating Real-to-Sim Gap
               in Continuum Dynamics via Learning Residual Anisotropy},
  author    = {Wang, Jiaxu and He, Junhao and Coauthors and Advisor},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  year      = {2026}
}