Sim2Sim
This guide covers the workflow for taking a policy trained in an RL backend and replaying it in MuJoCo to check whether the policy still behaves correctly outside its original training environment.
That workflow is especially useful because training backends such as Isaac Lab often require significantly more compute than a typical personal workstation can provide. In practice, many users train on shared servers or remote GPU machines, then need a separate lightweight environment for local inspection and validation. A MuJoCo-based Sim2Sim path fills that gap: it lets you bring the exported policy back to a local machine, run quick replay tests, and debug behavior before moving on to more complex transfer or deployment steps.
This is slightly different from the usual Sim2Sim motivation in many robot-learning projects. There, the main concern is often that contact dynamics are implemented differently across simulation engines, and MuJoCo becomes the preferred replay target because its contact modeling is accurate and widely trusted. LAV2 also benefits from that property, especially for tasks where contact behavior matters, but it is not the primary reason for this workflow. Here, the central goal is still to provide a lightweight local validation environment after training has happened in a heavier backend.
In LAV2, the practical sim-to-sim path has three parts:
- export the trained policy into a deployable model file
- run the policy in MuJoCo through the common play runner
- iterate on observation, command, and timing alignment until the transferred behavior is trustworthy
sequenceDiagram
autonumber
participant Trainer as RL backend
participant Exporter as Exporter
participant Runner as MuJoCo play runner
participant Policy as Exported policy
participant Sim as MuJoCo
Trainer->>Exporter: Export trained checkpoint
Exporter-->>Runner: Load exported model file
Runner->>Sim: Read state and sensors
Runner->>Runner: Reconstruct observation
Runner->>Policy: Run inference
Policy-->>Runner: Return normalized action
Runner->>Runner: Apply mapping and command interpretation
Runner->>Sim: Step rollout with mapped command
Runner->>Runner: Check timing and behavior alignment
1. Export The Trained Policy
The first step is to export the trained policy into a model format that can be loaded independently of the original training runtime.
In practice this is usually straightforward if the training library already provides an exporter:
- Isaac Lab workflows using
rsl_rlcan export trained policies through the corresponding tooling in that stack - LAV2's
skrlpath already integrates export support inside lav2.runner.skrl.isaaclab
The LAV2 skrl runner supports exporting both TorchScript (.pt) and ONNX (.onnx) policies. Those exported files are the artifacts expected by the MuJoCo-side play scripts and by lav2.runner.skrl.eval.
2. Replay In MuJoCo
Once you have an exported policy file, the next step is to test it in MuJoCo using the common play utilities under lav2.runner.common.mujoco.
The repository already provides two entry scripts for this:
scripts/sim2sim/LAV2_base_play.pyscripts/sim2sim/LAV2_base_vel_play.py
These scripts wrap MuJoCoPlayConfig, MuJoCoPlayRunner, and the policy loader in lav2.runner.skrl.eval.
Typical usage looks like:
uv run python scripts/sim2sim/LAV2_base_play.py --checkpoint /path/to/policy.pt
uv run python scripts/sim2sim/LAV2_base_vel_play.py --checkpoint /path/to/policy.onnx
At this stage, MuJoCo replay is not limited to one fixed command source. You can test the policy with:
- keyboard commands
- gamepad commands
- specified trajectories through the trajectory hook in MuJoCoPlayConfig
That makes MuJoCo replay useful both as a quick deployment sanity check and as a richer qualitative testbed for transferred policies.
What To Check First During Replay
Start by checking that the model loads, the observation size matches the training setup, and the policy produces bounded actions. Then verify timing, command mapping, and qualitative tracking behavior.
3. Do Alignment Testing
A policy replaying in MuJoCo is not enough by itself. The more important step is to make sure the deployment-side assumptions still match what the policy saw during training.
The main alignment items to test are:
- control frequency and decimation
- normalized action mapping and command interpretation
- observation construction and ordering
Control Frequency
The deployment loop must respect the trained policy's effective action rate. That includes both the MuJoCo-side control decimation in MuJoCoPlayConfig and the original training-side assumptions about control update frequency.
If the policy was trained at one control rate and replayed at another, the result can look like poor transfer even when the model itself is fine.
Normalization And Mapping
The action coming out of the exported model is usually still in the normalized space expected by the training environment. It must therefore be mapped back into the controller or actuator space used in MuJoCo.
In LAV2 this is handled through the action and command logic inside lav2.runner.common.mujoco, together with the controller-side mapping utilities such as lav2.controller.mapping.
This is one of the first places to inspect when a policy appears numerically stable but produces obviously wrong motion.
Observation Alignment
The MuJoCo replay observation must match the training observation in:
- channel meaning
- channel ordering
- frame convention
- normalization assumptions
This is why the default observation extraction inside MuJoCoPlayRunner matters so much. If the replay observation differs from the training observation, the policy is not really being tested on the same task anymore.
Goal Of Sim2Sim
The goal of this stage is not just to make the policy run. The goal is to make the policy run under a replay stack whose timing, mappings, and observations are close enough to the training setup that the result is meaningful.
That is the minimum standard before moving on to more difficult transfer or deployment work.
API Cross-References
- MuJoCo play utilities: lav2.runner.common.mujoco
- MuJoCo play config: MuJoCoPlayConfig
- MuJoCo play runner: MuJoCoPlayRunner
- SKRL Isaac Lab runner and exporter path: lav2.runner.skrl.isaaclab
- Exported policy loader: lav2.runner.skrl.eval
- Controller-side mapping helpers: lav2.controller.mapping