Skip to content

Domain Randomization

This page explains domain randomization as a simulation-side capability of the LAV2 stack itself. The focus here is not on terrain generation or generic scene variation, but on vehicle intrinsic randomization: under one task definition, different environment instances carry different yet physically plausible platform parameters, actuator responses, and controller settings.

That distinction matters in this repository. Scene-side variation is usually owned by terrain configuration or task events, while the randomization discussed here sits closer to the platform model. It changes what the simulated vehicle is, not only what the vehicle is asked to do. For the transfer-side motivation behind this choice, see Real2Sim; this page concentrates on how the mechanism is represented and executed inside the simulation stack.

flowchart TD
  A[Nominal platform definition] --> B[VehicleParams]
  B --> C[Controller-side parameters]
  B --> D[Actuator and plant-side parameters]
  C --> E[Torch controllers]
  D --> F[Mixer and rotor dynamics]
  E --> G[Task backend reset hooks]
  F --> G
  G --> H[Per-env sampled runtime state]
  H --> I[Parallel training environments]

Design Boundary

LAV2 keeps the nominal platform description centered on VehicleParams. Domain randomization is layered on top of that nominal model rather than defined as a separate parallel parameter system. In practice, this means the repository still has one authoritative description of mass properties, actuator constants, timing assumptions, and PX4-aligned limits, while the training backends are allowed to perturb selected subsets of those quantities at runtime.

The current implementation follows a consistent pattern. A runtime component keeps a nominal copy of the parameters it owns, samples per-environment values when randomization is requested, writes those values back into the batched tensors used during execution, and then rebuilds any derived caches whose values depend on the sampled parameters. This is why the design remains compatible with large batched backends: randomization is expressed as a controlled mutation of runtime state, not as a second disconnected model definition.

This repository therefore treats domain randomization as a property of the simulation components themselves. The key question is not merely whether a task can inject randomness, but whether the affected controller or dynamics module can preserve a coherent interface after the perturbation has been applied. That is also why the implementation currently lives around controllers, mixer logic, and rotor dynamics rather than being hidden entirely inside task code.

Parameter Surfaces

The first parameter surface is the controller side. In lav2.controller.torch.pid, the batched FlightController exposes FlightController.randomize, allowing the per-environment PID gains and controller limits to be resampled at reset. Conceptually, this is not the same as plant randomization. It changes the control law that interprets the same target and state signals, so it models uncertainty in the inner-loop controller configuration rather than uncertainty in the platform physics alone.

The second parameter surface is the actuator and allocation path. Mixer in lav2.controller.torch.mixer supports Mixer.randomize, which perturbs the parameters that govern thrust allocation and normalized-throttle interpretation. Because the mixer rebuilds its allocation matrix and thrust-related caches after sampling, the effect is not a superficial output disturbance; it changes how demanded wrench is mapped into rotor commands.

The third parameter surface is the rotor model itself. RotorDynamics in lav2.dynamics.torch.rotor exposes RotorDynamics.randomize, which perturbs quantities such as thrust and torque coefficients, rate limits, motor inertia, and rise or fall time constants. This is an important distinction from simpler perturbation schemes that only scale final thrust. In LAV2, the randomized rotor model can also change actuator response dynamics, which is usually closer to the kinds of mismatch that appear when moving from simulator assumptions to a real propulsion chain.

Read together, these three surfaces define the current scope of intrinsic randomization in LAV2. The repository does not yet treat all physically meaningful quantities as randomized objects. In particular, body-side properties such as mass, inertia, and center-of-mass offset are still discussed more as an architectural direction than as a uniformly implemented runtime layer across all backends. The present design should therefore be read as a partial but coherent foundation rather than as the final abstraction boundary.

Runtime Topology

The runtime topology is intentionally stratified. The nominal vehicle description remains centralized, but the sampled parameters live inside the batched runtime objects that actually execute control and dynamics updates. This split is what allows LAV2 to preserve one model vocabulary while still supporting many environments whose parameters differ at the episode level.

flowchart LR
  A[VehicleParams] --> B[FlightController]
  A --> C[Mixer]
  A --> D[RotorDynamics]
  B --> E[control_action or rotor_commands]
  C --> E
  D --> E
  E --> F[Task reset event]
  F --> G[randomize controller]
  F --> H[randomize mixer]
  F --> I[randomize rotor]
  G --> J[Per-env batched tensors]
  H --> J
  I --> J

What matters here is that reset-time orchestration does not invent a separate parameter language. The backend only decides when and for which environments randomization is applied. The controller and plant modules themselves still own how the sampled values are represented, validated, and propagated into downstream computations. This separation keeps backend code relatively thin and avoids entangling task semantics with component-specific parameter bookkeeping.

It also clarifies why LAV2 prefers reset-time randomization. Once an episode begins, the control and dynamics modules should behave as one coherent sampled system. If parameters were resampled arbitrarily during the episode, the simulator would no longer represent a single uncertain platform instance but a time-varying plant definition. That may be useful in some research settings, but it is not the modeling assumption currently encoded in the stack.

Backend Integration

The training-oriented backends in LAV2 all use this same logic, but they do not wire it in identical ways. In Isaac Lab, lav2.tasks.isaaclab.LAV2_base.mdp.events provides the explicit event helpers that find the relevant action term, locate the attached runtime component, and invoke the component-level randomization method. The default wiring is then assembled in lav2.tasks.isaaclab.LAV2_base.LAV2_env_cfg, where reset events specify practical parameter ranges for controller, mixer, and rotor perturbations.

The mjlab path mirrors that structure closely. lav2.tasks.mjlab.LAV2_base.mdp.events exposes the same kind of event-layer indirection, and lav2.tasks.mjlab.LAV2_base.LAV2_env_cfg binds those events into the manager-based environment configuration. This structural similarity is valuable because it preserves the same mental model across backends even when the surrounding environment framework differs.

Genesis Forge currently takes a slightly different route. Instead of matching the Isaac Lab and mjlab event-module pattern exactly, lav2.tasks.genesis_forge.LAV2_base.mdp.actions applies randomization from the action-side environment wiring. The orchestration layer therefore changes, but the underlying randomized objects remain the same: controller, mixer, and rotor modules still provide the actual runtime mutation surface.

Taken together, these integrations show the design intent clearly. Domain randomization in LAV2 is not supposed to be a backend-specific trick. The backend chooses the injection point, but the randomized semantics belong to the shared control and dynamics components. That is what keeps the implementation portable across Isaac Lab, mjlab, and future backends.

Modeling Implications

The current implementation already reflects a more serious modeling stance than the simplest “multiply thrust by noise” baseline used in some projects. Once actuator time constants, thrust coefficients, limits, and controller gains can vary independently across environments, the policy is no longer trained against one frozen inner-loop stack. Instead, it is exposed to a family of related closed-loop systems that differ in physically meaningful ways.

That said, the present boundary is still narrower than a fully general domain-randomization engine. The repository does not yet provide a unified operation model covering relative scaling, additive perturbation, and absolute sampling for all parameter classes, nor does it expose one common targeting language across body, actuator, and controller objects. Those are sensible future directions, but the current documentation should describe what exists today without pretending those abstractions are already complete.

It is also important to keep controller-side and plant-side uncertainty conceptually separate. Randomizing controller gains addresses robustness to inner-loop configuration drift, while randomizing mixer and rotor parameters addresses robustness to actuation and plant mismatch. Combining them may be useful, but they answer different failure modes and should remain distinguishable in both code and documentation.

Future Extension

The most natural extension path is to complete the body-side randomization story. Mass properties, inertia structure, center-of-mass offset, and track-side actuation parameters are all plausible candidates for the same runtime pattern that rotor and controller modules already use. Extending toward those quantities would make the intrinsic randomization layer more symmetric across flight and ground modes.

Another important direction is to improve observability of sampled parameters. At the moment, runtime modules mutate their internal batched tensors, but the stack does not yet expose one uniform intrinsics buffer or randomized-parameter record that can be logged, replayed, or conditionally exposed to the policy. That kind of explicit bookkeeping would make it easier to diagnose failure regions, compare training and evaluation distributions, and reason about which sampled parameter combinations actually caused policy degradation.

These extensions should be understood as design continuation rather than a need to replace the current structure. The present implementation already establishes the right ownership pattern: nominal values remain centralized, runtime modules own sampled execution state, and task backends trigger randomization through reset-oriented hooks.

API Cross-References