LAV2_base_moe

rsl-rl config for LAV2 base task with Mixture-of-Experts trunk.

Mirrors lav2/runner/skrl/cfg/LAV2_base_moe.py but adapted to rsl-rl v5 native dict format. Uses :class:~lav2.runner.rsl_rl.models.moe.MoEMLPModel as the actor model class and :class:~lav2.runner.rsl_rl.algorithms.moe.PPO_MoE for post-update MoE bias correction.

Usage (Isaac Lab Hydra entry point)::

"rsl_rl_cfg_entry_point": "lav2.runner.rsl_rl.cfg.LAV2_base_moe:LAV2MoEPPORunnerCfg"

Usage (direct script)::

from lav2.runner.rsl_rl.cfg.LAV2_base_moe import get_runner_cfg
cfg = get_runner_cfg(experiment_name="my_run")
runner = OnPolicyRunner(env, cfg, log_dir, device=device)

类：

名称	描述
`BetaDistributionCfg`	Configuration for the Beta output distribution.
`LAV2MoEPPORunnerCfg`	Hydra-compatible config class for the MoE task.

函数：

名称	描述
`get_runner_cfg`	Return a rsl-rl v5 native config dict for the MoE task.

BetaDistributionCfg

Configuration for the Beta output distribution.

LAV2MoEPPORunnerCfg

Bases: RslRlOnPolicyRunnerCfg

Hydra-compatible config class for the MoE task.

get_runner_cfg

get_runner_cfg(experiment_name: str = _EXPERIMENT_NAME, max_iterations: int = _MAX_ITERATIONS, num_experts: int = 4, k: int = 2, init_std: float = 1.0, distribution: str = 'beta', action_range: tuple[float, float] = (-1.0, 1.0)) -> dict

Return a rsl-rl v5 native config dict for the MoE task.

参数：

名称	类型	描述	默认
`experiment_name`	`str`	Logging directory name.	`_EXPERIMENT_NAME`
`max_iterations`	`int`	Number of PPO iterations.	`_MAX_ITERATIONS`
`num_experts`	`int`	Number of experts in the MoE layer.	`4`
`k`	`int`	Top-k experts routed to per token.	`2`
`init_std`	`float`	Initial standard deviation (Gaussian only). Default `1.0` (matches skrl MoE config `log_std=0`).	`1.0`
`distribution`	`str`	Output distribution, `"beta"` or `"gaussian"`. Default `"beta"`.	`'beta'`
`action_range`	`tuple[float, float]`	Action space bounds (Beta only). Default `(-1.0, 1.0)`.	`(-1.0, 1.0)`

LAV2_base_moe

BetaDistributionCfg

LAV2MoEPPORunnerCfg

get_runner_cfg

`experiment_name`

`max_iterations`

`num_experts`

`k`

`init_std`

`distribution`

`action_range`