LAV2_base_moe
rsl-rl config for LAV2 base task with Mixture-of-Experts trunk.
Mirrors lav2/runner/skrl/cfg/LAV2_base_moe.py but adapted to rsl-rl
v5 native dict format. Uses :class:~lav2.runner.rsl_rl.models.moe.MoEMLPModel
as the actor model class and :class:~lav2.runner.rsl_rl.algorithms.moe.PPO_MoE
for post-update MoE bias correction.
Usage (Isaac Lab Hydra entry point)::
"rsl_rl_cfg_entry_point": "lav2.runner.rsl_rl.cfg.LAV2_base_moe:LAV2MoEPPORunnerCfg"
Usage (direct script)::
from lav2.runner.rsl_rl.cfg.LAV2_base_moe import get_runner_cfg
cfg = get_runner_cfg(experiment_name="my_run")
runner = OnPolicyRunner(env, cfg, log_dir, device=device)
类:
| 名称 | 描述 |
|---|---|
BetaDistributionCfg |
Configuration for the Beta output distribution. |
LAV2MoEPPORunnerCfg |
Hydra-compatible config class for the MoE task. |
函数:
| 名称 | 描述 |
|---|---|
get_runner_cfg |
Return a rsl-rl v5 native config dict for the MoE task. |
BetaDistributionCfg
Configuration for the Beta output distribution.
LAV2MoEPPORunnerCfg
Bases: RslRlOnPolicyRunnerCfg
Hydra-compatible config class for the MoE task.
get_runner_cfg
get_runner_cfg(experiment_name: str = _EXPERIMENT_NAME, max_iterations: int = _MAX_ITERATIONS, num_experts: int = 4, k: int = 2, init_std: float = 1.0, distribution: str = 'beta', action_range: tuple[float, float] = (-1.0, 1.0)) -> dict
Return a rsl-rl v5 native config dict for the MoE task.
参数:
| 名称 | 类型 | 描述 | 默认 |
|---|---|---|---|
|
str
|
Logging directory name. |
_EXPERIMENT_NAME
|
|
int
|
Number of PPO iterations. |
_MAX_ITERATIONS
|
|
int
|
Number of experts in the MoE layer. |
4
|
|
int
|
Top-k experts routed to per token. |
2
|
|
float
|
Initial standard deviation (Gaussian only).
Default |
1.0
|
|
str
|
Output distribution, |
'beta'
|
|
tuple[float, float]
|
Action space bounds (Beta only).
Default |
(-1.0, 1.0)
|