moe
PPO variant with MoE load-balancing bias updates.
Mirrors lav2/runner/skrl/cfg/LAV2_base_moe.PPO_MoE but adapted to
rsl-rl's :class:~rsl_rl.algorithms.PPO interface.
类:
| 名称 | 描述 |
|---|---|
PPO_MoE |
PPO with post-update MoE load-balancing bias correction. |
PPO_MoE
Bases: PPO
PPO with post-update MoE load-balancing bias correction.
After each PPO update, the expert bias terms on :class:MoELayer
are adjusted based on expert utilisation, steering the router
toward balanced expert usage without an auxiliary loss.
方法:
| 名称 | 描述 |
|---|---|
update |
Run PPO update and then update MoE biases. |
update
update()
Run PPO update and then update MoE biases.