moe
PPO variant with MoE load-balancing bias updates.
Mirrors lav2/runner/skrl/cfg/LAV2_base_moe.PPO_MoE but adapted to
rsl-rl's :class:~rsl_rl.algorithms.PPO interface.
Classes:
| Name | Description |
|---|---|
PPO_MoE |
PPO with post-update MoE load-balancing bias correction. |
PPO_MoE
Bases: PPO
PPO with post-update MoE load-balancing bias correction.
After each PPO update, the expert bias terms on :class:MoELayer
are adjusted based on expert utilisation, steering the router
toward balanced expert usage without an auxiliary loss.
Methods:
| Name | Description |
|---|---|
update |
Run PPO update and then update MoE biases. |
update
update()
Run PPO update and then update MoE biases.