Skip to content

moe

PPO variant with MoE load-balancing bias updates.

Mirrors lav2/runner/skrl/cfg/LAV2_base_moe.PPO_MoE but adapted to rsl-rl's :class:~rsl_rl.algorithms.PPO interface.

Classes:

Name Description
PPO_MoE

PPO with post-update MoE load-balancing bias correction.

PPO_MoE

Bases: PPO

PPO with post-update MoE load-balancing bias correction.

After each PPO update, the expert bias terms on :class:MoELayer are adjusted based on expert utilisation, steering the router toward balanced expert usage without an auxiliary loss.

Methods:

Name Description
update

Run PPO update and then update MoE biases.

update

update()

Run PPO update and then update MoE biases.