跳转至

moe

PPO variant with MoE load-balancing bias updates.

Mirrors lav2/runner/skrl/cfg/LAV2_base_moe.PPO_MoE but adapted to rsl-rl's :class:~rsl_rl.algorithms.PPO interface.

类:

名称 描述
PPO_MoE

PPO with post-update MoE load-balancing bias correction.

PPO_MoE

Bases: PPO

PPO with post-update MoE load-balancing bias correction.

After each PPO update, the expert bias terms on :class:MoELayer are adjusted based on expert utilisation, steering the router toward balanced expert usage without an auxiliary loss.

方法:

名称 描述
update

Run PPO update and then update MoE biases.

update

update()

Run PPO update and then update MoE biases.