moe

Mixture-of-Experts MLP model for rsl-rl.

Mirrors lav2/runner/skrl/cfg/LAV2_base_moe.py but adapted to rsl-rl's MLPModel interface.

Uses loss-free load balancing: expert bias terms are adjusted after each PPO update based on expert utilisation, without an auxiliary loss term.

References

https://kexue.fm/archives/10699 (Math details)
https://kexue.fm/archives/10757 (Auxiliary-Loss-Free Load Balancing)
https://github.com/ambisinister/lossfreebalance (Loss-Free impl)

Classes:

Name	Description
`MoELayer`	Mixture-of-experts layer with sigmoid top-k routing.
`MoEMLPModel`	MLP model whose trunk uses a mixture-of-experts layer.

MoELayer

MoELayer(input_size: int, output_size: int, num_experts: int = 4, k: int = 2, bias_update_speed: float = 0.001)

Bases: Module

Mixture-of-experts layer with sigmoid top-k routing.

Uses unbiased sigmoid gate values for expert weighting and adds a learned bias to routing logits for load balancing. Bias corrections are accumulated in :attr:bias_updates and applied externally (e.g. by :class:~lav2.runner.rsl_rl.algorithms.moe.PPOMoE).

Initialize experts, gate network, and load-balancing bias.

Methods:

Name	Description
`forward`	Compute routed expert outputs.

forward

forward(x: Tensor) -> torch.Tensor

Compute routed expert outputs.

MoEMLPModel

MoEMLPModel(obs, obs_groups, obs_set, output_dim, hidden_dims=(64, 64), activation='elu', obs_normalization=False, distribution_cfg=None, num_experts: int = 4, k: int = 2)

Bases: MLPModel

MLP model whose trunk uses a mixture-of-experts layer.

The architecture mirrors the skrl counterpart: Linear → Act → MoELayer → Act → Linear(head).

Parameters:

Name	Type	Description	Default
`num_experts`	`int`	Number of experts in the MoE layer.	`4`
`k`	`int`	Number of experts routed to per token (top-k).	`2`

Build the MoE-enhanced trunk and distribution head.

moe

MoELayer

forward

MoEMLPModel

`num_experts`

`k`