LAV2_base_moe
SKRL configuration for the LAV2 base MoE task.
Classes:
| Name | Description |
|---|---|
PPO_MoE |
PPO variant with MoE bias updates. |
MoELayer |
Mixture-of-experts layer with top-k routing. |
Shared |
Shared actor-critic model with MoE trunk. |
Functions:
| Name | Description |
|---|---|
get_agent_cfg |
Return the PPO agent configuration. |
get_model_class |
Return the model class for the agent. |
get_agent_class |
Return the agent class. |
get_memory_class |
Return the memory class. |
get_memory_cfg |
Return the memory configuration. |
get_trainer_class |
Return the trainer class. |
get_trainer_cfg |
Return the trainer configuration. |
PPO_MoE
Bases: PPO
PPO variant with MoE bias updates.
Methods:
| Name | Description |
|---|---|
update |
Run PPO update and then update MoE biases. |
update
update(*args: Any, **kwargs: Any) -> None
Run PPO update and then update MoE biases.
MoELayer
MoELayer(input_size, output_size, num_experts, k)
Bases: Module
Mixture-of-experts layer with top-k routing.
Initialize experts, gate network, and load-balancing bias.
Methods:
| Name | Description |
|---|---|
forward |
Compute routed expert outputs. |
forward
forward(x: Tensor) -> torch.Tensor
Compute routed expert outputs.
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: Weighted combination of top-k expert outputs. |
Shared
Shared(observation_space, state_space, action_space, device, clip_actions=False, clip_log_std=True, min_log_std=-20, max_log_std=2, reduction='sum')
Bases: GaussianMixin, DeterministicMixin, Model
Shared actor-critic model with MoE trunk.
Initialize the shared MoE actor-critic model.
Methods:
| Name | Description |
|---|---|
act |
Compute actions or values depending on role. |
compute |
Compute policy/value outputs depending on role. |
act
act(inputs: Any, role: str) -> Any
Compute actions or values depending on role.
Returns:
| Type | Description |
|---|---|
Any
|
tuple | None: Action outputs for the given role, or |
compute
compute(inputs: Any, role: str) -> Any
Compute policy/value outputs depending on role.
Returns:
| Type | Description |
|---|---|
Any
|
tuple | None: |
get_agent_cfg
get_agent_cfg() -> Any
Return the PPO agent configuration.
get_model_class
get_model_class() -> Any
Return the model class for the agent.
get_agent_class
get_agent_class() -> Any
Return the agent class.
get_memory_class
get_memory_class() -> Any
Return the memory class.
get_memory_cfg
get_memory_cfg() -> Any
Return the memory configuration.
get_trainer_class
get_trainer_class() -> Any
Return the trainer class.
get_trainer_cfg
get_trainer_cfg() -> Any
Return the trainer configuration.