-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
I'm trying to do MARL with a PettingZoo environment and Tianshou's PPO implementation but it looks like the PettingZoo AECEnv wrapper is inserting agent_id values that are causing issues with the batch processing. I'm not sure if this is a bug. I might just be misusing some components. The OnpolicyTrainer is throwing the following exception in test_episode:
The offending agent_id properties were likely generated by the PettingZooEnv wrapper:
def reset(self, *args: Any, **kwargs: Any) -> tuple[dict, dict]:
self.env.reset(*args, **kwargs)
observation, reward, terminated, truncated, info = self.env.last(self)
if isinstance(observation, dict) and "action_mask" in observation:
observation_dict = {
"agent_id": self.env.agent_selection,
"obs": observation["observation"],
"mask": [obm == 1 for obm in observation["action_mask"]],
}
else:
if isinstance(self.action_space, spaces.Discrete):
observation_dict = {
"agent_id": self.env.agent_selection,
"obs": observation,
"mask": [True] * self.env.action_space(self.env.agent_selection).n,
}
else:
observation_dict = {
"agent_id": self.env.agent_selection,
"obs": observation,
}
return observation_dict, info
The test_ppo.py sample, which works fine but doesn't support MARL, is similar to my code but uses the CartPole Gymnasium env instead, without the PettingZoo wrapper.
Here's my PettingZoo env:
https://github.com/encratite/thumper/blob/master/environment.py
Here's the Tianshou PPO training code:
https://github.com/encratite/thumper/blob/master/train.py
Any idea what I'm doing wrong? Or is this an actual incompatibility between different Tianshou components?