Support Basic MARL #619

xihuai18 · 2022-04-28T14:24:22Z

I have marked all applicable categories:
- exception-raising fix
- algorithm implementation fix
- documentation modification
- new feature
I have reformatted the code using make format (required)
I have checked the code using make commit-checks (required)
If applicable, I have mentioned the relevant/related issue(s)
If applicable, I have listed every items in this Pull Request below

As I mentioned in #494, to support CTDE scheme or other schemes that need global information, the returned info should contain more information.

By the way, I am trying to implement some basic MARL-communication algorithms based on tianshou. Inspired by #399 (comment), I have the following design:

An multi-agent venv have agent_num agents and env_num envs should act as a venv with agent_num x env_num envs. env_id = agent_num x env_num + env_num should be contained in info.
Similar to AysncCollector, the env_id in info indicates which agents are taking actions.
The MABuffer has agent_num x env_num buffers, the indices used for each agent should be the same.
The MApolicy contains agent_num agents and the centralized module.

Trinkle23897 · 2022-04-28T22:11:53Z

An multi-agent venv have agent_num agents and env_num envs should act as a venv with agent_num x env_num envs. env_id = agent_num x env_num + env_num should be contained in info.

This is not a good assumption. The environment's input per step can be (action, player_id) and many players share the same environment. My previous solution to this issue is to extend the API from action array to action dict, for example, https://github.com/sail-sg/envpool/blob/c2b5b6c679304c976d821e2519e17cf3e9f4f98e/envpool/python/envpool.py#L101-L109
where action format is (4 envs, 8 actions)

{
  "env_id": [0, 1, 2, 3],
  "players.env_id": [1, 2, 0, 2, 0, 1, 1, 2]
  "players.action: [x, x, x, x, x, x, x, x],
  "players.id": [a, b, c, a, b, c, d, e],
}

It shows 4 environments, 5 agents (a-e), and generates a batch of data with size 8. Policy a generates 0th and 3rd action for 1st and 2nd environment respectively.

The full test script is https://github.com/sail-sg/envpool/blob/c2b5b6c679304c976d821e2519e17cf3e9f4f98e/envpool/dummy/dummy_envpool_test.cc#L26-L92

The above is the solution provided by envpool. And It may not be consistent with PettingZoo. I put it here just for a reference.

The other 3 items look good!

leoxihwang added 3 commits April 28, 2022 21:54

add state in pettingzooenv info

3b2148a

add pettingzoo_env_wrapper

27e8f3f

format

4d14e57

leoxihwang and others added 2 commits May 1, 2022 04:48

update pettingzoo env

9117faf

Merge branch 'master' into marl

01e9b39

xihuai18 closed this by deleting the head repository Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Basic MARL #619

Support Basic MARL #619

Uh oh!

xihuai18 commented Apr 28, 2022

Uh oh!

Trinkle23897 commented Apr 28, 2022 •

edited

Loading

Uh oh!

Uh oh!

Support Basic MARL #619

Support Basic MARL #619

Uh oh!

Conversation

xihuai18 commented Apr 28, 2022

Uh oh!

Trinkle23897 commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Trinkle23897 commented Apr 28, 2022 •

edited

Loading