这是indexloc提供的服务,不要输入任何密码
Skip to content

Support Basic MARL #619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Support Basic MARL #619

wants to merge 5 commits into from

Conversation

xihuai18
Copy link

  • I have marked all applicable categories:
    • exception-raising fix
    • algorithm implementation fix
    • documentation modification
    • new feature
  • I have reformatted the code using make format (required)
  • I have checked the code using make commit-checks (required)
  • If applicable, I have mentioned the relevant/related issue(s)
  • If applicable, I have listed every items in this Pull Request below

As I mentioned in #494, to support CTDE scheme or other schemes that need global information, the returned info should contain more information.

By the way, I am trying to implement some basic MARL-communication algorithms based on tianshou. Inspired by #399 (comment), I have the following design:

  • An multi-agent venv have agent_num agents and env_num envs should act as a venv with agent_num x env_num envs. env_id = agent_num x env_num + env_num should be contained in info.
  • Similar to AysncCollector, the env_id in info indicates which agents are taking actions.
  • The MABuffer has agent_num x env_num buffers, the indices used for each agent should be the same.
  • The MApolicy contains agent_num agents and the centralized module.

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Apr 28, 2022

An multi-agent venv have agent_num agents and env_num envs should act as a venv with agent_num x env_num envs. env_id = agent_num x env_num + env_num should be contained in info.

This is not a good assumption. The environment's input per step can be (action, player_id) and many players share the same environment. My previous solution to this issue is to extend the API from action array to action dict, for example, https://github.com/sail-sg/envpool/blob/c2b5b6c679304c976d821e2519e17cf3e9f4f98e/envpool/python/envpool.py#L101-L109
where action format is (4 envs, 8 actions)

{
  "env_id": [0, 1, 2, 3],
  "players.env_id": [1, 2, 0, 2, 0, 1, 1, 2]
  "players.action: [x, x, x, x, x, x, x, x],
  "players.id": [a, b, c, a, b, c, d, e],
}

It shows 4 environments, 5 agents (a-e), and generates a batch of data with size 8. Policy a generates 0th and 3rd action for 1st and 2nd environment respectively.

The full test script is https://github.com/sail-sg/envpool/blob/c2b5b6c679304c976d821e2519e17cf3e9f4f98e/envpool/dummy/dummy_envpool_test.cc#L26-L92

The above is the solution provided by envpool. And It may not be consistent with PettingZoo. I put it here just for a reference.

The other 3 items look good!

@xihuai18 xihuai18 closed this by deleting the head repository Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants