这是indexloc提供的服务,不要输入任何密码
Skip to content

Add multi-agent example: tic-tac-toe #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 127 commits into from
Jul 21, 2020
Merged

Conversation

youkaichao
Copy link
Collaborator

@youkaichao youkaichao commented Jul 9, 2020

No description provided.

@Trinkle23897 Trinkle23897 changed the title add multi-agent example: tic-tac-toe WIP: add multi-agent example: tic-tac-toe Jul 9, 2020
@Trinkle23897 Trinkle23897 linked an issue Jul 9, 2020 that may be closed by this pull request
8 tasks
@duburcqa
Copy link
Collaborator

duburcqa commented Jul 9, 2020

@youkaichao As I said, I don't get why to be in a hurry on merge this PR, since your are the only one using it for now.

@Trinkle23897
Copy link
Collaborator

No, not only him but also other people. Over three people have emailed me or comment on my personal social media to require MARL features, but Kaichao is the first to deal with it.

@duburcqa
Copy link
Collaborator

duburcqa commented Jul 9, 2020

No, not only him but also other people. Over three people have emailed me or comment on my personal social media to require MARL features, but Kaichao is the first to deal with it.

So working a bit longer on this feature to improve it shouldn't be an issue if people have been waiting weeks.

@Trinkle23897
Copy link
Collaborator

No, not only him but also other people. Over three people have emailed me or comment on my personal social media to require MARL features, but Kaichao is the first to deal with it.

So working a bit longer on this feature to improve it shouldn't be an issue if people have been waiting weeks.

Sure.

@duburcqa
Copy link
Collaborator

duburcqa commented Jul 9, 2020

If you really want to rush, the good practice is to use a dev branch. master branch is dedicated to mature features, even when not distributed on pypi and associated with a release version. People wanting to use preview features then should checkout dev instead of master.

@duburcqa
Copy link
Collaborator

duburcqa commented Jul 9, 2020

The good point about dev branch, is that there is basically no rules to follow since it only applies on master. One can submit and merge PRs for which the CI is even failing, or not having the approval of other members. Yet at the end, when dev branch is going to be merged on master, it has to follow the rules. It is a good way to release experimental features as fast as possible, without compromising master branch.

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jul 9, 2020

The good point about dev branch, is that there is basically no rules to follow since it only applies on master. One can submit and merge PRs for which the CI is even failing, or not having the approval of other members. Yet at the end, when dev branch is going to be merged on master, it has to follow the rules. It is a good way to release experimental features as fast as possible, without compromising master branch.

So I think we can release 0.2.4 first, and then keep master branch stable and all the new features will appear in dev first. Do you agree?
But I haven't figure out how to deal with API changes, for example, #123: since it will introduce tianshou.utils.net, running the newest test with older tianshou core (0.2.3) will raise an ImportError.

@duburcqa
Copy link
Collaborator

duburcqa commented Jul 9, 2020

But I haven't figure out how to deal with API changes, for example, #123: since it will introduce tianshou.utils.net, running the newest test with older tianshou core (0.2.3) will raise an ImportError.

It happens when a library is young. Breaking changes must be avoided if possible, but if it is necessary to improve the quality then it must be done. You can avoid it unfortunately. Please must expect that such things can happen at early stage.

@duburcqa
Copy link
Collaborator

duburcqa commented Jul 9, 2020

So I think we can release 0.2.4 first, and then keep master branch stable and all the new features will appear in dev first. Do you agree?

As you wish. Let's see how it goes after release of 0.2.4.

duburcqa
duburcqa previously approved these changes Jul 9, 2020
@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jul 9, 2020

Why did you approve this @duburcqa ?

@duburcqa
Copy link
Collaborator

duburcqa commented Jul 9, 2020

Why did you approve this @duburcqa ?

You said you wanted to merge it as is, no ?

@Trinkle23897
Copy link
Collaborator

I'm not willing to merge now because I think the interface of MA is somewhat hardcode. In my mind, the normal MAPolicy will accept a list of policies and a dispatch function to indicate how to split the full data into N agent's view.
But this brings some other issues: if we use PPO+DQN as two agents, since PPO is an on-policy algo and DQN is off-policy, how can we deal with their sample function with regard to a single buffer? It can be a future work.

@duburcqa
Copy link
Collaborator

duburcqa commented Jul 9, 2020

But this brings some other issues: if we use PPO+DQN as two agents, since PPO is an on-policy algo and DQN is off-policy, how can we deal with their sample function with regard to a single buffer? It can be a future work.

That's not an easy task and probably it cannot be solved rapidly. Did you have a look to how the libraries supporting this feature were handling it ?

@Trinkle23897
Copy link
Collaborator

But this brings some other issues: if we use PPO+DQN as two agents, since PPO is an on-policy algo and DQN is off-policy, how can we deal with their sample function with regard to a single buffer? It can be a future work.

That's not an easy task and probably it cannot be solved rapidly. Did you have a look to how the libraries supporting this feature were handling it ?

Agree. I'll have a second check with this pr.

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jul 9, 2020

@youkaichao please add the corresponding documentation in https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html, also with minor modification in README.md.

@duburcqa
Copy link
Collaborator

duburcqa commented Jul 9, 2020

@Trinkle23897 Can you enable "Dismiss stale pull request approvals when new commits are pushed " on master please ?

@Trinkle23897
Copy link
Collaborator

@Trinkle23897 Can you enable "Dismiss stale pull request approvals when new commits are pushed " on master please ?

Done

@Trinkle23897 Trinkle23897 changed the title WIP: add multi-agent example: tic-tac-toe Add multi-agent example: tic-tac-toe Jul 10, 2020
@codecov-commenter
Copy link

codecov-commenter commented Jul 10, 2020

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 91.60839% with 12 lines in your changes missing coverage. Please review.

Project coverage is 89.10%. Comparing base (d09b69e) to head (a04ddf6).

Files with missing lines Patch % Lines
tianshou/env/basevecenv.py 81.25% 6 Missing ⚠️
tianshou/policy/multiagent/mapolicy.py 93.75% 4 Missing ⚠️
tianshou/env/maenv.py 84.61% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #122      +/-   ##
==========================================
+ Coverage   88.81%   89.10%   +0.28%     
==========================================
  Files          31       35       +4     
  Lines        2030     2139     +109     
==========================================
+ Hits         1803     1906     +103     
- Misses        227      233       +6     
Flag Coverage Δ
unittests 89.10% <91.60%> (+0.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Trinkle23897 Trinkle23897 changed the title Add multi-agent example: tic-tac-toe WIP: Add multi-agent example: tic-tac-toe Jul 10, 2020
@Trinkle23897 Trinkle23897 changed the title WIP: Add multi-agent example: tic-tac-toe Add multi-agent example: tic-tac-toe Jul 21, 2020
@Trinkle23897
Copy link
Collaborator

@duburcqa should be okay now, please have a check.

@Trinkle23897 Trinkle23897 merged commit 8c32d99 into thu-ml:dev Jul 21, 2020
@youkaichao youkaichao deleted the marl-example branch July 21, 2020 09:05
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
* make fileds with empty Batch rather than None after reset

* dummy code

* remove dummy

* add reward_length argument for collector

* Improve Batch (thu-ml#126)

* make sure the key type of Batch is string, and add unit tests

* add is_empty() function and unit tests

* enable cat of mixing dict and Batch, just like stack

* bugfix for reward_length

* add get_final_reward_fn argument to collector to deal with marl

* minor polish

* remove multibuf

* minor polish

* improve and implement Batch.cat_

* bugfix for buffer.sample with field impt_weight

* restore the usage of a.cat_(b)

* fix 2 bugs in batch and add corresponding unittest

* code fix for update

* update is_empty to recognize empty over empty; bugfix for len

* bugfix for update and add testcase

* add testcase of update

* make fileds with empty Batch rather than None after reset

* dummy code

* remove dummy

* add reward_length argument for collector

* bugfix for reward_length

* add get_final_reward_fn argument to collector to deal with marl

* make sure the key type of Batch is string, and add unit tests

* add is_empty() function and unit tests

* enable cat of mixing dict and Batch, just like stack

* dummy code

* remove dummy

* add multi-agent example: tic-tac-toe

* move TicTacToeEnv to a separate file

* remove dummy MANet

* code refactor

* move tic-tac-toe example to test

* update doc with marl-example

* fix docs

* reduce the threshold

* revert

* update player id to start from 1 and change player to agent; keep coding

* add reward_length argument for collector

* Improve Batch (thu-ml#128)

* minor polish

* improve and implement Batch.cat_

* bugfix for buffer.sample with field impt_weight

* restore the usage of a.cat_(b)

* fix 2 bugs in batch and add corresponding unittest

* code fix for update

* update is_empty to recognize empty over empty; bugfix for len

* bugfix for update and add testcase

* add testcase of update

* fix docs

* fix docs

* fix docs [ci skip]

* fix docs [ci skip]

Co-authored-by: Trinkle23897 <463003665@qq.com>

* refact

* re-implement Batch.stack and add testcases

* add doc for Batch.stack

* reward_metric

* modify flag

* minor fix

* reuse _create_values and refactor stack_ & cat_

* fix pep8

* fix reward stat in collector

* fix stat of collector, simplify test/base/env.py

* fix docs

* minor fix

* raise exception for stacking with partial keys and axis!=0

* minor fix

* minor fix

* minor fix

* marl-examples

* add condense; bugfix for torch.Tensor; code refactor

* marl example can run now

* enable tic tac toe with larger board size and win-size

* add test dependency

* Fix padding of inconsistent keys with Batch.stack and Batch.cat (thu-ml#130)

* re-implement Batch.stack and add testcases

* add doc for Batch.stack

* reuse _create_values and refactor stack_ & cat_

* fix pep8

* fix docs

* raise exception for stacking with partial keys and axis!=0

* minor fix

* minor fix

Co-authored-by: Trinkle23897 <463003665@qq.com>

* stash

* let agent learn to play as agent 2 which is harder

* code refactor

* Improve collector (thu-ml#125)

* remove multibuf

* reward_metric

* make fileds with empty Batch rather than None after reset

* many fixes and refactor
Co-authored-by: Trinkle23897 <463003665@qq.com>

* marl for tic-tac-toe and general gomoku

* update default gamma to 0.1 for tic tac toe to win earlier

* fix name typo; change default game config; add rew_norm option

* fix pep8

* test commit

* mv test dir name

* add rew flag

* fix torch.optim import error and madqn rew_norm

* remove useless kwargs

* Vector env enable select worker (thu-ml#132)

* Enable selecting worker for vector env step method.

* Update collector to match new vecenv selective worker behavior.

* Bug fix.

* Fix rebase

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>

* show the last move of tictactoe by capital letters

* add multi-agent tutorial

* fix link

* Standardized behavior of Batch.cat and misc code refactor (thu-ml#137)

* code refactor; remove unused kwargs; add reward_normalization for dqn

* bugfix for __setitem__ with torch.Tensor; add Batch.condense

* minor fix

* support cat with empty Batch

* remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases

* support stack with empty Batch

* remove condense

* refactor code to reflect the shared / partial / reserved categories of keys

* add is_empty(recursive=False)

* doc fix

* docfix and bugfix for _is_batch_set

* add doc for key reservation

* bugfix for algebra operators

* fix cat with lens hint

* code refactor

* bugfix for storing None

* use ValueError instead of exception

* hide lens away from users

* add comment for __cat

* move the computation of the initial value of lens in cat_ itself.

* change the place of doc string

* doc fix for Batch doc string

* change recursive to recurse

* doc string fix

* minor fix for batch doc

* write tutorials to specify the standard of Batch (thu-ml#142)

* add doc for len exceptions

* doc move; unify is_scalar_value function

* remove some issubclass check

* bugfix for shape of Batch(a=1)

* keep moving doc

* keep writing batch tutorial

* draft version of Batch tutorial done

* improving doc

* keep improving doc

* batch tutorial done

* rename _is_number

* rename _is_scalar

* shape property do not raise exception

* restore some doc string

* grammarly [ci skip]

* grammarly + fix warning of building docs

* polish docs

* trim and re-arrange batch tutorial

* go straight to the point

* minor fix for batch doc

* add shape / len in basic usage

* keep improving tutorial

* unify _to_array_with_correct_type to remove duplicate code

* delegate type convertion to Batch.__init__

* further delegate type convertion to Batch.__init__

* bugfix for setattr

* add a _parse_value function

* remove dummy function call

* polish docs

Co-authored-by: Trinkle23897 <463003665@qq.com>

* bugfix for mapolicy

* pretty code

* remove debug code; remove condense

* doc fix

* check before get_agents in tutorials/tictactoe

* tutorial

* fix

* minor fix for batch doc

* minor polish

* faster test_ttt

* improve tic-tac-toe environment

* change default epoch and step-per-epoch for tic-tac-toe

* fix mapolicy

* minor polish for mapolicy

* 90% to 80% (need to change the tutorial)

* win rate

* show step number at board

* simplify mapolicy

* minor polish for mapolicy

* remove MADQN

* fix pep8

* change legal_actions to mask (need to update docs)

* simplify maenv

* fix typo

* move basevecenv to single file

* separate RandomAgent

* update docs

* grammarly

* fix pep8

* win rate typo

* format in cheatsheet

* use bool mask directly

* update doc for boolean mask

Co-authored-by: Trinkle23897 <463003665@qq.com>
Co-authored-by: Alexis DUBURCQ <alexis.duburcq@gmail.com>
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to support multi-agent reinforcement learning
4 participants