Add multi-agent example: tic-tac-toe #122

youkaichao · 2020-07-09T07:23:13Z

No description provided.

examples/multi_agent/MANet.py

examples/multi_agent/tic_tac_toe.py

tianshou/policy/modelfree/MAdqn.py

duburcqa · 2020-07-09T09:01:47Z

@youkaichao As I said, I don't get why to be in a hurry on merge this PR, since your are the only one using it for now.

Trinkle23897 · 2020-07-09T09:04:32Z

No, not only him but also other people. Over three people have emailed me or comment on my personal social media to require MARL features, but Kaichao is the first to deal with it.

duburcqa · 2020-07-09T09:06:37Z

No, not only him but also other people. Over three people have emailed me or comment on my personal social media to require MARL features, but Kaichao is the first to deal with it.

So working a bit longer on this feature to improve it shouldn't be an issue if people have been waiting weeks.

Trinkle23897 · 2020-07-09T09:07:06Z

No, not only him but also other people. Over three people have emailed me or comment on my personal social media to require MARL features, but Kaichao is the first to deal with it.

So working a bit longer on this feature to improve it shouldn't be an issue if people have been waiting weeks.

Sure.

duburcqa · 2020-07-09T09:08:03Z

If you really want to rush, the good practice is to use a dev branch. master branch is dedicated to mature features, even when not distributed on pypi and associated with a release version. People wanting to use preview features then should checkout dev instead of master.

duburcqa · 2020-07-09T09:12:21Z

The good point about dev branch, is that there is basically no rules to follow since it only applies on master. One can submit and merge PRs for which the CI is even failing, or not having the approval of other members. Yet at the end, when dev branch is going to be merged on master, it has to follow the rules. It is a good way to release experimental features as fast as possible, without compromising master branch.

Trinkle23897 · 2020-07-09T11:37:56Z

The good point about dev branch, is that there is basically no rules to follow since it only applies on master. One can submit and merge PRs for which the CI is even failing, or not having the approval of other members. Yet at the end, when dev branch is going to be merged on master, it has to follow the rules. It is a good way to release experimental features as fast as possible, without compromising master branch.

So I think we can release 0.2.4 first, and then keep master branch stable and all the new features will appear in dev first. Do you agree?
But I haven't figure out how to deal with API changes, for example, #123: since it will introduce tianshou.utils.net, running the newest test with older tianshou core (0.2.3) will raise an ImportError.

duburcqa · 2020-07-09T12:01:28Z

But I haven't figure out how to deal with API changes, for example, #123: since it will introduce tianshou.utils.net, running the newest test with older tianshou core (0.2.3) will raise an ImportError.

It happens when a library is young. Breaking changes must be avoided if possible, but if it is necessary to improve the quality then it must be done. You can avoid it unfortunately. Please must expect that such things can happen at early stage.

duburcqa · 2020-07-09T12:02:39Z

So I think we can release 0.2.4 first, and then keep master branch stable and all the new features will appear in dev first. Do you agree?

As you wish. Let's see how it goes after release of 0.2.4.

Trinkle23897 · 2020-07-09T12:50:58Z

Why did you approve this @duburcqa ?

duburcqa · 2020-07-09T12:57:42Z

Why did you approve this @duburcqa ?

You said you wanted to merge it as is, no ?

Trinkle23897 · 2020-07-09T13:02:45Z

I'm not willing to merge now because I think the interface of MA is somewhat hardcode. In my mind, the normal MAPolicy will accept a list of policies and a dispatch function to indicate how to split the full data into N agent's view.
But this brings some other issues: if we use PPO+DQN as two agents, since PPO is an on-policy algo and DQN is off-policy, how can we deal with their sample function with regard to a single buffer? It can be a future work.

duburcqa · 2020-07-09T13:05:44Z

But this brings some other issues: if we use PPO+DQN as two agents, since PPO is an on-policy algo and DQN is off-policy, how can we deal with their sample function with regard to a single buffer? It can be a future work.

That's not an easy task and probably it cannot be solved rapidly. Did you have a look to how the libraries supporting this feature were handling it ?

Trinkle23897 · 2020-07-09T13:07:28Z

But this brings some other issues: if we use PPO+DQN as two agents, since PPO is an on-policy algo and DQN is off-policy, how can we deal with their sample function with regard to a single buffer? It can be a future work.

That's not an easy task and probably it cannot be solved rapidly. Did you have a look to how the libraries supporting this feature were handling it ?

Agree. I'll have a second check with this pr.

Trinkle23897 · 2020-07-09T13:08:58Z

@youkaichao please add the corresponding documentation in https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html, also with minor modification in README.md.

duburcqa · 2020-07-09T13:28:59Z

@Trinkle23897 Can you enable "Dismiss stale pull request approvals when new commits are pushed " on master please ?

Trinkle23897 · 2020-07-09T23:18:12Z

@Trinkle23897 Can you enable "Dismiss stale pull request approvals when new commits are pushed " on master please ?

Done

codecov-commenter · 2020-07-10T02:02:24Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 91.60839% with 12 lines in your changes missing coverage. Please review.

Project coverage is 89.10%. Comparing base (d09b69e) to head (a04ddf6).

Files with missing lines	Patch %	Lines
tianshou/env/basevecenv.py	81.25%	6 Missing ⚠️
tianshou/policy/multiagent/mapolicy.py	93.75%	4 Missing ⚠️
tianshou/env/maenv.py	84.61%	2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #122      +/-   ##
==========================================
+ Coverage   88.81%   89.10%   +0.28%     
==========================================
  Files          31       35       +4     
  Lines        2030     2139     +109     
==========================================
+ Hits         1803     1906     +103     
- Misses        227      233       +6

Flag	Coverage Δ
unittests	`89.10% <91.60%> (+0.28%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tianshou/policy/modelfree/dqn.py

tianshou/policy/multiagent/mapolicy.py

Trinkle23897 · 2020-07-21T06:55:04Z

@duburcqa should be okay now, please have a check.

* make fileds with empty Batch rather than None after reset * dummy code * remove dummy * add reward_length argument for collector * Improve Batch (thu-ml#126) * make sure the key type of Batch is string, and add unit tests * add is_empty() function and unit tests * enable cat of mixing dict and Batch, just like stack * bugfix for reward_length * add get_final_reward_fn argument to collector to deal with marl * minor polish * remove multibuf * minor polish * improve and implement Batch.cat_ * bugfix for buffer.sample with field impt_weight * restore the usage of a.cat_(b) * fix 2 bugs in batch and add corresponding unittest * code fix for update * update is_empty to recognize empty over empty; bugfix for len * bugfix for update and add testcase * add testcase of update * make fileds with empty Batch rather than None after reset * dummy code * remove dummy * add reward_length argument for collector * bugfix for reward_length * add get_final_reward_fn argument to collector to deal with marl * make sure the key type of Batch is string, and add unit tests * add is_empty() function and unit tests * enable cat of mixing dict and Batch, just like stack * dummy code * remove dummy * add multi-agent example: tic-tac-toe * move TicTacToeEnv to a separate file * remove dummy MANet * code refactor * move tic-tac-toe example to test * update doc with marl-example * fix docs * reduce the threshold * revert * update player id to start from 1 and change player to agent; keep coding * add reward_length argument for collector * Improve Batch (thu-ml#128) * minor polish * improve and implement Batch.cat_ * bugfix for buffer.sample with field impt_weight * restore the usage of a.cat_(b) * fix 2 bugs in batch and add corresponding unittest * code fix for update * update is_empty to recognize empty over empty; bugfix for len * bugfix for update and add testcase * add testcase of update * fix docs * fix docs * fix docs [ci skip] * fix docs [ci skip] Co-authored-by: Trinkle23897 <463003665@qq.com> * refact * re-implement Batch.stack and add testcases * add doc for Batch.stack * reward_metric * modify flag * minor fix * reuse _create_values and refactor stack_ & cat_ * fix pep8 * fix reward stat in collector * fix stat of collector, simplify test/base/env.py * fix docs * minor fix * raise exception for stacking with partial keys and axis!=0 * minor fix * minor fix * minor fix * marl-examples * add condense; bugfix for torch.Tensor; code refactor * marl example can run now * enable tic tac toe with larger board size and win-size * add test dependency * Fix padding of inconsistent keys with Batch.stack and Batch.cat (thu-ml#130) * re-implement Batch.stack and add testcases * add doc for Batch.stack * reuse _create_values and refactor stack_ & cat_ * fix pep8 * fix docs * raise exception for stacking with partial keys and axis!=0 * minor fix * minor fix Co-authored-by: Trinkle23897 <463003665@qq.com> * stash * let agent learn to play as agent 2 which is harder * code refactor * Improve collector (thu-ml#125) * remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com> * marl for tic-tac-toe and general gomoku * update default gamma to 0.1 for tic tac toe to win earlier * fix name typo; change default game config; add rew_norm option * fix pep8 * test commit * mv test dir name * add rew flag * fix torch.optim import error and madqn rew_norm * remove useless kwargs * Vector env enable select worker (thu-ml#132) * Enable selecting worker for vector env step method. * Update collector to match new vecenv selective worker behavior. * Bug fix. * Fix rebase Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu> * show the last move of tictactoe by capital letters * add multi-agent tutorial * fix link * Standardized behavior of Batch.cat and misc code refactor (thu-ml#137) * code refactor; remove unused kwargs; add reward_normalization for dqn * bugfix for __setitem__ with torch.Tensor; add Batch.condense * minor fix * support cat with empty Batch * remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases * support stack with empty Batch * remove condense * refactor code to reflect the shared / partial / reserved categories of keys * add is_empty(recursive=False) * doc fix * docfix and bugfix for _is_batch_set * add doc for key reservation * bugfix for algebra operators * fix cat with lens hint * code refactor * bugfix for storing None * use ValueError instead of exception * hide lens away from users * add comment for __cat * move the computation of the initial value of lens in cat_ itself. * change the place of doc string * doc fix for Batch doc string * change recursive to recurse * doc string fix * minor fix for batch doc * write tutorials to specify the standard of Batch (thu-ml#142) * add doc for len exceptions * doc move; unify is_scalar_value function * remove some issubclass check * bugfix for shape of Batch(a=1) * keep moving doc * keep writing batch tutorial * draft version of Batch tutorial done * improving doc * keep improving doc * batch tutorial done * rename _is_number * rename _is_scalar * shape property do not raise exception * restore some doc string * grammarly [ci skip] * grammarly + fix warning of building docs * polish docs * trim and re-arrange batch tutorial * go straight to the point * minor fix for batch doc * add shape / len in basic usage * keep improving tutorial * unify _to_array_with_correct_type to remove duplicate code * delegate type convertion to Batch.__init__ * further delegate type convertion to Batch.__init__ * bugfix for setattr * add a _parse_value function * remove dummy function call * polish docs Co-authored-by: Trinkle23897 <463003665@qq.com> * bugfix for mapolicy * pretty code * remove debug code; remove condense * doc fix * check before get_agents in tutorials/tictactoe * tutorial * fix * minor fix for batch doc * minor polish * faster test_ttt * improve tic-tac-toe environment * change default epoch and step-per-epoch for tic-tac-toe * fix mapolicy * minor polish for mapolicy * 90% to 80% (need to change the tutorial) * win rate * show step number at board * simplify mapolicy * minor polish for mapolicy * remove MADQN * fix pep8 * change legal_actions to mask (need to update docs) * simplify maenv * fix typo * move basevecenv to single file * separate RandomAgent * update docs * grammarly * fix pep8 * win rate typo * format in cheatsheet * use bool mask directly * update doc for boolean mask Co-authored-by: Trinkle23897 <463003665@qq.com> Co-authored-by: Alexis DUBURCQ <alexis.duburcq@gmail.com> Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>

duburcqa reviewed Jul 9, 2020

View reviewed changes

examples/multi_agent/MANet.py Outdated Show resolved Hide resolved

duburcqa reviewed Jul 9, 2020

View reviewed changes

Trinkle23897 changed the title ~~add multi-agent example: tic-tac-toe~~ WIP: add multi-agent example: tic-tac-toe Jul 9, 2020

Trinkle23897 linked an issue Jul 9, 2020 that may be closed by this pull request

How to support multi-agent reinforcement learning #121

Open

8 tasks

youkaichao force-pushed the marl-example branch from e5af257 to 1b694c2 Compare July 9, 2020 08:18

duburcqa previously approved these changes Jul 9, 2020

View reviewed changes

youkaichao force-pushed the marl-example branch from 1b694c2 to 350a0de Compare July 9, 2020 15:09

youkaichao closed this Jul 9, 2020

youkaichao reopened this Jul 9, 2020

Trinkle23897 changed the title ~~WIP: add multi-agent example: tic-tac-toe~~ Add multi-agent example: tic-tac-toe Jul 10, 2020

Trinkle23897 dismissed duburcqa’s stale review via 26fb703 July 10, 2020 01:41

Trinkle23897 changed the title ~~Add multi-agent example: tic-tac-toe~~ WIP: Add multi-agent example: tic-tac-toe Jul 10, 2020

Trinkle23897 force-pushed the dev branch from fa542f8 to fe5555d Compare July 20, 2020 07:54

Trinkle23897 added 7 commits July 20, 2020 16:01

Merge branch 'dev' into marl-example

2df6eaa

simplify mapolicy

35be0e8

minor polish for mapolicy

721364f

remove MADQN

a0c8d60

fix pep8

58485ec

change legal_actions to mask (need to update docs)

4a44517

simplify maenv

4b73651

youkaichao commented Jul 20, 2020

View reviewed changes

tianshou/policy/modelfree/dqn.py Outdated Show resolved Hide resolved

youkaichao commented Jul 20, 2020

View reviewed changes

tianshou/policy/modelfree/dqn.py Outdated Show resolved Hide resolved

tianshou/policy/modelfree/dqn.py Outdated Show resolved Hide resolved

tianshou/policy/modelfree/dqn.py Outdated Show resolved Hide resolved

tianshou/policy/modelfree/dqn.py Outdated Show resolved Hide resolved

youkaichao commented Jul 20, 2020

View reviewed changes

tianshou/policy/multiagent/mapolicy.py Outdated Show resolved Hide resolved

Trinkle23897 added 9 commits July 20, 2020 21:45

fix typo

43e908e

Merge branch 'dev' into marl-example

ed61a6a

move basevecenv to single file

6141459

separate RandomAgent

6dab46f

update docs

bbd89fd

grammarly

b0d8f64

fix pep8

ae067ea

win rate typo

2017e27

format in cheatsheet

267340f

Trinkle23897 changed the title ~~WIP: Add multi-agent example: tic-tac-toe~~ Add multi-agent example: tic-tac-toe Jul 21, 2020

youkaichao added 2 commits July 21, 2020 10:58

use bool mask directly

3c76bd2

update doc for boolean mask

a04ddf6

Trinkle23897 approved these changes Jul 21, 2020

View reviewed changes

duburcqa approved these changes Jul 21, 2020

View reviewed changes

Trinkle23897 merged commit 8c32d99 into thu-ml:dev Jul 21, 2020

youkaichao deleted the marl-example branch July 21, 2020 09:05

Trinkle23897 mentioned this pull request Jul 21, 2020

Working with agent dimension in multi-agent workflows based on single policy (parameter sharing) #136

Closed

Add multi-agent example: tic-tac-toe #122

Add multi-agent example: tic-tac-toe #122

Uh oh!

Conversation

youkaichao commented Jul 9, 2020 • edited by Trinkle23897 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

duburcqa commented Jul 9, 2020

Uh oh!

Trinkle23897 commented Jul 9, 2020

Uh oh!

duburcqa commented Jul 9, 2020

Uh oh!

Trinkle23897 commented Jul 9, 2020

Uh oh!

duburcqa commented Jul 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duburcqa commented Jul 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Trinkle23897 commented Jul 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duburcqa commented Jul 9, 2020

Uh oh!

duburcqa commented Jul 9, 2020

Uh oh!

Trinkle23897 commented Jul 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duburcqa commented Jul 9, 2020

Uh oh!

Trinkle23897 commented Jul 9, 2020

Uh oh!

duburcqa commented Jul 9, 2020

Uh oh!

Trinkle23897 commented Jul 9, 2020

Uh oh!

Trinkle23897 commented Jul 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duburcqa commented Jul 9, 2020

Uh oh!

Trinkle23897 commented Jul 9, 2020

Uh oh!

codecov-commenter commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Trinkle23897 commented Jul 21, 2020

Uh oh!

Uh oh!

youkaichao commented Jul 9, 2020 •

edited by Trinkle23897

Loading

duburcqa commented Jul 9, 2020 •

edited

Loading

duburcqa commented Jul 9, 2020 •

edited

Loading

Trinkle23897 commented Jul 9, 2020 •

edited

Loading

Trinkle23897 commented Jul 9, 2020 •

edited

Loading

Trinkle23897 commented Jul 9, 2020 •

edited

Loading

codecov-commenter commented Jul 10, 2020 •

edited

Loading