Implement ContinuousBCQPolicy and offline_bcq example #480

thkkk · 2021-11-18T08:29:34Z

I have marked all applicable categories:
- exception-raising fix
- algorithm implementation fix
- documentation modification
- new feature
I have reformatted the code using make format (required)
I have checked the code using make commit-checks (required)
If applicable, I have mentioned the relevant/related issue(s)
If applicable, I have listed every items in this Pull Request below

This PR implements ContinuousBCQPolicy, which could be used to train offline agent in the environment of continuous action space. Here is an experimental result in 'halfcheetah-expert-v1', a d4rl environment (for Offline Reinforcement Learning).

Example usage is in the examples/offline/offline_bcq.py.

Trinkle23897 · 2021-11-18T12:55:26Z

Could you please add test/continuous/bcq.py for BCQ unit test?

codecov-commenter · 2021-11-18T22:33:51Z

Codecov Report

Merging #480 (aa884a6) into master (94d3b27) will increase coverage by 0.17%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #480      +/-   ##
==========================================
+ Coverage   94.06%   94.24%   +0.17%     
==========================================
  Files          60       61       +1     
  Lines        3910     4031     +121     
==========================================
+ Hits         3678     3799     +121     
  Misses        232      232

Flag	Coverage Δ
unittests	`94.24% <100.00%> (+0.17%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/policy/__init__.py	`100.00% <100.00%> (ø)`
tianshou/policy/imitation/bcq.py	`100.00% <100.00%> (ø)`
tianshou/utils/net/continuous.py	`96.52% <100.00%> (+1.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94d3b27...aa884a6. Read the comment docs.

thkkk

finish test_bcq

examples/offline/README.md

examples/offline/offline_bcq.py

test/continuous/test_bcq.py

tianshou/policy/imitation/bcq.py

This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning). Example usage is in the examples/offline/offline_bcq.py.

thkkk added 7 commits November 16, 2021 23:24

finish ContinuousBCQPolicy and mujoco_bcq code

fb84b4b

finish ContinuousBCQPolicy and mujoco_bcq example code

1029f64

update docstring

f4328c2

formatted

69c9c87

reset

aba37b2

Implement ContinuousBCQPolicy and offline_bcq example

0f20e18

update some comments

c16e1da

Merge branch 'master' into master

51072e3

thkkk added 2 commits November 19, 2021 17:50

Add ContinuousBCQ test and update offline_bcq example

615fd01

Merge branch 'master' of github.com:thkkk/tianshou

cebaf9d

thkkk commented Nov 19, 2021

View reviewed changes

thkkk added 5 commits November 19, 2021 19:52

Update ContinuousBCQ test

5e407d3

Rename ContinuousBCQ to BCQ

d1b8e8a

Add BCQ

209f887

fix docs

632ce9b

fix: update readme of offline example

a2fe98d

Trinkle23897 reviewed Nov 20, 2021

View reviewed changes

thkkk and others added 11 commits November 21, 2021 15:38

fix docstring

70c7406

modify some comments

50e7400

Add parameters in BCQ

eeb2bfa

Move VAE and Pertubation to utils/net/continuous.py

4972e50

Add an arg in offline_bcq

05d7adf

code format

8d08351

Add gather_pendulum_data for unittest

76a1d83

simplify

032e41f

move all bcq tests under offline/

8cb5bc2

fix mypy

1e671f2

remove dill and use offline_trainer to test bcq

731f4db

Trinkle23897 added 2 commits November 21, 2021 23:10

skip win/mac vecenv test

c90ac1d

polish

52ac650

Trinkle23897 reviewed Nov 22, 2021

View reviewed changes

tianshou/policy/imitation/bcq.py Show resolved Hide resolved

polish

9ff232d

Trinkle23897 previously approved these changes Nov 22, 2021

View reviewed changes

thkkk added 2 commits November 22, 2021 21:15

Modify VAE and Perturbation, in order to adapt to more dimensional input

029f869

Merge branch 'master' of github.com:thkkk/tianshou

1cc3966

thkkk dismissed Trinkle23897’s stale review via 1cc3966 November 22, 2021 13:17

polish

aa884a6

Trinkle23897 approved these changes Nov 22, 2021

View reviewed changes

Trinkle23897 merged commit 5c5a3db into thu-ml:master Nov 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement ContinuousBCQPolicy and offline_bcq example #480

Implement ContinuousBCQPolicy and offline_bcq example #480

Uh oh!

thkkk commented Nov 18, 2021

Uh oh!

Trinkle23897 commented Nov 18, 2021

Uh oh!

codecov-commenter commented Nov 18, 2021 •

edited

Loading

Uh oh!

thkkk left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Implement ContinuousBCQPolicy and offline_bcq example #480

Implement ContinuousBCQPolicy and offline_bcq example #480

Uh oh!

Conversation

thkkk commented Nov 18, 2021

Uh oh!

Trinkle23897 commented Nov 18, 2021

Uh oh!

codecov-commenter commented Nov 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

thkkk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Nov 18, 2021 •

edited

Loading