这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@MischaPanch
Copy link
Collaborator

@MischaPanch MischaPanch commented Nov 11, 2023

This PR adds a new method for getting actions from an env's observation and info. This is useful for standard inference and stands in contrast to batch-based methods that are currently used in training and evaluation. Without this, users have to do some kind of gymnastics to actually perform inference with a trained policy. I have also added a test for the new method.

In future PRs, this method should be included in the examples (in the the "watch" section).

To add this required improving multiple typing things and, importantly, simplifying the signature of forward in many policies! This is a breaking change, but it will likely affect no users. The input parameter of forward was a rather hacky mechanism, I believe it is good that it's gone now. It will also help with #948 .

The main functional change is the addition of compute_action to BasePolicy.

Other minor changes:

  • improvements in typing
  • updated PR and Issue templates
  • Improved handling of max_action_num

Closes #981

@MischaPanch MischaPanch changed the title Feature/get action from obs Method to get actions from observations Nov 11, 2023
@MischaPanch MischaPanch added the enhancement Feature that is not a new algorithm or an algorithm enhancement label Nov 11, 2023
@MischaPanch MischaPanch force-pushed the feature/get_action_from_obs branch from 39ad259 to bf9c971 Compare November 11, 2023 23:07
@MischaPanch MischaPanch self-assigned this Nov 11, 2023
@MischaPanch MischaPanch force-pushed the feature/get_action_from_obs branch 3 times, most recently from dad715a to ad196d0 Compare November 12, 2023 19:47
@MischaPanch
Copy link
Collaborator Author

The cql integration test is failing, as it is on master currently. I don't know why, and it doesn't happen locally.. Any idea, @Trinkle23897 ?

FAILED test/offline/test_cql.py::test_cql - assert False
 +  where False = <function test_cql.<locals>.stop_fn at 0x7fc8a49911c0>(-1202.872077798278)
===== 1 failed, 109 passed, 1 skipped, 2800 warnings in 1210.42s (0:20:10) =====

@MischaPanch MischaPanch force-pushed the feature/get_action_from_obs branch from 17e7bcf to 010cf32 Compare November 12, 2023 21:20
@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2023

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 96.49123% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.11%. Comparing base (6d6c85e) to head (2d9afeb).
⚠️ Report is 672 commits behind head on master.

Files with missing lines Patch % Lines
tianshou/policy/base.py 93.54% 2 Missing ⚠️
tianshou/policy/modelfree/c51.py 80.00% 1 Missing ⚠️
tianshou/policy/modelfree/fqf.py 87.50% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #991      +/-   ##
==========================================
+ Coverage   88.06%   88.11%   +0.05%     
==========================================
  Files          96       96              
  Lines        7505     7512       +7     
==========================================
+ Hits         6609     6619      +10     
+ Misses        896      893       -3     
Flag Coverage Δ
unittests 88.11% <96.49%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MischaPanch MischaPanch added this to the Release 1.0.0 milestone Nov 12, 2023
@nuance1979
Copy link
Collaborator

nuance1979 commented Nov 13, 2023

The cql integration test is failing, as it is on master currently. I don't know why, and it doesn't happen locally.. Any idea, @Trinkle23897 ?

FAILED test/offline/test_cql.py::test_cql - assert False
 +  where False = <function test_cql.<locals>.stop_fn at 0x7fc8a49911c0>(-1202.872077798278)
===== 1 failed, 109 passed, 1 skipped, 2800 warnings in 1210.42s (0:20:10) =====

We set an arbitrary threshold here and check to see if after 5 iterations the model can beat it. The difference is most probably due to floating point math error but could be a hidden bug.

I'd suggest lowering the threshold to -1210 to let it pass for now and revisit it as part of the benchmarking work in the future.

@Trinkle23897
Copy link
Collaborator

I'd suggest changing seed but I'm fine with lowering threshold

@MischaPanch MischaPanch changed the title Method to get actions from observations Method to compute actions from observations Nov 16, 2023
@MischaPanch MischaPanch force-pushed the feature/get_action_from_obs branch from fa9a45d to a216c14 Compare November 16, 2023 14:20
@MischaPanch
Copy link
Collaborator Author

MischaPanch commented Nov 16, 2023

@Trinkle23897 I removed the batch_size=None related changes (by force pushing) and moved them to a draft PR #993 . Feel free to take it over.

Would be nice to merge this one soon, as we need the compute_action method in downstream code in internal projects

@Trinkle23897 Trinkle23897 enabled auto-merge (squash) November 16, 2023 17:23
@Trinkle23897 Trinkle23897 merged commit 3a1bc18 into thu-ml:master Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Feature that is not a new algorithm or an algorithm enhancement

Projects

Development

Successfully merging this pull request may close these issues.

Update PR Template

4 participants