Method to compute actions from observations #991

MischaPanch · 2023-11-11T23:01:48Z

This PR adds a new method for getting actions from an env's observation and info. This is useful for standard inference and stands in contrast to batch-based methods that are currently used in training and evaluation. Without this, users have to do some kind of gymnastics to actually perform inference with a trained policy. I have also added a test for the new method.

In future PRs, this method should be included in the examples (in the the "watch" section).

To add this required improving multiple typing things and, importantly, simplifying the signature of forward in many policies! This is a breaking change, but it will likely affect no users. The input parameter of forward was a rather hacky mechanism, I believe it is good that it's gone now. It will also help with #948 .

The main functional change is the addition of compute_action to BasePolicy.

Other minor changes:

improvements in typing
updated PR and Issue templates
Improved handling of max_action_num

Closes #981

MischaPanch · 2023-11-12T21:12:03Z

The cql integration test is failing, as it is on master currently. I don't know why, and it doesn't happen locally.. Any idea, @Trinkle23897 ?

FAILED test/offline/test_cql.py::test_cql - assert False
 +  where False = <function test_cql.<locals>.stop_fn at 0x7fc8a49911c0>(-1202.872077798278)
===== 1 failed, 109 passed, 1 skipped, 2800 warnings in 1210.42s (0:20:10) =====

codecov-commenter · 2023-11-12T21:52:39Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 96.49123% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.11%. Comparing base (6d6c85e) to head (2d9afeb).
⚠️ Report is 672 commits behind head on master.

Files with missing lines	Patch %	Lines
tianshou/policy/base.py	93.54%	2 Missing ⚠️
tianshou/policy/modelfree/c51.py	80.00%	1 Missing ⚠️
tianshou/policy/modelfree/fqf.py	87.50%	1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #991      +/-   ##
==========================================
+ Coverage   88.06%   88.11%   +0.05%     
==========================================
  Files          96       96              
  Lines        7505     7512       +7     
==========================================
+ Hits         6609     6619      +10     
+ Misses        896      893       -3

Flag	Coverage Δ
unittests	`88.11% <96.49%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

nuance1979 · 2023-11-13T22:17:02Z

The cql integration test is failing, as it is on master currently. I don't know why, and it doesn't happen locally.. Any idea, @Trinkle23897 ?
FAILED test/offline/test_cql.py::test_cql - assert False
 +  where False = <function test_cql.<locals>.stop_fn at 0x7fc8a49911c0>(-1202.872077798278)
===== 1 failed, 109 passed, 1 skipped, 2800 warnings in 1210.42s (0:20:10) =====

We set an arbitrary threshold here and check to see if after 5 iterations the model can beat it. The difference is most probably due to floating point math error but could be a hidden bug.

I'd suggest lowering the threshold to -1210 to let it pass for now and revisit it as part of the benchmarking work in the future.

Trinkle23897 · 2023-11-13T22:24:58Z

I'd suggest changing seed but I'm fine with lowering threshold

tianshou/data/buffer/base.py

tianshou/policy/base.py

MischaPanch · 2023-11-16T14:27:43Z

@Trinkle23897 I removed the batch_size=None related changes (by force pushing) and moved them to a draft PR #993 . Feel free to take it over.

Would be nice to merge this one soon, as we need the compute_action method in downstream code in internal projects

Change policy forward interface: removed support for 'input' kwarg

aa43688

MischaPanch changed the title ~~Feature/get action from obs~~ Method to get actions from observations Nov 11, 2023

MischaPanch added the enhancement Feature that is not a new algorithm or an algorithm enhancement label Nov 11, 2023

MischaPanch force-pushed the feature/get_action_from_obs branch from 39ad259 to bf9c971 Compare November 11, 2023 23:07

MischaPanch requested review from Trinkle23897 and opcode81 November 11, 2023 23:09

MischaPanch self-assigned this Nov 11, 2023

MischaPanch force-pushed the feature/get_action_from_obs branch 3 times, most recently from dad715a to ad196d0 Compare November 12, 2023 19:47

MischaPanch force-pushed the feature/get_action_from_obs branch from 17e7bcf to 010cf32 Compare November 12, 2023 21:20

MischaPanch added this to the Release 1.0.0 milestone Nov 12, 2023

Trinkle23897 reviewed Nov 14, 2023

View reviewed changes

tianshou/data/buffer/base.py Outdated Show resolved Hide resolved

Trinkle23897 reviewed Nov 14, 2023

View reviewed changes

tianshou/policy/base.py Show resolved Hide resolved

MischaPanch and others added 6 commits November 16, 2023 13:19

Added compute_action method and test, multiple typing improvements

980a0c4

Fixed missing info field in target-q computation

ce85f17

Meta: Adjusted PR and Issue templates

0b5e96d

Fixed dealing with max_action_num

61e1fa9

Minor improvements in types and fix in passing info

b931258

try changing seed

a216c14

MischaPanch changed the title ~~Method to get actions from observations~~ Method to compute actions from observations Nov 16, 2023

MischaPanch force-pushed the feature/get_action_from_obs branch from fa9a45d to a216c14 Compare November 16, 2023 14:20

Merge branch 'master' into feature/get_action_from_obs

2d9afeb

Trinkle23897 approved these changes Nov 16, 2023

View reviewed changes

MischaPanch enabled auto-merge November 16, 2023 17:01

Trinkle23897 disabled auto-merge November 16, 2023 17:23

Trinkle23897 enabled auto-merge (squash) November 16, 2023 17:23

Trinkle23897 merged commit 3a1bc18 into thu-ml:master Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Method to compute actions from observations #991

Method to compute actions from observations #991

Uh oh!

MischaPanch commented Nov 11, 2023 •

edited

Loading

Uh oh!

MischaPanch commented Nov 12, 2023

Uh oh!

codecov-commenter commented Nov 12, 2023 •

edited

Loading

Uh oh!

nuance1979 commented Nov 13, 2023 •

edited

Loading

Uh oh!

Trinkle23897 commented Nov 13, 2023

Uh oh!

Uh oh!

Uh oh!

MischaPanch commented Nov 16, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Method to compute actions from observations #991

Method to compute actions from observations #991

Uh oh!

Conversation

MischaPanch commented Nov 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MischaPanch commented Nov 12, 2023

Uh oh!

codecov-commenter commented Nov 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nuance1979 commented Nov 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Trinkle23897 commented Nov 13, 2023

Uh oh!

Uh oh!

Uh oh!

MischaPanch commented Nov 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MischaPanch commented Nov 11, 2023 •

edited

Loading

codecov-commenter commented Nov 12, 2023 •

edited

Loading

nuance1979 commented Nov 13, 2023 •

edited

Loading

MischaPanch commented Nov 16, 2023 •

edited

Loading