Support computing custom scores and terminating/saving based on them in BaseTrainer #1202

anyongjin · 2024-08-13T02:54:17Z

This PR introduces a new concept into tianshou training: a best_score. It is computed from the appropriate Stats instance and always added to InfoStats.

Breaking Changes:

InfoStats has a new non-optional field best_score

Background

Currently, tianshou uses the maximum average return to find the best model. But sometimes it may not meet user needs, for example, the average return only drops by 5%, but the standard deviation drops by 50%. The latter is generally considered to be more stable and better than the former.

tianshou/trainer/base.py

tianshou/data/stats.py

MischaPanch

Thanks for the PR @anyongjin, it's a good contribution!

Overall, the trainer has to become more flexible, but it would be too much to ask for right now. I think we can merge this after some slight changes and then soon refactor the trainer, taking in consideration the support for custom scoring and custom conditions on terminating the training

tianshou/trainer/base.py

anyongjin · 2024-08-14T03:44:43Z

In essence, Average Reward and Test Score are two different things. The former represents a fixed test result indicator. The latter is a score for the test result. The scoring logic for different tasks and users may be different. For example, some consider the standard deviation and some do not.
Currently, tianshou uses best_reawrd for both average reward and test score. It is difficult for users to implement custom scoring logic. So I suggest that best_reward be used only for average reward, and best_score be added for test score. In this way, best_reward and best_score are two different things. If it is called best_custom_score, people will think that there is a system default score field, so I think it is better not to add 'custom'.

Update:

Added explanation for InfoStats.best_score.
Use lambda function when compute_score_fn is None to avoid multiple if-else

tianshou/trainer/base.py

opcode81 · 2025-03-10T14:55:45Z

@anyongjin the "terminating" part was not actually implemented. If it is the score we seek to maximize, then the early stopping criterion (stop_fn) should also use scores instead of mean returns, don't you agree?
I noticed this, because I am refactoring and improving the library for v2. I will change this in the v2 branch.

anyongjin · 2025-03-11T06:32:21Z

Yes, if best_score is used to determine whether it is the optimal model, then best_score should also be passed when calling stop_fn instead of the previously default average return. I forgot to modify this part earlier, and it should be corrected in v2.

add evaluate_test_fn to BaseTrainer

47b966a

MischaPanch reviewed Aug 13, 2024

View reviewed changes

tianshou/trainer/base.py Outdated Show resolved Hide resolved

MischaPanch reviewed Aug 13, 2024

View reviewed changes

tianshou/data/stats.py Show resolved Hide resolved

MischaPanch requested changes Aug 13, 2024

View reviewed changes

tianshou/trainer/base.py Outdated Show resolved Hide resolved

tianshou/trainer/base.py Outdated Show resolved Hide resolved

tianshou/trainer/base.py Outdated Show resolved Hide resolved

tianshou/trainer/base.py Outdated Show resolved Hide resolved

rename to compute_score_fn

0dd252a

anyongjin added 2 commits August 14, 2024 12:07

fix mypy error

9f98f57

fix black format error

68aadc9

MischaPanch reviewed Aug 14, 2024

View reviewed changes

tianshou/trainer/base.py Show resolved Hide resolved

MischaPanch approved these changes Aug 14, 2024

View reviewed changes

MischaPanch changed the title ~~add evaluate_test_fn to BaseTrainer (Calculate the test batch performance score to determine whether it is the best model)~~ Support computing custom scores and terminating/saving based on them in BaseTrainer Aug 14, 2024

MischaPanch merged commit a38e586 into thu-ml:master Aug 14, 2024
4 checks passed

This was referenced Aug 14, 2024

Refactoring: make Trainer more flexible #1204

Closed

Support custom scores for Trainer in high-level interfaces #1205

Open

anyongjin mentioned this pull request Aug 14, 2024

optimize logging log_msg #1206

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support computing custom scores and terminating/saving based on them in BaseTrainer #1202

Support computing custom scores and terminating/saving based on them in BaseTrainer #1202

Uh oh!

anyongjin commented Aug 13, 2024 •

edited by MischaPanch

Loading

Uh oh!

Uh oh!

Uh oh!

MischaPanch left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anyongjin commented Aug 14, 2024

Uh oh!

Uh oh!

Uh oh!

opcode81 commented Mar 10, 2025 •

edited

Loading

Uh oh!

anyongjin commented Mar 11, 2025

Uh oh!

Uh oh!

Support computing custom scores and terminating/saving based on them in BaseTrainer #1202

Support computing custom scores and terminating/saving based on them in BaseTrainer #1202

Uh oh!

Conversation

anyongjin commented Aug 13, 2024 • edited by MischaPanch Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Breaking Changes:

Background

Uh oh!

Uh oh!

Uh oh!

MischaPanch left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anyongjin commented Aug 14, 2024

Uh oh!

Uh oh!

Uh oh!

opcode81 commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anyongjin commented Mar 11, 2025

Uh oh!

Uh oh!

anyongjin commented Aug 13, 2024 •

edited by MischaPanch

Loading

MischaPanch left a comment •

edited

Loading

opcode81 commented Mar 10, 2025 •

edited

Loading