这是indexloc提供的服务,不要输入任何密码
Skip to content

Support computing custom scores and terminating/saving based on them in BaseTrainer #1202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 14, 2024

Conversation

anyongjin
Copy link
Contributor

@anyongjin anyongjin commented Aug 13, 2024

This PR introduces a new concept into tianshou training: a best_score. It is computed from the appropriate Stats instance and always added to InfoStats.

Breaking Changes:

  • InfoStats has a new non-optional field best_score

Background

Currently, tianshou uses the maximum average return to find the best model. But sometimes it may not meet user needs, for example, the average return only drops by 5%, but the standard deviation drops by 50%. The latter is generally considered to be more stable and better than the former.

Copy link
Collaborator

@MischaPanch MischaPanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @anyongjin, it's a good contribution!

Overall, the trainer has to become more flexible, but it would be too much to ask for right now. I think we can merge this after some slight changes and then soon refactor the trainer, taking in consideration the support for custom scoring and custom conditions on terminating the training

@anyongjin
Copy link
Contributor Author

In essence, Average Reward and Test Score are two different things. The former represents a fixed test result indicator. The latter is a score for the test result. The scoring logic for different tasks and users may be different. For example, some consider the standard deviation and some do not.
Currently, tianshou uses best_reawrd for both average reward and test score. It is difficult for users to implement custom scoring logic. So I suggest that best_reward be used only for average reward, and best_score be added for test score. In this way, best_reward and best_score are two different things. If it is called best_custom_score, people will think that there is a system default score field, so I think it is better not to add 'custom'.

Update:

  • Added explanation for InfoStats.best_score.
  • Use lambda function when compute_score_fn is None to avoid multiple if-else

@MischaPanch MischaPanch changed the title add evaluate_test_fn to BaseTrainer (Calculate the test batch performance score to determine whether it is the best model) Support computing custom scores and terminating/saving based on them in BaseTrainer Aug 14, 2024
@MischaPanch MischaPanch merged commit a38e586 into thu-ml:master Aug 14, 2024
4 checks passed
@anyongjin anyongjin mentioned this pull request Aug 14, 2024
8 tasks
@opcode81
Copy link
Collaborator

opcode81 commented Mar 10, 2025

@anyongjin the "terminating" part was not actually implemented. If it is the score we seek to maximize, then the early stopping criterion (stop_fn) should also use scores instead of mean returns, don't you agree?
I noticed this, because I am refactoring and improving the library for v2. I will change this in the v2 branch.

@anyongjin
Copy link
Contributor Author

Yes, if best_score is used to determine whether it is the optimal model, then best_score should also be passed when calling stop_fn instead of the previously default average return. I forgot to modify this part earlier, and it should be corrected in v2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants