-
Notifications
You must be signed in to change notification settings - Fork 1.2k
W&B: add artifacts support #441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
examples/atari/atari_dqn_wandb.py
Outdated
else: | ||
eps = args.eps_train_final | ||
policy.set_eps(eps) | ||
logger.write('train/eps', env_step, eps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the root course of that bug, I'll fix that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #441 +/- ##
==========================================
- Coverage 94.71% 94.21% -0.51%
==========================================
Files 60 60
Lines 3821 3853 +32
==========================================
+ Hits 3619 3630 +11
- Misses 202 223 +21
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I'm wondering if we can add a unit test of wandb under |
@Trinkle23897 yes I'll do that before marking this ready for review. I think the easiest way to add tests would be to set up a dummy account that we can use for CI tests. |
Co-authored-by: Jiayi Weng <trinkle23897@qq.com>
I've already set up https://wandb.ai/tianshou a week before, maybe we can use a sub-namespace under this account? |
Okay. One thing to note here would be that the API key of the account used for CI will be exposed. So, it'll be better to use an account that doesn't have any important experiments. |
@Trinkle23897 Ohh I wasn't aware of that. Yes, that would work! |
This is very helpful. Thanks! |
@Trinkle23897 I was trying to log gym visualizations over time to show training progress something like this. To do that we need to monitor the videos from the gym. So I added
|
it may because env.reset() after env.reset() inside the initialization process. a quick fix is to add another wrapper that can handle this consecutive reset into a single one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do it, but I don't really know how to modify an in-progress PR.
@Trinkle23897 @drozzy the psrl test is failing. I think the latest gym doesn't include |
@Trinkle23897 ok I've updated the gym req. based on that link |
@Trinkle23897 Do you know how the github env secret can be accessed in python script? For the wandb test to pass we need to login in the script like |
|
@Trinkle23897 Ok now it's working |
@Trinkle23897 Hi. This is ready to be merged from my end. Do you have any changes that you'd like me to address. |
Could you please wait for me to finish one assignment for 2 days? I'll polish it. |
@Trinkle23897 sure. Take your time :) |
@Trinkle23897 Would you like to add the instructions on how to use this logger in the docs or readme? |
yep |
Awesome |
test/modelbased/test_psrl.py
Outdated
if args.logger == "wandb": | ||
logger = WandbLogger(save_interval=1, project='psrl', name='wandb_test') | ||
elif args.logger == "tensorboard": | ||
log_path = os.path.join(args.logdir, args.task, 'psrl') | ||
writer = SummaryWriter(log_path) | ||
writer.add_text("args", str(args)) | ||
logger = TensorboardLogger(writer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to log all args into wandb at beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. I'll add it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I've added configuration logging. Hopefully tests succeed
|
||
Creates three panels with plots: train, test, and update. | ||
This logger creates three panels with plots: train, test, and update. | ||
Make sure to select the correct access for each panel in weights and biases: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can here show where the example script is?
@@ -41,6 +41,13 @@ def get_args(): | |||
) | |||
parser.add_argument('--frames-stack', type=int, default=4) | |||
parser.add_argument('--resume-path', type=str, default=None) | |||
parser.add_argument('--resume-id', type=str, default=None) | |||
parser.add_argument( | |||
'--logger', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and maybe we can add some instructions on how to use wandb (including resume) in examples/atari/README.md
instead of tensorboard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the instructions in doc also? I can make a separate PR tomorrow for adding the detailed instructions in examples/atari/README.md
as well as docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just append to this pr
Great work @AyushExel, please check if my modification is correct or not (I haven't run the atari_dqn script for resume training and see if wandb still work). I change loading checkpoint under |
@Trinkle23897 Thanks. Yeah It makes sense. I'm testing the resume functionality on atari now. |
@Trinkle23897 alright I think this is good to go. I can add instructions and details in a separate PR. |
sure! |
- rename WandBLogger -> WandbLogger - add save_data and restore_data - allow more input arguments for wandb init - integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py - documentation update
make format
(required)make commit-checks
(required)This PR fixes existing bugs in WandbLogger and adds support for artifacts.

When using
WandbLogger
, you can now resume your runs from any device. Example usage is in theexamples/atari/atari_dqn_wandb.py
:Let me know if I missed something. This is still a WIP. I'll add some more advanced visualization features.
Bug fix
It seems like the atari_dqn task is passing
floats
tologger.write
method which crashes the script. I've handled the use case but it is hacky. We might need to modify the trainer to always pass dicts