-
Notifications
You must be signed in to change notification settings - Fork 61
Open
Description
Nice repo!!!
it seems that the default parameter for the policy will freeze all the layers of the language model we are using and just update the lm_head
I tried the provided example of flan-T5 here: https://colab.research.google.com/drive/1DYHt0mi6cyl8ZTMJEkMNpsSZCCvR4jM1?usp=sharing
when I changed the value unfreeze_layer_from_past to be 1 to update the wights of the final layer of flan-t5 like this:
the behavior change the the actor starts to generate empty text:
Also after training it gave me empty text:
what is the reason of the this behavior?
NOTE: I did not change anything else in the flan-t5 code example.
Metadata
Metadata
Assignees
Labels
No labels