-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fix merging bug / update boft conv2d scaling variable #2127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@BenjaminBossan I think this should fix the problem. Best. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix.
I ran pytest tests/regression/test_regression.py -s --regression -k boft on this branch and it passed.
* initial skeleton * tokenize fn * adding bos and eos to tokenization fn * prmtrainer * fixing small typo in tokenize * typo in input_ids and labels construction * numpy dimension * introduce the stepwise reward trainer * update markdown files * let user decide post step separator in config * doc post_step_separator * do not add post step_tokens to last step of the reasoning process * renaming prm to stepwisereward * formatting * fix tokenize kwargs * adapt test to the new post_token args * adding example script * fix small typo * add create_model_card and renaming * fixing booleans * Adding the new stepwise_preference instead of placeholders for datasets * formatting * Update docs/source/_toctree.yml Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update examples/scripts/stepwise_reward_modeling.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * update push to hub Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * step_separator can't be None Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix suggested typos * add citation * reformat doc * reordering init * push to hub prm800k * changing dataset in example * change dataset format to align with the sky is blue example * fix tokenization column names * fix num labels in openai example * add support for conversational dataset * remove training whitespace * replace tokenizer with processing class * Update docs/source/dataset_formats.mdx Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * remove openai_prm800k * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update docs/source/stepwise_reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/stepwise_reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * renaming Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * renaming Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * minor renamings in docs * using prm800k instead of openai_prm800k * update num labels to 2 following the new format * changing doc examples to math examples * change reference to dataset_formats.mdx * changing dataset config in test * remove conversational dataset support * remove conv dataset support * fix bos token * fix scriptarguments in example * completion to completions * remove valuerror for step_separator inside steps * run precommit * remove conv dataset support Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * renaming zen dataset * remove unused printing * unknown label column * introduce the train on last step arg * _tokenize support train_on_last_step * incorporate train_on_last_step to tests * formatting * remove comments in trainer * Refactor `tokenize_row` * Update max_completion_length parameter in StepwiseRewardConfig * Collator * Update comment * Update type hint * fix table * Remove collator * don't need pad token id * add error back * max length args * use tokenizer arg * Update doc * label -> labels * fixing tokenization issues in tokenize row * correct labels for token classification * adding max_length to tokenize_row * reformat tests * adding tests for tokenize row * fixing typos in comments * update doc Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Add math_shepherd.py script for dataset processing * split the dataset * formatting * same evaluation method for the two training methods * adding filtering to example script * formatting * Add features to avoid casting labels to bool in dataset tokenization * Update docs/source/stepwise_reward_trainer.mdx [ci skip] * Add learning_rate parameter to StepwiseRewardConfig class * update doc * Remove unused setup_chat_format function * Fix warning message in stepwise_reward_modeling.py * Update logging steps in stepwise_reward_trainer.mdx * little doc change [ci skip] * Fix copyrights * fix space after copyrights * Update dataset loading in stepwise_reward_modeling.py * refine compute_accuracy and proper test * fix tests * style * renamings * renaming in init * doc renaming * fix sorting and tag * experiemental [ci skip] * trigger CI * other doc fix --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
Fixing the bug that boft_s cannot be loaded with older version checkpoints.