+
Skip to content

Conversation

Narsil
Copy link
Contributor

@Narsil Narsil commented Jan 20, 2023

I'm using this to do experiments feel free to close or ignore.
Main goal is to be able to use weights contained on https://hf.co
which doesn't include NPZ format.

All models supporting safetensors: https://huggingface.co/models?library=safetensors&sort=downloads

@Narsil Narsil marked this pull request as draft January 20, 2023 17:03
@coreylowman
Copy link
Owner

Great idea! This'll be great for getting pretrained weights since I had no idea how to load them from pickle files

@Narsil Narsil changed the title [WIP] Safetensors support. Safetensors support. Jan 20, 2023
@Narsil Narsil marked this pull request as ready for review January 20, 2023 19:15
@Narsil
Copy link
Contributor Author

Narsil commented Jan 20, 2023

Nice, I'll open this PR up then.

Biggest caveats I've seen:

There's a copy on write because safetensors expects a flat [u8] buffer. There seems to be a copy here: https://github.com/coreylowman/dfdx/pull/381/files#diff-a0868f528cbb396ebf73e5726654153f4ba44c8820003f936d809e8ae45e6cdaR35
Since every tensor have to be gathered before saving it is quite a strain. There is experimental support for saving iteratively huggingface/safetensors#134 (Like writing tensor per tensor in the file) but it has caveats.
If we were able to get &[u8] directly from Tensor then there wouldn't be a need for it (not sure how easy/likely).

It's usually not that bad on save, since it's not the biggest drawback, but still something to take into account when training large models.

In general I think we could aim for zero-copy loads with memory mapping on aligned files: https://github.com/coreylowman/dfdx/pull/381/files#diff-a0868f528cbb396ebf73e5726654153f4ba44c8820003f936d809e8ae45e6cdaR124
But not worth doing atm I think (That would mean having a borrowed buffer, which kills possibility of doing gradients stuff, just mentioning in case).

@coreylowman
Copy link
Owner

Yeah I think given the tensors may be stored on other devices, getting a &[u8] slice seems unlikely. E.g. for cuda we're going to have to copy back to CPU Vec<E> first anyway.

@Narsil
Copy link
Contributor Author

Narsil commented Jan 20, 2023

Yeah I think given the tensors may be stored on other devices, getting a &[u8] slice seems unlikely. E.g. for cuda we're going to have to copy back to CPU Vec first anyway.

Yes, and training is most likely going to occur on GPU most of the time. Still extra copies in general are usually becoming bottleneck really fast so I'm acutely aware of them.

Loading is happening much more than saving in practice (just like inference >> training overall) so as long as those are reduced to a minimum it's Ok.

@Narsil Narsil force-pushed the safetensors_load_read branch from fa4d016 to 4ea24b9 Compare February 16, 2023 18:33
@Narsil
Copy link
Contributor Author

Narsil commented Feb 16, 2023

Any new thoughts on this PR ? @coreylowman

Copy link
Owner

@coreylowman coreylowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. After #460 it will be straightforward to impl this for nn layer as well

Copy link
Owner

@coreylowman coreylowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorCollection api was just merged, which greatly simplified nn::SaveToNpz/LoadFromNpz. Should be really easy to add a safetensors loader/saver for nn layer now as well!

@Narsil Narsil force-pushed the safetensors_load_read branch from 4ea24b9 to c76e417 Compare February 24, 2023 20:28
@Narsil
Copy link
Contributor Author

Narsil commented Feb 24, 2023

Great addition !

I updated everything ! We can now simply load/save entire modules. Nice !

Cargo.toml Outdated
Comment on lines 36 to 37
# safetensors = { version = "0.2", default-features = false, optional = true }
safetensors = { git = "https://github.com/huggingface/safetensors", default-features = false, optional = true }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a holdover from developing? Which should be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to release 0.3 . Should happen this week. (I will release python release candidate first, there has been a relatively big change in how things are saved, everythng backward compatible, but files on disk will be different now, to get alignement properly done).

Copy link
Owner

@coreylowman coreylowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 will be good to merge after version fix in Cargo.toml

@Narsil Narsil force-pushed the safetensors_load_read branch 2 times, most recently from 512433a to 9b408dd Compare March 5, 2023 23:51
@Narsil Narsil requested a review from coreylowman March 6, 2023 09:06
@Narsil
Copy link
Contributor Author

Narsil commented Mar 6, 2023

@coreylowman I fixed with the released version.

@coreylowman
Copy link
Owner

@Narsil looks good. didn't this used to have an impl for tensor collections?

@Narsil
Copy link
Contributor Author

Narsil commented Mar 8, 2023

I'm as confused as you, I simply rebased. Doing the work again since I can't find that work anymore. I'm guessing I screwed my rebase but I can't figure out where the original code lives.

@Narsil Narsil force-pushed the safetensors_load_read branch from 9b408dd to ee6d197 Compare March 8, 2023 12:16
@Narsil
Copy link
Contributor Author

Narsil commented Mar 8, 2023

I added the test with the feature as I'm seeing a super weird trait issue in the Booleans module, which doesn't seem linked to my changes ? Any ideas: https://github.com/coreylowman/dfdx/actions/runs/4364586004/jobs/7632084283

Comment on lines +36 to +38
gpt2.weight
.load_safetensor(&tensors, "wte.weight")
.expect("Could not load tensor");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome example

Copy link
Owner

@coreylowman coreylowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Don't know why the boolean test is failing, its passing locally for me. Will resolve separately. Thanks for the contribution!

@coreylowman coreylowman merged commit 420a8a2 into coreylowman:main Mar 8, 2023
@Narsil Narsil deleted the safetensors_load_read branch March 8, 2023 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载