Safetensors support. #381

Narsil · 2023-01-20T16:43:03Z

I'm using this to do experiments feel free to close or ignore.
Main goal is to be able to use weights contained on https://hf.co
which doesn't include NPZ format.

All models supporting safetensors: https://huggingface.co/models?library=safetensors&sort=downloads

coreylowman · 2023-01-20T18:48:42Z

Great idea! This'll be great for getting pretrained weights since I had no idea how to load them from pickle files

Narsil · 2023-01-20T19:21:57Z

Nice, I'll open this PR up then.

Biggest caveats I've seen:

There's a copy on write because safetensors expects a flat [u8] buffer. There seems to be a copy here: https://github.com/coreylowman/dfdx/pull/381/files#diff-a0868f528cbb396ebf73e5726654153f4ba44c8820003f936d809e8ae45e6cdaR35
Since every tensor have to be gathered before saving it is quite a strain. There is experimental support for saving iteratively huggingface/safetensors#134 (Like writing tensor per tensor in the file) but it has caveats.
If we were able to get &[u8] directly from Tensor then there wouldn't be a need for it (not sure how easy/likely).

It's usually not that bad on save, since it's not the biggest drawback, but still something to take into account when training large models.

In general I think we could aim for zero-copy loads with memory mapping on aligned files: https://github.com/coreylowman/dfdx/pull/381/files#diff-a0868f528cbb396ebf73e5726654153f4ba44c8820003f936d809e8ae45e6cdaR124
But not worth doing atm I think (That would mean having a borrowed buffer, which kills possibility of doing gradients stuff, just mentioning in case).

coreylowman · 2023-01-20T20:10:09Z

Yeah I think given the tensors may be stored on other devices, getting a &[u8] slice seems unlikely. E.g. for cuda we're going to have to copy back to CPU Vec<E> first anyway.

src/tensor/safetensors.rs

Narsil · 2023-01-20T22:24:59Z

Yeah I think given the tensors may be stored on other devices, getting a &[u8] slice seems unlikely. E.g. for cuda we're going to have to copy back to CPU Vec first anyway.

Yes, and training is most likely going to occur on GPU most of the time. Still extra copies in general are usually becoming bottleneck really fast so I'm acutely aware of them.

Loading is happening much more than saving in practice (just like inference >> training overall) so as long as those are reduced to a minimum it's Ok.

Narsil · 2023-02-16T18:34:50Z

Any new thoughts on this PR ? @coreylowman

coreylowman

Looks good to me. After #460 it will be straightforward to impl this for nn layer as well

src/tensor/safetensors.rs

coreylowman

TensorCollection api was just merged, which greatly simplified nn::SaveToNpz/LoadFromNpz. Should be really easy to add a safetensors loader/saver for nn layer now as well!

Narsil · 2023-02-24T20:32:43Z

Great addition !

I updated everything ! We can now simply load/save entire modules. Nice !

coreylowman · 2023-02-25T17:14:06Z

Cargo.toml

+# safetensors = { version = "0.2", default-features = false, optional = true }
+safetensors = { git = "https://github.com/huggingface/safetensors", default-features = false, optional = true }


Was this a holdover from developing? Which should be used?

I need to release 0.3 . Should happen this week. (I will release python release candidate first, there has been a relatively big change in how things are saved, everythng backward compatible, but files on disk will be different now, to get alignement properly done).

coreylowman

🔥 will be good to merge after version fix in Cargo.toml

Narsil · 2023-03-06T09:19:38Z

@coreylowman I fixed with the released version.

coreylowman · 2023-03-07T01:49:52Z

@Narsil looks good. didn't this used to have an impl for tensor collections?

Narsil · 2023-03-08T11:00:26Z

I'm as confused as you, I simply rebased. Doing the work again since I can't find that work anymore. I'm guessing I screwed my rebase but I can't figure out where the original code lives.

Narsil · 2023-03-08T13:09:41Z

I added the test with the feature as I'm seeing a super weird trait issue in the Booleans module, which doesn't seem linked to my changes ? Any ideas: https://github.com/coreylowman/dfdx/actions/runs/4364586004/jobs/7632084283

coreylowman · 2023-03-08T13:20:56Z

examples/safetensors-save-load.rs

+    gpt2.weight
+        .load_safetensor(&tensors, "wte.weight")
+        .expect("Could not load tensor");


Awesome example

coreylowman

LGTM! Don't know why the boolean test is failing, its passing locally for me. Will resolve separately. Thanks for the contribution!

Narsil marked this pull request as draft January 20, 2023 17:03

Narsil changed the title ~~[WIP] Safetensors support.~~ Safetensors support. Jan 20, 2023

Narsil marked this pull request as ready for review January 20, 2023 19:15

coreylowman reviewed Jan 20, 2023

View reviewed changes

src/tensor/safetensors.rs Outdated Show resolved Hide resolved

Narsil force-pushed the safetensors_load_read branch from fa4d016 to 4ea24b9 Compare February 16, 2023 18:33

coreylowman reviewed Feb 19, 2023

View reviewed changes

src/tensor/safetensors.rs Outdated Show resolved Hide resolved

src/tensor/safetensors.rs Outdated Show resolved Hide resolved

src/tensor/safetensors.rs Show resolved Hide resolved

coreylowman reviewed Feb 22, 2023

View reviewed changes

Narsil force-pushed the safetensors_load_read branch from 4ea24b9 to c76e417 Compare February 24, 2023 20:28

coreylowman reviewed Feb 25, 2023

View reviewed changes

coreylowman requested changes Feb 25, 2023

View reviewed changes

Narsil force-pushed the safetensors_load_read branch 2 times, most recently from 512433a to 9b408dd Compare March 5, 2023 23:51

Narsil requested a review from coreylowman March 6, 2023 09:06

Narsil added 3 commits March 8, 2023 11:46

Dummy attempt to use safetensors.

80c0066

Cleaned up.

7210f3d

Released version.

0582a50

Narsil added 2 commits March 8, 2023 13:15

Reworked the TensorCollection.

5077f30

Readd boolean.

ee6d197

Narsil force-pushed the safetensors_load_read branch from 9b408dd to ee6d197 Compare March 8, 2023 12:16

Testing the safetensors feature.

babb2e8

coreylowman reviewed Mar 8, 2023

View reviewed changes

coreylowman approved these changes Mar 8, 2023

View reviewed changes

coreylowman merged commit 420a8a2 into coreylowman:main Mar 8, 2023

Narsil deleted the safetensors_load_read branch March 8, 2023 13:25

		# safetensors = { version = "0.2", default-features = false, optional = true }
		safetensors = { git = "https://github.com/huggingface/safetensors", default-features = false, optional = true }

Uh oh!

Safetensors support. #381

Safetensors support. #381

Uh oh!

Conversation

Narsil commented Jan 20, 2023

Uh oh!

coreylowman commented Jan 20, 2023

Uh oh!

Narsil commented Jan 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coreylowman commented Jan 20, 2023

Uh oh!

Uh oh!

Narsil commented Jan 20, 2023

Uh oh!

Narsil commented Feb 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coreylowman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coreylowman left a comment

Choose a reason for hiding this comment

Uh oh!

Narsil commented Feb 24, 2023

Uh oh!

coreylowman Feb 25, 2023

Choose a reason for hiding this comment

Uh oh!

Narsil Feb 27, 2023

Choose a reason for hiding this comment

Uh oh!

coreylowman left a comment

Choose a reason for hiding this comment

Uh oh!

Narsil commented Mar 6, 2023

Uh oh!

coreylowman commented Mar 7, 2023

Uh oh!

Narsil commented Mar 8, 2023

Uh oh!

Narsil commented Mar 8, 2023

Uh oh!

coreylowman Mar 8, 2023

Choose a reason for hiding this comment

Uh oh!

coreylowman left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Narsil commented Jan 20, 2023 •

edited

Loading

Narsil commented Feb 16, 2023 •

edited

Loading