-
Notifications
You must be signed in to change notification settings - Fork 2.1k
FIX: DynamicCache key_cache attribute deprecation #2737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: DynamicCache key_cache attribute deprecation #2737
Conversation
Resolves failing CI with transformers source install. The key_cache attribute on DynamicCache is deprecated and will be removed in the 4.56.0 transformers release. The deprecation message mentions that the layers attribute should be used instead, which is what this PR does.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
src/peft/peft_model.py
Outdated
| past_key_values.cross_attention_cache = DynamicCache() | ||
| past_key_values.is_updated = { | ||
| layer_idx: False for layer_idx in range(len(past_key_values.cross_attention_cache.key_cache)) | ||
| layer_idx: False for layer_idx in range(len(past_key_values.cross_attention_cache.layers)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't we need to check both attributes for a certain time? AFAICS the old DynamicCache instance doesn't have a layers attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, true, but while looking at this, I found another issue. Namely, we first initialize past_key_values.cross_attention_cache = DynamicCache() and then iterate over its key_cache / layers, but of course those are going to be empty. I confirmed this when running the tests. So, technically, this line is a no-op.
Now I wonder: can it be removed or should we save the is_updated from before we override cross_attention_cache with the empty DynamicCache()? When checking the attribute before, it is not empty but can have values like {0: True, 1: True}.
@zucchini-nlp I think something was messed up in #2096 to end up in this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a major refactor on cache recently, and prob the new format isn't 100% backward compatible. Can you remind me what was the desired behavior for prefix-tuning on encoder decoder models. We append virtual tokens to decoder inputs only or to encoder ids as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked the prefix tuning paper, the idea is to have separate prefix tokens (or rather, embeddings) for both encoder and decoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see, Overwriting cross_attention_cache actually won't change the values of is_updated afaik and thus we still have to set them to False in order to save cross-encoder cache
Maybe we can just iterate over cache.is_updated and change it in-place, so we don't have to infer number of layers from cache.layers, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good suggestion. Just so I'm sure I understand correctly, currently we have:
past_key_values = EncoderDecoderCache.from_legacy_cache(past_key_values)
past_key_values.cross_attention_cache = DynamicCache()
past_key_values.is_updated = {
layer_idx: False for layer_idx in range(len(past_key_values.cross_attention_cache.layers))
}which makes no sense because past_key_values.is_updated will always end up being an empty dict. Instead, we should do this, right?
past_key_values = EncoderDecoderCache.from_legacy_cache(past_key_values)
past_key_values.cross_attention_cache = DynamicCache()
for key in past_key_values.is_updated.keys():
past_key_values.is_updated[key] = FalseAnd this should also work with older transformers versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, an empty dict would throw errors so we need an explicit False
|
@githubnemo Ready for another review. |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
|
@githubnemo Could you please re-review? |
In huggingface#2737, we fixed some code that relied on the deprecated attribute but some was being missed, as it only runs on the nightly CI with multiple GPUs. This PR fixes this. Note that the original transformers code that this solution was based on no longer exists, as transformers now initializes the cache lazily, so pre-allocating the keys and values to the correct device is not necessary. But since prefix tuning inserts "virtual" keys/values, we still have to ensure the correct device in PEFT. I have tested the failing tests locally and they pass.
In #2737, we fixed some code that relied on the deprecated attribute but some was being missed, as it only runs on the nightly CI with multiple GPUs. This PR fixes this. Note that the original transformers code that this solution was based on no longer exists, as transformers now initializes the cache lazily, so pre-allocating the keys and values to the correct device is not necessary. But since prefix tuning inserts "virtual" keys/values, we still have to ensure the correct device in PEFT. I have tested the failing tests locally and they pass.
Resolves failing CI with transformers source install.
The
key_cacheattribute onDynamicCacheis deprecated and will be removed in the 4.56.0 transformers release. The deprecation message mentions that thelayersattribute should be used instead, which is what this PR does.