这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@BenjaminBossan
Copy link
Member

Resolves failing CI with transformers source install.

The key_cache attribute on DynamicCache is deprecated and will be removed in the 4.56.0 transformers release. The deprecation message mentions that the layers attribute should be used instead, which is what this PR does.

Resolves failing CI with transformers source install.

The key_cache attribute on DynamicCache is deprecated and will be
removed in the 4.56.0 transformers release. The deprecation message
mentions that the layers attribute should be used instead, which is what
this PR does.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

past_key_values.cross_attention_cache = DynamicCache()
past_key_values.is_updated = {
layer_idx: False for layer_idx in range(len(past_key_values.cross_attention_cache.key_cache))
layer_idx: False for layer_idx in range(len(past_key_values.cross_attention_cache.layers))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we need to check both attributes for a certain time? AFAICS the old DynamicCache instance doesn't have a layers attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, true, but while looking at this, I found another issue. Namely, we first initialize past_key_values.cross_attention_cache = DynamicCache() and then iterate over its key_cache / layers, but of course those are going to be empty. I confirmed this when running the tests. So, technically, this line is a no-op.

Now I wonder: can it be removed or should we save the is_updated from before we override cross_attention_cache with the empty DynamicCache()? When checking the attribute before, it is not empty but can have values like {0: True, 1: True}.

@zucchini-nlp I think something was messed up in #2096 to end up in this situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a major refactor on cache recently, and prob the new format isn't 100% backward compatible. Can you remind me what was the desired behavior for prefix-tuning on encoder decoder models. We append virtual tokens to decoder inputs only or to encoder ids as well?

Copy link
Member Author

@BenjaminBossan BenjaminBossan Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked the prefix tuning paper, the idea is to have separate prefix tokens (or rather, embeddings) for both encoder and decoder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see, Overwriting cross_attention_cache actually won't change the values of is_updated afaik and thus we still have to set them to False in order to save cross-encoder cache

Maybe we can just iterate over cache.is_updated and change it in-place, so we don't have to infer number of layers from cache.layers, wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good suggestion. Just so I'm sure I understand correctly, currently we have:

                past_key_values = EncoderDecoderCache.from_legacy_cache(past_key_values)
                past_key_values.cross_attention_cache = DynamicCache()
                past_key_values.is_updated = {
                    layer_idx: False for layer_idx in range(len(past_key_values.cross_attention_cache.layers))
                }

which makes no sense because past_key_values.is_updated will always end up being an empty dict. Instead, we should do this, right?

                past_key_values = EncoderDecoderCache.from_legacy_cache(past_key_values)
                past_key_values.cross_attention_cache = DynamicCache()
                for key in past_key_values.is_updated.keys():
                    past_key_values.is_updated[key] = False

And this should also work with older transformers versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, an empty dict would throw errors so we need an explicit False

@BenjaminBossan
Copy link
Member Author

@githubnemo Ready for another review.

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@BenjaminBossan
Copy link
Member Author

@githubnemo Could you please re-review?

@BenjaminBossan BenjaminBossan merged commit 2d9b22f into huggingface:main Aug 26, 2025
26 of 27 checks passed
@BenjaminBossan BenjaminBossan deleted the fix-dynamiccache-key-cache-deprecation branch August 26, 2025 08:37
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Aug 27, 2025
In huggingface#2737, we fixed some code that relied on the deprecated attribute but
some was being missed, as it only runs on the nightly CI with multiple
GPUs. This PR fixes this.

Note that the original transformers code that this solution was based on
no longer exists, as transformers now initializes the cache lazily, so
pre-allocating the keys and values to the correct device is not
necessary. But since prefix tuning inserts "virtual" keys/values, we
still have to ensure the correct device in PEFT.

I have tested the failing tests locally and they pass.
BenjaminBossan added a commit that referenced this pull request Sep 4, 2025
In #2737, we fixed some code that relied on the deprecated attribute but
some was being missed, as it only runs on the nightly CI with multiple
GPUs. This PR fixes this.

Note that the original transformers code that this solution was based on
no longer exists, as transformers now initializes the cache lazily, so
pre-allocating the keys and values to the correct device is not
necessary. But since prefix tuning inserts "virtual" keys/values, we
still have to ensure the correct device in PEFT.

I have tested the failing tests locally and they pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants