FIX: Exploit trust_remote_code in prompt tuning #2896

BenjaminBossan · 2025-11-04T16:25:57Z

Resolves #2888

This is a test for a hypothetical exploit that would enable trust_remote_code (and thus RCE) when a user loads a malicious prompt tuning model. This is because PEFT would just pass the on the tokenizer_kwargs defined in the prompt tuning config unsanitzed, which means that if the tokenizer is also malicious, the malicious code would be executed.

For this exploit to work, a user cannot load a model using PeftModel.from_pretrained as normal, because the tokenizer is only loaded in training mode. Although the attacker could set inference_mode=True in the adapter_config.json, that would still not work because prompt tuning methods cannot be loaded in inference mode. Therefore, the only way for the exploit to work would be if the user manually loads the model.

Resolves huggingface#2888 This is a test for a hypothetical exploit that would enable trust_remote_code (and thus RCE) when a user loads a malicious prompt tuning model. This is because PEFT would just pass the on the tokenizer_kwargs defined in the prompt tuning config unsanitzed, which means that if the tokenizer is also malicious, the malicious code would be executed. For this exploit to work, a user cannot load a model using PeftModel.from_pretrained as normal, because the tokenizer is only loaded in training mode. Although the attacker could set inference_mode=True in the adapter_config.json, that would still not work because prompt tuning methods cannot be loaded in inference mode. Therefore, the only way for the exploit to work would be if the user manually loads the model.

HuggingFaceDocBuilderDev · 2025-11-04T16:30:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

- moved new test to other file to group similar test - fix old test that was failing due to change

githubnemo

LGTM!

BenjaminBossan mentioned this pull request Nov 4, 2025

Potential remote code execution via untrusted tokenizer_kwargs in PromptEmbedding #2888

Closed

Move test, fix old test

7fa0281

- moved new test to other file to group similar test - fix old test that was failing due to change

BenjaminBossan requested a review from githubnemo November 4, 2025 17:23

githubnemo approved these changes Nov 5, 2025

View reviewed changes

BenjaminBossan merged commit ff00848 into huggingface:main Nov 5, 2025
24 of 25 checks passed

BenjaminBossan deleted the fix-exploit-trust-remote-code-prompt-tuning branch November 5, 2025 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX: Exploit trust_remote_code in prompt tuning #2896

FIX: Exploit trust_remote_code in prompt tuning #2896

Uh oh!

BenjaminBossan commented Nov 4, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 4, 2025

Uh oh!

githubnemo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FIX: Exploit trust_remote_code in prompt tuning #2896

FIX: Exploit trust_remote_code in prompt tuning #2896

Uh oh!

Conversation

BenjaminBossan commented Nov 4, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 4, 2025

Uh oh!

githubnemo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants