Causal Language Control in Multilingual Transformers via Sparse Feature Steering

Chou, Cheng-Ting; Liu, George; Sun, Jessica; Blondin, Cole; Zhu, Kevin; Sharma, Vasu; O'Brien, Sean

Computer Science > Computation and Language

arXiv:2507.13410 (cs)

[Submitted on 17 Jul 2025]

Title:Causal Language Control in Multilingual Transformers via Sparse Feature Steering

Authors:Cheng-Ting Chou, George Liu, Jessica Sun, Cole Blondin, Kevin Zhu, Vasu Sharma, Sean O'Brien

View PDF HTML (experimental)

Abstract:Deterministically controlling the target generation language of large multilingual language models (LLMs) remains a fundamental challenge, particularly in zero-shot settings where neither explicit language prompts nor fine-tuning are available. In this work, we investigate whether sparse autoencoder (SAE) features, previously shown to correlate with interpretable model behaviors, can be leveraged to steer the generated language of LLMs during inference. Leveraging pretrained SAEs on the residual streams of Gemma-2B and Gemma-9B, we identify features whose activations differ most significantly between English and four target languages: Chinese, Japanese, Spanish, and French. By modifying just a single SAE feature at one transformer layer, we achieve controlled language shifts with up to 90\% success, as measured by FastText language classification, while preserving semantic fidelity according to LaBSE (Language-Agnostic BERT Sentence Embedding) similarity. Our analysis reveals that language steering is most effective in mid-to-late transformer layers and is amplified by specific attention heads disproportionately associated with language-sensitive SAE features. These results demonstrate the promise of sparse feature steering as a lightweight and interpretable mechanism for controllable multilingual generation.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2507.13410 [cs.CL]
	(or arXiv:2507.13410v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2507.13410

Submission history

From: Cheng-Ting Chou [view email]
[v1] Thu, 17 Jul 2025 06:49:16 UTC (1,944 KB)

Computer Science > Computation and Language

Title:Causal Language Control in Multilingual Transformers via Sparse Feature Steering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Causal Language Control in Multilingual Transformers via Sparse Feature Steering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators