Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language

Bo Zeng; Chenyang Lyu; Sinuo Liu; Mingyan Zeng; Minghao Wu; Xuanfan Ni (倪宣凡); Tianqi Shi; Yu Zhao (宇赵,, 赵宇); Yefeng Liu; Chenyu Zhu; Ruizhe Li; Jiahui Geng; Qing Li; Yu Tong; Longyue Wang; Weihua Luo; Kaifu Zhang

doi:10.18653/v1/2025.acl-long.1172

Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language

Bo Zeng, Chenyang Lyu, Sinuo Liu, Mingyan Zeng, Minghao Wu, Xuanfan Ni, Tianqi Shi, Yu Zhao, Yefeng Liu, Chenyu Zhu, Ruizhe Li, Jiahui Geng, Qing Li, Yu Tong, Longyue Wang, Weihua Luo, Kaifu Zhang

Abstract

Instruction-following capability has become a major ability to be evaluated for Large Language Models. However, existing datasets, such as IFEval, are either predominantly monolingual and centered on English or simply machine translated to other languages, limiting their applicability in multilingual contexts. In this paper, we present an carefully-curated extension of IFEval to a localized multilingual version named Marco-Bench-MIF, covering 30 languages with varying levels of localization. Our benchmark addresses linguistic constraints (e.g., modifying capitalization requirements for Chinese) and cultural references (e.g., substituting region-specific company names in prompts) via a hybrid pipeline combining translation with verification. Through comprehensive evaluation of 20+ LLMs on our Marco-Bench-MIF, we found that: (1) 25-35% accuracy gap between high/low-resource languages, (2) model scales largely impact performance by 45-60% yet persists script-specific challenges, and (3) machine-translated data underestimates accuracy by 7-22% versus localized data. Our analysis identifies challenges in multilingual instruction following, including keyword consistency preservation and compositional constraint adherence across languages. Our Marco-Bench-MIF will be made publicly available to the community.

Anthology ID:: 2025.acl-long.1172
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24058–24072
Language:
URL:: https://aclanthology.org/2025.acl-long.1172/
DOI:: 10.18653/v1/2025.acl-long.1172
Bibkey:
Cite (ACL):: Bo Zeng, Chenyang Lyu, Sinuo Liu, Mingyan Zeng, Minghao Wu, Xuanfan Ni, Tianqi Shi, Yu Zhao, Yefeng Liu, Chenyu Zhu, Ruizhe Li, Jiahui Geng, Qing Li, Yu Tong, Longyue Wang, Weihua Luo, and Kaifu Zhang. 2025. Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 24058–24072, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language (Zeng et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1172.pdf

PDF Cite Search Fix data