Multi-turn sft issue Qwen3 #1398

friendshipity · 2025-05-13T03:38:50Z

friendshipity
May 13, 2025

In multi-turn SFT with Qwen3, there are multiple "think blocks" across turns, and only the loss of the assistant's replies is computed. This means that all "think blocks" from previous turns are included in the context when computing the loss for each assistant reply. Given that these think blocks are quite long in my dataset, does it make sense to split the multi-turn text and compute the loss for each turn separately (no history think blocks in context), in order to avoid an excessively long token context during training?

jklj077 · 2025-05-13T12:36:32Z

jklj077
May 13, 2025
Maintainer

For multi-turn conversations, the thinking content of the previous turns should be removed except for multi-step tool calls. The official chat template could do that automaticaly for you.

does it make sense to split the multi-turn text and compute the loss for each turn separately (no history think blocks in context)

It would be better to reorganize that one multi-turn example into multiple examples and only keep the thinking block of the final turn. For example, [Q1, T1, A1, Q2, T2, A2, Q3, T3, A3] should be split into [Q1, T1, A1], [Q1, A1, Q2, T2, A2], and [Q1, A1, Q2, A2, Q3, T3, A3].

3 replies

jhsansom Jun 4, 2025

Is there a way to disable this feature (i.e., to retain the thinking blocks of previous turns)? I'm working on a project where I need/want to maintain the reasoning chain for previous steps regardless of whether or not the model has queried a tool.

I believe I understand the basic reasoning behind dynamically removing old thinking blocks from reading this article.

jklj077 Jun 5, 2025
Maintainer

For the chat template used by Qwen, you need to modify

        {%- if loop.index0 > ns.last_query_index %}
            {%- if loop.last or (not loop.last and reasoning_content) %}
                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
            {%- else %}
                {{- '<|im_start|>' + message.role + '\n' + content }}
            {%- endif %}
        {%- else %}
            {{- '<|im_start|>' + message.role + '\n' + content }}
        {%- endif %}

to

        {{- '<\|im_start\|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}

to retain all thinking blocks, including empty thinking blocks, of previous turns.

YMRTZ Jun 25, 2025

For multi-turn conversations, the thinking content of the previous turns should be removed except for multi-step tool calls.

Sorry, by this do you mean that the thinking content of previous turns should be removed and the multi-step tool calls left in, or that thinking content of previous turns should not removed if using multi-step tool calls? @jklj077

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-turn sft issue Qwen3 #1398

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multi-turn sft issue Qwen3 #1398

Uh oh!

Uh oh!

friendshipity May 13, 2025

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

jklj077 May 13, 2025 Maintainer

Uh oh!

jhsansom Jun 4, 2025

Uh oh!

jklj077 Jun 5, 2025 Maintainer

Uh oh!

YMRTZ Jun 25, 2025

friendshipity
May 13, 2025

Replies: 1 comment 3 replies

jklj077
May 13, 2025
Maintainer

jklj077 Jun 5, 2025
Maintainer