Details about the ablation studies

Hi ！Can I ask some details about the 'RL w. Text-only CoT' setting in Table 5 in your paper?
In this setting, did you ask the model to also output the coordinates of the box but didn't input the cropped image back into the llm? Or just output the thinking process like traditional CoT?
Looking forward to your reply. Thanks !