Thanks for the awesome work. While reading the source code, I had the following question: When calculating the tool reward and format reward, why check whether visual tokens exist in predict_str? My understanding is that predict_str represents the content generated by the model, so shouldn't it be checking for `<tool>` instead? https://github.com/Visual-Agent/DeepEyes/blob/main/verl/utils/reward_score/vl_agent.py#L198-L201