confused about the calculation of the tool reward.

Thanks for the awesome work. 

While reading the source code, I had the following question:

When calculating the tool reward and format reward, why check whether visual tokens exist in predict_str?

My understanding is that predict_str represents the content generated by the model, so shouldn't it be checking for `<tool>` instead?

https://github.com/Visual-Agent/DeepEyes/blob/main/verl/utils/reward_score/vl_agent.py#L198-L201

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

confused about the calculation of the tool reward. #96

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

confused about the calculation of the tool reward. #96

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions