-
Notifications
You must be signed in to change notification settings - Fork 329
Open
Description
Is the two-stage training in the code implemented by manually training twice by changing prompt_format? I didn't see the process of calling the model twice in a row, so the first call of the model uses QCM-R format to save the output rationale, and then integrates it into QCMR-A for answer reasoning?
Then the entire R and A exist in the dataset, and the two models use the initialization model for reasoning. What is the meaning of the first QCM-R training?
Maybe I misunderstood something?
Metadata
Metadata
Assignees
Labels
No labels