You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to reproduce quality of monoT5 on BEIR benchmark from the recent article. But after running script finetune_monot5.py on one epoch, as stated in the description of the checkpoint "monot5-base-msmarco-10k", my results are quite lower.
For example, on NQ, when I use my checkpoint the result is 0.5596 ndcg@10. But when I use the original checkpoint - 0.5676 ndcg@10. On NFCorpus: 0.3604 ndcg@10 with my checkpoint, 0.3778 ndcg@10 with the original.
So, is one epoch of training monoT5 with pytorch script similar to one epoch of training with TF? And with what hyperparameters can I reproduce performance of "monot5-base-msmarco-10k"?