Cannot reproduce "monot5-base-msmarco-10k" via pytorch script

Hello!

I am trying to reproduce quality of monoT5 on [BEIR](https://github.com/beir-cellar/beir) benchmark from the recent [article](https://arxiv.org/pdf/2206.02873.pdf). But after running script `finetune_monot5.py` on one epoch, as stated in the description of the checkpoint "monot5-base-msmarco-10k", my results are quite lower. 

For example, on NQ, when I use my checkpoint the result is **0.5596 ndcg@10**. But when I use the original checkpoint  - **0.5676 ndcg@10**. On NFCorpus:  **0.3604 ndcg@10** with my checkpoint,  **0.3778 ndcg@10** with the original.

So, is one epoch of training monoT5 with pytorch script similar to one epoch of training with TF? And with what hyperparameters can I reproduce performance of "monot5-base-msmarco-10k"?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot reproduce "monot5-base-msmarco-10k" via pytorch script #307

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot reproduce "monot5-base-msmarco-10k" via pytorch script #307

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions