Number of tokens per batch mismatch - longformer vs roberta

I see in your conversion notebook that you suggest that the number of tokens per batch should be the same as roberta: 2^18 = 260k

When I look at the roberta paper, it says it uses a sequence length of 512 and a batch size of 8k. This means that each batch has 512*8k = 4M tokens


Am I missing something?