看报告测试给出GPU消耗20G，实测24G显存GPU溢出

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=torch.bfloat16).to("cuda")
加上.to("cuda")这段代码，4090GPU 24G GPU显存溢出，huggingface的demo和github demo 代码差异就在这一行有没有加，以哪个为准呢，有什么区别。