这是indexloc提供的服务,不要输入任何密码
Skip to content

Respectfully asking: maximum images supported in multi-image understanding? #514

@ThomaswellY

Description

@ThomaswellY

Hello Qwen-VL-Chat Team!

I am currently exploring multi-image understanding tasks with Qwen-VL-Chat. I understand that the model already supports two-image comparison questions, for example:

query = tokenizer.from_list_format([ {'image': 'assets/mm_tutorial/Chongqing.jpeg'}, {'image': 'assets/mm_tutorial/Beijing.jpeg'}, {'text': '上面两张图片分别是哪两个城市?请对它们进行对比。'}, ])

This functionality is extremely useful.

I have tried adding more images (up to 10) in a similar format, but I observed that the model’s answers became repetitive and out of order. This led me to wonder:

  1. What is the maximum number of images that the model can accept in a single input?
  2. If more images are provided than this limit, how does the model behave (e.g., error, truncation, unexpected output)?
  3. Are there any recommended or more “compliant” ways to format multi-image inputs to ensure reliable responses, especially when the number of images exceeds two?
  4. If multi-round calls to model.chat are necessary to handle many images, do you have any suggestions or best practices to reduce time consumption during these calls?

Your guidance would be greatly appreciated, as it would help me design experiments effectively and ensure that I use the model in the best way possible.

Thank you very much for your time and support!

Best regards~

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions