这是indexloc提供的服务,不要输入任何密码
Skip to content

[BUG]: Multi-modal Content Array Not Supported in Database Storage #4088

@bernarduswillson

Description

@bernarduswillson

How are you running AnythingLLM?

Docker (remote machine)

What happened?

AnythingLLM successfully processes multi-modal prompts (text + images) and generates responses, but fails when attempting to save the conversation to the database due to Prisma schema expecting a string for the prompt field instead of an array of content objects.

Are there known steps to reproduce?

  1. Send a request to AnythingLLM with multi-modal content format:
{
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text", 
          text: "Please evaluate these image entries\n\n ID: 1\nCaption: \"This is a test caption\""
        },
        {
          type: "image_url",
          image_url: {
            url: "data:image/jpeg;base64,{base64_image}",
            detail: "low"
          }
        }
      ]
    }
  ]
}
  1. Observe that the LLM processes the request successfully and logs a valid response
  2. Check logs for database errors during conversation saving

Error logs:

Invalid `prisma.workspace_chats.create()` invocation:
{
  data: {
    workspaceId: 13,
    prompt: [
      {
        type: "text",
        text: "Please evaluate these image entries\n\n ID: 1\nCaption: \"This is a test caption\""
      },
      {
        type: "image_url",
        image_url: {
          url: "data:image/jpeg;base64,/9j/2wBDA..."
        }
      }
    ],
    ...
  }
}

Argument `prompt`: Invalid value provided. Expected String, provided (Object, Object).

LLM response logs:

{
  "text": "```json\n{\n  \"results\": [\n    {\n      \"id\": \"1\",\n      \"is_image_valid\": false,\n      \"is_caption_valid\": false,\n      \"explanation\": \"The image appears to be a screenshot of an app interface, not suitable for a daycare setting. The caption 'This is a test caption' is not meaningful or descriptive of the image content.\"\n    }\n  ]\n}\n```",
  "sources": [],
  "type": "chat",
  "metrics": {
    "prompt_tokens": 696,
    "completion_tokens": 84,
    "total_tokens": 780,
    "outputTps": 30.08595988538682,
    "duration": 2.792
  }
}
  • LLM successfully processes the multi-modal prompt
  • Valid response is generated (as seen in logs)
  • Database save operation fails with Prisma validation error
  • Error: Argument 'prompt': Invalid value provided. Expected String, provided (Object, Object).

Metadata

Metadata

Assignees

Labels

investigatingCore team or maintainer will or is currently looking into this issuepossible bugBug was reported but is not confirmed or is unable to be replicated.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions