这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@tristan-stahnke-GPS
Copy link
Contributor

@tristan-stahnke-GPS tristan-stahnke-GPS commented Apr 24, 2025

Pull Request Type

  • 🐛 fix
  • ♻️ refactor

What is in this change?

This PR significantly refactors the AWSBedrockLLM provider (server/utils/AiProviders/bedrock/index.js) to:

  • Enable Multi-modal Input: Correctly implement support for sending image attachments alongside text prompts to compatible Bedrock models using the Converse API.
  • Fix Max Tokens Error: Address the ValidationException caused by incorrectly using the total context window limit for the maximum output tokens parameter in API calls.
  • Improve Code Quality: Enhance clarity, maintainability, and error handling.

Additional Information

The previous version of the Bedrock provider had two major issues:

No Multi-modal Support: The #generateContent function responsible for handling message attachments was unimplemented or commented out. This prevented users from sending images to multi-modal Bedrock models (like Claude 3).

Incorrect maxTokens Usage: The promptWindowLimit() function (reading AWS_BEDROCK_LLM_MODEL_TOKEN_LIMIT) was used to set the inferenceConfig.maxTokens parameter in API calls. This value represents the total context window, which is often much larger than the maximum number of tokens a model can generate in a single output. This led to ValidationException errors when the configured context window exceeded the model's specific output limit (e.g., sending 131k when the model's output limit was 8192).

Detailed Changes:

Multi-modal Input Implementation (#generateContent & Helpers)
Implemented #generateContent:

  • This private method now correctly processes both userPrompt (text) and attachments (images).

Added getImageFormatFromMime Helper:

  • Parses MIME types (e.g., image/png) from attachments.
  • Extracts the format (png).
  • Normalizes jpg to jpeg.
  • Validates the format against SUPPORTED_BEDROCK_IMAGE_FORMATS (jpeg, png, gif, webp).

Added Base64 Data URI Stripping:

  • Logic within #generateContent now automatically removes the data:image/...;base64, prefix from attachment.contentString before decoding.

Added base64ToUint8Array Helper:

  • Decodes the pure base64 string into a Uint8Array using atob() and charCodeAt(). This method was chosen to closely match the technique observed in previous Langchain JS implementations.
  • Formatted Image Blocks: #generateContent now constructs image blocks in the precise format required by the Bedrock Converse API: { image: { format: "...", source: { bytes: ... } } }.
  • Robustness: Added checks for invalid attachment objects and handles potential errors during base64 decoding, logging warnings/errors and skipping problematic attachments. Ensures the final content array sent to the API is never empty.

Output Token Limit Correction (getMaxOutputTokens & API Calls)
Introduced DEFAULT_MAX_OUTPUT_TOKENS:

  • Defined a constant (4096) as a safe default for the maximum number of tokens to generate in a response.

Added getMaxOutputTokens Method:

  • Reads the optional AWS_BEDROCK_LLM_MAX_OUTPUT_TOKENS environment variable.
  • If the env var is set and valid, it uses that value.
  • Otherwise, it falls back to DEFAULT_MAX_OUTPUT_TOKENS.
  • This clearly separates the output limit from the total context window limit.

Updated API Calls (getChatCompletion, streamGetChatCompletion):

  • Modified the inferenceConfig in both ConverseCommand and ConverseStreamCommand to use the value returned by this.getMaxOutputTokens() for the maxTokens parameter.
  • This ensures the requested output length respects the specific model's generation capabilities, fixing the ValidationException.
  • Clarified promptWindowLimit: Renamed the static function back to promptWindowLimit (from the temporary getContextWindowLimit) for consistency. Added comments clarifying that this function reads AWS_BEDROCK_LLM_MODEL_TOKEN_LIMIT and represents the total context window, primarily used for calculating input limits (this.limits).

Prompt Construction (constructPrompt)

  • Now correctly utilizes the implemented #generateContent to format messages containing text and/or images for both chat history and the final user prompt.
  • Ensures system prompts (both real and simulated for noSystemPromptModels) are generated without attachments.
  • Improved handling of potentially missing/invalid attachments arrays in history messages.

API Call Structure

  • Correctly separates the system content block from the main messages array (containing only user/assistant turns) when calling ConverseCommand and ConverseStreamCommand, adhering to the API's expected structure.

Code Quality and Refinements

  • Added/improved JSDoc comments for all major functions and helpers, explaining parameters, return values, and logic.
    Introduced constants for better maintainability (SUPPORTED_BEDROCK_IMAGE_FORMATS, DEFAULT_MAX_OUTPUT_TOKENS, DEFAULT_CONTEXT_WINDOW_TOKENS).
  • Refined error messages, particularly for the maxTokens validation error, providing more context to the user.
  • Improved validation of environment variables in the constructor.
  • Minor improvements to logging messages.
  • Added MODEL_MAP import (consistent with other providers, though not yet used for Bedrock limits).

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

Tristan Stahnke added 4 commits April 24, 2025 12:59
…ropic Claude Sonnet models:

- Context Window defaults to 8192 maximum, which isn't correct
- Multimodal stopped working when removing langchain, which was transparently handling image_url to a format sonnet expects.
@timothycarambat timothycarambat added the PR:needs review Needs review by core team label Apr 29, 2025
@timothycarambat timothycarambat merged commit b64a77f into Mintplex-Labs:master May 6, 2025
cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
… Limits (Mintplex-Labs#3714)

* Fixed two primary issues discovered while using AWS Bedrock with Anthropic Claude Sonnet models:
- Context Window defaults to 8192 maximum, which isn't correct
- Multimodal stopped working when removing langchain, which was transparently handling image_url to a format sonnet expects.

* Ran `yarn lint`

* Updated .env.example to have aws bedrock examples too

* Refactor for readability
move utils for AWS specific functionality to subfile
add token output max to ENV so setting persits

---------

Co-authored-by: Tristan Stahnke <tristan.stahnke+gpsec@guidepointsecurity.com>
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR:needs review Needs review by core team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants