feat: Add image analysis support for Gemini models #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit introduces image analysis capabilities to
consult7
, enabling it to process and send image files to compatible multimodal models, initially targeting Google Gemini.Key Features & Changes:
Multimodal Content Handling:
--include-images
command-line flag to enable the processing of image files.file_processor.py
now differentiates between text and image files based on common extensions (PNG, JPG, GIF, WebP, BMP, SVG).format_content
now structures image parts as{"inline_data": {"mime_type": ..., "data": ...}}
to comply with Google Gemini API expectations.Provider-Specific Logic:
supports_images
flag in themodel_info
dictionary (set inconsultation.py
'sget_model_context_info
) to determine if a model/provider combination can handle multimodal input.consultation_impl
uses this flag along with the--include-images
CLI flag to decide whether to send structured multimodalcontent_parts
or a concatenated text string to the provider.GoogleProvider
(providers/google.py
) was updated to:List[Dict[str, Any]]
(multimodal parts) as input.contents
list for the Gemini API, including properly formattedinline_data
parts for images.config=
instead ofgeneration_config=
in thegenerate_content
API call.Bug Fixes & Robustness:
consultation_impl
argument mismatches) by consistently using keyword arguments for optional and server-provided parameters inserver.py
.list_tools
inserver.py
(though the final step of restoringlist_tools
was deferred after confirming the core vision functionality).Token Handling & Utilities:
estimate_image_tokens
totoken_utils.py
.Documentation:
README.md
updated to include the--include-images
flag, image analysis capabilities for Gemini, supported formats, token usage, and example use cases.This series of changes allows
consult7
to effectively leverage Gemini's vision capabilities for tasks involving image analysis alongside text or code, while maintaining compatibility with existing text-only providers.