这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@abhipatel12
Copy link
Collaborator

TLDR

Fixes #10728.

This PR modifies the fallback mechanism in WebFetchTool to respect the Content-Type HTTP header of the fetched resource. Previously, html-to-text conversion was applied unconditionally to all responses in the fallback path, which corrupted non-HTML data such as JSON APIs or raw source code files.

Now, html-to-text conversion is only applied if the Content-Type indicates HTML or is missing. For all other content types, the raw text content is returned.

Dive Deeper

  • Logic Change (packages/core/src/tools/web-fetch.ts):
    • Updated WebFetchToolInvocation.executeFallback to retrieve the content-type header from the fetchWithTimeout response.
    • Implemented logic to check if contentType.includes('text/html') or if it is empty string. Only in these cases is html-to-text.convert() called.
    • Otherwise (e.g., application/json, text/plain), response.text() is used raw.
  • Tests (packages/core/src/tools/web-fetch.test.ts):
    • Added a new describe('execute (fallback)', ...) test block.
    • Mocks the primary Gemini-based fetch to fail (returning empty candidates) to force execution into the executeFallback method.
    • Adds specific test cases for:
      • text/html (verified conversion happens).
      • application/json (verified raw text is used).
      • text/plain (verified raw text is used).
      • Missing Content-Type header (verified conversion happens, preserving existing behavior).

Reviewer Test Plan

Web fetch should still operate normally. Use natural language to guide the usage for the web fetch tool for various content type sites.

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Linked issues / bugs

@abhipatel12 abhipatel12 requested a review from a team as a code owner October 16, 2025 18:02
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @abhipatel12, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue in the WebFetchTool where its fallback mechanism would indiscriminately apply HTML-to-text conversion to all fetched content, leading to data corruption for non-HTML resources like JSON APIs or raw text files. The changes introduce intelligent content type detection, ensuring that html-to-text conversion is only performed when appropriate (for HTML or unspecified content types), while other content types are preserved as raw text. This improves the robustness and accuracy of the web fetching tool.

Highlights

  • Content-Type Header Respect: The WebFetchTool's fallback mechanism now correctly respects the Content-Type HTTP header of fetched resources, preventing unintended conversion of non-HTML content.
  • Conditional HTML Conversion: The html-to-text conversion is now only applied if the Content-Type indicates HTML (e.g., text/html) or if the header is missing (assuming HTML by default).
  • Raw Text for Other Types: For all other content types, such as application/json or text/plain, the raw text content of the response is returned without HTML conversion.
  • Expanded Test Coverage: New tests have been added to specifically cover the fallback mechanism's behavior for various content types, including HTML, JSON, plain text, and cases with missing Content-Type headers.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly modifies the WebFetchTool's fallback mechanism to respect the Content-Type header, preventing the corruption of non-HTML data. The added tests effectively validate this new behavior. I've identified one high-severity issue: the Content-Type check is case-sensitive, which violates the HTTP specification. I've provided a suggestion to make the check case-insensitive to ensure robustness.

@github-actions
Copy link

github-actions bot commented Oct 16, 2025

Size Change: +314 B (0%)

Total Size: 17.8 MB

ℹ️ View Unchanged
Filename Size Change
./bundle/gemini.js 17.8 MB +314 B (0%)
./bundle/sandbox-macos-permissive-closed.sb 1.03 kB 0 B
./bundle/sandbox-macos-permissive-open.sb 830 B 0 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB 0 B
./bundle/sandbox-macos-restrictive-closed.sb 3.29 kB 0 B
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB 0 B
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB 0 B

compressed-size-action

Copy link
Member

@richieforeman richieforeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@abhipatel12 abhipatel12 enabled auto-merge October 16, 2025 18:13
@abhipatel12 abhipatel12 added this pull request to the merge queue Oct 16, 2025
Merged via the queue into main with commit 05930d5 Oct 16, 2025
20 checks passed
@abhipatel12 abhipatel12 deleted the abhipatel12/web-fetch-content-fallback branch October 16, 2025 18:27
thacio added a commit to thacio/auditaria that referenced this pull request Oct 17, 2025
Millsondylan pushed a commit to Millsondylan/gemini-cli-1 that referenced this pull request Oct 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Web Fetch Improvements - Content-Type Aware Fallback

3 participants