这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@angelplusultra
Copy link
Contributor

@angelplusultra angelplusultra commented Oct 13, 2025

Pull Request Type

  • ✨ feat
  • 🐛 fix
  • ♻️ refactor
  • 💄 style
  • 🔨 chore
  • 📝 docs

Relevant Issues

resolves #4508

What is in this change?

This PR introduces two main changes:

  1. YouTube Transcript Support — Adds the ability to pull YouTube video transcripts using the scrapeGenericUrl function.
  2. Improved Introspection Logging — Refactors the scrape function in the web scraper agent tool to provide more specific introspection logs, detailing exactly what the scraper is doing for a given resource.

What These Changes Enable

YouTube Transcript Support

With these updates, chatting with an LLM using @agent can now automatically fetch a YouTube transcript when given a video URL via the web_scraper tool.

Example usage:

@agent Please summarize this video https://www.youtube.com/watch?v=B_H1DxOI6Xs

Additionally, users can now pass a YouTube video URL directly into the URL input field within the RAG document modal to create a document from that video, effectively bypassing the need for the dedicated YouTube data connector.

Improved Introspection Logging

When @agent calls the web_scraper tool and passes in a URL. The tool first verifies what kind of resource it is by analyzing the URL itself and making a HEAD call to retrieve its Content-Type header. Based on this information the introspection logs will inform the user whether the tools will begin to

  • Pull the transcript and metadata for the YouTube video (If the user provides a YouTube video URL)
  • Read the content of the file (If the user provides a URL that responds with a non HTML content type )
  • Scrape the content of the web page (If the user provides a URL that responds with HTML)

Additional Information

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

- Introduced functionality to handle YouTube URLs by validating them and fetching video transcripts.
- Updated the `processVia` logic to include a new option for processing YouTube video transcripts.
- Enhanced the scraping function to format and return transcript content as a document if required.
- Added a utility function to validate YouTube URLs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR:needs review Needs review by core team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: @agent YouTube Transcript Analysis

3 participants