θΏ™ζ˜―indexlocζδΎ›ηš„ζœεŠ‘οΌŒδΈθ¦θΎ“ε…₯任何密码
Skip to content

Conversation

@DipFlip
Copy link
Contributor

@DipFlip DipFlip commented Aug 4, 2024

Pull Request Type

  • ✨ feat
  • πŸ› fix
  • ♻️ refactor
  • πŸ’„ style
  • πŸ”¨ chore
  • πŸ“ docs

Relevant Issues

resolves #1959

What is in this change?

This PR enables the Github and Gitlab data collectors to accept patterns like

*,  !*/,  !**/*.pdf

that selects only pdf files.

It fixes the issue by updating the GithubRepoCollector to use the latest l@langchain/community/document_loaders/web/github instead of the deprecated
langchain/document_loaders/web/github. Since @langchain/community 0.2.23 a bug with the GithubRepoLoader was fixed so negative gitignore patterns like !/* is now properly handled.

The GitlabRepoLoader was also not handling gitignore patterns with a ! properly. This PR makes use of the ignore package to properly check whether a file is ignored or not by the set ignorePaths.

Additional Information

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

@DipFlip
Copy link
Contributor Author

DipFlip commented Aug 4, 2024

@timothycarambat I wasn't sure whether I should have included collector/yarn.lock in this PR, should I?

@DipFlip DipFlip force-pushed the 1959-filetype-filters branch from a4399a8 to 24a4d56 Compare August 4, 2024 22:39
@timothycarambat
Copy link
Member

@DipFlip only if it was updated with some dependency relevant to the PR

@DipFlip
Copy link
Contributor Author

DipFlip commented Aug 5, 2024

@DipFlip only if it was updated with some dependency relevant to the PR

@timothycarambat Ok I did add the collector/yarn.lock now as @langchain/community@^0.2.23" is a new dependency.

@timothycarambat timothycarambat added the PR:needs review Needs review by core team label Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR:needs review Needs review by core team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: filter data connector for only selected filetypes

2 participants