θΏ™ζ˜―indexlocζδΎ›ηš„ζœεŠ‘οΌŒδΈθ¦θΎ“ε…₯任何密码
Skip to content

Conversation

@shatfield4
Copy link
Collaborator

@shatfield4 shatfield4 commented Apr 26, 2024

Pull Request Type

  • ✨ feat
  • πŸ› fix
  • ♻️ refactor
  • πŸ’„ style
  • πŸ”¨ chore
  • πŸ“ docs

Relevant Issues

resolves #1190

What is in this change?

  • Create data connector that will scrape to X depth of links on site
  • Only finds links with matching domain name on site to scrape only links that are on the same website

Additional Information

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

@shatfield4 shatfield4 linked an issue Apr 26, 2024 that may be closed by this pull request
@shatfield4 shatfield4 changed the title WIP website depth scraping, (sort of works) [FEAT] Website depth scraping data connector Apr 26, 2024
@shatfield4 shatfield4 self-assigned this Apr 26, 2024
@shatfield4 shatfield4 removed the request for review from timothycarambat April 26, 2024 20:36
@shatfield4 shatfield4 marked this pull request as ready for review April 26, 2024 21:52
@timothycarambat timothycarambat added the PR:needs review Needs review by core team label Apr 26, 2024
@shatfield4
Copy link
Collaborator Author

@timothycarambat, refactored based on what we discussed.

  • Creates array of all links so we know how many links before main scraping starts
  • Passes the array to bulk scraping function

@timothycarambat timothycarambat merged commit 612a7e1 into master May 14, 2024
@timothycarambat timothycarambat deleted the 1190-feat-website-scraping-depth branch May 14, 2024 19:49
CrackerCat pushed a commit to CrackerCat/anything-llm that referenced this pull request Jul 31, 2024
* WIP website depth scraping, (sort of works)

* website depth data connector stable + add maxLinks option

* linting + loading small ui tweak

* refactor website depth data connector for stability, speed, & readability

* patch: remove console log
Guard clause on URL validitiy check
reasonable overrides

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
CrackerCat pushed a commit to CrackerCat/anything-llm that referenced this pull request Aug 1, 2024
* WIP website depth scraping, (sort of works)

* website depth data connector stable + add maxLinks option

* linting + loading small ui tweak

* refactor website depth data connector for stability, speed, & readability

* patch: remove console log
Guard clause on URL validitiy check
reasonable overrides

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
CrackerCat pushed a commit to CrackerCat/anything-llm that referenced this pull request Aug 2, 2024
* WIP website depth scraping, (sort of works)

* website depth data connector stable + add maxLinks option

* linting + loading small ui tweak

* refactor website depth data connector for stability, speed, & readability

* patch: remove console log
Guard clause on URL validitiy check
reasonable overrides

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
CrackerCat pushed a commit to CrackerCat/anything-llm that referenced this pull request Aug 3, 2024
* WIP website depth scraping, (sort of works)

* website depth data connector stable + add maxLinks option

* linting + loading small ui tweak

* refactor website depth data connector for stability, speed, & readability

* patch: remove console log
Guard clause on URL validitiy check
reasonable overrides

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
* WIP website depth scraping, (sort of works)

* website depth data connector stable + add maxLinks option

* linting + loading small ui tweak

* refactor website depth data connector for stability, speed, & readability

* patch: remove console log
Guard clause on URL validitiy check
reasonable overrides

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR:needs review Needs review by core team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: Website scraping depth

3 participants