[BUG]: Bulk link scraper tries to fetch child-links from root of website instead of defined origin

### How are you running AnythingLLM?

Docker (local)

### What happened?

https://discord.com/channels/1114740394715004990/1243578185581334538

The bulk link scraper does not work when the URL you enter is not the root.

If you enter `https://www.somesite.com/some/sub` with depth 1, it will correctly identify the children of that sub `https://www.somesite.com/some/sub/child_#`
However, it will then scrape `https://www.somesite.com/child_#`

E.g.
If you enter `https://learn.microsoft.com/en-us/azure/well-architected/reliability`
It will try to scrape `https://learn.microsoft.com/metrics`  instead of `https://learn.microsoft.com/en-us/azure/well-architected/reliability/metrics`

![image](https://github.com/Mintplex-Labs/anything-llm/assets/6628064/295070d8-cf6f-416b-ae21-d840a88cf671)
![image](https://github.com/Mintplex-Labs/anything-llm/assets/6628064/fd025886-d6e0-4cc1-a945-ff10dbd4c0ae)
`404 - Page not found\n\nWe couldn't find this page.`

### Are there known steps to reproduce?

Enter a URL which is not the root website / homepage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG]: Bulk link scraper tries to fetch child-links from root of website instead of defined origin #1528

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG]: Bulk link scraper tries to fetch child-links from root of website instead of defined origin #1528

Description

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions