-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
How are you running AnythingLLM?
Docker (remote machine)
What happened?
Hi!
I am building a private chatbot prototype for clients in the education/welfare sector.
My goal:
For this, when the chatbot recommends some courses (e.g. language courses), I like to provide the url link directly to the website with the course information.
Problem:
Sadly AnythingLLM changes the uploaded url quite a lot in the documents section. This happens with the bulk link scraper or manually the API upload_link() function. In both cases, the original url is changed and the chatbot recommends a broken link.
Example for a course in German:
- Broken uploaded link in AnythingLLM documents section:
www_vhs muehldorf.de-programmberuf-karrierekursIHK-Fachkraft-Rechnungswesen-Steuerrechtliche-GrundlagenA20000.html - Real link from website that I uploaded:
https://www.vhs-muehldorf.de/programm/beruf-karriere/kurs/IHK-Fachkraft-Rechnungswesen-Steuerrechtliche-Grundlagen/A20000
Short-term solution/fix:
I am giving the bot some examples of correct links in the system prompt, but that doesn't work always perfectly and it consumes input tokens.
I would be happy about a solution in AnythingLLM for this! Maybe I can also collaborate or help as a software engineer. Thanks and blessings!
Helpful Information:
For example in my loaded sources the chatbot gets the information for the link from the metadata and sourceDocument:
sourceDocument: www_vhs-lingen.de-programmkursPrivate-Kochkurse-nur-fuer-Euch2024H92000.html published: 10/7/2024, 7:57:11 AM </document_metadata>
But inside the citations dropdown the links are correct, so I suppose they are saved somewhere correctly. If this could be saved in the metadata, maybe that would be a solution. Usable links may be under chunkSource also.
Possibly I could set the link inside the metadata with the API function (raw-text) myself. I need to test that.
Are there known steps to reproduce?
No response