-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
How are you running AnythingLLM?
All versions
What happened?
When attempting to pull and parse a file using either the RAG Modal or @agent mode, the process fails if the URL does not explicitly end with a file extension (e.g., .pdf, .csv). This occurs even when the server responds with a correct Content-Type header that identifies the file type.
Example:
The following URL fails to be processed, despite responding with an application/pdf content type:
https://arxiv.org/pdf/2307.10265
Observed Behavior:
The application logs display the following error:
[2] Error processing single file File extension .10265 not supported for parsing and cannot be assumed as text file type.
This error originates from the file extension guard located at:
anything-llm/collector/processSingleFile/index.js
Lines 58 to 72 in 89a0149
| if (!SUPPORTED_FILETYPE_CONVERTERS.hasOwnProperty(fileExtension)) { | |
| if (isTextType(fullFilePath)) { | |
| console.log( | |
| `\x1b[33m[Collector]\x1b[0m The provided filetype of ${fileExtension} does not have a preset and will be processed as .txt.` | |
| ); | |
| processFileAs = ".txt"; | |
| } else { | |
| trashFile(fullFilePath); | |
| return { | |
| success: false, | |
| reason: `File extension ${fileExtension} not supported for parsing and cannot be assumed as text file type.`, | |
| documents: [], | |
| }; | |
| } | |
| } |
Expected Behavior:
The system should be able to successfully pull and parse files from URLs that do not explicitly contain a file extension, provided the Content-Type header in the server's response clearly indicates the file's MIME type.
Are there known steps to reproduce?
No response