[BUG]: File Parsing Fails for URLs Without Explicit File Extensions

### How are you running AnythingLLM?

All versions

### What happened?

When attempting to pull and parse a file using either the RAG Modal or `@agent` mode, the process fails if the URL does not explicitly end with a file extension (e.g., `.pdf`, `.csv`). This occurs even when the server responds with a correct `Content-Type` header that identifies the file type.

**Example:**

The following URL fails to be processed, despite responding with an `application/pdf` content type:

https://arxiv.org/pdf/2307.10265

**Observed Behavior:**

The application logs display the following error:

```

[2] Error processing single file File extension .10265 not supported for parsing and cannot be assumed as text file type.

```

This error originates from the file extension guard located at:

https://github.com/Mintplex-Labs/anything-llm/blob/89a01492b51a23150b59732166b90ebdd1843c50/collector/processSingleFile/index.js#L58-L72

**Expected Behavior:**

The system should be able to successfully pull and parse files from URLs that do not explicitly contain a file extension, provided the `Content-Type` header in the server's response clearly indicates the file's MIME type.

---

### Are there known steps to reproduce?

_No response_

	if (!SUPPORTED_FILETYPE_CONVERTERS.hasOwnProperty(fileExtension)) {
	if (isTextType(fullFilePath)) {
	console.log(
	`\x1b[33m[Collector]\x1b[0m The provided filetype of ${fileExtension} does not have a preset and will be processed as .txt.`
	);
	processFileAs = ".txt";
	} else {
	trashFile(fullFilePath);
	return {
	success: false,
	reason: `File extension ${fileExtension} not supported for parsing and cannot be assumed as text file type.`,
	documents: [],
	};
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG]: File Parsing Fails for URLs Without Explicit File Extensions #4513

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG]: File Parsing Fails for URLs Without Explicit File Extensions #4513

Description

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions