-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Fetch, Parse, and Create Documents for Statically Hosted Files #4398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
β¦oad and parse statically hosted files
β¦ponse content type, as a result unlocked all supported files
β¦ndard MS Word files in constants.js
| @@ -0,0 +1,31 @@ | |||
| const { WATCH_DIRECTORY } = require("../../utils/constants"); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make more sense to have this single util that downloads files moved to collector/utils/files/index.js since it has to do with files only and we can probably re-use it too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix in 0bba2c7
β¦d into getPageContent | Return explicit argument of captureAs into scrapeGenericUrl in processLink fn
β¦obal module | Add URL valuidation to downloadURIToFile
remove unused imports
Pull Request Type
Relevant Issues
resolves #2110
What is in this change?
This PR introduces an enhancement to the document upload modal: the web scraping input field can now fetch, parse, and convert statically hosted files (e.g., https://example.com/a-file.pdf) and API endpoints returning JSON (e.g., https://jsonplaceholder.typicode.com/posts) into application documents.
Currently Supporting:
Currently Supporting:
Text
.txt,.md,.org,.adoc,.rst.html.csv.jsonDocuments
.docx(NOTE:.docfiles are currently not supported, only.docx).odt.pdf.epubPresentations
.pptx.odpSpreadsheets
.xlsxEmail
.mboxAudio
.wav.mp3Video
.mp4.mpegImages
.png.jpgAdditional Information
The core functionality hinges on the Content-Type header of the response. If the mimetype is supported, your file will be pulled and processed.
Developer Validations
yarn lintfrom the root of the repo & committed changes