这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@shatfield4
Copy link
Collaborator

@shatfield4 shatfield4 commented Jun 29, 2024

Pull Request Type

  • ✨ feat
  • 🐛 fix
  • ♻️ refactor
  • 💄 style
  • 🔨 chore
  • 📝 docs

Relevant Issues

resolves #1784

What is in this change?

  • Remove the LangChain PDFLoader and replace it with pdfjs
  • pdfjs allows us to be able to get more info like chapters and bookmarks from a PDF
  • Using pdfjs will allow us to improve RAG results in the future

Additional Information

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

@shatfield4 shatfield4 linked an issue Jun 29, 2024 that may be closed by this pull request
@shatfield4 shatfield4 self-assigned this Jun 29, 2024
@shatfield4 shatfield4 marked this pull request as draft June 29, 2024 00:52
@shatfield4 shatfield4 marked this pull request as ready for review July 1, 2024 20:57
@shatfield4 shatfield4 added the PR:needs review Needs review by core team label Jul 1, 2024
timothycarambat and others added 5 commits July 3, 2024 14:02
…784-feat-add-bookmarks-to-pdf-metadata-in-pdfloader
…f github.com:Mintplex-Labs/anything-llm into 1784-feat-add-bookmarks-to-pdf-metadata-in-pdfloader
…f github.com:Mintplex-Labs/anything-llm into 1784-feat-add-bookmarks-to-pdf-metadata-in-pdfloader
@timothycarambat timothycarambat merged commit a870148 into master Jul 3, 2024
@timothycarambat timothycarambat deleted the 1784-feat-add-bookmarks-to-pdf-metadata-in-pdfloader branch July 3, 2024 21:26
CrackerCat pushed a commit to CrackerCat/anything-llm that referenced this pull request Jul 31, 2024
…s#1791)

* WIP replace langchain pdfloader with pdfjs and add more context to each page

* remove extras from pdfjs and just replace langchain library

* remove unneeded dep

* fix console log in docs

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
CrackerCat pushed a commit to CrackerCat/anything-llm that referenced this pull request Aug 1, 2024
…s#1791)

* WIP replace langchain pdfloader with pdfjs and add more context to each page

* remove extras from pdfjs and just replace langchain library

* remove unneeded dep

* fix console log in docs

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
CrackerCat pushed a commit to CrackerCat/anything-llm that referenced this pull request Aug 2, 2024
…s#1791)

* WIP replace langchain pdfloader with pdfjs and add more context to each page

* remove extras from pdfjs and just replace langchain library

* remove unneeded dep

* fix console log in docs

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
CrackerCat pushed a commit to CrackerCat/anything-llm that referenced this pull request Aug 3, 2024
…s#1791)

* WIP replace langchain pdfloader with pdfjs and add more context to each page

* remove extras from pdfjs and just replace langchain library

* remove unneeded dep

* fix console log in docs

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
…s#1791)

* WIP replace langchain pdfloader with pdfjs and add more context to each page

* remove extras from pdfjs and just replace langchain library

* remove unneeded dep

* fix console log in docs

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR:needs review Needs review by core team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: Add Bookmarks to PDF Metadata in PDFLoader

3 participants