这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@franzbischoff
Copy link
Contributor

@franzbischoff franzbischoff commented Aug 10, 2023

As discussed with @timothycarambat , this implements the PyMuPDF that is quick and handles Unicode better.
Also, allows us to use actual PDF metadata as 'title', 'author', etc, to populate the embedded metadata for further references when searching.

  • One thing I find out is that the window that shows the references of the searches is not handling Unicode on 'title' field, but it is on the text body. This should be fixed.

  • Known bug: lanceDB insists in assume the Scheme is different when differerent filetypes are uploaded for embedding. Pinecone handle this ok.

I've reviewed the documentation and the actual tables created by lanceDB and could not figure out.

The rest works fine.

@franzbischoff franzbischoff marked this pull request as draft August 10, 2023 22:57
@franzbischoff franzbischoff marked this pull request as ready for review August 16, 2023 20:04
@timothycarambat timothycarambat self-assigned this Aug 16, 2023
@franzbischoff

This comment was marked as resolved.

@franzbischoff franzbischoff marked this pull request as draft September 11, 2023 04:19
@franzbischoff franzbischoff marked this pull request as ready for review September 11, 2023 04:36
@timothycarambat
Copy link
Member

moved to #241

@franzbischoff franzbischoff deleted the feature/metadata branch September 19, 2023 03:10
@franzbischoff franzbischoff restored the feature/metadata branch September 19, 2023 03:17
@franzbischoff franzbischoff deleted the feature/metadata branch November 4, 2023 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants