-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Do you need to file an issue?
- I have searched the existing issues and this bug is not already filed.
- I believe this is a legitimate bug, not just a question or feature request.
Describe the bug
i setted up lighrag (latest) and i figured something curious.
i chunked in a (german) PDF file, a manual about a sewing machine.
after the processing was done i saw this in the processing window:
Chunk 49 of 50 extracted 7 Ent + 6 Rel chunk-fe3df1c3d9bf3e9ea1d8efc6f8a58915
Chunk 50 of 50 extracted 7 Ent + 4 Rel chunk-896d6c3bb2d8e400e2b9f135d94eccba
Merging stage 1/1: Anleitungsbuch_Naehmaschine_2259_Deutsch.pdf
Phase 1: Processing 613 entities from doc-3f651dbe208da94184b27316170623a8 (async: 24)
LLMmrg: 100m Sprint Rekord | 0+9 (dd 2)
LLMmrg: Noah Carter | 0+21 (dd 5)
LLMmrg: Tokyo | 0+9 (dd 17)
LLMmrg: World Athletics Championship | 0+25
LLMmrg: Carbon-Fiber Spikes | 0+8 (dd 2)
Phase 2: Processing 53 relations from doc-3f651dbe208da94184b27316170623a8 (async: 24)
Chunks appended from relation: c60
LLMmrg: TokyoWorld Athletics Championship | 0+10 (dd 15)
LLMmrg: 100m Sprint RekordNoah Carter | 0+11
Chunks appended from relation: c104
LLMmrg: Carbon-Fiber Spikes~Noah Carter | 0+10
Phase 3: Updating final 616(613+3) entities and 53 relations from doc-3f651dbe208da94184b27316170623a8
Completed merging: 613 entities, 3 extra entities, 53 relations
Completed processing file 1/1: Anleitungsbuch_Naehmaschine_2259_Deutsch.pdf
Enqueued document processing pipeline stopped
When i check the graphs there is Noah Carter.
I checked the content of the PDF file and there is nothing about him in it.
i removed the PDF again completly and read in another PDF.
There is no Noah Carter.
So with this PDF there is allways Noah Carter
I read in an 8 month old redit post that there seems to be a demo content about him in a lightrag folder in the file "prompt.py"?!
I attached the PDF file. maybe someone can confirm.
Anleitungsbuch_Naehmaschine_2259_Deutsch.pdf
Steps to reproduce
fully clean your database and chunk in this attached PDF
Expected Behavior
No response
LightRAG Config Used
Using MS Azure and jina
Logs and screenshots
see above
Additional Information
- LightRAG Version: vv1.4.9.8/0251
- Operating System: Debian 12
- Python Version:
- Related Issues: