-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Do you need to file an issue?
- I have searched the existing issues and this bug is not already filed.
- I believe this is a legitimate bug, not just a question or feature request.
Describe the bug
After indexing some of my Wiki pages, I ask 'which items in the setup log were marked as DONE?'. This relates to table rows mentioned on a few pages where I have kept an action log relating to my overall task (install new version of our software in test system).
The response is quite long, but completely comprises text that looks like this:
3 1 0 5 2 3 7 1 ----- <P 3 3 6 <KPG- entities' <K <K <what <entity <entity}<SEP <k ---<setup} <K<entity}<K</"NTL<ศร P <K<file <f the <tokens". <entities <KD <S <most <no <the<SEP-<related <no <your own <any <current <No<span the<the <<configuration<PAGEs" "KB<SEP"} "P <"document"}, 8 <"knowledge<w (SEP "SEP"}, "technology <SE <entity <<SEP"} 'file<SEP:<" <JSON", <category "<category</PATTER", "entities", "P" <SEP"} "last<Support <a <SEP"}, <a<SEP"} <file<s<topic} <file <the <>entities <file}<SEP} <<setup<SEP"}
There doesn't seem to be any large chunk of content, it's all single words separated by < signs as you see. I don't know how the software works, it just looked interesting so I thought that I'd have a play, but this to me looks like it's encoded or something. I'm hoping that the pattern of text means somthing to somebody.
Steps to reproduce
I'm playing with running an LLM on my laptop. I'm using the Intel "Ipex-llm" version of ollama with "bge-m3" and "llama3:8b-instruct-q4_K_M". The llama3 responds reasonably via 'ollama run', or in lightrag bypass mode, which is to say, the answers are in English, and make sense.
The most obvious page to discover facts about this question has a table such as this:
| Task | Notes | Status |
|---|---|---|
| create-cloud\Install-ClientSpace.ps1 dev qav2 | 1. CloudShell slow to spin up - no version number\n 2. Failure to exec on cloudshellapi - port has changed to localhost:8080 | DONE |
| Add-CLient | DONE | |
| List-APIKEy | stapikey-SIVN2434-QZIE1360-QVRE3827 | DONE |
| Execute Connector Install | Default version is not 2.0.0-beta15 - needs doing at stable release | DONE |
It's unclear whether this kind of data can be processed by lightRAG, that's not the problem, but again, the jumbled output is plain wrong.
Expected Behavior
I would expect the response to resolve to some form of English. I don't really know what to expect from either llama 3 nor LightRAG, I'm just sussing it out. I'm not really at a stage to judge the capability of the system in terms of accuracy etc. The output I received here is just plain wrong.
I currently have 10 pages indexed. When I had one page and two pages, and used direct questions about individual facts, the response was correct. I think that this issue may be to do with my querying for an extraction of multiple facts (all tasks that are done) that in turn take some inference to determine.
LightRAG Config Used
I'm running lightrag via Docker Compose.
services:
lightrag:
image: ghcr.io/hkuds/lightrag:1.4.6
container_name: lightrag
restart: unless-stopped
ports:
- "8000:9621" # LightRAG Web UI + API
env_file:
- ./lightrag.env
volumes:
- lightrag_data:/app/workspace # persists LightRAG data/cache
volumes:
lightrag_data:# Point LightRAG to Ollama
LLM_BINDING=ollama
LLM_MODEL=llama3:8b-instruct-q4_K_M
LLM_BINDING_HOST=http://host.docker.internal:11434
OLLAMA_LLM_NUM_CTX=32768
ENABLE_LLM_CACHE=true
ENABLE_LLM_CACHE_FOR_EXTRACT=true
SUMMARY_LANGUAGE=English
OLLAMA_EMULATING_MODEL_TAG=latest
MAX_ASYNC=1
MAX_PARALLEL_INSERT=1
EMBEDDING_FUNC_MAX_ASYNC=1
EMBEDDING_BATCH_NUM=1
CHUNK_SIZE=800 # was 1200
CHUNK_OVERLAP_SIZE=50 # was 100
MAX_TOTAL_TOKENS=20000 # was 70000 (this limits giant prompts)
# Embeddings via Ollama (LightRAG also needs this)
EMBEDDING_BINDING=ollama
EMBEDDING_MODEL=bge-m3
EMBEDDING_BINDING_HOST=http://host.docker.internal:11434
EMBEDDING_DIM=1024
OLLAMA_EMBEDDING_NUM_CTX=32768
RERANK_BINDING=null
# Reasonable defaults (file-based storage + built-in vector store)
WORKING_DIR=/app/workspaceLogs and screenshots
Sorry, I shut the whole thing down and didn't capture the logs. Whole text of answer to "which items in the setup log were marked as DONE?" is below:
file Following settings 2 a general<SEP < SEP 1 JSON-based specific 4 following PUBLISH- entity < more the "PICK an entity's knowledge The "no "PENDING (no template document the "Document Current User
1 2 KEG
KPIKPG/KG 1 Document 1 entities 3<SEP - < JSON 2<SEP 4<THEKT-<SEP - 3 2<SEP < > "Technology</entities "file "<SE 4
The <entity <JSON (JSON<written</ entity<p 5
- "K} ---> } entity<SEP <description <the <& <title 1 4} } 'No</entity <most <KPI the <info <KPT<bringing <SE 2 0<SEP <data}
The "document
concise <KPGT -
The "<K <JSON You're 7<SEP (general <"Document" <user information in 1 "PATTER 3 1<SEP, type of "API <" "technology <its<written <" <any<SE the 'at sep <k<entity <K <k<SEP <kw 2 <any entity (related <this type of 4<SEP
#<currents<SEP
3 1 0 5 2 3 7 1 ----- <P 3 3 6 <KPG- entities' <K <K <what <entity <entity}<SEP <k ---<setup} <K<entity}<K</"NTL<ศร P <K<file <f the <tokens". <entities <KD <S <most <no <the<SEP-<related <no <your own <any <current <No<span the<the <<configuration<PAGEs" "KB<SEP"} "P <"document"}, 8 <"knowledge<w (SEP "SEP"}, "technology <SE <entity <<SEP"} 'file<SEP:<" <JSON", <category "<category</PATTER", "entities", "P" <SEP"} "last<Support <a <SEP"}, <a<SEP"} <file<s<topic} <file <the <>entities <file}<SEP} <<setup<SEP"}
---<SE <no<SEP</<entity <technology -<SEP"}, <any<SEP<file<b</<SEP}
and <t- <most <file". "P<file}.
file P"
custom---" "file "file<description}<path<s</entity <technology
<SEP "technology<i<b inst-sh, file <any<SEP</file <specific}<P<file"}, <Install"} <file<file <<some <the<category<installation"<em" "follow<wrong<a<Publish<spaned<SEP the <token</entity <<SEP"}
The file"
"tech</<file"}
<the<SEP"}
---<document" <P</file<s</entropy <last<Saved}<file"} <last</> <software
AP <SEP">>} <folder-<SEP <technology<description <A/<entity <section"<category</entity <file"}, " Technology":<SEP",<SEP}<P"><file"},{"description"}}, "follows
, < <the}<m<p}</t</entity}, <tr}<file <file</> } file <general<SEP} " /> <technology</entity <SEP"} "file"-</entity <a<SEP},<file</</tech-Technology<most <file}, <file <technology <file<s", <file"}, < SEP"><file}, <file"}, <file"}, <>file<span on the file <SEP} <" <technology
", description: <token"}
description":<SEP"} <familiar-} "category"} "file"}<file"}, <SEP", "file"}, <user", <<SEP",<file"<file"}, <SEP}<T-13-<entity<file"}, file Following settings 2 a general<SEP < SEP 1 JSON-based specific 4 following PUBLISH- entity < more the "PICK an entity's knowledge The "no "PENDING (no template document the "Document Current User
1 2 KEG
KPIKPG/KG 1 Document 1 entities 3<SEP - < JSON 2<SEP 4<THEKT-<SEP - 3 2<SEP < > "Technology</entities "file "<SE 4
The <entity <JSON (JSON<written</ entity<p 5
- "K} ---> } entity<SEP <description <the <& <title 1 4} } 'No</entity <most <KPI the <info <KPT<bringing <SE 2 0<SEP <data}
The "document
concise <KPGT -
The "<K <JSON You're 7<SEP (general <"Document" <user information in 1 "PATTER 3 1<SEP, type of "API <" "technology <its<written <" <any<SE the 'at sep <k<entity <K <k<SEP <kw 2 <any entity (related <this type of 4<SEP
#<currents<SEP
3 1 0 5 2 3 7 1 ----- <P 3 3 6 <KPG- entities' <K <K <what <entity <entity}<SEP <k ---<setup} <K<entity}<K</"NTL<ศร P <K<file <f the <tokens". <entities <KD <S <most <no <the<SEP-<related <no <your own <any <current <No<span the<the <<configuration<PAGEs" "KB<SEP"} "P <"document"}, 8 <"knowledge<w (SEP "SEP"}, "technology <SE <entity <<SEP"} 'file<SEP:<" <JSON", <category "<category</PATTER", "entities", "P" <SEP"} "last<Support <a <SEP"}, <a<SEP"} <file<s<topic} <file <the <>entities <file}<SEP} <<setup<SEP"}
---<SE <no<SEP</<entity <technology -<SEP"}, <any<SEP<file<b</<SEP}
and <t- <most <file". "P<file}.
file P"
custom---" "file "file<description}<path<s</entity <technology
<SEP "technology<i<b inst-sh, file <any<SEP</file <specific}<P<file"}, <Install"} <file<file <<some <the<category<installation"<em" "follow<wrong<a<Publish<spaned<SEP the <token</entity <<SEP"}
The file"
"tech</<file"}
<the<SEP"}
---<document" <P</file<s</entropy <last<Saved}<file"} <last</> <software
AP <SEP">>} <folder-<SEP <technology<description <A/<entity <section"<category</entity <file"}, " Technology":<SEP",<SEP}<P"><file"},{"description"}}, "follows
, < <the}<m<p}</t</entity}, <tr}<file <file</> } file <general<SEP} " /> <technology</entity <SEP"} "file"-</entity <a<SEP},<file</</tech-Technology<most <file}, <file <technology <file<s", <file"}, < SEP"><file}, <file"}, <file"}, <>file<span on the file <SEP} <" <technology
", description: <token"}
description":<SEP"} <familiar-} "category"} "file"}<file"}, <SEP", "file"}, <user", <<SEP",<file"<file"}, <SEP}<T-13-<entity<file"}, peer closed connection without sending complete message body (incomplete chunked read)
Additional Information
- LightRAG Version: image: ghcr.io/hkuds/lightrag:1.4.6
- Operating System: Host Os is Windows 11
- Python Version: From stated image
- Related Issues: