-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Open
Labels
CoreLightRAG CoreLightRAG CorediscussenhancementNew feature or requestNew feature or requesttrackedIssue is tracked by projectIssue is tracked by project
Milestone
Description
Background
LightRAG currently merges entities solely based on exact name matches (including captions). This results in multiple disconnected nodes for the same entity under different names, and may even create isolated subgraphs for identical entities, ultimately degrading query performance.
Automated Entity Merging for Variant Names
To address this, we propose an automated entity merging approach for differently named but identical entities:
-
Vector Node Database Utilization:
- Modify node vector DB implementation to store the embedded vector on entity name.
-
Similarity Threshold Configuration:
- Set a minimum cosine similarity threshold (e.g., 0.8) for candidate selection.
-
Candidate Retrieval:
- During merging, retrieve the top 10 most relevant nodes based on cosine similarity (above the threshold).
-
LLM-Based Merge Validation:
- Submit the current entity’s name/description along with candidate entities’ names/descriptions to an LLM.
- Task the LLM to:
- Determine whether merging is justified,
- If merging is approved, select a best candidate for merging, and return the consolidated entity name and description.
-
Iterative Merging With Depth Limitation (optional):
- Repeat the merging validation process for the newly consolidated entity returned by the LLM.
bunkerskyi, morioka, ralf-plbk, Konsilion, Litchee2 and 4 more
Metadata
Metadata
Assignees
Labels
CoreLightRAG CoreLightRAG CorediscussenhancementNew feature or requestNew feature or requesttrackedIssue is tracked by projectIssue is tracked by project