这是indexloc提供的服务,不要输入任何密码
Skip to content

[PROTOTYPE] Feature Suggestion: Graphrag update #8878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

adrianad
Copy link
Contributor

What problem does this PR solve?

It started with me wanting to know how many nodes and edges I had in my Graph. Then the graph kept resetting halfway through the parsing. I think there is a race condition where the graph gets deleted before being rebuilt (set_graph() in ragflow/graphrag/utils.py) and the next task merging on the empty graph (maybe some issue with the lock, im not sure). It happened with multiple task executors but also with just one. By splitting up the process and separating it from the parsing a couple of issues can be solved:

  1. Race conditions by only building the graph once the enitity extraction is done
  2. Inefficient workflow requiring full document reparsing to create knowledge graphs
  3. Suboptimal community detection running running too often during parsing, only needs to run at the end

Moved all GraphRAG functionalities to the Knowledge Graph tab
Tab always shows, all tasks can be started from there:
Extract Entities
Build Graph
Resolve Entities
Detect Communities
Show Graph statistics with total number of Nodes, Edges and Communities
Show progress of all tasks
Visualize communities

image

Type of change

  • New Feature (non-breaking change which adds functionality)
  • Bug Fix (non-breaking change which fixes an issue)
  • Refactoring

Additional Notes

This is a working implementation that addresses the core issues, but the code still needs cleanup and refinement. I'm looking for feedback on the overall approach and architecture before spending time polishing the implementation details. I saw on the roadmap that you are also working on the Graphrag part, so I don't know if this is even needed.

Key improvements:

  • All GraphRAG tasks accessible from one location
  • Real-time progress tracking
  • Graph statistics display (nodes, edges, communities)
  • Community visualization
  • Can be performed independently after document parsing

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. ☯️ refactor Pull request that refactor/refine code 🐞 bug Something isn't working, pull request that fix bug. 💞 feature Feature request, pull request that fullfill a new feature. labels Jul 16, 2025
@adrianad
Copy link
Contributor Author

More information about the potential race condition in #8882

@yingfeng yingfeng requested a review from KevinHuSh July 16, 2025 14:38
@yingfeng
Copy link
Member

Thanks a lot, please fix the conflict~

@adrianad
Copy link
Contributor Author

I mainly would like to know what you think about it before I put more time into it as no one requested this feature.

@KevinHuSh
Copy link
Collaborator

Appreciation!
Very brilliant desgin in term of trade-off betweem usability and tech-cost.

From the screenshot, I can't find the process showing off dialog. Is there a placed to show the progress of every step?

@adrianad
Copy link
Contributor Author

The progress is shown under status in the top left. (Video is 2x speed)

Screencast.from.2025-07-17.14-27-18_2x.webm

@KevinHuSh
Copy link
Collaborator

#8932

@0zxq0
Copy link

0zxq0 commented Jul 24, 2025

我主要想知道在我投入更多时间之前你对它有何看法,因为没有人要求这个功能。

I need this feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working, pull request that fix bug. 💞 feature Feature request, pull request that fullfill a new feature. ☯️ refactor Pull request that refactor/refine code size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants