+
Skip to content

Conversation

mdwelsh
Copy link
Contributor

@mdwelsh mdwelsh commented Nov 18, 2024

This should fix the issue where SummarizeData leaves us with an incomplete materialize dir when the total doc size exceeds the LLM token limit.

# so that the materialized data is complete, even if they are not all included
# in the input prompt to the LLM.
for di, doc in enumerate(result.take_all()):
if isinstance(doc, MetadataDocument):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need MetadataDocument check. take_all removes those.

# For query result caching in the executor, we need to consume the documents
# so that the materialized data is complete, even if they are not all included
# in the input prompt to the LLM.
for di, doc in enumerate(result.take_all()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works at small scale, but will blow up memory at large scale.
Approving as this is NTSB only, but I suggest a TODO or something.

)

# First run should populate cache.
executor = SycamoreExecutor(context, cache_dir=temp_dir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test fast? If it is, then fine to leave in unit tests, but if it's more than 5-10s, I'd like to get it moved into integration tests. I initially thought this would use ray (which basically guarantees it's slow), but I'm no longer sure.

@mdwelsh mdwelsh merged commit 29115ef into main Nov 20, 2024
11 of 14 checks passed
@HenryL27 HenryL27 deleted the matt/fix-summarize branch August 30, 2025 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载