+
Skip to content

Conversation

austintlee
Copy link

No description provided.

assert doc.doc_id, "Retrieved invalid doc with missing doc_id"
if not doc.parent_id:
if query_params.document_cache is not None:
cached_doc = query_params.document_cache.get(f"{index_name}:{doc.doc_id}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to move this get_key logic to a shared method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

# doc_ids = list(unique_docs.keys())
doc_ids = [d for d in list(unique_docs.keys()) if d not in list(cached_docs.keys())]

# We can't safely exclude embeddings since we might need them for 'rerank', e.g.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do embedding style reranking locally? Won't the knn result just be ordered? Or do we expect to reorder based on something else after

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. When I wrote the comment, I thought we use embeddings for reranking. In general, though, I don't know if we can safely drop them in the context of document reconstruct. Maybe document reconstruct should be a set of options not just a boolean.

doc.elements.sort(key=lambda e: e.element_index if e.element_index is not None else float("inf"))
if doc.doc_id not in cached_docs:
doc.elements.sort(key=lambda e: e.element_index if e.element_index is not None else float("inf"))
if query_params.document_cache is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can cached_docs have anything if document_cache is None?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be an empty dict.

query: Optional[Dict] = None,
reconstruct_document: bool = False,
query_kwargs: dict[str, Any] = {},
query_kwargs=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change to an optional? Just for consistency

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载