+
Skip to content

Conversation

eric-anderson
Copy link
Collaborator

  • Minor refactoring of opensearch_writer.py and utils.py to enable re-use.
  • Manual test to run against real opensearch and experiment.
  • Unit test to exercise implementation.

Next part will handle deletion/update in the source.

…ssing metadata on source or destination.

* Minor refactoring of opensearch_writer.py and utils.py to enable re-use.
* Manual test to run against real opensearch and experiment.
* Unit test to exercise implementation.

Next part will handle deletion/update in the source.
@eric-anderson eric-anderson requested a review from bsowell June 9, 2025 18:33
Copy link
Contributor

@bsowell bsowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this. I think since it's a separate connector, I'm not super worried -- we will be able to adjust as we add the other features like delete.

I was a bit confused about how to use this with DocSets like the other connectors. Is that possible/coming later?

to_be_loaded_groups[i].append(f)

for i, g in enumerate(to_be_loaded_groups):
root, splitter = self.sources[i]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent a while trying to figure out exactly what a splitter is supposed to be in this context -- perhaps just because "splitter" is used elsewhere in sycamore. In this case, would "explode" be the canonical splitter in, e.g., our standard docstore ingestion pipeline?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an explanation of splitter to class, added pointer where you asked.



# Todo accept sources as docset and require it end with materialize.
class OpenSearchSync:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is what the todo is referring to, but I'm a bit confused about the interface here. This doesn't look like our other connectors. How do I use it if I have a DocSet and want to write to OpenSearch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, you would materialize out, execute and run the connector. In the future I expect to add something that verifies you materialized as the last step, and then executes that and runs the reliable write. So you'd be able to write:
sycamore.init().read.whatever().ops().materialize("/tmp/example").write.reliable_opensearch(params)

return pid_to_parts

def os_client(self):
assert False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this left over from debugging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and none of the tests use real opensearch so it wasn't caught.

@eric-anderson
Copy link
Collaborator Author

I went through this. I think since it's a separate connector, I'm not super worried -- we will be able to adjust as we add the other features like delete.

I was a bit confused about how to use this with DocSets like the other connectors. Is that possible/coming later?

Right now, you would materialize out, execute and run the connector. In the future I expect to add something that verifies you materialized as the last step, and then executes that and runs the reliable write. So you'd be able to write:
sycamore.init().read.whatever().ops().materialize("/tmp/example").write.reliable_opensearch(params)

@eric-anderson eric-anderson merged commit d41ac3e into main Jun 11, 2025
12 of 15 checks passed
@eric-anderson eric-anderson deleted the eric-opensearch-sync branch June 11, 2025 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载