Adding clustering and groupby in luna planner part #1183

bohou-aryn · 2025-02-13T21:31:04Z

No description provided.

baitsguy

Code looks good, some clarifying questions

baitsguy · 2025-02-13T23:13:35Z

lib/sycamore/sycamore/data/document.py

        if "lineage_id" not in self.data:
            self.update_lineage_id()

+    def get_by_path(self, path: str):


is this the same as field_to_value(..)?

seems similar, would try to reuse that one.

baitsguy · 2025-02-14T18:28:49Z

lib/sycamore/sycamore/docset.py

+            return {"vector": doc.embedding, "cluster": -1} if field_name is None else {"vector": doc[field_name], "cluster": -1}

-        embeddings = self.plan.execute().map(init_embedding).materialize()
+        embeddings = self.plan.execute().filter(filter_meta).map(init_embedding).materialize()


what's the materialize for? also can we not do

self.plan.filter(filter_meta).map(init_embedding).execute()

I think that'll use ray versions of the ops right? so theoretically faster

You can also just pull our the filter_meta method somewhere

that is for reuse the embeddings, otherwise, just compute twice from beginning, would have one option for configuring this.

baitsguy · 2025-02-14T18:30:42Z

lib/sycamore/sycamore/grouped_data.py

        dataset = self._docset.plan.execute()
-        grouped = dataset.map(Document.from_row).groupby(self._key)
+
+        def filter_meta(row):


pull out somewhere

baitsguy · 2025-02-14T18:34:29Z

lib/sycamore/sycamore/query/execution/sycamore_operator.py

+        raise Exception("New Top K not implemented for codegen")
+
+
+class SycamoreEmbed(SycamoreOperator):


what's this for? don't think the planner uses it. My current thought is this (embedding) should just be an implementation details within group_by etc?

inside groupby just means every time run topk, it would do this embedding over all items. Not sure about what role the Luna plays, does it also write?

baitsguy · 2025-02-14T18:35:52Z

lib/sycamore/sycamore/query/execution/sycamore_operator.py

        return result, []


+class SycamoreNewTopK(SycamoreOperator):


Let's discuss the new operators we want to add.

I'd imagine we want: topK, groupBy

We should just be able to nuke the existing topK rather than adding an new one

let's call this GroupByCount

lib/sycamore/sycamore/docset.py

baitsguy · 2025-02-14T23:12:13Z

lib/sycamore/sycamore/query/operators/top_k.py

    could be 'Form groups of different food'"""
+
+
+class NewTopK(Node):


GroupByCount

bohou-aryn requested review from baitsguy and bsowell February 13, 2025 21:31

bohou-aryn force-pushed the groupby branch from f0aeb51 to 274518e Compare February 14, 2025 16:04

baitsguy reviewed Feb 14, 2025

View reviewed changes

bohou-aryn force-pushed the groupby branch 2 times, most recently from 4985dd6 to 6a1694f Compare February 19, 2025 19:41

baitsguy approved these changes Feb 19, 2025

View reviewed changes

bohou-aryn force-pushed the groupby branch from 6a1694f to a4533f3 Compare February 19, 2025 21:38

Adding clustering and groupby in luna executor part

68206a1

bohou-aryn force-pushed the groupby branch from a4533f3 to 68206a1 Compare February 19, 2025 21:44

bohou-aryn merged commit 5c1ce95 into main Feb 19, 2025
10 of 15 checks passed

		raise Exception("New Top K not implemented for codegen")


		class SycamoreEmbed(SycamoreOperator):

		could be 'Form groups of different food'"""


		class NewTopK(Node):

Adding clustering and groupby in luna planner part #1183

Adding clustering and groupby in luna planner part #1183

Uh oh!

Conversation

bohou-aryn commented Feb 13, 2025

Uh oh!

baitsguy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants