+
Skip to content

Conversation

bohou-aryn
Copy link
Collaborator

No description provided.

self._docset = docset
self._key = key

def aggregate(self, f) -> DocSet:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider subclassing DocSet or using a different type since most of our transforms aren't going to work with the new document structures? I think materialize on this won't work? @eric-anderson ?

Copy link
Collaborator

@HenryL27 HenryL27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plans for how to implement this in ExecMode.LOCAL?

self._docset = docset
self._key = key

def aggregate(self, f) -> DocSet:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I get a type for f?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, would add one

dataset = self._docset.plan.execute().map(Document.from_row)
grouped = dataset.groupby(self._key)
aggregated = grouped.aggregate(f)
m = aggregated.take()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to m?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used for debug, would remove.

Comment on lines +18 to +24
def to_doc(row: dict):
count = row.pop("count()")
doc = Document(row)
properties = doc.properties
properties["count"] = count
doc.properties = properties
return doc.to_row()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's actually in a row at this point? These are ray ops, right?
Also why not just doc.properties['count'] = count?

from sycamore.data import Document


class GroupedData:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really belong in the transforms module? Might be a top-level thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, might move it out.

Copy link
Contributor

@baitsguy baitsguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can address Henry's comments but good to merge after

@bohou-aryn
Copy link
Collaborator Author

merged in #1123

@bohou-aryn bohou-aryn closed this Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载