-
Notifications
You must be signed in to change notification settings - Fork 65
add groupby #1115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add groupby #1115
Conversation
self._docset = docset | ||
self._key = key | ||
|
||
def aggregate(self, f) -> DocSet: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider subclassing DocSet or using a different type since most of our transforms aren't going to work with the new document structures? I think materialize on this won't work? @eric-anderson ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plans for how to implement this in ExecMode.LOCAL
?
self._docset = docset | ||
self._key = key | ||
|
||
def aggregate(self, f) -> DocSet: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I get a type for f
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, would add one
dataset = self._docset.plan.execute().map(Document.from_row) | ||
grouped = dataset.groupby(self._key) | ||
aggregated = grouped.aggregate(f) | ||
m = aggregated.take() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens to m
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used for debug, would remove.
def to_doc(row: dict): | ||
count = row.pop("count()") | ||
doc = Document(row) | ||
properties = doc.properties | ||
properties["count"] = count | ||
doc.properties = properties | ||
return doc.to_row() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's actually in a row at this point? These are ray ops, right?
Also why not just doc.properties['count'] = count
?
from sycamore.data import Document | ||
|
||
|
||
class GroupedData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really belong in the transforms
module? Might be a top-level thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, might move it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can address Henry's comments but good to merge after
merged in #1123 |
No description provided.