-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
API,MAINT: Reorganize array-wrap calling and introduce return_scalar
#25409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This also deprecates any `__array_wrap__` which does not accept `context` and `return_scalar`.
@@ -700,7 +700,7 @@ PyArray_Round(PyArrayObject *a, int decimals, PyArrayObject *out) | |||
finish: | |||
Py_DECREF(f); | |||
Py_DECREF(out); | |||
if (ret_int) { | |||
if (ret_int && ret != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, unrelated and a bit annoying to trigger. But the deprecations triggered the bug...
Hmm, seeing this I'm really not sure it is worth breaking every package that implemented Another idea, that does not break the signature (but might well break implementations), is to pass on to But there also remains something to be said for just more generally using array scalars, and let that be the API break. It might just make code simpler rather than more complex... p.s. Should stress that I haven't thought enough about this! I should really look more at your |
It would be nice not to require this, but...
No because 0-D doesn't imply that NumPy should be returning a scalar, after all that is what annoys us all the time! I would like to be able to decide that It doesn't matter much which cases are which, so long there is no absolute uniform 0-d is always array/scalar...
Might work out mostly until Which was, why I thought I would just call So, I would love a different idea, but the only thing I have left is piggy-backing on |
This probably doesn't fix the 32bit issue unfortunately, only the windows ones...
abe3530
to
4846b1d
Compare
FWIW, I have updated the |
Maybe we should still go with this? I think it may cause less breakage than a new argument... Though perhaps I'm overly worried; not sure how many projects in fact have an |
Let's say this, we currently have two possible paths:
And So to some degree, I really want to add the ability to pick the two behaviors consciously for NumPy. Now, you could just not tell
Well, in the sense of requiring less code change? Yes, for sure. In the sense of braking things disregarding the deprecation warning, not sure? Since I think some things like pandas Series will run into
Right, and if I add another step which tries to pass I don't have a strong opinion, it seems tedious, but OTOH, it seems bound to be annoying for someone to not be able to match NumPy arrays (e.g. |
@mhvk sorry, need to come back to this, I think it is pretty important (if just to push the followup of We had discussed this in a meeting a while ago, and I don't think anyone had serious concerns with adding it. But I do value your opinion more than most when it comes to To me, the new kwarg still just seems easier than any other polymorphism to try to guess this without it, and it is only some libraries that need to adapt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg - sorry to have forgotten about this! Looking with a fresh eye, I think your solution is for the best -- everything I can think of has the risk of subtle breakage.
So, now just a proper review of the code -- which looks great! One genuine comment about whether forcing wrapping for reductions is correct (looks like the old code did not). Though if forced wrapping for reductions is OK, then all the calls to npy_apply_wrap
in fact used forced, so the argument could be removed. Indeed, that suggests that if we do not want to force wrapping in reductions, we could just move the if
statement to the reduction instead (or are you using it in the follow-up PR?).
numpy/_core/memmap.py
Outdated
@@ -337,7 +337,7 @@ def __array_wrap__(self, arr, context=None): | |||
return arr | |||
# Return scalar instead of 0d memmap, e.g. for np.sum with | |||
# axis=None | |||
if arr.shape == (): | |||
if arr.shape == () and return_scalar: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be if return_scalar:
, right? or does the calling code not ensure this is set only for 0-d arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, thanks. When I started I hadn't ensured that yet, but it does now (and I think it is much cleaner/better).
* @param inputs Original input objects | ||
* @param out_wrap Set to the python callable or None (on success). | ||
* @param out_wrap_type If not NULL, set to the type belonging to the wrapper | ||
* or set to NULL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the code, this seems no longer true: *out_wrap_type
is always set. This means the rest of the comment can go, which is probably for the better!
* Wrapping is always done for ufuncs because the values stored will have been | ||
* mutated (guarantees calling array-wrap if any mutation may have occurred; | ||
* this is not true for Python calls, though). | ||
* UFuncs always call array-wrap with base-class arrays in any case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing the comment up to this point is OK.
Or was this comment meant for npy_apply_wrap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this is a leftover and better and sufficiently covered by force_wrap
now...
arr = (PyArrayObject *)obj; | ||
} | ||
else { | ||
/* TODO: This branch should ideally be NEVER taken! */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests fail if you remove this? Might be worth an issue enumerating those (if not too many...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really try, but the point is that as long as we convert 0-D arrays to scalars randomly, it would seem rather typical to call wrap(a + b)
and the result of that can be a scalar.
I will expand the comment of why rather than investigating for now (just seems more useful to me).
return NULL; | ||
for (int out_i = 0; out_i < ufunc->nout; out_i++) { | ||
context.out_i = out_i; | ||
PyObject *original_out = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make this a one-liner, but up to you!
PyObject *original_out = (full_args.out) ? PyTuple_GET_ITEM(full_args.out, out_i) : NULL;
def __array_wrap__(self, arr, context=None): | ||
return 'pass' | ||
|
||
class Test2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can inherit from Test1
to allow removing the __array__
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, sorry, I have not gotten back earlier. I think I addressed the issues (the largest one maybe being to try just passing context).
Forcing wrapping in the reduction path seemed right to me, but that said I don't care about just keeping things as they were. It doesn't really belong here, except that I wanted to slowly try to standardize behavior more.
arr = (PyArrayObject *)obj; | ||
} | ||
else { | ||
/* TODO: This branch should ideally be NEVER taken! */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really try, but the point is that as long as we convert 0-D arrays to scalars randomly, it would seem rather typical to call wrap(a + b)
and the result of that can be a scalar.
I will expand the comment of why rather than investigating for now (just seems more useful to me).
numpy/_core/memmap.py
Outdated
@@ -337,7 +337,7 @@ def __array_wrap__(self, arr, context=None): | |||
return arr | |||
# Return scalar instead of 0d memmap, e.g. for np.sum with | |||
# axis=None | |||
if arr.shape == (): | |||
if arr.shape == () and return_scalar: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, thanks. When I started I hadn't ensured that yet, but it does now (and I think it is much cleaner/better).
* Wrapping is always done for ufuncs because the values stored will have been | ||
* mutated (guarantees calling array-wrap if any mutation may have occurred; | ||
* this is not true for Python calls, though). | ||
* UFuncs always call array-wrap with base-class arrays in any case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this is a leftover and better and sufficiently covered by force_wrap
now...
numpy/_core/src/umath/ufunc_object.c
Outdated
} | ||
|
||
PyObject *wrapped_result = npy_apply_wrap( | ||
(PyObject *)ret, out_obj, wrap, wrap_type, NULL, | ||
PyArray_NDIM(ret) == 0, NPY_TRUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say it is correct, but you are right, it is a subtle change? I find it odd to not do it for reductions, but do it for ufuncs, that said, ufuncs are special also in passing context
, so I would be happy to undo it.
(I think besides this, looking up might be unnecessary sometimes, but nothing is changed.)
Nevermind, I siwtched to not do force-wrap which I think leaves the reduction result unchanged. While I would like to align them, I need this in more for the other PR then delay thinking about it. |
Doing this due to numpygh-25635 since it is super confusing with the bad retrying...
gh-25635 was a good reason to chain exceptions. Because this was always terrible (really it got no worse, except now you have a chance of getting to the real error). It shows nicely how that try/except can go wrong if it isn't very specific enough.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly small comments about the logic in npy_find_array_wrap
(and one ref-counting issue).
for (int i = 0; i < nin; i++) { | ||
PyObject *obj = inputs[i]; | ||
if (PyArray_CheckExact(obj)) { | ||
if (wrap == NULL || priority < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Priority cannot be < 0
here, right? Should the initialization above be at -inf
? A benefit of that is that I think the test for wrap == NULL
would not be needed any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also write priority < NPY_PRIORITY
and use that below for priority = NPY_PRIORITY
- somewhat easier to read than having to know the priority of arrays is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Priority can be < 0
, this makes sense for example for memmap
and is used there.
I had initially thought about it, and decided it seemed dubious. Maybe all it changes is that -inf
really means that __array_wrap__
should never be called, which would make sense (and probably nobody uses that anyway).
There is the odd code, that the old code just skipped ndarrays, which is worse, because negative wraps take precendence over ndarray! (yes this is a change here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Will replace the NPY_PRIORITY
to not hardcode the 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if you initialize priority = 0
, doesn't that mean that np.memmap.__array_wrap__
never gets selected? Is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here always uses the first one (if there is more than one). That is what the NULL
check does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that is nicer though of course! I should write i == 0
that is much clearer...
EDIT: The only reason I even initialized priority at all is because gcc incorrectly thinks it can be used uninitialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait no, arg... that doesn't work, how about just adding a comment for what the NULL
check does early on?
EDIT: Done this, I do think we can tweak this, but as of now... this is the smallest variation from what we had before (which was very strange and didn't agree with the python side)
} | ||
} | ||
else if (PyArray_IsAnyScalar(obj)) { | ||
if (wrap == NULL || priority < NPY_SCALAR_PRIORITY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
dbb85c0
to
606f5c3
Compare
return_scalar
return_scalar
for (int i = 0; i < nin; i++) { | ||
PyObject *obj = inputs[i]; | ||
if (PyArray_CheckExact(obj)) { | ||
if (wrap == NULL || priority < NPY_PRIORITY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still confused: if you initialize priority = -inf
(or some other very large negative number), will this line not just always be true if wrap
is undefined? So, why is the wrap == NULL
needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not if the object defines priority = -inf
. Maybe that would be preferable, but 🤷.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I only need it on the last branch, yes. Would that seem clearer (then the last branch is still special, this way priority is only initialized to silence warnings)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think coding against something that defines priority = -inf
yet defined __array_wrap__
seems unnecessary! But if you really prefer what you have, that is fine too -- my own sense is that the logic becomes clearer without the extra conditions that are needed to grab the first argument unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree, I think I just wanted to match one of the implementations inside NumPy.
Maybe it is laziness speaking: This way no user can run into this (not even hypothesis) and its there already there...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think we've reached (more than) good enough!
You probably want to squash & merge, or reduce the number of commits a bit.
Thanks Marten, let me squash merge this then, since I want to rebase the other PR on this work... Afterall, while I think it fixes a bunch of issues (at least mid-term) and opens avenues, it was really mostly a split-out from the other PR. |
Nice, on to the helper! (ping me when rebased - that one we went over a lot already so should be easy to do a final round of review). |
I thought numpy 2 would be out by now. Wasn't expecting breaking changes even now... |
I wish :(, although I hope it should be mostly good. You will certainly also notice gh-25168 (more than this). |
This reorganize how array-wrap is called. It might very mildly change the semantics for reductions I think (and for negative priorities).
Overall, it now passes a new
return_scalar=False/True
when calling__array_wrap__
and deprecates any array-wrap which does not acceptarr, context, return_scalar
.I have not integrated it yet, but half the reason for the reorganization is to integrate it/reuse it in the
array_coverter
helper PR (gh-24214), which stumbled over trying to make the scalar handling sane.Forcing downstream to add
return_scalar=False
to the signature is a bit annoying, but e.g. our memory maps currently try to guess at it, which seems bad. I am hoping, this can be part to making the scalar vs. array return more sane.But, maybe mainly, I hope it consolidates things (together with gh-24124 mainly, as if ufuncs were the only complex place we used this, it wouldn't matter much).