Deprecation of assigning/converting out of bounds Python integers

seberg · November 8, 2022, 5:58pm

I am still a bit on the fence for a deprecation that is currently on NumPy main and scheduled for release with NumPy 1.24. The reason for this change is related to NEP 50, that in the future, I want:

np.array(1, dtype=np.int8) + 5000

to raise an error, because this should (approximately) be the same as:

np.array(1, dtype=np.int8) + np.int8(5000)

While currently the operation returns an int16 result. So the first example np.int8(1) + 5000 should error, but for the explicit second one we do have a choice. The error seems convenient for implementation, but not necessary.

So, due to the above, we decided to deprecate all out-of-bounds integer assignment and conversions for Python integers. These (with some rare exceptions) worked previously. The main examples of things that will fail are:

np.array([5999], dtype="int8")  # will fail
np.int16(5000000)  # will fail
# And assignment
arr = np.zeros(3, dtype="int8")
arr[0] = 50000  # will fail

# As well as the unsigned ones:
np.array([-1], dtype="uint8")  # will fail
np.uint8(-1)  # will fail

While NumPy usually allows e.g. np.array(5000).astype(np.int8) and would continue to do so (an unsafe cast).
The reason why I am a bit unsure is, that I think this change doesn’t affect libraries much, but those are the most likely to give feedback on failures normally.

So, bringing this up again as a poll, because formulating an opinion is hard, but overwhelming gut feeling aggregated in a poll may be good information:

Do you agree with deprecating out out-of-bounds integers?

Yes, fully agree
Yes, but I am unsure
No, but I don’t expect issues
No, strongly disagree

0 voters

One thing that you sometimes see is the use of -1 together with unsigned integers (to get the maximum integer). Assuming we do the deprecation, we could except the scalar creation functions such as np.uint8(-1), np.uint16(-10), etc.

Should e.g. np.uint8(-1) be an exception (assuming deprecation)

No exception: It is surprising and not helpful enough
Exception for small negative integers
Allow even np.uint8(300) == np.uint8(44) (current behavior)

0 voters

peterjc · November 8, 2022, 7:04pm

I’ve interpreting this as if it had an “even” present:

i.e. Exception for all negative integers, including even small negative integers like -1. No special case, this is not special enough (Zen of Python).

seberg · November 11, 2022, 9:45am

Yeah, but the next line says:

Although practicality beats purity.

API design (and this is part of it), is still an art unfortunately :).

If a lot of users rely on something like np.uint8(-1) breaking them isn’t nice. But of course you can argue that things like np.uint8(-1 % 2**8) and np.iinfo(np.uint8).max are clear enough workarounds in practice.

asmeurer · November 16, 2022, 12:30am

So the point is that the np.int8(5000) errors, before the addition, right? That sounds like a good change, but I’m not sure about erroring on any kind of overflow, like

np.int8(100) + np.int8(100)

By the way, maybe this is mentioned in the NEP, but are there plans to change this behavior?

>>> np.array([2**100])
array([1267650600228229401496703205376], dtype=object)

seberg · November 16, 2022, 8:20am

Yes, nobody is talking about normal overflows during operations, although we want them for scalars (the only exception are currently powers, and I think we can fix it before NEP 50 is adopted).

Note that if you write an operation like np.int8(1) + 5000 there is no need for it to behave identical to to np.int8(1) + np.int8(5000) (in the sense of how/if it errors).

No, I tried to make it a bit smaller: NEP 50 — Promotion rules for Python scalars — NumPy Enhancement Proposals

So yes and no, since of course np.uint8(100) + 2**100 would error out, I think, and you would have to conver the NumPy integer to make it work.

seberg · November 16, 2022, 10:33am

To be clear, I wouldn’t mind getting rid of that behavior. It seems like the right thing to me. But I tried to write NEP 50 to be a bit more minimal. There are enough things to worry about without the issue of what np.uint8(-1) should do, or whether large integers can cause object arrays to be created.
(For example I think it is still not 100% clear if the choice of limiting the “weak” behavior to the operators + might not be the more consistent choice, even if it removes consistency between np.add and + to some degree. And even though, the actual implementation may have to live in np.add at least for now.)

drammock · November 30, 2022, 4:45pm

IMO clear communication about replacing things like np.uint8(-1) with things like np.iinfo(np.uint8).max is the best approach. it makes the source code of downstream packages more explicit / less magic, reducing effort for future maintainers. Well-worth the one-time cost of fixing the new errors.