In this poll we ask for your opinion on how NumPy should work when arrays, NumPy scalars, and Python scalars are mixed.
The poll is divided into three parts:
- Asking you increasingly complex questions about what answer you expect from certain NumPy operations.
- Asking two trickier question around possible design choices (somewhat independently from your preferred choices).
- Asking your opinion on the feasibility of these changes with respect to breaking backward compatibility.
Please answer these questions without checking the actual NumPy behavior (some of it will be explained). We are interested in what you would think the best possible behavior is.
Note that in the following type_arr
denotes a 1-D NumPy arr, i.e.
uint8_arr = np.arange(100, dtype=np.uint8) # always 1-D for simplicity
float32_arr = np.linspace(0, 1, 100, dtype=np.float32)
And so on. All NumPy scalars will be written as uint8(1)
, float32(3.)
, etc. (I further use int64
as the default integer, which is not true e.g. on windows.)
Many of the question are targeted to integers, because the issues are more pronounced, however, similar considerations always exists for float32
vs float64
.
(Votes in the following questions are public, you can edit votes by clicking on âshow resultsâ.)
Part 1: Please mark your preferred result:
The following are some basic operations, please mark the result dtype that you would expect.
uint8_arr + int32(3) == ?
uint8_arr
-
int32_arr
. - Something else.
0 voters
uint8_arr + 3 == ?
uint8_arr
-
int64_arr
(default integer) - Something else.
0 voters
uint8(3) + 3 == ?
uint8
-
int64
(default integer) - Python integer
- Something else.
0 voters
What happens if we add an np.asarray()
call?
uint8_arr + np.asarray(3) == ?
uint8
-
int64
(default integer) - Something else.
0 voters
float32(3) + 3.1415 == ?
float32
float64
- Python float
- Something else.
0 voters
uint8_arr + 3000 == ?
-
uint8_arr
with overflow due touint8(3000)
overflowing - An exception because 3000 does not fit into a
uint8
-
int64_arr
(default integer) - Something else.
0 voters
Python operators behave mostly the same as the corresponding NumPy functions. For example +
and np.add
do largely the same thing. What do you expect in the following example?
np.add(uint8_arr, 3) == ?
- Identical results to
uint8_arr + 3
(whichever that is) - The same result as
uint8_arr + np.asarray(3)
- All are identical:
np.add(uint8_arr, 3) == uint8_arr + 3 == uint8_arr + np.asarray(3)
- Something else.
0 voters
Finally, one tricky floating point comparison question (note that floating point equality is always prone to difficulties and in many cases discouraged)
float32(0.31) == 0.31
float32_arr([0.31]) == 0.31
- Both should return
True
. (For the array that means[True]
) - Both should return
False
(becausefloat64(float32(0.31)) != float64(0.31)
) - The scalar case should return
False
, the array case[True]
. - Something else.
0 voters
Part 2: The tricky questions about possible design choices
Operators vs. Functions
Ignoring your answer to the previous questions. Given with the following behaviour:
uint32(3) + 4 == uint32
uint32(3) + np.asarray(4) == int64 # because `np.asarray(4)` is `int64`
What do think would be acceptable for the following operation:
np.add(uint32(3), 4) == ?
-
np.add
must behave the same as+
-
np.add
must behave the same asuint32(3) + np.asarray(4)
- Either option seems acceptable
0 voters
Scalar behaviour
NumPy currently behaves the following way:
# For integers:
uint8_arr + 3 == uint8_arr
uint8(3) + 3 == int64 # Both are scalars, so the Python integer "wins"
# Same for floats:
float32_arr + 3.1415 == float32_arr
float32(3) + 3.1415 == float64
If you answered that uint8
and float32
are the correct results in the scalar case, you may agree to wanting to modify this. However, this may silently break or modify code results in cases such as this:
def analyze_value(scalar):
return scalar * 3 + 6 # some operation written with Python scalars.
data = np.load("array_containing_uint16")
value = data[0]
result = analyze_value(value)
In this case, the result would have previously been an int64
(or int32 on some systems), or float64
if the data
was a float32
array. After the change, the result would be:
- A
uint16
(or error) this may lead to incorrect results - A
float32
, which will lead to reduced precision (in extreme cases incorrect results could be possible)
Part 3: General opinion about changes:
We would like to fix the expectations in NumPy to preserve types more faithfully, although depending on your above choices these may change differently.
The main expected backward compatibility issues are the following:
- Some floating point equality/comparison checks may behave different due to different floating point precisions (compare the floating point equality questions)
- Some operations would return more precise values, which might be wrong occasionally or take up additional memory.
- âScalar valuesâ that come from a low-precision storage (the above problem) may lead to wrong or less precise results.
Overall, we expect a major version release would be necessary to signal the extend of these changes. In your opinion, are these changes feasible (or which of them seem feasible)?
Changes in floating point comparisons are acceptable:
(Note: âwith conditionsâ here and below means that you think there need to be some specific things done. For example an optional warning to find potential issues, or testing large downstream packages, etc.)
- in a major release
- with conditions with conditions
- not acceptable
0 voters
Increased precision is acceptable:
- in a major release
- with conditions with conditions
- not acceptable
0 voters
Reduced precision for âscalar valuesâ coming from a storage array is viable:
- in a major release
- with conditions with conditions (e.g. an optional warning to check if a script may be affected)
- only if an overflow warning occurs in all affected integer cases. (Requires âspecial operatorsâ )
- not acceptable
0 voters
(âspecial operatorsâ means that np.add(uint8(3), 4)
would behave differently from uint(3) + 4
as in the above question.)