Sure, and I assume you can speed up uarray
a bit also. But right now, we seem to be at best at potential “typical case” overhead of 4 * N(backends)
slower than what seems easy for a dispatcher that has all info available (i.e. a strict-type dispatcher similar to singledispatch
):
from functools import singledispatch
import numbers
def plain(arg, arg2, arg3, arg4, arg5):
return "plain", arg, arg2, arg3, arg4, arg5
@singledispatch
def disp(arg, arg2, arg3, arg4, arg5):
return "base-impl", arg, arg2, arg3, arg4, arg5
# Super fast of course, ~73ns for me
%timeit plain(1, 2, 3, 4, 5)
# Much slower, the wrapping adds ~320ns even if there is only one choice :).
%timeit disp(1, 2, 3, 4, 5)
@disp.register(numbers.Integral)
def _(arg, arg2, arg3, arg4, arg5):
return "base-impl", arg, arg2, arg3, arg4, arg5
@disp.register(numbers.Integral)
def _(arg, arg2, arg3, arg4, arg5):
return "integral", arg, arg2, arg3, arg4, arg5
@disp.register(list)
def _(arg, arg2, arg3, arg4, arg5):
return "base-impl", arg, arg2, arg3, arg4, arg5
# has to do more "work" now, but it is constant no matter which one we take
# and additional work is only 40ns and is a dict lookup, so basically constant
# at 360ns overhead.
%timeit disp(1, 2, 3, 4, 5)
%timeit disp("base", 2, 3, 4, 5)
And this is pure Python using *args, **kwargs
. So I you could probably knock of more than 100ns by moving to C! Lets say the “relevant args” part adds ~100ns if it is done by a helper (as in __array_function__
and uarray
, but it seems like this may be a convenient API in any case) and I guess you may need to add up to 20ns for every additional “relevant arg”.
That leaves us at <500ns overhead, with lots of potential to speed things up (some of which would require technical deep-dives though)! I would not be surprised if <200ns is possible in the vast majority of cases.
Yeah, you can speed things up with additional API. But if that requires backend buy-in so it is harder to add in some future rather than adding it now (you have to fix all backends, rather than one frontend). And once we have the API for type-based “pruning”, it seems to me that we probably do not need __ua_convert__
at all for type-dispatching!
(There may be some convenience about supporting auto-conversion or some type of “replacer-API”, but it seems a feature that is completely independent of dispatching itself – at least most/typical dispatching.)
I believe there are a lot of good ideas and things in the concrete uarray
proposal, but I also think we need to start by identifying them and looking at each choice individually to see if there are better alternatives.
We really need better arguments or fixes for:
- The use of the
with
statement in its very broad form. (I think it’s availability to force a type-specific backend is a no-go. And there needs to be a reason why it must exists in this form and that it is just a possible, but discouraged abuse?)
- That a dispatcher and registration process that is primarily based on types is not preferable for type-dispatching.
- Performance is one aspect here, but I think convenience is almost as important.
- Yes, there will be limits especially if we come to backend-selection, but frankly, they feel solvable (if a bit more tricky than the current version obviously).
- Listing of all available implementations for a single function (I do not think a list of backends that might implement it is good). I.e. I don’t understand why the “domains” need to be so broad? Potentially allow libraries to blocklist outdated/buggy backend versions? …?
- … probably more, but I would hope those will be more of the “technical details” kind.
I will be frank: If you were asking for inclusion in NumPy, unfortunately I think I would have to veto any proposal that does not address these points much clearer. Right now, the alternatives look better to me when it comes to type-dispatching, and I do not understand why adding on backend-selection should be too hard.
uarray
to me seems always designed based on __array_ufunc__
and __array_function__
and that means that the alternative of something like “multiple dispatching” was probably never thoroughly considered (happy to be proven wrong). Since then, a lot of things were fixed and solved (lets learn from those!), but I think we need to backtrack to especially those early design choices and really be sure they were right.
Taking a concrete proposal on without them addressed feels like potentially taking on technical debt that may just be too painful to repay later. We should have the resources right now to carefully explain and also reconsider the design choices rather than hoping that we can fix them later. If there were no clear alternatives, that may be an argument, but they exist, they are just not implemented.