Releasing (or not) 32-bit Windows wheels

Thanks for the detailed write-up, Ralf.

This is becoming somewhat of a niche platform now, and as long as users have recourse to get the packages installed (which it sounds like they do), I think it’s OK. We have to pick our battles.

I’d be hesitant: we use several well-tested Fortran packages, and those numerical codes are tricky. Sure, they may not be the best quality, but by now we know they work. It also does not really help to port this to C++ (Fortran is likely easier to parse for most contributors?). If we do decide to go this route, we’d need a more thorough set of unit tests, and those are hard to construct after-the-fact. And, who knows, perhaps with LFortran our problems will eventually disappear!

2 Likes

It’s just a small piece of the whole, but I’d be happy to follow up my ongoing work on hyp2f1 by trying to replace all of the Fortran in special with clean, documented, bug-free Cython. Also writing thorough tests like I did for hyp2f1. If it was possible for me to work on it full time, I think I could power through in about a month. Realistically with my schedule, I think I could get through everything in about two years. At the moment I don’t have any spare bandwidth, but starting in August I should be able to put in about 2 hours per week.

1 Like

Porting from a standard, well-defined language to Cython, which is only supporting by a niche tool without a lot of developer bandwidth, seems like a step in the wrong direction. My argument above was that “bug-free” is a very hard thing to guarantee. The most robust solution may be Ondrej’s suggestion: automatically generate C++ code, and then tidy it up and write more test cases.

That assumes that the original code is bug-free in Fortran too which is often not and requires intricate “old” Fortran knowledge. Cython at this point for SciPy is quite established. So not that esoteric at all and certainly more maintainer-friendly. Also Fortran code often forces f2py conversion and quite a lot of copies of arrays that hinders performance. As long as the code is clear in functionality it is much more readable and fixable, say an array is overflowing, finding that in Fortran is orders of magnitude to figure out than a Cython code which is much more debuggable even though you might not know what the code is doing.

But the part that is worrying is the new release is not visible in the horizon and many Cython features and bugs are fixed in the 3.0alpha which we cannot switch to.

2 Likes

ISTM “we do not provide these prebuilt binaries, here are third-party alternatives” is a prefectly suitable answer.

Re: fortran. We can chisel bits and pieces, and in fact we have several almost complete replacements . E.g. much of quadpack can be replaced by quad_vec. So a nice project could be to look at what is not available and try estimating the effort to port the missing bits. That would at least help extrapolating the total effort :-).

That said, I actually find clean fortran code often easier to deal with than almost anything else. The key here is “clean”, which many fortran’66 ports are not quite. So if lfortan support does materialize, maybe we should think about improving the glue (f2py or Cython or whatnot) instead.

I agree for Fortran code that is relatively clean and without clear issues. special is its own beast though. Much of the code in specfun.f is clearly of very low quality. The Fortran implementation of hyp2f1 had many outstanding issues which were never fixed due to the difficulty of working with and reviewing the dense Fortran code. With the Cython implementation, I was able to clean up issues that were open for nearly 5 years. The other hypergeometric functions also have many problems and I plan to replace these as well. I haven’t looked into anything beyond hypergeometric functions though. Perhaps many of these are already OK.

Please compare the new Cython implementation with the old Fortran implementation before judging whether this was a step in the wrong direction. See also the suite of benchmarks that I wrote at special/_precompute/hyp2f1_data.py (unfortunately I’m only able to post 2 links).

Automatic code generation seems like a great idea for old Fortran which is already at production grade, but this isn’t true for much of specfun.f. Perhaps no one can promise perfectly bug-free code, but I can almost guarantee that the new code will be an improvement for these particular cases.

The impetus for rewriting here wasn’t to remove Fortran for the sake of it, but because much of specfun.f is a bug riddled mess which probably shouldn’t be trusted in production code. When I initially inquired about fixing hyp2f1, I was asked to use Cython for ease of review and maintenance. If there were consensus around using something like C++ instead, I think I could handle that, but it would likely slow down the pace of development.

1 Like

Thanks for the data points @mckib2 and @steppi, very interesting. And thanks @certik for weighing in. I’m following LFortran development from a distance, with great interest!

This is indeed a key part, and I’m afraid it wouldn’t be that quick. Redoing BLAS/LAPACK support in particular is challenging, and we’d need to keep supporting cython_blas and cython_lapack as a service to the rest of the ecosystem. That said, while not “quick”, I do think it’s a feasible task in principle.

They don’t work (well), unfortunately. @steppi’s qualification of specfun applies to many other vendored libraries, like ARPACK, FITPACK, QUADPACK, interpolative, and so on. They’re not well-tested libraries with a few bugs - many of them deserve to be thrown away and completely replaced; they have segfaults and correctness issues that we just aren’t able to address, in addition to the impossibility to extend them with new code in practice.

Looking at experiences over the past few years, this is clearly not true:

  • Fortran: few hard bugs get fixed, and zero new code gets written. The only significant addition of Fortran code in the past 5 (or more?) years was PROPACK I think, and as @mckib2 describes it was so frustrating that he started a port to C++. We typically also don’t have more than 1-2 maintainers who even want to review Fortran PRs.
  • Several newer maintainers and contributors are quite enthusiastic about C++. New features or rewrites of code focusing on performance do happen (e.g., scipy.spatial.cKDTree, scipy.fft, scipy.spatial.distance, scipy.special.logit/expit, earlier also the scipy.sparse matrix data structures). Some of the most significant new functionality we added recently is based on Boost and HiGHS, both high-quality C++ libraries. And it’s a lot easier to find new folks with C++ skills willing to work on high-performance numerical Python libraries than it is to find folks with Fortran skills. Finally, for existing maintainers who don’t know a language, learning C++ potentially makes sense from a career perspective. Learning how to deal with old Fortran code, not so much.

C++ has its issues and can be complex, but the reality is that we attract new talented maintainers that want to use it. For Fortran, it’s zero. And those folks that are enthusiastic about Fortran are talking about modern Fortran (!= F77/F90), which can indeed be nice - but only has a good story for HPC / on Linux. Fortran on Windows is just never-ending pain, and responsible for our worst packaging issues. It was also the worst problem for getting things to work on macOS M1. In terms of negative externalities, it is also a problem for Pyodide for example, while something like Cython or Pythran isn’t even though those tools are more niche overall (because they are transpilers, they basically work wherever C/C++ work).

There’s a whole bunch of reasons pro/con for any other language too:

  • C is most portable and simple to integrate, but limited because of no templating, and not many people write new code in it,
  • C++ is most popular with people who like writing native code and feature-full, but harder to understand than C
  • Cython is the most approachable to write new code in for the largest number of maintainers and is nice for binding generation too, but is a pain for build system integration, creates binaries that are too large, and there are long-term maintenance worries because it relies on 1-2 maintainers only,
  • Modern Fortran: nice language for array-based algorithms and fast, but no good support for interfacing with Python, still niche, lack of compilers, and we lack maintainers/reviewers,

For Fortran as we have it in SciPy though (F77 mostly), there’s just no pros at all beyond “we already have the code”, and many cons.

Yes, that is a problem. It would depend on the component whether a line-by-line translation (or auto-translation) would make sense, or a rewrite from scratch.

Yes, that would be nice if anyone is looking for a potentially high-impact project :slight_smile:

2 Likes

Thanks! That is 3 votes for “yeah no 32-bit wheels is okay” for now, and no dissenters. So we can get back to the more fun part, Fortran :wink:

(or auto-translation) would make sense

f2c, anyone? :-).
(Shudders)

I can see how it would be easier to find C++ maintainers. Fortran is, per definition, a simpler language. and the readability of C++ can vary dramatically, depending on how it is written / what features get used. Octave, e.g., is a good example of clean C++ that is relatively easy to parse; but when done wrong C++ can be a nightmare. So, its use requires appropriate restraint.

Cython is approachable for simple things, but can get hairy as complexity increases—without the advantage of being able to factor out to a standard, self-contained library. I like Cython, but worry about the longevity of code written in a specialized language.

For hyp2f1, and also hyp1f1, hyperu and hyp0f1 whose Cython rewrites were already initiated by @person142, I think it would be a waste of effort to rewrite yet again in a new language but I would be willing to have a decision made for me on what language to use for newly initiated specfun rewrites. I’d be comfortable with either of C, C++ or Cython. I think @person142 should probably have most say in the choice of language since he would be the one tasked with reviewing the PRs.

Old Fortran should be modernized. As an example, here is a modernized minpack, to which 6 people contributed so far, and an issue to make SciPy use it:

It looks like we would be able to maintain this library for you, including the Python wrappers, and SciPy could just use it. I know you commented on that issue also.

So it is definitely possible to organize the community to maintain a Fortran library.

I agree with this, and that is one of the motivations for LFortran. We are working hard to finish most semantics and to compile codes like SciPy. We already support macOS (including M1, my main development machine, everything works) and Windows (you can link things with MSVC), as well as we compile to WebAssembly, so Pyodide could use it; we also support transpiling to C++, if people wanted, and so that you are not locked into Fortran. I will be sure to let you know once we can compile SciPy, or at least parts of it, and we can investigate more.

Just wanted to let people know, that while we do not have a production solution that you can use today, we have a very solid plan how to fix all these issues (I think), and are well on the way of implementing it.

1 Like

Thanks, that is a good point. minpack is a nice test case I guess, it certainly would be a nice upgrade for scipy.optimize. I’m not sure if that’s what actually modern Fortran looks like - I had expected not, because there’s still a lot of goto’s in the code (which is bad).

Sounds promising - thanks for pushing hard on that :raised_hands:

You’re right, rewriting again doesn’t make sense, that code looks good now. I think our preference of languages hasn’t really changed. @stefanv does have a point about Cython becoming hairy when the code gets more complex.

I think there’s some personal judgement that’s fine to make here. Assuming you are coming from a “more Python-like is easier”, then:

  • The simplest use cases: Pythran
  • If Pythran isn’t enough: Cython
  • If Cython code grows more complex and starts to look a lot like C code: then using C or C++ is better
  • If the code is simple(-ish) or can use NumPy’s C API for ufuncs for example: C
  • If you need to support multiple dtypes and start to use macros or homegrown templating to do so: C++ (with templates, but otherwise relatively simple) is preferred

Thanks! These are good rules of thumb.

Good point, I just created an issue to fix this: Remove goto · Issue #74 · fortran-lang/minpack · GitHub. We took the old F77 style minpack, and carefully modernized it step by step, without breaking functionality. We still have to replace all goto. Besides that, if you see anything else you don’t like, please let us know!

2 Likes

Just a heads up that the latest main of Minpack now does not contain any goto. If you find anything else that you don’t like, feel free to just open up an issue and we’ll try to fix it.

Thanks a lot Ondrej, that’s great to hear!

@stefanv something is up with Ondrej’s previous post, either some infra hiccup or someone flagged it for no good reason (the content is 100% fine and had 2 likes already):

I tried to post a link to the PR that fixed the goto, and Discourse said “I cannot post a link to that site” (github), and then I noticed my previous post was automatically flagged for self promotion. :wink: Probably because it also has a link to “that site” (github).

As far as I know, the forum does not flag things automatically; someone must have clicked on it by accident. Looks like its fixed now (I can read it in a private browsing window, at least).

1 Like

I think it’s fixed, thanks!