A few weeks (moths?) have passed. In the mean time I’ve started (and stopped and started again) using AI to work on projects. Jump to the end for my thoughts, read the middle waffle bit if you want to hear about my adventures of using AI to build more and more complex things.
One thing I’m learning is that AI tools are eager to do things and lots of it. I used to joke that a AI tool is like a summer student, but I increasingly think this is pretty accurate. They are smart, have some knowledge of the field, very motivated and have lots of time to do things (no meetings, etc). They will explore all sorts of avenues, some more promising, some less. And you can guide them by providing constraints.
AI tools seem to be very much like this. They perform much better for tasks where you have constraints that keep them on the straight and narrow. For example if you have an existing implementation that you can declare as “the truth!”.
The first project I tackled where I had no chance of evaluating the “code quality” was creating a small web app Swiss Alpine Maps - it gives me something I’ve wanted for years but never started because HTML+JS is quite foreign to me. Here the constraints I specified were “single file, load JS from CDN, no build steps, no react” - lots of things not to do. Inspired from reading Useful patterns for building HTML tools
Next in my list of experiments was creating a random forest implementation in pure Python. Using only PyTorch and its tricks/tools/infrastructure to achieve good performance. Here the constraints are provided by the existing scikit-learn implementation. You can compare to it, you can use it as a benchmark, etc. After a few afternoons of working on this I think I have an implementation that is mostly correct (I made the mistake of adding binning as a feature, so you can’t expect 1:1 results with scikit-learn - a learning for the future), about 5-10x slower than scikit-learn on a CUDA GPU, but it uses a outrageous amount of memory. What I learnt from this is that there is not enough constraints and architecture specification from me to help the AI make progress once the basics are there. At the moment when tasked with improving memory or runtime performance it benchmarks, optimises, gives up, again and again. Each time it starts the loop it basically decides the same thing is the problem, tries the same fix, and learns that this doesn’t fix it. To make progress here I think I need to spend more time setting up constraints, ways to make notes from learnings, etc. (I’ve not published it yet, I might to that as a way to preserve one of my early attempts of using AI to do something complicated. Even if it is ultimately useless.)
Based on this learning experience I spent some time creating (with the help of AI) design documents for how array API support in scikit-learn works. What patterns we have, what are anti-patterns, what the testing strategy is, that we do not accept performance regressions for Numpy, etc. Most of these docs are great, also for humans, some of the things in them needed fixing. After doing this I spent about a day on working with the AI to convert the GaussianProcessRegressor to have array API support. The changes so far look reasonable, I don’t think I could have done it myself in under a day (especially given that I didn’t pay super close attention to this task but attended meetings, etc at the same time - deep work this was not). Right now I’m in the phase of establishing performance - and from that decide what to do next. I have learnt that the AI is pleased with its work when it works, but that this is not the end of the road - I want the torch CUDA version to be faster, or at the very least not slower. And no regression in performance.
TL;DR: AI works a lot better if you provide architecture and design constraints. Explicit instructions on patterns and anti-patterns. You can store these as markdown documents in your repository. Creating them takes a bit of time and care, but they save you a lot of time in the long run.
Should we spend time adding this stuff to our projects? Is it worth the maintenance effort? Does it make it easier to use AI to make contributions to the project? Is it akin to having a linter setup for contributors to “just use”?