Blog - Community Considerations Around AI Contributions

As LLM- / Agent-generated workflows and PRs become commonplace, we, the Scientific Python maintainer community, have to decide how to engage with them. Much of our ecosystem was crafted by hand, with a lot of care and love, so it is unsurprising that the rise in LLM contributions may at first feel threatening. I know from personal experience that I felt somewhat deflated the first time one of these landed in front of me. There was a sense of frustration and loss, and it took me a few days to process the repercussions. Discussing this with a colleague, he rightfully positioned it as follows:


This is a companion discussion topic for the original entry at https://blog.scientific-python.org/scientific-python/community-considerations-around-ai
1 Like

Thanks for the post @stefanv!

I definitely agree that it’s worth Scientific Python developing guidelines that we can all live with across the community. Every project is currently struggling with the many questions in isolation (we have a multi-month thread about it in the napari core team chat). Likely, everyone will fall into a spectrum regarding what kind of policy they support, but we can also appreciate the value of a unified policy that we all contribute to. It’s also a very fast-moving space and keeping up with it feels like a full-time job, so outsourcing maintenance of the policy is also great. :joy:

A few disjointed comments:

  1. The bit about a reduction in learning really resonated with me. From where I sit, we are witnessing a global deskilling of unprecedented scale, with potentially dire consequences. An adjacent example: use of an AI tool in colonoscopy led to a marginal improvement in adenoma detection rate, but caused a massive drop in performance (20%) after the AI tool was removed. This is a big deal because we do not control these AI tools: approximately no one is using small local models to get AI coding assistance, instead using massive online models rented to us at a loss. Anthropic recently confirmed models’ potential to impede learning in an in house study.
    Yes, as you point out, there are ways to use the models to help learning, but few among us will use them in such a disciplined way.
  2. The point has been made by some in our community that LLMs are License Laundering Machines, and this has been borne out in a recent and notorious PR to the OCaml repo. So I think the copyright infringement warning should be much stronger.
  3. Maintainers are already struggling with a large number of low quality PRs, whether they themselves like LLMs or not. The personal benefit/public cost discrepancy of LLMs may turn out to be much bigger than anyone anticipates. Combined with point (1), at a minimum, I would change the “Gain understanding” guideline from “we therefore recommend that contributors work to fully understand the changes they submit” to “we therefore request” or even “require”.
    More broadly, I agree with many voices out there that we should not “just learn to use AI or be left behind”, and it’s worth at a minimum tapping the brakes, if not putting on more resistance. One way I find myself thinking about it is like cars: they help you get places faster but they simultaneously have made everything further away, so the net benefit to society ends up being small while massively increasing many costs (land use, pollution, noise, danger, etc). We can acknowledge the overall utility of cars, and we can avoid being judgemental of car users — it is very difficult to not be car-reliant in many places — while still working towards a car-free existence, and not actively encouraging car use.
3 Likes

There is some further relevant discussion in the NumPy mailing list. I particularly liked @matthew-brett’s contribution and his conclusion: “We have much to lose from careless use of AI.”

I also liked @nicholdav’s post: “I agree with Butterick that, as currently designed, [coding LLMs] break the ethical compact of open source.”

1 Like

Thank you @jni

My replying to the listserv from gmail was not a good idea–some encoding issue made the reply a bit hard to read, and didn’t do much to help my tone. I’ll copy here and hope that this version is slightly less painful to read.

re: @matthew-brett ’s comments about copyright, and his write-up with @_pi (see also this talk from @_pi), I want to share a couple of posts from Matthew Butterick, and a related lawsuit:

https://matthewbutterick.com/chron/this-copilot-is-stupid-and-wants-to-kill-me.html

https://githubcopilotinvestigation.com/

https://githubcopilotlitigation.com/

I’m guessing some people will be familiar with these, but if you’re not, I hope you’ll read and consider the arguments.

Perhaps most relevant (from the “investigation” post):

> Just this week, Texas A&M pro­fes­sor Tim Davis gave numer­ous exam­ples of large chunks of his code being copied ver­ba­tim by Copi­lot, includ­ing when he prompted Copi­lot with the com­ment /* sparse matrix trans­pose in the style of Tim Davis */.

So clearly coding LLMs memorize, but as currently engineered, most do not provide citations or links to licenses.

(They could, but AFAIK they don’t. Butterick points this out also.) Previous work suggests it’s easy to extract code, but hard to extract authorship (e.g., https://arxiv.org/pdf/2012.07805, page 10), and that larger models tend to memorize more (https://dl.acm.org/doi/pdf/10.1145/3597503.3639074). Likewise “most LLMs fail to provide accurate license information, particularly for code under copyleft licenses” (https://arxiv.org/abs/2408.02487v1),

The example of the prompt “matrix code in the style of Tim Davis” shows that it’s not as simple as “more examples in training data = more memorization”. For scientific software, the number of examples in the training data will always be <<< the number of examples of, say, CRUD apps. My guess is that, with a very specific prompt, if you try to generate scientific code, you will be much more likely to violate an OS license. (One could test this, of course. Probably with an approach like this: https://arxiv.org/abs/2601.02671)

I don’t want to be a random person getting sanctimonious on the mailing list.

But I really value the ethics of open source software, the amazing contributions of all the numpy developers, and of scientist-coders more broadly.

I get the appeal of coding LLMs. And I agree with Butterick that, as currently designed, they break the ethical compact of open source.

I would hate to see numpy and the ecosystem around it move in that direction.

Something I read over the weekend summed things up as “AI tooling makes the easy stuff easier (writing code) and the hard stuff harder (reviewing, knowing what you actually want to build).” - I think that is a pretty good summary.

Big projects (all projects?) have been limited by reviewer bandwidth since forever and three years. With AI that problem has gotten worse - it is now even less costly to generate code than it used to be.

I don’t know what the fix is, trying to attract more people to be reviewers is an unsolved problem. The optimist in me hopes that because now everyone has to become a reviewer (or admit that they didn’t look at what their AI tool generated for them) we will get more people who will contribute by being reviewers.

Another thing to worry about, if you don’t already have enough, is that as open-source communities we have worked hard to make it easier for people from all around the world to participate and contribute. But now you have people using tools costing $200/month to play the open-source game. This is not cheap for anyone, but it is absolutely way beyond what many people on earth can spend. How to prevent this from become yet another hurdle to participation is a bit of an open question.

3 Likes