A policy on generative AI assisted contributions

I am increasingly seeing vibe-coded PRs from novice contributors and I started a discussion on the SymPy mailing list. Right now there are not too many of these but they are quite blatant and I think that this is definitely going to become worse over time.

Really these are just low quality PRs but I don’t think it is reasonable to engage with them in the same way as a low quality PR that was actually written by a human. For example if you explain in detail what the problems are with the code then I think that there is a good chance that the “author” is going to type your feedback into the LLM and have it try to vibe-code an improved PR. I have tried it and I really don’t enjoy talking to LLMs if actually trying to do anything and talking to an LLM via something like GitHub comments with someone who is just typing those comments into an LLM is soul-destroying.

When reviewing novice PRs the situation is generally that it would be far easier to just write the code without the novice so there is a net loss in terms of effort on the maintainer side. The flipside though is that the novice hopefully learns something from the experience and improves over time. This model is predicated on there being a certain effort-exchange ratio though e.g. that the novice does not open the PR in the first place without putting in some effort beforehand to understand the codebase, the workflow, the issue they are trying to fix and so on. I think it is pretty much possible now to say “Claude, write some code and open a PR to fix issue #12345” so any kind of technical barrier is gone and you can open a spam PR having spent zero time trying to understand anything.

I also don’t think that this is helpful to novices because in the PR process the AI is helping them with all of the wrong things like writing the code rather than thinking about how to write the code. A student at my University recently asked what is “the purpose of Maths” given that computers (e.g. SymPy, Maple etc) can do all of the exercises we ask them to do by hand in their entry-level Maths exam. My answer was that you have to understand how to do some things manually before you can learn how to make effective use of a computer for more complex problems. Having the computer do your homework is like getting a robot to lift weights for you at the gym.

I don’t think that the vibe-code PR authors are malicious. People want to contribute to open source and why wouldn’t a novice believe the hype that this is how code is written in the age of AI? Although unintentional though the resulting spam is effectively abusive to open source projects and maintainers.

We need some kind of guidance or policy so that contributors understand what is reasonable. I don’t know how to write any policy/guidance about “responsible use of AI” though without first addressing the basic questions from above like whether AI-generated code is allowed at all and what it means for copyright.

3 Likes

Thanks for this useful write-up.

I guess anyone who is using PRs to certify experience in contribution is going to run into Cambell’s law pretty soon.

It still seems to me that we can avoid these problems by saying that the contributor has to take full responsibility for copyright, and therefore, they should avoid AI unless the contribution is so trivial in intellectual terms, that copyright is not an issue. Something like:

I have not used AI for the code in this contribution OR

I have personally reviewed the content of this PR in detail for potential copyright violations, and confirm that no such violation has occurred.

I suppose we could then add advice not to use LLMs for PR summaries, because of the problems that you’ve pointed out.

Perhaps we could use some heuristic - such as screening for signs of AI slop, and sticking a label on the PR to say slop is suspected, and asking the contributor to confirm that AI was not used, or that they have carefully checked the content for accuracy and made it concise. Reject the PR (for a contributor who has not shown previous bona-fides) if slop remains.

1 Like

I like this idea! Perhaps a useful addition would be instructions of what to look out for? And to ask to also confirm that the user reviewed the AI-generated code?

I imagine to many of the new users that seem to be a big part of this discussion it may be not clear at all what are the possible pitfalls of using a fully automatically generated PR.

That assumes that people are interested in learning. Lot of people don’t really care. They just want to get something merged as fast as possible for x/y/z, it does not really matter. What matters to me is to know so that I can approach the PR with a totally different mindset. If the PR gives me a small value and I want this now, I spend the review time to fix what I want and just merge. Otherwise I close and we all move on fast.

I have personally reviewed the content of this PR in detail for potential copyright violations, and confirm that no such violation has occurred.

One problem with this is it’s really not clear how to review the content for potential violations. For handwritten code, it’s pretty clear: don’t look into GPL licensed code (GSL is out, R is out etc). With LLMs though: how am I, as a contributor, supposed to review what LLM gave me? Comments upthread seem to indicate that if I touched LLMs at all, it’s “dirty” right away. If it’s more nuanced than that, great, then a contributor needs some more specific guidance, I’d think.

We could unpack more, but basically, right, if the LLM is generating substantial code, that might be subject to copyright, you have the unpleasant job of going to check whether it should be subject to copyright. But for the kind of tasks that LLMs do well, like automatic refactoring of imports, swapping single for double quotes in strings and so on, the question doesn’t arise.

So yes, the practical upshot is that we throw the burden on the contributor for substantial code, but I think that’s reasonable, given the copyright risks.

Please do unpack more! Basically, I honestly don’t understand 1) how can a contributor check for potential copyright violations, and 2) how can an OSS project check the contributor’s check?

Consider a well-meaning, diligent and competent contributor, and consider three situations:

  • a contributor asks an LLM to translate a BSD-licensed Matlab code to Python, and submits it to SciPy (this was the original case which triggered the OP— Matt H did just this for some Alan Genz code);
  • a contributor uses an autocomplete functionality of their IDE to write a patch;
  • a contributor gives an LLM the link to an issue on Github and asks it to write a fix;

Suppose, further, that in all cases a contributor checks the generated code, modifies it as necessary, and can in full confidence check the box “I verified all code, I understand what it does, and I believe it fixes the issue/adds an enhancement asked”. Suppose the resulting patch is good, is of high quality, and is otherwise mergeable.

Questions:

  1. What actions does a contributor need to take in order to check for copyright violations?
  • If the problem is that the LLM training set could have contained license-incompatible code (which is the problem discussed upthread IIUC), then the answer seems to be “there is no way, it’s a black box”.
  1. If the contributor says basically “I pinky swear it’s compatible”, how does this help the OSS project?
  • again, if the problem is with the LLM training set, no actions of a contributor shield the OSS project from being in violation of the copyright?

In other words, I honestly fail to see how moving the burden of a copyright check on a contributor helps to either weed out low-quality submissions, or help with LLM-assisted submissions which are of otherwise high quality.

1 Like

Let me reply with some questions, followed by the promised unpacking.

  • Do we care about copyright? That is - is it acceptable to us, as open-source authors, that using an LLM can easily generate code to which copyright should apply, so that, if we do nothing, in due course, copyright will become very difficult to enforce, and will be practically void?
  • Let’s say we do care about copyright - what cost are we prepared to pay, in order to defend ourselves from voiding copyright? Clearly replies here will vary from high cost to low cost. I feel strongly about copyright, and am prepared to pay a high cost. Others may feel differently.
  • If one is engaged in that trade-off, copyright against code quality and volume, then one has to ask what benefit we expect from AI, or conversely, what benefit will we lose by constraining the use of AI. That’s a difficult question, with many facets. Could the code have been written in a similar time without AI, other than for fairly trivial and mechanical processing that is unlikely to violate copyright? (The evidence so far suggests yes). Will contributors using AI benefit in the same way from feedback and training, our previous model of onboarding contributors? Is it in fact true that the committer of AI code understands the code and its context to the same degree as a contributor not using AI? If we are flexible and welcoming of vibe-coding and autocomplete - what kind of programmers will we encourage? And what will be the value / cost ratio of merging those contributions? And supporting those contributors? And so on. These are the kinds of questions that Oscar is alluding to in his post that set off the current discussions.

At the top of your email:

Basically, I honestly don’t understand 1) how can a contributor check for potential copyright violations, and 2) how can an OSS project check the contributor’s check?

I’ll try and cover the first point below. For the second, as we’ve discussed before, we have to ask whether it is worth having a policy that may be effective, but is difficult to enforce. For example, it may be that contributors do change their behavior, in their desire to adapt to our norms, but that it would be hard to detect whether they have done this. I would argue that having the norm is worthwhile, even so. In general, it seems to me that we have a strong set of communities, and strong norms, which are, in fact, seldom broken by serious contributors.

Anyway, back to the personal copyright check. Let’s take the example you gave:

a contributor asks an LLM to translate a BSD-licensed Matlab code to Python, and submits it to SciPy

In that case, I would look carefully at the Matlab code, and at the Python code, and confirm that it was a faithful port, with no substantial code in it, that did not come directly from the Matlab code. I would state that in my PR to confirm I had done the work. If I did find substantial code that didn’t come from the original, I’d do a Github search for that code, and I’d do a Google search for potential code that might have been used. I’d then check the results to see if they matched the AI output, and report the outcome on the PR.

  • a contributor uses an autocomplete functionality of their IDE to write a patch;

I’m not quite sure what you mean here - if you mean a simple one or two-line fix, then I think the contributor would be reasonable in asserting something like “code is trivial, copyright unlikely to apply”. If the code is not trivial, then I’d do the Github / Google search as above, and report the results.

In asking the contributor to report their findings on potential copyright violation, we make it easier for maintainers to confirm. I suspect we’ll gradually build expertise in detecting copyright violation.

I suppose the underlying question is the obvious one - is it time to throw up our hands and accept copyright has become moot in the age of coding LLMs? I would argue no - copyright is too important, and LLM coders are not good enough, to make us do that.