A policy on generative AI assisted contributions

ev-br · October 29, 2025, 7:17pm

Please do unpack more! Basically, I honestly don’t understand 1) how can a contributor check for potential copyright violations, and 2) how can an OSS project check the contributor’s check?

Consider a well-meaning, diligent and competent contributor, and consider three situations:

a contributor asks an LLM to translate a BSD-licensed Matlab code to Python, and submits it to SciPy (this was the original case which triggered the OP— Matt H did just this for some Alan Genz code);
a contributor uses an autocomplete functionality of their IDE to write a patch;
a contributor gives an LLM the link to an issue on Github and asks it to write a fix;

Suppose, further, that in all cases a contributor checks the generated code, modifies it as necessary, and can in full confidence check the box “I verified all code, I understand what it does, and I believe it fixes the issue/adds an enhancement asked”. Suppose the resulting patch is good, is of high quality, and is otherwise mergeable.

Questions:

What actions does a contributor need to take in order to check for copyright violations?

If the problem is that the LLM training set could have contained license-incompatible code (which is the problem discussed upthread IIUC), then the answer seems to be “there is no way, it’s a black box”.

If the contributor says basically “I pinky swear it’s compatible”, how does this help the OSS project?

again, if the problem is with the LLM training set, no actions of a contributor shield the OSS project from being in violation of the copyright?

In other words, I honestly fail to see how moving the burden of a copyright check on a contributor helps to either weed out low-quality submissions, or help with LLM-assisted submissions which are of otherwise high quality.