A policy on generative AI assisted contributions

Hi,

Given the rise of various generative AI tools, it would be nice if we could come up
with a definitive policy on dealing with genAI assisted contributions to SciPy.
The subject has been discussed multiple times over multiple venues, so it’d be great
if we could codify our position in some form and place it somewhere in the docs
(Where exactly is TBD; SciPy - Frequently Asked Questions could be one such place; one other option
is the github PR template which has some guidelines already).

For two recent discussions see, e.g.

  • a long discussion in the NumPy list [1]
  • a recent use case (which, in fact, sparked this email) [2]

[1] Mailman 3 Policy on AI-generated code - NumPy-Discussion - python.org
[2] MAINT: stats: remove `mvn` fortran calls from `multivariate_normal.cdf` by ev-br · Pull Request #22298 · scipy/scipy · GitHub

Broadly, it seems that the concerns can be roughly classified into two buckets:

  1. Do we accept genAI assisted contributions at all, and if we do, what are the boundaries
    of what’s acceptable.
  2. Copyright and license concerns.

Let’s consider these two in order.

  1. Do we accept genAI contributions?
    It’s futile to attempt to block them wholesale.
    People are going to use AI assistants anyway, along with various IDEs, autocompleters etc
    So it seems reasonable to treat these assistants on par with other tools: as long as
    it does not matter if a contributor uses VS Code or emacs or vi, it also does not matter
    what engine powers their autocompleter. The discussion in [1] has a concise wording,
    “Only submit code that you actually undestand”, and I’d just go with just this.

One thing we probably do not want to see is a fully automated submission, which is
both generated and submitted by a bot, with no human intervention or supervision at all.
So we could/should stress that, too.

  1. Copyright violations / incompatible licenses

Here I lack expertise and have more questions than answers. Here’s a possibly naive
attempt at asking hopefully relevant questions:

  • Are we concerned that the code generated by some tool is under some incompatible
    license from just the fact of being generated by the tool?

In this case, using the tool is similar to copying code from, say, Numerical Recipes.
Then we should either blacklist known bad tools (do we know of any?) or ask
contributors to declare the tools they used (so that we can remove offending code
at a future date if needed).

  • Are we concerned that the training set of a tool included license-incompatible code?
    And if it did, what does it mean for us, are we then transitively non-compliant?

If this is a concern, and we at some point discover that some code is license-incompatible,
the action does not depend on how it got into the codebase, genAI or not:
we discover it, we remove it.

And then the only question IMO is whether it’s helpful to us to ask contributors
list the AI assistants they used.


To summarize,

  • we should not block genAI assisted contributions;
  • we should block fully automated contributions, and ask submitters to only submit
    code they understand;
  • TBD if we want to ask contributors to list what genAI assistants they used, if any.
  • Let’s add explicit (short!) guidance to the docs, TBD where precisely.

Thoughts?

1 Like

I will copy/paste and expand what I wrote in the linked SciPy PR.


If I could, I would have used these ChatGPT/CoPilot/Skynet/“All seeing overlord AI” like there is no tomorrow. But my experience with them so far is that, they are almost good and make just a tiny mistake like i becomes j, then I spend rest of the week troubleshooting, learning how to use GDB and fail miserably etc. So it seems like we are swapping manual labor of translating with manual labor of troubleshooting, which for me, definitely the latter that is more painful.

But maybe it is because it is not trained enough on F77 or JurassicPACK code we have. On the other hand, I did get some useful tips if I ask programming tips, say, “show me an example of while loop that uses comma operator as an expression in C17” and it brought some nice results. I did not try but some folks in my team use it for summarizing the code about what it does.

Other than that, probably @rkern is the most knowledgeable among us what the legal implications might be, however Copilot seems to understand when you say avoid GPL sources for the answer at the citations it brings.

Having said all these, I noticed a particular behavior from all AI tools. They are providing a melange of things that they have in their training bank (whatever they call it these days).

I am ready to be immortalized in the history books as an idiot, but AI tools are terrible at coming up coherent code for numerical computing. Not only they can’t answer any technical computing question coherently, but when asked in ASCII about certain technical problem, they are bringing almost always some code that looks suspiciously similar to either GNU scientific library or Octave or some stuff they knocked off from some personal gist translated in the requested language.

I noticed this mostly because of the strange index corrections before they do things on the arrays (which signals that the source must be a 1-indexed one). While sometimes the result is very close to the actual thing, it is devoid of any proper numerical checks (because they can’t judge the result).

Hence, in my personal opinion, before we even judge the legality of things, the sheer possibility of the code being not production ready gives me enough confidence to reject any AI generated code. Thus agree with summary point 2 even if the code runs correctly.

This is not the same with a developer using AI for snippets of code for very specific thing and including parts of AI generated code. This is very tricky to judge as AI tools already swallowed Stackoverflow and other sources, we don’t have any ways to backtrack the original source. So a careful agree with summary point 1.

I don’t know how to enforce point 3 but if there is a way, that would be really nice.

Point 4, fully agree.

Out of precaution (there are still many lawsuits pending) and most importantly, for moral and ethical reasons, I am -1 on allowing any generative AI at all. Adding here the opinion piece from Paul, which I fully agree with: Current Challenges in Free Software and Open Source Development

We spend so much effort ensuring that contributions are original and respect all licensing, copyrights and just wishes of original authors, that it would be strange to me to adopt a different position.

Furthermore, as pointed out here and in some other threads, the nature of what we do in SciPy is difficult and these tools don’t (yet) shine. Writing code is only part of the story and often times we spend our effort discussing API, architecture, etc. Hence I don’t think that it would be detrimental to disallow such tools. There is a lot of hype and companies in the sector are trying hard to push that we are doomed if we don’t use their stuff. IMHO, this is not true and we are far from replacing us.

Since this is an issue that affects all projects in the ecosystem, can we collaborate to co-author a SPEC that can be widely adopted, instead of each library having to figure it out for themselves?

6 Likes

At the top of the file, CoPilot suggested a line :sweat_smile:

I think, no further explanation needed.

1 Like

We can, but is it a single SPEC or we would have multiple depending on what projects decide? That’s something we should maybe get a vote on.

I think the major value to a SPEC would be assembling information about the various issues that might inform a decision into one place. But that can be done with a blog post at least as well. The ultimate decisions each project might take come down to their individual values; there isn’t a technical answer to this that might be helped by the coordination amongst projects via adherence to a SPEC.

1 Like

I haven’t seen any problems with this. In my experience, it’s very useful and writes high-quality code and saves time. I guess it depends on which tool you’re using.

As long as unit tests are included and human reviewers approve it, I don’t see how it’s any different in trustworthiness from code written by random humans.

But yes, copyright and license legality is a legitimate concern.

Hi,

endolith
February 3

tupui:

Furthermore, as pointed out here and in some other threads, the nature of what we do in SciPy is difficult and these tools don’t (yet) shine.

I haven’t seen any problems with this. In my experience, it’s very useful and writes high-quality code and saves time. I guess it depends on which tool you’re using.

As long as unit tests are included and human reviewers approve it, I don’t see how it’s any different in trustworthiness from code written by random humans.

But yes, copyright and license legality is a legitimate concern.

It doesn’t seem useful to argue about the usefulness of the
AI-generated code, or otherwise, at this stage. I haven’t found it
useful myself, but maybe that’s just my case, and in any case, it may
well improve.

It’s the license issue that is central. I don’t think there’s any
argument that these tools are all digesting code with licenses that at
least require attribution, and producing code, indirectly (or directly

  • at times) derived from that licensed code, but with all licensing
    stripped from the output.

The value of a spec or similar, over a blog post, is the ability to
iterate to a shared and agreed conclusion, rather than leaving it as
the opinion of any one particular person.

Cheers,

Matthew

1 Like

I think you’re right, @rkern.

We don’t have any hope of controlling user behavior through a policy document. Many people find these tools useful, so they will keep using them, and there is no sensible way to check the license of the generated snippet. For larger PRs, which may contain more overt license breaches, it’s probably less likely that AI was used wholesale, since the author needed to understand the problem well to begin with.

As we don’t have any ways of ensuring people did not read NR nor had a look at GPL code nor took anything from StackOverflow. To me that’s the same thing. If we were to say that we don’t want people not to use any AI tools we could and that’s the same type of self declaration we would ask people to do.

I think it is not that simple to discard the usefulness, though I wrote a wall of text arguing otherwise above. These tools, whether blindly parroting or contemplating, became quite eloquent. As I mentioned above, lately I have been testing them (or they are observing me) and though a bit of success is achieved for line completion, figuring out things from scratch are not even close.

However, once I put a lot of comments and write the first two-three lines they predict the context OKish. Then they fill in some boilerplate code and I correct here and there. Some time has saved, some productivity increased in turn utilizing a power plant worth energy for this.

I would suggest putting a few sentences in the policy docs that we don’t want wholesale AI code knowing full well it would be near impossible to distinguish. But if we notice something noticeably fishy, we can point to the policy. Or if someone is reading before contributing, they’ll get a taste of our stance.

How AI tools became like that and what they did criminally/unethically to become like that, is something we can discuss for years.

But that train has sailed.
             - Some LLM (probably)

Licenses and policies would definitely protect across the totality of the project, but I think anything more granular than that, it is just futile Stallman-ing the reality. Unlike licensed code from elsewhere, people will grab pieces from LLMs here and there. We don’t need to pick this fight in our limited time.

Besides, it’s not like we are getting very large contributions anyways. So I think this is not a big issue as we might be starting to discuss.

Hi,

| ilayn
February 3 |

  • | - |

I think it is not that simple to discard the usefulness, though I wrote a wall of text arguing otherwise above. These tools, whether blindly parroting or contemplating, became quite eloquent. As I mentioned above, lately I have been testing them (or they are observing me) and though a bit of success is achieved for line completion, figuring out things from scratch are not even close.

However, once I put a lot of comments and write the first two-three lines they predict the context OKish. Then they fill in some boilerplate code and I correct here and there. Some time has saved, some productivity increased in turn utilizing a power plant worth energy for this.

I would suggest putting a few sentences in the policy docs that we don’t want wholesale AI code knowing full well it would be near impossible to distinguish. But if we notice something noticeably fishy, we can point to the policy. Or if someone is reading before contributing, they’ll get a taste of our stance.

How AI tools became like that and what they did criminally/unethically to become like that, is something we can discuss for years.

But that train has sailed.

  • Some LLM (probably)

Licenses and policies would definitely protect across the totality of the project, but I think anything more granular than that, it is just futile Stallman-ing the reality. Unlike licensed code from elsewhere, people will grab pieces from LLMs here and there. We don’t need to pick this fight in our limited time.

I remember, back in the day - it was desirable to do “clean-room” porting - where the person writing the new code had not read the code with an incompatible license.

For example, one person could read the e.g. GPL code, describe the algorithm, and the other person could write the new code. If the person writing the code had read the the GPL code, it was always possible for the GPL code to leak into the new code.

Of course, with AI, it’s all dirty room - it can produce code from anywhere.

So I think we have to have the same policy for AI as for dirty-room code. We can’t use it. Now - enforcement - that’s a different matter - I just think we should pursue the same policies on enforcement as we did before. We say don’t do it, and if we find you did do it, we’ll complain and pull out the code.

Cheers,

Matthew

2 Likes

This is a more succinct way of what I tried to write. So I think we agree.

What I am trying to emphasize is the practicality of it. If same code appears a 50 places because all 50 of them used AI generated sources and MIT’d the code. I don’t think we can find the 51st original source and point to its GPL. So it is an idealistic stance. Still good to have it but a policy won’t solve any trouble in case we find out the truth later.

1 Like

I’d be happy to see a SPEC about this - I think it can be an informative SPEC (similar to an informative PEP) that doesn’t propose a single solution to adopt, but contains at least the following:

  1. Prior art - what did other well-known open source projects decide and why?

  2. Legal status - link to relevant sources, and summarize if there’s any expectation of more legal clarity arising in the near to medium term.

  3. Lay out a set of distinct options that projects can choose from, with their main pros and cons. E.g.:

    • Option 1: do nothing for now (status quo)
    • Option 2: forbid all generated code by an LLM-based tool
    • Option 3: add guidance to PR template & docs that contributors may choose to use such tools, but need to be aware and take responsibility for copyright of the code they submit
    • Option 4: … ?

None of that is project-specific, and people can contribute new arguments and options if they find there are gaps in such a SPEC.

A Discourse thread only is a poor medium for iterating to a solution on such a complex topic. I completely agree with @rkern that it’s largely about values and preferences - it’s just a lot easier to get a sense of those and also help people less knowledgeable about the subject form an opinion if there’s a structured document to read and refer to. Once a good SPEC exists, a SciPy-specific choice can more easily be made.

Hello everyone,

I never wrote on this discourse but since this is a topic that directly touches my current development experience I’m sharing my approach to AI generated code; I wouldn’t mind some feedback on the approach I chose, and I hope that my experience could be used as one of the many reference approaches people have in using these tools.

The thought of encountering possible copyright violations in terms of generated code pulled from other sources always scratched the back of my mind. So far, I use generative tools to create reference sandbox scripts to test a functionality I have in mind, and I fine tune the generation in case I’m not satisfied with the result. In almost all cases, the generated scripts remain just that and I keep them locally without publishing anywhere. I just use them to review an idea I had.

I do not keep track of how much code I effectively publish which is generated, and even if I do that, I always follow the assumption that the AI is wrong. That’s because:

  • I don’t trust the accuracy of the prompts I input;
  • I don’t trust the effective comprehension of the tool receiving the prompt, accurate or not it might be.

In the end I always end up writing my own code, but what the tooling gives me is a better understanding of the problem I’m currently tackling.

I believe someone else in the thread already gave a -1 on completely generated PRs, which I also agree to: before even considering publishing generated code there should be an inner understanding of what’s been generated, and if it’s a copy-paste of the prompt response there’s no guarantee that this is enforced.

I am still okay with the approach of using AI as it was meant to be used, a.k.a. an assistant that gives an initial shape to an idea which then is developed on its own with no trace of the original prompt response.

In my case, I am considering adding to the documentation of my new projects a comprehensive approach on using AI tooling (it’s a plugin-driven software where the idea is “bring your tool and I glue it to a broader system”), especially since the trend seems to be “I gave a prompt to ChatGPT and I published a plugin for this software, here’s how YOU can also do it!”. At least that’s what I’ve heard from colleagues attending conferences on the matter. It didn’t happen LITERALLY like this, but my feeling is that this is the subtle message that (willing or not) transpires from this.

Hi,

[snip]

I am still okay with the approach of using AI as it was meant to be used, a.k.a. an assistant that gives an initial shape to an idea which then is developed on its own with no trace of the original prompt response.

I’m afraid the problem of the license remains. AI generated some
code where it is effectively impossible for you to know what
license(s) should correctly apply. Now you’ve seen the code with
the unknowable license, you are at grave risk of reproducing aspects
of that code, thereby pulling in the unknowable license. Your
resulting code, even if you have rewritten it - cannot be asserted
free of the original (unknowable) license). And therefore you cannot
assert your own copyright, or freedom from other people’s copyright.

Cheers,

Matthew

1 Like