AGENTS.md and CLAUDE.md addition to SciPy repository

As it is getting very common to receive LLM generated pull requests on the repository. It is probably better to meet these tools where they are instead of instructing the PR owner everytime.

To do this, there is a pseudo-standard way to front-load the context to these agents that is an AGENTS file and in specific case of Claude Code, the file CLAUDE.md . My proposal is that we add these files to the repository to at least get the basics right in the LLM context.

I don’t have too much of an opinion on the content but this is what I came up with after recursing with Claude Code itself.

# CLAUDE.md

This file is for Claude Code and other Claude-based AI assistants.

**You must read [AGENTS.md](AGENTS.md) before making any changes to this repository.** It contains essential development guidelines, build instructions, testing procedures, and contribution rules.

and some long file (see contributing guidelines)

# AGENTS.md - Guide for AI-Assisted Development

This document provides guidance for AI assistants (Claude, etc.) working on the SciPy codebase.

## Project Overview

SciPy is a fundamental library for scientific computing in Python, providing algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics, and many other domains.

- **License**: BSD-3-Clause
- **Python**: >= 3.12
- **NumPy**: >= 2.0.0
- **Repository**: https://github.com/scipy/scipy
- **Documentation**: https://docs.scipy.org/doc/scipy/

## Build System

SciPy uses **Meson** as its build system with **meson-python** as the build backend.

### Quick Build Commands

```bash
# Build the package (recommended way)
spin build

# Build with debug symbols
spin build --debug

# Build release version
spin build --release

# Build with AddressSanitizer
spin build --asan

# Build with specific BLAS
spin build --with-scipy-openblas    # Use scipy-openblas32
spin build --with-accelerate        # Use Apple Accelerate (macOS)
```

### Build Dependencies

- `meson-python>=0.15.0`
- `Cython>=3.0.8`
- `pybind11>=2.13.2`
- `pythran>=0.14.0`
- `numpy>=2.0.0`
- C/C++ compilers (GCC >= 9.1, Clang >= 15.0, or MSVC >= 19.20)

### Development Environment Setup

**Option 1: Pixi (Recommended)**

Pixi provides isolated, reproducible environments with pre-configured tasks:

```bash
# Build SciPy
pixi run build

# Run tests
pixi run test

# Build documentation
pixi run docs

# Run linter
pixi run lint

# Start IPython with SciPy
pixi run ipython

# Run benchmarks
pixi run bench

# Debug build
pixi run build-debug
```

Available pixi environments include: `build`, `test`, `docs`, `lint`, `bench`, `ipython`, `gdb`, `lldb`, and various array API backend environments (`jax-cpu`, `torch-cpu`, `cupy`, etc.).

**Option 2: Conda/Mamba**
```bash
conda env create -f environment.yml
conda activate scipy-dev
```

Then use `spin` commands for development tasks:
```bash
spin build              # Build SciPy
spin test               # Run tests
spin test -s linalg     # Test specific submodule
spin docs               # Build documentation
spin lint               # Run linter
spin ipython            # IPython with built SciPy
spin bench              # Run benchmarks
```

**Option 3: Pip with virtual environment**
```bash
python -m venv venv
source venv/bin/activate
pip install -e . -v --no-build-isolation
```

## Project Structure

```
scipy/
├── cluster/        # Clustering algorithms (k-means, hierarchical)
├── constants/      # Physical and mathematical constants
├── datasets/       # Dataset registry and downloading
├── differentiate/  # Numerical differentiation
├── fft/            # Fast Fourier Transform (modern API)
├── fftpack/        # FFT (legacy API, backwards compatibility)
├── integrate/      # Numerical integration (quadrature, ODE, BVP)
├── interpolate/    # Interpolation (splines, RBF)
├── io/             # File I/O (MATLAB, NetCDF, WAV, ARFF)
├── linalg/         # Linear algebra (BLAS, LAPACK wrappers)
├── ndimage/        # N-dimensional image processing
├── odr/            # Orthogonal distance regression
├── optimize/       # Optimization and root finding
├── signal/         # Signal processing
├── sparse/         # Sparse matrices and algorithms
├── spatial/        # Spatial algorithms (KDTree, distance, transforms)
├── special/        # Special mathematical functions
├── stats/          # Statistical functions and distributions
├── _lib/           # Internal utilities and vendored code
├── _build_utils/   # Build-time utilities
└── _external/      # Minimal external code
```

Each subpackage typically contains:
- `__init__.py` - Public API exports
- Implementation files (`.py`, `.pyx`, `.pxd`, `.c`, `.cpp`)
- `tests/` directory with pytest tests
- `meson.build` - Build configuration

## Testing

### Running Tests

```bash
# Run all tests (excludes slow tests by default)
spin test

# Run tests for a specific submodule
spin test -s linalg

# Run tests on a directory or file
spin test scipy/linalg
spin test scipy/linalg/tests/test_basic.py

# Run specific test module, class, or function
spin test -t scipy.linalg.tests.test_basic
spin test -t scipy.linalg.tests.test_basic::TestClassName
spin test -t scipy.linalg.tests.test_basic::TestClassName::test_method

# Run the full test suite (including slow tests)
spin test -m full

# Run tests matching a pattern
spin test -- -k "geometric"
spin test -- -k "geometric and not rgeometric"

# Run tests in parallel
spin test -j auto

# Show timing for slowest tests
spin test -d 10
```

### Test Markers

- `@pytest.mark.slow` - Slow tests (skipped by default in some CI)
- `@pytest.mark.xslow` - Extremely slow tests (not run unless explicitly requested)
- `@pytest.mark.xfail_on_32bit` - Expected failures on 32-bit platforms
- `@pytest.mark.skip_xp_backends(...)` - Skip for specific array API backends
- `@pytest.mark.thread_unsafe` - Must run single-threaded

### Test Configuration

- Config file: `pytest.ini`
- Root conftest: `scipy/conftest.py`
- Excluded from testing: `doc`, `tools`, vendored code in `_lib/`

## Code Style

### Style Rules

- **Line length**: 88 characters (not PEP8's 79)
- **Indentation**: 4 spaces (no tabs)
- **Import convention**: `import numpy as np` (enforced)
- **Docstrings**: NumPy style
- **File endings**: POSIX (LF, not CRLF)

### Linting

```bash
# Run linter
spin lint

# Or directly with ruff
python -m ruff check --config=tools/lint.toml scipy/
```

### Ruff Configuration (tools/lint.toml)

Enabled checks:
- `E`, `F` - Pyflakes and pycodestyle
- `UP` - PyUpgrade
- `PGH004` - Blanket noqa forbidden
- `B006`, `B008` - Mutable defaults forbidden
- `B028` - Warnings must include stacklevel
- `ICN001` - Import conventions (numpy as np)
- `W292` - Newline at end of file

### Type Checking

```bash
spin mypy
```

Type checking is selective - see `mypy.ini` for configured modules.

### Pre-commit Hook

```bash
# Install the pre-commit hook
cp tools/pre-commit-hook.py .git/hooks/pre-commit

# Run manually with auto-fix
./tools/pre-commit-hook.py --fix
```

## Documentation

### Building Docs

```bash
# Build documentation
spin docs

# Quick smoke test
spin smoke-docs

# Check reference guide
spin refguide-check
```

### Docstring Format

Use NumPy docstring style:

```python
def my_function(x, y, method='default'):
    """Short one-line summary.

    Longer description if needed, explaining the function's
    purpose and behavior.

    Parameters
    ----------
    x : array_like
        Description of x.
    y : float
        Description of y.
    method : {'default', 'alternative'}, optional
        Description of method. Default is 'default'.

    Returns
    -------
    result : ndarray
        Description of return value.

    Examples
    --------
    >>> from scipy.subpackage import my_function
    >>> my_function([1, 2, 3], 0.5)
    array([...])

    See Also
    --------
    related_function : Brief description.

    Notes
    -----
    Implementation notes, references, etc.

    References
    ----------
    .. [1] Author, "Title", Journal, year.
    """
```

## Contributing Guidelines

### PR Requirements

1. **Unit tests** - All new code must have tests
2. **Documentation** - Docstrings with parameters, returns, examples
3. **Code style** - Pass `spin lint`
4. **Benchmarks** - For performance-critical code (use ASV)
5. **License** - Code must be BSD-compatible

### API Design

- Public API: `scipy.subpackage.function_name`
- Private functions: leading underscore `_function_name`
- All public items must be in `__all__`
- Public items imported in `__init__.py`

### Important Notes

- **Discuss first**: New features should be discussed on the [SciPy forum](https://discuss.scientific-python.org/c/contributor/scipy) before implementation
- **No style-only PRs**: Don't submit PRs that only fix PEP8 issues
- **BSD license**: All contributed code must be BSD-compatible (no GPL, Apache, or unclear licenses)
- **Backwards compatibility**: Breaking changes require deprecation cycles

## Out-of-Scope Changes

**Do not submit PRs for the following types of changes:**

### Vendored and External Code

The following directories contain vendored third-party code that is maintained upstream. Do not modify these files - instead, report issues or submit fixes to the original projects:

- `scipy/_lib/array_api_compat/` - Vendored from [array-api-compat](https://github.com/data-apis/array-api-compat)
- `scipy/_lib/array_api_extra/` - Vendored from [array-api-extra](https://github.com/data-apis/array-api-extra)
- `scipy/_lib/cobyqa/` - Vendored from [COBYQA](https://github.com/cobyqa/cobyqa)
- `scipy/_lib/pyprima/` - Vendored from [PyPrima](https://github.com/pyprima/pyprima)
- `scipy/_lib/highs/` - Vendored from [HiGHS](https://github.com/ERGO-Code/HiGHS)
- `scipy/_external/` - External code with separate maintenance
- `subprojects/` - Meson subprojects (vendored dependencies)

### Low-Value Changes

The following types of PRs create review burden without meaningful improvement:

- **Style-only fixes**: PRs that only fix PEP8/linting issues without functional changes
- **Typo fixes in comments**: Minor typo corrections in code comments (documentation typos may be acceptable)
- **Reformatting code**: Changing code layout without functional benefit
- **Adding type hints to untyped code**: Unless part of a larger coordinated effort
- **Speculative refactoring**: Restructuring code without clear benefit or prior discussion
- **"Cleanup" PRs**: Broad changes labeled as cleanup without specific justification
- **Static analysis findings**: Issues flagged by static analysis tools (e.g., "possible None dereference", "unreachable code") without demonstrating a real bug via a reproducible test case. Many such findings are false positives or impossible to trigger in practice.

### When in Doubt

If you're unsure whether a change is appropriate:
1. Check if the code is in a vendored directory (see above)
2. Ask on the [SciPy forum](https://discuss.scientific-python.org/c/contributor/scipy) before starting work
3. For bug fixes, ensure you can reproduce the issue and write a failing test first

## Common Development Tasks

### Adding a New Function

1. Implement in appropriate subpackage file
2. Add to `__all__` in that file
3. Import in `__init__.py`
4. Add comprehensive tests in `tests/`
5. Add docstring with examples
6. Update `meson.build` if adding new files

### Debugging

```bash
# Run with GDB
spin gdb

# Run with LLDB
spin lldb

# Python REPL with built scipy
spin python

# IPython REPL
spin ipython
```

### Benchmarking

```bash
# Run benchmarks
spin bench

# Compare against another branch
spin bench --compare main
```

Benchmarks use [ASV (Airspeed Velocity)](https://asv.readthedocs.io/) and are located in `benchmarks/`.

## CI/CD

GitHub Actions workflows in `.github/workflows/`:

- `linux.yml` - Main Linux testing
- `macos.yml` - macOS testing
- `windows.yml` - Windows testing
- `lint.yml` - Linting checks
- `wheels.yml` - Wheel building
- `array_api.yml` - Array API compatibility

### Commit Message Prefixes

Commit messages must start with a standard prefix indicating the type of change:

| Prefix | Description |
|--------|-------------|
| `API:` | Incompatible API change |
| `BENCH:` | Changes to benchmark suite |
| `BLD:` | Build system changes |
| `BUG:` | Bug fix |
| `DEP:` | Deprecation or removal of deprecated features |
| `DEV:` | Development tool or utility |
| `DOC:` | Documentation |
| `ENH:` | Enhancement (new feature) |
| `MAINT:` | Maintenance (refactoring, typos, etc.) |
| `REV:` | Revert an earlier commit |
| `STY:` | Style fix (whitespace, PEP8) |
| `TST:` | Test additions or modifications |
| `REL:` | Release-related |

**Format:** `PREFIX: scope: short description`

Examples:
```
ENH: stats: add new statistical test for normality
BUG: sparse.linalg.gmres: add early exit when x0 already solves problem
MAINT/TST: fft: remove xp backend skips, test fftfreq device
DOC: optimize: fix typo in minimize docstring
```

Multiple prefixes can be combined (e.g., `MAINT/TST:`). Keep the first line under 72 characters.

### Commit Message Tags for CI Control

SciPy's CI is resource-intensive. Use commit message tags to skip unnecessary workflows:

| Tag | Effect |
|-----|--------|
| `[docs only]` | Skip build/test workflows, run only documentation CI |
| `[lint only]` | Skip build/test workflows, run only linting CI |

**Example commit messages:**
```
DOC: fix typo in optimize tutorial [docs only]

STY: fix ruff warnings in stats module [lint only]
```

**When to use these tags:**
- `[docs only]` - Changes exclusively to `.rst`, `.md` files, or docstrings with no code changes
- `[lint only]` - Pure style/formatting fixes with no functional changes

**Important:** Only use these tags when you are certain your changes don't affect code behavior. If in doubt, let the full CI run.

## Key Files Reference

| File | Purpose |
|------|---------|
| `pyproject.toml` | Project metadata and dependencies |
| `meson.build` | Root build configuration |
| `scipy/meson.build` | Main package build config |
| `pytest.ini` | Pytest configuration |
| `scipy/conftest.py` | Pytest fixtures and markers |
| `tools/lint.toml` | Ruff linter configuration |
| `mypy.ini` | Type checking configuration |
| `environment.yml` | Conda environment definition |
| `pixi.toml` | Pixi environment and task definitions |
| `.spin/cmds.py` | Spin CLI commands |

## Compiled Code

SciPy contains significant amounts of compiled code:

- **Cython** (`.pyx`, `.pxd`) - Python-like syntax compiled to C
- **C/C++** - Direct implementations, especially in `special/`, `sparse/`
- **Pythran** - Python to C++ transpilation for numeric code

When modifying compiled code:
1. Rebuild with `spin build`
2. Use `spin build --debug` for debugging
3. Check for memory issues with `spin build --asan`

## Useful Links

- [SciPy Documentation](https://docs.scipy.org/doc/scipy/)
- [Development Guide](https://docs.scipy.org/doc/scipy/dev/)
- [Contributing Guide](https://docs.scipy.org/doc/scipy/dev/hacking.html)
- [SciPy Forum](https://discuss.scientific-python.org/c/contributor/scipy)
- [Issue Tracker](https://github.com/scipy/scipy/issues)
- [Code of Conduct](https://docs.scipy.org/doc/scipy/dev/conduct/code_of_conduct.html)

Unfortunately, everything in LLM world is passive and suggestive so we can’t force nothing. But this is better than nothing if someone just scans the repo for a quick-PR win. Please feel free to take this and modify it further. This is just a draft.

We can also force other major LLMs if needed, via

.cursorrules → "Read AGENTS.md"
.github/copilot-instructions.md → "Read AGENTS.md"
1 Like

Thanks @ilayn. I spent a bit of time thinking about the developer experience here after your first post. A couple of brief thoughts:

  1. AGENTS.md in the root of the repo (and, if needed, more AGENTS.md in subdirs) seems to be becoming the standard. There is https://agents.md/, now owned by the Linux Foundation it looks like, and with wide but not universal (yet) support.
    1. We can add short redirects for one or a few of the other most popular tools that don’t yet support AGENTS.md indeed.
  2. The most tricky bit is the build/test/lint/etc. instructions. I think AGENTS.md has to contain multiple copies of the commands for those, at least for spin, pixi and regular vs. editable installs.
    1. Then, we need a user preferences system. That can be as simple as setting an environment variable. Or a ~/.config/scipy/dev-settings.toml type thing. Or both, since layering may help in some cases. I think we’ll probably want both, with the .toml being more important, given that env vars don’t propagate in some circumstances.
  3. Scope: it should be for the most common dev tasks and rules only. Growing an ever-larger esoteric set of rules should be done reluctantly/slowly, since it’s going to become a maintenance burden or a source of arguments.
    1. Rule of thumb: only things one maintainer wants if they are not a problem for other maintainers and interfere with their preferences/workflows.

All sounds good to me. The developer experience is a bit more tricky actually because these tools already moved on to more sophisticated things such as “skills” (which is difficult to summarize here) and one md file per folder usage to have local scopes. For example, in C code, you can make one for /include , one for /src to handle different concerns and then another in /test to handle testing specific details.

Also claude started hooks say for injecting something to every prompt with a UserPromptSubmit hook:

.claude/settings.json:

{
  "hooks": {
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "cat \"$CLAUDE_PROJECT_DIR/AGENTS.md\""
          }
        ]
      }
    ]
  }
}

This would inject the contents of AGENTS.md into the context every time you submit a prompt. However, this would be wasteful since it adds the full file content on every message.Hence a SessionStart hook that just outputs a reminder when you run /clear or at the very start :

{
  "hooks": {
    "SessionStart": [
      {
        "hooks": [
          {
            "type": "command", 
            "command": "echo 'Reminder: Read AGENTS.md before making code changes.'"
          }
        ]
      }
    ]
  }
}

All this is to say, developer experience is already quite far ahead and to be honest better left to the developer to do these things because the feature churn is now down to weeks for these companies.

All I want from these, is to discourage folks from submitting bad PRs. But if you folks have a good idea about DX I can only agree.

Thanks for the context, very useful.

If there’s a good way to redirect the tools to an out-of-tree AGENTS.md, or a nested structure of them mirroring the SciPy directory structure - that would be even better. Than everyone can do whatever they like for developer experience. A la GitHub - rgommers/pixi-dev-scipystack: Experimental out-of-tree dev setup for NumPy, SciPy and related projects with Pixi, which lived out of tree for a long time, and we only put a subset of that inside the repo (thanks again Lucas!) after building up experience with it for over a year.

Thanks for getting this started!

My experience with these files is that it is really helpful to start really minimalistic and only add what you need, as you experience the coding agent mess up. Focus on where your project differs from the average PyPI project, because otherwise you will get average-PyPI-project recommendations; e.g. the agent tries to run tests with pytest instead of spin. It should not be encyclopedic or have many optional choices. It’s okay to be opinionated here; devs driving the agents can give instructions that deviate if they decide to do things differently. For example, I wouldn’t bother documenting the various options to spin build; agents can run spin build --help and discover the options if the dev asks for Apple Accelerate and such. We can document the pixi workflow, and folks using Conda/Mamba can give their instructions on their own.

I think some of the information currently in here is more aimed at developers than agents, like the “Important Notes” and “Low-Value Changes” in the “Contributing Guidelines”. Until coding agents are much farther along, they’re not making those decisions.

I’d probably suggest starting with something more stripped down; cover basic spin and pixi usage (agents can --help the rest), the Python 3.12+ and numpy 2.0+ and Meson dependencies, commit message discipline, the basic style rules and how they are checked/enforced, that we use NumPy-style docstrings (but no need for an example, IME). The “API Design” and “Adding a New Function” sections are great. Then build up from there if people run into challenges. I think most of what you have could be useful to an agent, but probably not all the time.

An interesting experiment would be to write a simple, messy, docstringless function that would go into scipy.interpolate or somewhere, and have Claude Code do the work to turn it into a commit. Start with no AGENTS.md and see how it messes up. Try again in a fresh branch with a minimalistic one and see how it messes up. Add statements until it does a reasonable job, at least in the broad strokes.

2 Likes

That’s right. I generated it very quickly just to give an example. My .md file is a lot different from than the one above, but that is for my maintenance work and goes well with how I ask for it to make things work.

But a cold start PR would not go to well with that file, as recently some folks typically scan the Python ecosystem for easy PRs. It was geared more towards introducing the repo to a passing-by agent work.

I can recognize my issues with the agents in your description, especially falling back to dev.py and pytest usage when the context is full. So if we are going for the best developer experience, which is also fine, that’s indeed a different story.

I don’t think anything in the AGENTS.md is going to affect the low-value PRs. I don’t think those instructions there will prevent an agent from working on a low-value issue, or even cause it to raise the concern with the developer driving the agent. So I’d aim for a minimal, opinionated file that helps guide the basics of tool usage in this repository for actual developer use.

Happy to be proven wrong by an experiment, though!

I don’t think it is an experiment we can design since there is nothing to be measured. But at the face value how to build the repo and other details make it easier to build and create PRs of any quality, hence that would go against the purpose of preventing low-quality PRs if we are to find a way to make that experiment.

But “don’t send a PR in this part of the code” at least has the chance to mention it to the owner of the agent. So I think it does make a difference. In fact that’s how I stumbled upon agents file while I was traversing bunch of repos in the last few months, since the agent was mentioning the repo was not accepting PRs.

Also, current agents don’t have any issues whether it is 100 lines or 2000 lines in terms of precision so even the example above is still not considered long. But again, I don’t really have strong opinions.

However, as I mentioned, if we want to combine things and make it such that we also get DX benefits from it I would agree to that. And I don’t think such contributor guidelines section is creating any bloat of any kind, since it is meant for agents and you won’t be reading it.

An experiment would be to prompt Claude to make a low-value contribution (take a recently-rejected PR and try to recreate it, for instance), with and without these clauses. If it balks with the clause, then it works! Your experience running an agent across repos indicates that it might, but it’s not clear to me if the way you were running your agent is similar to the way our would-be contributors run their agents; I kinda suspect they point it straight at an issue and tell it to work on that issue.

At about 3.7k tokens, this text is on the long side. Current recommendations really do recommend shorter. Extraneous instructions do affect adherence to other instructions, even with current models.

If the goal is to just stop mindless agent usage, go even shorter and omit any DX-improving instructions, and just wave agents away.

If Claude or other agents were to be stateless, yes that would be possible but they are not. So that blog post is not accurate. They depend on the user’s settings and hooks and skills they installed. I don’t get the same response between two computers with the same account logged in.

Also fine with me. As I said I don’t have strong opinions. But I am trying to understand what the objection is. Do you have any examples what you wish to have? If you have something, let’s put it in and we can work on it incrementally over time. Anything is better than nothing at this stage.

I’m looking for any demonstration that those low-quality-avoiding instructions do anything, at any time, with any configuration. Not that it works 100% of the time or even 50% of the time or even reproducibly. Rigor is not required here, just some indication that a statement might actually do anything.

I was pretty specific in my recommendation for what should be in there:

I’d probably suggest starting with something more stripped down; cover basic spin and pixi usage (agents can --help the rest), the Python 3.12+ and numpy 2.0+ and Meson dependencies, commit message discipline, the basic style rules and how they are checked/enforced, that we use NumPy-style docstrings (but no need for an example, IME). The “API Design” and “Adding a New Function” sections are great. Then build up from there if people run into challenges. I think most of what you have could be useful to an agent, but probably not all the time.

Agreed. I stumbled over that blog as well when doing a bit of research for this discussion, and it’s quite good. I think the “progressive disclosure” in particular is relevant here to keep AGENTS.md short.

We also have to think about how it affects non-newcomers. For example, if I’d use an AI tool on my local repo, then it’d not be for generating code and making commits, nor for actively taking and dev steps, but rather as an intelligent code base search engine or for specific review tasks not caught by a linter or other CI (e.g., “Please review whether the error handling in the C code in this commit is correct and complete”). Having a lengthy AGENTS.md in the root of the repo will increase cost, and might decrease performance.

I’m having a hard time coming up with a good redirect. Something at the top like “Check ../AGENTS.md first, if it exists use that instead of this file” might be annoying since it imposes a particular location that may not work for everyone. And also it’s an extra tool call to check that the location exists - a bit inefficient even if cached per session.

I already told you anectodal evidence but somehow, you hold the opinion that your view is more correct so I can’t respond to that. Also we are trying to stop the person, not the agent. So your analysis is not directed to the original problem. The original problem is that folks setup and scan the repo for anything.

Regarding other details, please put the work in if you are not happy with it instead of the homework. I also need to see your version is actually doing something useful too but that type of argumentation leads to nowhere so I’ll leave it to you, as it is getting very tiring the microscopic details.

This is not going to be read every time or in its entirety so that is not an issue. It will not even read it if you don’t ask a specific task related to it. And in fact, even if it reads it, it will not even remember as your context gets filled up and it starts to compactify.

But I guess you folks have strong feelings about this so please take the initiative.

That works as advertised. You can see many projects also use it as such. But I am starting to feel that I have a credibility problem here. I guess I better not extend the discussion here not to waste more time and effort.

Do you happen to have a couple of examples of open source projects with such usage? I couldn’t find good examples so quickly, neither in scientific Python projects nor in deep learning focused ones like vLLM or PyTorch. The latter has a very small one, mainly saying “do NOT do xxx”.

I did a quick test with Claude Code (on Sonnet, FWIW). I added the verbatim CLAUDE.md and AGENTS.md above. Then I branched and removed “Important Notes”, “Out-of-Scope Changes”, and “Useful Links”. Under each branch, I prompted Claude to “Tighten except Exception: clauses to more specific exception classes.” following one recent rejected PR that obviously used a coding agent.

In both cases, Claude happily followed the direct instruction and made the changes and prepared a commit. Adherence to spin was actually followed more closely with the quality clauses than without, so no evidence that they harm attendance to the other rules. On the branch with the verbatim text, I asked Claude to “Review the contribution guidelines and determines if this is a suitable contribution.” It said that it was borderline and gave tedious arguments for and against and recommended asking first. So when explicitly asked to, it did attend to the contribution rules. However, doing so basically encouraged more interaction rather than less. With the sycophancy of these models being what it is, that’s in line with what I would have expected.

I then tried the prompt “Propose an easy PR to make for this project.” Under both conditions, it looked at recent commits and found Jake Bowhay’s recent systematic work enabling numpydoc linter rules and fixing what the linter reports. Under both conditions, it recommended enabling another rule in the list and fixing that. I think both would have been fine contributions with appropriate editing.

I’m quite happy to believe that a simple, hardline “Do not use an LLM to draft contributions to this project.” would be effective. What I’m asking for is some kind of evidence that the softer language that requires judgment calls would do anything; such judgment calls are particularly what these agents are bad at. It can be an N=1 anecdote, that’s fine. That fits the conventional wisdom for writing these files: start tiny, watch it fail to do what you want, add a rule to prevent that failure, watch it succeed with the new rule, then commit to that new rule.

Here is my attempt at a more-stripped version of AGENTS.md:

# AGENTS.md - Guide for AI-Assisted Development

This document provides guidance for AI assistants (Claude, etc.) working on the SciPy codebase.

## Project Overview

SciPy is a fundamental library for scientific computing in Python, providing algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics, and many other domains.

- **License**: BSD-3-Clause
- **Python**: >= 3.12
- **NumPy**: >= 2.0.0
- **Repository**: https://github.com/scipy/scipy
- **Documentation**: https://docs.scipy.org/doc/scipy/

## Build System

SciPy uses **Meson** as its build system with **meson-python** as the build backend.
It uses **Pixi** to manage the development environment and run convenient
developer tools.
It uses the `spin` CLI to run other developer tools.

### Quick Build Commands

```bash
# Build the package (recommended way)
pixi run build

# Run the tests
pixi run test

# Run the linter
pixi run lint

# Build documentation
pixi run docs

# Run benchmarks
pixi run bench

# Debug build
pixi run build-debug
```

Available pixi environments include: `build`, `test`, `docs`, `lint`, `bench`, `ipython`, `gdb`, `lldb`, and various array API backend environments (`jax-cpu`, `torch-cpu`, `cupy`, etc.).

## Project Structure

The `scipy` package source code is in the `scipy/` directory, with many
subpackages. Each subpackage typically contains:
- `__init__.py` - Public API exports
- Implementation files (`.py`, `.pyx`, `.pxd`, `.c`, `.cpp`)
- `tests/` directory with pytest tests
- `meson.build` - Build configuration

Vendored code is in `submodules/`. DO NOT EDIT files under here.

## Testing

### Running Tests

```bash
# Run all tests (excludes slow tests by default)
pixi run test

# Run tests for a specific submodule
pixi run test -s linalg

# Run tests on a directory or file
pixi run test scipy/linalg
pixi run test scipy/linalg/tests/test_basic.py

# Run specific test module, class, or function
pixi run test -t scipy.linalg.tests.test_basic
pixi run test -t scipy.linalg.tests.test_basic::TestClassName
pixi run test -t scipy.linalg.tests.test_basic::TestClassName::test_method
```

## Code Style

### Style Rules

- **Line length**: 88 characters (not PEP8's 79)
- **Indentation**: 4 spaces (no tabs)
- **Import convention**: `import numpy as np` (enforced)
- **Docstrings**: NumPy style
- **File endings**: POSIX (LF, not CRLF)

## Documentation

### Building Docs

```bash
# Build documentation
pixi run docs

# Quick smoke test
spin smoke-docs

# Check reference guide
spin refguide-check
```

### Docstring Format

Use NumPy docstring style.

## Contributing Guidelines

### PR Requirements

1. **Unit tests** - All new code must have tests
2. **Documentation** - Docstrings with parameters, returns, examples
3. **Code style** - Pass `pixi run lint`
4. **Benchmarks** - For performance-critical code (use ASV)
5. **License** - Code must be BSD-compatible

### API Design

- Public API: `scipy.subpackage.function_name`
- Private functions: leading underscore `_function_name`
- All public items must be in `__all__`
- Public items imported in `__init__.py`

## Common Development Tasks

### Adding a New Function

1. Implement in appropriate subpackage file
2. Add to `__all__` in that file
3. Import in `__init__.py`
4. Add comprehensive tests in `tests/`
5. Add docstring with examples
6. Update `meson.build` if adding new files

### Commit Message Prefixes

Commit messages must start with a standard prefix indicating the type of change:

| Prefix | Description |
|--------|-------------|
| `API:` | Incompatible API change |
| `BENCH:` | Changes to benchmark suite |
| `BLD:` | Build system changes |
| `BUG:` | Bug fix |
| `DEP:` | Deprecation or removal of deprecated features |
| `DEV:` | Development tool or utility |
| `DOC:` | Documentation |
| `ENH:` | Enhancement (new feature) |
| `MAINT:` | Maintenance (refactoring, typos, etc.) |
| `REV:` | Revert an earlier commit |
| `STY:` | Style fix (whitespace, PEP8) |
| `TST:` | Test additions or modifications |
| `REL:` | Release-related |

**Format:** `PREFIX: scope: short description`

Examples:
```
ENH: stats: add new statistical test for normality
BUG: sparse.linalg.gmres: add early exit when x0 already solves problem
MAINT/TST: fft: remove xp backend skips, test fftfreq device
DOC: optimize: fix typo in minimize docstring
```

Multiple prefixes can be combined (e.g., `MAINT/TST:`). Keep the first line under 72 characters.

### Commit Message Tags for CI Control

SciPy's CI is resource-intensive. Use commit message tags to skip unnecessary workflows:

| Tag | Effect |
|-----|--------|
| `[docs only]` | Skip build/test workflows, run only documentation CI |
| `[lint only]` | Skip build/test workflows, run only linting CI |

**Example commit messages:**
```
DOC: fix typo in optimize tutorial [docs only]

STY: fix ruff warnings in stats module [lint only]
```

**When to use these tags:**
- `[docs only]` - Changes exclusively to `.rst`, `.md` files, or docstrings with no code changes
- `[lint only]` - Pure style/formatting fixes with no functional changes

**Important:** Only use these tags when you are certain your changes don't affect code behavior. If in doubt, let the full CI run.

## Compiled Code

SciPy contains significant amounts of compiled code:

- **Cython** (`.pyx`, `.pxd`) - Python-like syntax compiled to C
- **C/C++** - Direct implementations, especially in `special/`, `sparse/`
- **Pythran** - Python to C++ transpilation for numeric code

When modifying compiled code:
1. Rebuild with `pixi run build`
2. Use `pixi run build --debug` for debugging
3. Check for memory issues with `pixi run build --asan`

Could probably go farther, but I think this is sound. I ran it through its paces with a “Propose an easy PR to make for this project” prompt again, and it identified some TODOs for the numpy 2.0 upgrade which is now unblocked. It did the updates, used pixi to run the tests and linter, and made a properly-formatted git commit. Not the most important PR, but one I’d probably approve and merge.

pixi run smoke-docs already exists, and https://github.com/scipy/scipy/pull/24336 adds tasks for refguide-check and smoke-tutorials.