Standardized system for parameter management across Scientific Python libraries

EwoutH · September 5, 2024, 8:31am

Over at Mesa we’re working on a system for managing model/agent default values and ranges, and I believe this could be beneficial beyond Mesa. We’re exploring how to create a standardized way to define parameter spaces, including default values, ranges, and distributions, to be used by any internal and potentially external components.

The idea is to develop a ParameterSpec structure that can be used for continuous, discrete, ordered, and categorical data, with flexibility for specifying distributions, sampling strategies, and default values. Then each software component (samplers, visualisation, batch runners) can extract the pieces of information it needs. By defining everything in one place, it makes it easier to integrate different components and much less error prone.

Since we also did something similar in the EMAworkbench (see parameters), we where thinking such a spec might be beneficial for more simulation and other scientific libraries. It might have potential for a SPEC.

We’ve started discussions here, and we’d love to hear your thoughts and see if there’s potential for a broader collaboration. If there’s much interest, we also start a proper discussion here.

mdhaber · September 10, 2024, 7:19pm

I’ve think I’ve had similar needs recently. In my case, I gave the objects some additional properties and methods. For instance, a parameter has a mathematical symbol that corresponds with it; this is useful for automatically documenting the domain of parameters that appear in mathematical expressions. A parameter object also has a method for drawing random values of the parameter; it can drawing from typical values, the full domain, the boundaries of the domain, and from outside the domain (for testing exceptional behavior). The domains of parameters can depend on other parameters. Do these sound relevant in your context, too?

EwoutH · September 11, 2024, 5:40am

Thanks for getting back! Glad to see there’s interest for solving this problem on a broader level.

I think that’s something that definitely can be included.

A parameter object also has a method for drawing random values of the parameter; it can drawing from typical values, the full domain, the boundaries of the domain, and from outside the domain (for testing exceptional behavior).

We discussed this in some extend on Mesa. Our initial conclusion was that it might be beneficial to separate parameter specs from samplers, in such a way that the parameter spec provides the information needed for (most) samplers, and each sampler can determine what information it would use from the parameter spec. This way different samplers can be used on a parameter definition.

The domains of parameters can depend on other parameters.

Haven’t thought about that yet, interesting idea! I can imagine this would make things slightly more complicated, but might be worth it.

mdhaber · September 11, 2024, 2:22pm

Our initial conclusion was that it might be beneficial to separate parameter specs from samplers, in such a way that the parameter spec provides the information needed for (most) samplers, and each sampler can determine what information it would use from the parameter spec.

Yes, even within the same domain, there are different distributions one might want to sample from. However, I’ve found that this is also the most tedious and error-prone part - so the thing I’d most like to avoid doing all on my own.

However, I think what I would be more interested in is a library rather than a SPEC (and perhaps one already exists - I haven’t looked). A SPEC attempts to standardize the way in which various packages do so something for themselves, whereas a library would just do it for them. I’m curious what the motivation is for a SPEC rather than a library?

EwoutH · September 12, 2024, 7:54am

Yes, a library might be the right implementation solution here. I was thinking of a spec to get the requirements and capabilities right for a wide set of repos, since this is not technically difficult to build (it’s can probably be just a dataclass with a few helper methods), but it’s difficult to get the format right and get everybody aligned.

adrinjalali · September 15, 2024, 7:11pm

This sounds similar to how we’re doing parameter validation in scikit-learn here: scikit-learn/sklearn/utils/_param_validation.py at main · scikit-learn/scikit-learn · GitHub

We haven’t made them public, and there are discussions about how it would interact with docstrings and typehints. There might also be a way to programmatically figure out parameter ranges for AutoML usecases. But all of that is still up for discussion.

Thought I’d put it here since there seem to be some similarities.