Solver instability for my specific application

Hi everyone,

As a start, I just want to say that I’m not an expert either in mathematics nor coding. In fact, I’m a computational biologist which tries to develop different kind of tools to understand the cell’s behaviour.

Currently, I’m working on a framework, which allows the user to simulate a fermentation process using the ODE solver solve_ivp. During the simulation, the user basically tracks different kind of metabolites that can be produced by the cell. The right hand side function technically calculates the metabolite change over the time, which in general in the form of metabolite production - metabolite consumption by the cell, mind you, the metabolite production/consumption are the results of the solving of a linear programming (LP) probelm (we call this a genome scale model). So at each iteration, LP is solvedm and ists result is used to calculate dydt.

In many fermentation process, it is very often that the conditions fluctuate so much. (especially a fermentation process, where you feed the cell culture). In this condition, you can imagine that some specific metabolites will be consumed to zero concentration and being added again later on. This happens super fast (magnitude of second) and from my experience, this seems to “confuse” the solver’s internal error estimation and causes my program to “stuck” during integration process.

So far I tested different methods from solve_ivp including RK45, BDF, Radau, LSODA, even explicit euler. Out of my experience, RK45 works the best, but like I have said before, it’s very instable, depending on the initial conditions and how many metabolites I pass as an array to the solver.

I have observed two different “stuck” behaviour of the solvers:

  1. Solver keeps calling the right hand side definition for the same time step
  2. Solver stops completely and the terminal froze

I’m really lost at the moment and really don’t have any idea what’s going on.

If anyone has some idea on what’s going on/what I can do to improve the solver’s performance that would be very great.

My colleague suggested this might be because in the concentration array, the tracked metabolite’s concentrations have different magnitudes which is not good for the solver (the solver likes to work with entriy values with similar magnitude) But I really doubt that’s an issue at all.

I’d really appreciate any comment/feedback/suggestion/idea from the community.

Thanks a lot!

Have you been able to find where the solver is stopping or slowing and then relate that to your RHS (right hand side) logic? I wish that solve_ivp had callback function that would allow easily for this. It currently requires manually patching the OdeSolver class in scipy, which is cumbersome.

Do you see the solution going to positive or negative infinity, or have strong spurious oscillations when this behavior is occurring? This would be another symptom to look for that can aide you in the diagnosis.

I have observed two different “stuck” behaviour of the solvers:

Solver keeps calling the right hand side definition for the same time step

Solver stops completely and the terminal froze

In my experience, these symptoms are usually caused by:

  1. You are using the default tolerances of solve_ivp. The default tolerances are poor for stiff problems.
  2. A very stiff system. However, the stiff scipy solvers should be capable of overcoming this issue, potentially with very long solve times. This is usually not causing the issue if you are using the stiff solvers. See point 1 above.
  3. If you do not have a consistently specified RHS, e.g. your inner solve code spuriously leads to large changes in derivative with small changes in output. I have run into this when I have made logic/coding errors or have a nonlinear equation solve inside that was switching between roots.
  4. A discontinuity in the RHS. I highly suspect you have this based on your description. It is common for cell culture fermentations to have discrete like feeds. Particularly bad when the system is otherwise stiff. If so, how did you implement this? Assuming your discontinuity is in time, it can be helpful to force the solver to hit the timepoint of the start of the discontinuity. odeint has this functionality (see tcrit parameter) but not yet solve_ivp. Another helpful thing is to smooth out the discontinuity so it is no longer discontinuous.
  5. If you are specifying the Jacobian for the stiff solvers and it doesn’t exactly correspond to the RHS, it can slow down and even stop your integration with the stiff solvers. If you are supplying the Jacobian, try letting the solver automatically calculate it, and see if the progress improves.

In the majority of time, in my own case at least, it is usually number 3. I have made some mistake in the code.

Hi Matthew,

thank you so much for taking your time to reply to my question in detailed manner.

I think you were indeed right about the discontinuity. I have a fed-batch simulation of a big-scale reactor and try to simulate glucose consumption in the top, middle and bottom of the reactor which are separated into different compartments. In each of those compartments, I track glucose, biomass, co2 and acetate concentration.

The thing is, the derivatives of those variables are strongly dependent (calculated) by solving a Linear Programming problem (LP) inside the rhs and everytime the solution to this LP is not optimal, I would change the solution to 0. Maybe this is causing the discontinuity?

However, I noticed that when I track less metabolites (only biomass, acetate and glucose) I was able to finish my simulation and no “stuck” behaviour was observed.

Regarding your comments:

  • “Have you been able to find where the solver is stopping or slowing and then relate that to your RHS (right hand side) logic? I wish that solve_ivp had callback function that would allow easily for this. It currently requires manually patching the OdeSolver class in scipy, which is cumbersome.” -. I mean, it’s probably the LP inside the rhs that’s causing discontuinity, if that’s what you mean with relate rhs with the solver stopping/slowing?
    1. …..”it can be helpful to force the solver to hit the timepoint of the start of the discontinuity. odeint has this functionality (see tcrit parameter) but not yet solve_ivp. Another helpful thing is to smooth out the discontinuity so it is no longer discontinuous.” - how exactly you force the solver to hit the timepoint of the start of the discontinuity? and about smoothing the discontinuity, cell’s consumption rate is very rapid, I don’t know exactly how to smooth this ..
  • And regarding your comment on the Jacobian matrix, I tried to provide one, but it didn’t seem to work either ..

Once again, thanks again for your help! Looking forward to your next reply!

The thing is, the derivatives of those variables are strongly dependent (calculated) by solving a Linear Programming problem (LP) inside the rhs and everytime the solution to this LP is not optimal, I would change the solution to 0. Maybe this is causing the discontinuity?

It is really hard to know without looking deeply into the code, but this sounds very much like it could be causing the issue. How to best handle it will depend on the exact nature of what you are trying to represent.

I wish that solve_ivp had callback function that would allow easily for this.

I have a long open PR for this, but it seems stuck to get it implemented. In my own code, I use the patch I referenced based on a Stack Overflow answer that either no longer exists or I can no longer find. Without that you are stuck putting print statements inside the RHS, but then you are also getting results from internal function evaluations on every step which makes it messy.

how exactly you force the solver to hit the timepoint of the start of the discontinuity?

If your discontinuity was time based, e.g. dy/dt = 1 if 0.1<=t<0.11 else 0, you can use the tcrit parameter in odeint to ensure the solver hits exactly t=0.1. It doesn’t sound particularly like this is your specific problem.

I would try to dive more into odeint and its parameter “tcrit”. At this point, anything is worth trying. so thanks!

anyway, if you wonder how my rhs function looks like, it looks roughly like this:

def rhs_definition(self,

        t, y:ConcentrationArray

        )-> ConcentrationArray: # y is list with biomass concentration in each compartment

    print('Printing simulation step', t, 'h')

    y = self.\_ensure_metabolite_concentrations_are_within_bounds(y)

    y = self.\_check_reactor_volume(y)

    dydt = self.\_empty_dydt #recycle concentration vector to avoid memory leakage

    dydt.fill(0.0)

    for comp in self.compartments:

        biomass_idx = self.get_index(species = 'biomass', compartment=comp)

        biomass_concentration = Q\_(y\[biomass_idx\], 'g/L')

        time_step = Q\_(t-self.previous_t,'h')

        growth_rate = self.get_growth_rate(y = y,

                                                t = t,

                                                compartment = comp,

                                                time_step=time_step,

                                                biomass_concentration=biomass_concentration)

        dydt = self.determine_mass_balance_for_compartment(y = y,

                                                                time_step = Q\_(t-self.previous_t,'h'),

                                                                dydt = dydt,

                                                                compartment=comp,

                                                                biomass_concentration = biomass_concentration,

                                                                growth_rate=growth_rate,

                                                                t=t

                                                                )

    self.\_check_timestep_state(t = t)

the LP solving is happening inside of _get_growth_rate():

def get_growth_rate(self, y:ConcentrationArray,

                     t: Iterable\[float\],

                     compartment: 'Compartment',

                     time_step: Quantity,

                     biomass_concentration:Quantity

                     ) -> Quantity:

    if self.\_is_metabolic_model():

        self.\_update_metabolite_transport_relations(y=y,

                                                        time_step=time_step,

                                                        biomass_concentration=biomass_concentration,

                                                        compartment=compartment

                                                        )

        solution = self.\_optimize_model(t)

        growth_rate = Q\_(self.model.reactions.get_by_id(self.biomass_rxn_id).flux,"1/h") \\

            if solution.status == 'optimal' else Q\_(0, "1/h")

    return growth_rate

I would also consider these lines also possible causes of this issue. When the ode integrator is assessing steps, it will try various combinations of state variables, if you overwrite those values, then you might lose some meaning to the Jacobian. It is usually better to ensure your equations satisfy these constraints rather than overwrite the state variables. It is possible that this is what these functions do in effect, but it isn’t clear.

y = self.\_ensure_metabolite_concentrations_are_within_bounds(y)
y = self.\_check_reactor_volume(y)