Problem with residuals / cost function when running PEtab / Catalyst on actual data #200

yakdahl · 2024-08-20T17:04:53Z

Hi! I'm having an issue with using PEtab with Catalyst, and I think there's something going wrong in the residual calculation. I can set up the problem as in the docs, including with data taken from actual measurement sources. There are no errors being thrown, and if I ask the PEtab problem to compute the simulated values for some reasonable parameter vector, I get a reasonable result. However, when I ask for the residuals or the cost function, I get non-numerical values (-Inf and NaN, respectively). Not sure what is going wrong there - any ideas?

This ties in to another question I have - would it be possible to allow definition of custom loss functions that could be supplied by the user? Or would that break other parts of the PEtab code?

Here's the code, as well as an example file for the experimental measurements.

using Catalyst, PEtab
using OrdinaryDiffEq, CSV, DataFrames

rs = @reaction_network begin
10.0^k1, PdCat + ArBr --> PdCatArBr
10.0^k2, PdCatArBr + NaOTMS --> PdCatArOTMS + NaBr
10.0^k3, PdCatArOTMS + Amine --> PdCatArAmine + HOTMS
10.0^k4, PdCatArAmine --> PdCat + ArAmine
10.0^kd2, PdCatArBr --> PdCatArylated
end;

simulation_conditions = Dict("cond_2" => Dict(:NaOTMS=>0.373, :ArBr=>0.355, :PdCat=>0.003, :Amine=>0.426),
"cond_3" => Dict(:NaOTMS=>0.361, :ArBr=>0.688, :PdCat=>0.003, :Amine=>0.413),
"cond_1" => Dict(:NaOTMS=>0.373, :ArBr=>0.355, :PdCat=>0.003, :Amine=>0.426))

params = [PEtabParameter(p, lb=1e-6, ub=1e8, scale=:log10) for p in rs.ps]
@parameters sigma
_sigma = PEtabParameter(:sigma)

push!(params,_sigma)

@unpack ArAmine = rs

obs_ArAmine = PEtabObservable(ArAmine, sigma * ArAmine,transformation=:lin)

observables = Dict("obs_ArAmine" => obs_ArAmine)

measurements = CSV.read("example.csv", DataFrame)

measurements[!,"measurement"] = [maximum([x,1.0e-10]) for x in measurements[!,"measurement"]] # tried to see if forcing the measurements to be at least small positive numbers would fix things - it didn't
measurements[!,"time"] = [x for x in measurements[!,"time"]] # random workaround to make sure the columns are formatted correctly - not sure why they don't read right.
measurements[!,"simulation_id"] = [x for x in measurements[!,"simulation_id"]]
measurements[!,"obs_id"] = [x for x in measurements[!,"obs_id"]]

petab_model = PEtabModel(rs, simulation_conditions, observables, measurements, params, verbose=false)

petab_problem = PEtabODEProblem(petab_model, verbose=false)

sim_vals = petab_problem.compute_simulated_values([1.0 for i in eachindex(params)]) # returns a reasonable vector of floats
petab_problem.compute_residuals([1.0 for i in eachindex(params)]) # returns -Inf

petab_problem.compute_cost([1.0 for i in eachindex(params)]) # returns NaN

example.csv

sebapersson · 2024-08-21T06:02:40Z

Thanks for reporting! You are correct that something is wrong with the residuals, but technically the code is not incorrect :) What causes problems is your noise-formula,

obs_ArAmine = PEtabObservable(ArAmine, sigma * ArAmine,transformation=:lin)

If either sigma or ArArmine is zero, the noise formula is evaluated to 0, and since when computing the likelihood we divide with the nose-formula:

residuals^2 / noise_formula

you are dividing by 0 and therefore get the NaN in compute_cost. How to solve this? Often a combined noise formula is used on the form:

noise_formula = sigma1 + sigma2*ArAmine

Here you would never be able to divide by zero, and there is only one more parameter to estimate (and typically the dataset holds enough information to estimate noise parameters so you should not have any identifiabillity issues).

Your code also errors because your parameter vector x = [1.5 for i in eachindex(params)] is on the log10 scale, in particular

# sigma = x[end]
x[end] == 1.0 # log10(sigma) = 1 thus sigma = 0

You defined sigma as zero, which again makes the measurement noise formula to evaluate to zero. To generate a startguess within the bounds, you can do:

x0 = PEtab.generate_startguesses(petab_problem, 1)

As for a custom objective function, what kind of function do you have in mind? Currently, it is often not possible to use a custom objective, because some formulas for the gradient computations are hard-coded (but there might be workarounds). I am further currently updating PEtab.jl, and one of my aims is to make a custom objective possible.

yakdahl · 2024-08-23T17:57:34Z

Thanks! That makes sense - hadn't quite realized how the variable transformation was working in the rest of the code. This fixes the problems in the minimal problem, there's still something else wrong in my actual problem. Let me see if I can figure out what that is and I'll get back to you on that.

In terms of loss functions: I've previously added time derivatives or similar terms (such as the slope of a line fit to the first n datapoints - experimental chemists use this as a measure of initial rates pretty frequently) into the loss function for kinetic models. I've found this somewhat useful for making sure that the solutions converge rapidly towards a minimum in which not only the general outline of the trace is reproduced, but also the slopes of the traces are well defined. In principle it should all be the same minimum, but I think the loss landscape becomes a bit more convex and I think doing this might eliminate some local minima that I've gotten stuck in sometimes.

sebapersson · 2024-08-24T04:32:01Z

Thanks! That makes sense - hadn't quite realized how the variable transformation was working in the rest of the code. This fixes the problems in the minimal problem, there's still something else wrong in my actual problem. Let me see if I can figure out what that is and I'll get back to you on that.

Happy this solved the problem, hope you manage to fix the big problem (otherwise feel free to reach out!)

In terms of loss functions: I've previously added time derivatives or similar terms (such as the slope of a line fit to the first n datapoints - experimental chemists use this as a measure of initial rates pretty frequently) into the loss function for kinetic models. I've found this somewhat useful for making sure that the solutions converge rapidly towards a minimum in which not only the general outline of the trace is reproduced, but also the slopes of the traces are well defined. In principle it should all be the same minimum, but I think the loss landscape becomes a bit more convex and I think doing this might eliminate some local minima that I've gotten stuck in sometimes.

You cannot add things like this directly, but there are workarounds. For example, lets say you want to compare the time-derivative of the ODE-solution against a linear slope, you could split your cost function into two parts. In the first you compute the cost with PEtab.jl (standard cost), and in the second part you compute your custom cost (compare against slope) with help of the utility functions. As an example:

petab_problem = PEtabModel(...)
function cost_p1(x)
    return petab_problem.compute_cost(x)
end 
function cost_p2(x)
    # id is the simulation condition 
    odesol = get_odesol(x, petab_problem; condition_id = id)
    # do things to compute custom slope
end
cost_total(x) = cost_p1(x) + cost_p2(x)

Similarly, for cost_p2 you can build a gradient function with using for example ForwardDiff and combine with the PEtab.jl gradient function (gradients are linear operators so adding a new function is not a problem). There are some pitfalls when adding custom factors like these (e.g. easy to mess up gradient computations), but feel free to reach out if you want to test something like this later and have any problems (and hopefully when done updating PEtab.jl I can figure out some easier way to add custom things in the objective :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with residuals / cost function when running PEtab / Catalyst on actual data #200

Problem with residuals / cost function when running PEtab / Catalyst on actual data #200

yakdahl commented Aug 20, 2024 •

edited

Loading

sebapersson commented Aug 21, 2024

yakdahl commented Aug 23, 2024

sebapersson commented Aug 24, 2024

Problem with residuals / cost function when running PEtab / Catalyst on actual data #200

Problem with residuals / cost function when running PEtab / Catalyst on actual data #200

Comments

yakdahl commented Aug 20, 2024 • edited Loading

sebapersson commented Aug 21, 2024

yakdahl commented Aug 23, 2024

sebapersson commented Aug 24, 2024

yakdahl commented Aug 20, 2024 •

edited

Loading